Commit Graph

10417 Commits

Author SHA1 Message Date
George Hotz
b80cacb416 fix GPU efficientnet example 2021-05-26 17:29:35 -07:00
George Hotz
1ae0e88627 nvidia notes 2021-05-26 14:27:00 -07:00
20kdc
2653d33292 vgg7 (image upscaling) implementation - not the best, but it works (#255)
* vgg7 implementation - not the best, but it works

* VGG7 implementation: Spread nansbane to deter NaNs, maybe improved training experience

* VGG7 implementation: Fix training, for real this time

Results actually attempt to approximate the input

* VGG7 implementation: Sample probability management
2021-05-12 23:48:51 -07:00
Skosh
81bf933a91 Improved __getitem__ (#254)
* Some progress on yolov3

* Removed some debugging comments… Also, the forward pass eats all RAM for some reason

* forward pass almost runs

* forward pass runs almost

* forward pass runs, now we gotta load the weights

* loading weights works

* fetches config and weights

* everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done

* some changes

* fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly

* Something is wrong with the forward pass, Conv2d tests added

* forward pass almost outputs correct values, gotta fix one more thign

* yolo works

* some final changes

* reverting changes

* removed dataloader

* fixed some indentation

* comment out failing test, somehow it fails CI even though it passes on my computer…

* fixed wrong probabilities

* added webcam option to YOLO, now just need to add bounding boxes and speed it up

* some progress towards adding bounding boxes

* trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage

* Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image

* removed some debugging print statements

* updated result image

* something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…

* Improved __getitem__

* Updated

* Updated __getitem__

* Linebreaks

* Maybe this works?

* Added MNIST locally, tests run now
2021-05-05 22:15:22 -07:00
Skosh
78aa147b39 [WIP] YOLO working on tinygrad! (#245)
* Some progress on yolov3

* Removed some debugging comments… Also, the forward pass eats all RAM for some reason

* forward pass almost runs

* forward pass runs almost

* forward pass runs, now we gotta load the weights

* loading weights works

* fetches config and weights

* everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done

* some changes

* fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly

* Something is wrong with the forward pass, Conv2d tests added

* forward pass almost outputs correct values, gotta fix one more thign

* yolo works

* some final changes

* reverting changes

* removed dataloader

* fixed some indentation

* comment out failing test, somehow it fails CI even though it passes on my computer…

* fixed wrong probabilities

* added webcam option to YOLO, now just need to add bounding boxes and speed it up

* some progress towards adding bounding boxes

* trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage

* Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image

* removed some debugging print statements

* updated result image

* something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…
2021-04-25 18:06:52 -07:00
ziofil
155ec1f18e saving 50 LOC with automatic @staticmethod for forward and backward (#252)
* automatic @staticmethod for forward and backward

* triggering unit tests
2021-04-25 18:04:16 -07:00
freedom" Koan-Sin Tan
f0cc2b66f8 add an aneccompile example in Objective-C (#240)
* add an aneccompile example in Objective-C

add a compile.m corresponding to compile.mm

build with
```clang compile.m -F /System/Library/PrivateFrameworks/ -framework ANECompiler -framework Foundation```

CoreFoundation framework is a C library.
Foundation is an Objective-C framework.

CF data structures in CoreFoundation usually have corresponding NS data structures in Foundation, e.g.,
NSDictionary is "toll-free bridged" with its Core Foundation counterpart, CFDictionary.
See [1].

[1] https://developer.apple.com/library/archive/documentation/General/Conceptual/CocoaEncyclopedia/Toll-FreeBridgin/Toll-FreeBridgin.html

* figure out how to use param_3 of ANECCompile

add a simple param_3 blocks callback, which dumps the status
dictionary when status != 0
2021-01-31 08:31:16 -08:00
Göktuğ Karakaşlı
eabe0b9017 remove deepwalk args (#243) 2021-01-31 08:30:17 -08:00
George Hotz
ce77dda805 yolov5 v4 2021-01-05 07:56:17 -08:00
George Hotz
62e3a8558c fix tolerance maybe 2021-01-05 07:45:47 -08:00
Asim
1c148f2fe4 fixed example broken after gpu refactor (#238) 2021-01-05 07:41:54 -08:00
George Hotz
8a38e0d207 only mish failed 2021-01-03 09:47:11 -08:00
George Hotz
a337f7780e smarter way to write sign 2021-01-03 09:46:00 -08:00
George Hotz
1a4487965a remove negative from things w/o negative 2021-01-03 09:43:34 -08:00
George Hotz
0531b848eb second class sign 2021-01-03 09:33:12 -08:00
George Hotz
0702e0c763 nah, no sign, it's not what you want. use relu 2021-01-03 09:30:33 -08:00
George Hotz
29655609d5 fix GPU sign...these tests aren't very good 2021-01-03 09:00:49 -08:00
George Hotz
ea9c9af5d7 faster sign 2021-01-03 08:54:21 -08:00
George Hotz
c2eeb6950b add support for sign. technically relu can be second class now 2021-01-03 08:29:57 -08:00
George Hotz
6842ad9ec8 minor cleanups, yolo work 2021-01-03 08:14:16 -08:00
NeuralLink
0825cf7f79 Added softplus and mish non stable (#220)
*  Added softplus and mish CPU

* 🔨 refactor

* 🔨 second class softplus and mish

* 🔨 test fix

* no need of device in testing
2021-01-03 08:08:41 -08:00
George Hotz
ac229ea750 remove print 2021-01-02 12:53:30 -08:00
George Hotz
895d142503 start trying to load yolo v5 2021-01-02 12:51:55 -08:00
NeuralLink
ece07a3d12 🔨 refactor register ops (#233)
* 🔨 refactor register ops

* 🔨 reorder and register for ANE

* 🔨 refactor

* 🔨 conflicts

* 🔨 minor fix

* ane fix

* extra reshape weird
2021-01-02 07:47:16 -08:00
Marcel Bischoff
42b4761025 transformer >99.98% test accuracy in ~30s (#230)
* transformer

* BS might divide len(Y_test)

* outoput when accuracy is high

* more readeable

* fixed loss in serious_mnist for new API
2021-01-02 07:45:09 -08:00
Liam
ebd72ff437 Test split (#231)
* Split tests

Split tests into "Test CPU" and "Test GPU".

Add test flag "TEST_DEVICES" which is a comma separated list of devices:
CPU,GPU,ANE

* Run tests based on provided TEST_DEVICES flag

By default will run all "CPU,GPU,ANE"

* fix bad quote

* Revert changes and use GPU=1

This is done through setting the default Tensor Device to Device.CPU of
GPU=1 is set.

Run GPU tests: GPU=1 pytest -s -v
2021-01-01 09:19:03 -05:00
George Hotz
4a7cf2e420 more reordering 2020-12-31 09:58:02 -05:00
George Hotz
92abe43683 reduce before binary because of unbroadcasting 2020-12-31 09:49:52 -05:00
George Hotz
4291002881 reorder GPU ops 2020-12-31 09:46:39 -05:00
George Hotz
de7fe085de no read out of bounds 2020-12-31 09:41:36 -05:00
George Hotz
1fb5fcafce GPU slice should fix tests 2020-12-31 09:37:03 -05:00
Liam
e972a45456 Dynamically register ops to Tensor (#232)
* Dynamically register ops to Tensor

This saves lines. And reduces redundant repetition.

* ffs spacing

you don't pay me enough!
2020-12-31 09:10:19 -05:00
Marcel Bischoff
e2f833f58f max to behave on ties like torch (#229)
* checkpoint

* fixing pow

* undo pow

* backward max on GPU and CPU rewrite

* indentation

* changing seed for curiosity

* max replaced equality

* undo seed

* rebase

* fixed tests

* merge error
2020-12-30 18:52:50 -05:00
George Hotz
30f8132646 reorder ops in ops cpu 2020-12-30 11:00:01 -05:00
George Hotz
e5b2803b5d ops in readme 2020-12-30 10:48:55 -05:00
George Hotz
2d44bf7f1a Dot -> Matmul 2020-12-30 10:41:51 -05:00
George Hotz
10fc3ff5b9 cleaner syntax 2020-12-30 10:35:37 -05:00
George Hotz
fcfe3dae01 write slice for CPU 2020-12-30 10:32:53 -05:00
George Hotz
47504004fd ane ops 2020-12-29 18:00:53 -05:00
George Hotz
1f5c9618ef refactor in readme and issue #225 2020-12-29 17:30:04 -05:00
George Hotz
f9170505b3 if you like your transformers twice as slow, use the GPU 2020-12-29 17:14:23 -05:00
George Hotz
6a6a82e999 support multidot on GPU 2020-12-29 16:56:30 -05:00
George Hotz
27208d729b add GPU max thanks to marcelbischoff 2020-12-29 16:44:14 -05:00
George Hotz
4bbad11afe link to papers 2020-12-29 14:15:46 -05:00
George Hotz
3f8e137b6f extra/transformer 2020-12-29 14:14:00 -05:00
George Hotz
c4e7a1ae59 accessors are dumb 2020-12-29 14:10:26 -05:00
George Hotz
fb6aaefb9b save 2 lines 2020-12-29 14:02:50 -05:00
George Hotz
ea341c84fe logsoftmax good, div bad 2020-12-29 13:59:39 -05:00
George Hotz
f18801c7db simple pool. swimming is very easy now 2020-12-29 13:48:50 -05:00
George Hotz
8f9232d59b readmee 2020-12-29 13:40:34 -05:00