Commit Graph

11106 Commits

Author SHA1 Message Date
George Hotz
8a38e0d207 only mish failed 2021-01-03 09:47:11 -08:00
George Hotz
a337f7780e smarter way to write sign 2021-01-03 09:46:00 -08:00
George Hotz
1a4487965a remove negative from things w/o negative 2021-01-03 09:43:34 -08:00
George Hotz
0531b848eb second class sign 2021-01-03 09:33:12 -08:00
George Hotz
0702e0c763 nah, no sign, it's not what you want. use relu 2021-01-03 09:30:33 -08:00
George Hotz
29655609d5 fix GPU sign...these tests aren't very good 2021-01-03 09:00:49 -08:00
George Hotz
ea9c9af5d7 faster sign 2021-01-03 08:54:21 -08:00
George Hotz
c2eeb6950b add support for sign. technically relu can be second class now 2021-01-03 08:29:57 -08:00
George Hotz
6842ad9ec8 minor cleanups, yolo work 2021-01-03 08:14:16 -08:00
NeuralLink
0825cf7f79 Added softplus and mish non stable (#220)
*  Added softplus and mish CPU

* 🔨 refactor

* 🔨 second class softplus and mish

* 🔨 test fix

* no need of device in testing
2021-01-03 08:08:41 -08:00
George Hotz
ac229ea750 remove print 2021-01-02 12:53:30 -08:00
George Hotz
895d142503 start trying to load yolo v5 2021-01-02 12:51:55 -08:00
NeuralLink
ece07a3d12 🔨 refactor register ops (#233)
* 🔨 refactor register ops

* 🔨 reorder and register for ANE

* 🔨 refactor

* 🔨 conflicts

* 🔨 minor fix

* ane fix

* extra reshape weird
2021-01-02 07:47:16 -08:00
Marcel Bischoff
42b4761025 transformer >99.98% test accuracy in ~30s (#230)
* transformer

* BS might divide len(Y_test)

* outoput when accuracy is high

* more readeable

* fixed loss in serious_mnist for new API
2021-01-02 07:45:09 -08:00
Liam
ebd72ff437 Test split (#231)
* Split tests

Split tests into "Test CPU" and "Test GPU".

Add test flag "TEST_DEVICES" which is a comma separated list of devices:
CPU,GPU,ANE

* Run tests based on provided TEST_DEVICES flag

By default will run all "CPU,GPU,ANE"

* fix bad quote

* Revert changes and use GPU=1

This is done through setting the default Tensor Device to Device.CPU of
GPU=1 is set.

Run GPU tests: GPU=1 pytest -s -v
2021-01-01 09:19:03 -05:00
George Hotz
4a7cf2e420 more reordering 2020-12-31 09:58:02 -05:00
George Hotz
92abe43683 reduce before binary because of unbroadcasting 2020-12-31 09:49:52 -05:00
George Hotz
4291002881 reorder GPU ops 2020-12-31 09:46:39 -05:00
George Hotz
de7fe085de no read out of bounds 2020-12-31 09:41:36 -05:00
George Hotz
1fb5fcafce GPU slice should fix tests 2020-12-31 09:37:03 -05:00
Liam
e972a45456 Dynamically register ops to Tensor (#232)
* Dynamically register ops to Tensor

This saves lines. And reduces redundant repetition.

* ffs spacing

you don't pay me enough!
2020-12-31 09:10:19 -05:00
Marcel Bischoff
e2f833f58f max to behave on ties like torch (#229)
* checkpoint

* fixing pow

* undo pow

* backward max on GPU and CPU rewrite

* indentation

* changing seed for curiosity

* max replaced equality

* undo seed

* rebase

* fixed tests

* merge error
2020-12-30 18:52:50 -05:00
George Hotz
30f8132646 reorder ops in ops cpu 2020-12-30 11:00:01 -05:00
George Hotz
e5b2803b5d ops in readme 2020-12-30 10:48:55 -05:00
George Hotz
2d44bf7f1a Dot -> Matmul 2020-12-30 10:41:51 -05:00
George Hotz
10fc3ff5b9 cleaner syntax 2020-12-30 10:35:37 -05:00
George Hotz
fcfe3dae01 write slice for CPU 2020-12-30 10:32:53 -05:00
George Hotz
47504004fd ane ops 2020-12-29 18:00:53 -05:00
George Hotz
1f5c9618ef refactor in readme and issue #225 2020-12-29 17:30:04 -05:00
George Hotz
f9170505b3 if you like your transformers twice as slow, use the GPU 2020-12-29 17:14:23 -05:00
George Hotz
6a6a82e999 support multidot on GPU 2020-12-29 16:56:30 -05:00
George Hotz
27208d729b add GPU max thanks to marcelbischoff 2020-12-29 16:44:14 -05:00
George Hotz
4bbad11afe link to papers 2020-12-29 14:15:46 -05:00
George Hotz
3f8e137b6f extra/transformer 2020-12-29 14:14:00 -05:00
George Hotz
c4e7a1ae59 accessors are dumb 2020-12-29 14:10:26 -05:00
George Hotz
fb6aaefb9b save 2 lines 2020-12-29 14:02:50 -05:00
George Hotz
ea341c84fe logsoftmax good, div bad 2020-12-29 13:59:39 -05:00
George Hotz
f18801c7db simple pool. swimming is very easy now 2020-12-29 13:48:50 -05:00
George Hotz
8f9232d59b readmee 2020-12-29 13:40:34 -05:00
George Hotz
837aaacfbf Unpad2D on GPU: 2020-12-29 13:16:14 -05:00
George Hotz
02655c07d5 break maxpool2d on GPU 2020-12-29 13:05:57 -05:00
George Hotz
061e37de39 touchups 2020-12-29 12:41:21 -05:00
George Hotz
a2e6562330 fix max op, less lines 2020-12-29 10:47:04 -05:00
Marcel Bischoff
dc8fa7999c Transpose on GPU (#221)
* 2serious

* load/save

* fixing GPU

* added DEBUG

* needs BatchNorm or doesn't learn anything

* old file not needed

* added conv biases

* added extra/training.py and checkpoint

* assert in test only

* save

* padding

* num_classes

* checkpoint

* checkpoints for padding

* training was broken

* merge

* rotation augmentation

* more aug

* needs testing

* streamline augment, augment is fast thus bicubic

* tidying up

* transformer eval

* axis=-1

* transpose

* test for permutation using torch.movedims

* another test

* line
2020-12-29 10:40:11 -05:00
George Hotz
36579f66bf max op 2020-12-28 23:54:52 -05:00
George Hotz
bcb3ceeca3 set training in functions 2020-12-28 22:45:46 -05:00
George Hotz
51bf164b72 dropout, training 2020-12-28 22:12:23 -05:00
George Hotz
7b8fee038d it works! forgot the sqrt 2020-12-28 16:23:52 -05:00
George Hotz
1faf05ef67 ahh, it's better if i don't train the embedding 2020-12-28 16:07:02 -05:00
George Hotz
c3832e1bde hmm, fix layernorm to not be batchnorm and it breaks 2020-12-28 13:06:21 -05:00