Commit Graph

10417 Commits

Author SHA1 Message Date
George Hotz
1290e01e2c all ops supported on GPU now 2020-12-03 10:43:11 -08:00
George Hotz
621a93b777 ane in readme 2020-12-03 10:40:31 -08:00
George Hotz
1dcaecacc4 Support for Apple Neural Engine (#130)
* ane query is success

* cite and build instructions

* low level access, need to disable AMFI

* coreml_ane works

* coreml fun

* more work

* compiled example

* progress

* compiler works

* model flow

* TODOs in the readme

* put some real weights in

* we are learning objc

* much progress i think

* signed model still doesn't work

* working example

* there are float16

* clean up: part 1

* h11ane header, more cleanup

* cleanup DeviceController creation

* remove the stupid sleep

* notes

* start a hwx parser

* no tabs

* compare stuff

* hmm, why don't inputs work

* cache doesn't seem to fix it

* hmm, the issue was the compiler

* fix the compiler, guess i didn't put in weights

* logging for compiler

* uselessness in plist

* remove hwx before compile, weights are converted to float16

* better compare

* better compare

* last line in comparE

* opcodes from compiler

* notes
2020-12-03 10:32:26 -08:00
baplou
c83cebccda Made the readme more consistent (#136) 2020-11-28 08:20:02 -06:00
Marcel Bischoff
541330c42a Update README.md (#133)
should we put `ipython3` otherwise the path doesn't work or we have to add the env, not sure what is nicer
2020-11-25 07:53:54 -08:00
Mufeed VH
0bbf66627c Define ProfileOp class once (#131)
* define `ProfileOp` class once

* clean `ProfileOp` class

* removed `else: pass`
2020-11-24 19:39:13 -08:00
George Hotz
03994e0011 load torch files without torch 2020-11-21 13:43:53 -08:00
Marcel Bischoff
26899869a2 Update tensor.py (#128)
Otherwise `.cpu()` is broken if default is GPU
2020-11-21 09:16:03 -08:00
adamritter
f190ca446d Detach (#123)
* Detach

* Torch.detach reuses the buffer in the

* Fix test

* wakey wakey GitHub Actions

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-19 19:03:42 -08:00
Colin Manko
8383ff40ad fix pyopencl (#125) 2020-11-19 19:03:04 -08:00
adamritter
5797e63d9b Train efficientnet should respect NUM environment variable (#122)
Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-16 20:02:31 -08:00
dustcollector12
ee99d016e9 tensor implementation for rmsprop and adam (#121)
* tensor implementation for rmsprop and adam

* test_mnist.py extended to cover sgd, rmsprop and adam on cpu and gpu

* number of steps reduced for adam from 1000 to 200
2020-11-16 15:07:49 -08:00
George Hotz
17bf90dbe4 unbroadcasting works on the GPU 2020-11-16 09:16:55 -08:00
George Hotz
17eab716b6 unbroadcast GPU template 2020-11-16 08:16:36 -08:00
George Hotz
2ffb8de1ea move efficientnet to extra 2020-11-16 08:08:07 -08:00
George Hotz
13d34373d1 move gradcheck to extra, clean up unbroadcast 2020-11-16 08:03:31 -08:00
George Hotz
ed4c35e2e9 channels on the inside 2020-11-15 21:19:59 -08:00
adamritter
fb1df81c7d Fix train_efficientnet (#120)
Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-15 20:50:31 -08:00
George Hotz
1207fe4c7d cleanup LogSoftmax 2020-11-15 20:49:57 -08:00
George Hotz
d1441de3a6 minor cleanups 2020-11-15 20:39:19 -08:00
George Hotz
37a210f868 touchups and lines 2020-11-15 20:26:52 -08:00
adamritter
5ea3d76dfb Topological sort, zero_grads (#119)
* Topological sort, zero_grads

* Bug fix, add test

* Add zero_grads

* Put deepwalk function in backward

* Move zero_grad to optim

* Fix gradcheck hack

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-15 20:25:29 -08:00
George Hotz
a35425189d binop fast path for no broadcast 2020-11-15 19:12:14 -08:00
Marcel Bischoff
c7b7f8ccc8 Backwards ops supporting broadcasting (#118)
* streamlined numerical_jacobian

* Got rid of the g loop in Conv2D.forward

* ereased stupid line

* nothing

* no loops in Conv2D forward

* Conv2D backprop improved

* stupid things in examples

* alternative to einsum

* Conv2D backward einsum alternative

* tidying up

* tidied up

* no ravel

* got rid of print

* Update efficientnet.py

* Update efficientnet.py

* Update efficientnet.py

* only tensordot

* 255.0

* whitespace

* aspect ratio error in efficientnet

* noprint

* efficient net wrong strides

* broadcasting for backward ops

* Update ops.py

* Update ops.py

- was wrong

* broadcast test for backward enabled

* function adBC + not summing over already 1 axis

* spacing

Co-authored-by: Marcel Bischoff <marcel@Marcels-iMac.local>
2020-11-15 15:21:10 -08:00
adamritter
55d93017e4 Simplify more (#117)
Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-14 06:15:31 -08:00
dustcollector12
28474949b8 refactoring of forward in reshape (#115)
* refactoring of forward in reshape

* test case for reshape added
2020-11-13 13:20:43 -08:00
dustcollector12
6f033ea30a enable local images for efficientnet.py (#116) 2020-11-13 07:00:12 -08:00
pb1729
420af82888 General broadcasting of binary operations (#114)
* allow for general broadcasting of binary operations. can handle any situation where corresponding dimensions between the tensors match, or at least one of them is of size 1. if a tensor has fewer dimensions than the other, then its size is padded with 1s until they match have the same number. also refactored buffer_zeros() by creating a function buff() that makes a buffer from a numpy array

* remove extra tabs

Co-authored-by: phillip <phillip_bement@reedbement.com>
2020-11-12 22:27:48 -08:00
damianzim
2b1286eef6 Don't wrap np.int32 in a function, use an alias (#113) 2020-11-12 19:32:19 -08:00
adamritter
08aa60d9d0 broadcasting 1s at the start, 1 kernel/4 divs version (#110)
* Pad2d backward pass on GPU

* Faster Pad2D GPU backward pass (no zeroing needed)

* Fix out of bounds error

* Don't save prg

* Let compiler optimize division by 1

* More generic broadcasting (1s at the start)

* Bug fix

* Add comment

* Try to fix flaky test with other method

* Add mixed broadcast support

* 1kernel

* Separate broadcast tests

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-12 13:33:35 -08:00
NeuralLink
f773ef3996 tanh non first class op (#111)
*  tanh non first class op

* tanh test with 1e-6 tol

Co-authored-by: Kartik Sharma <kartik.sharma@claimgenius.com>
2020-11-12 13:32:50 -08:00
Ryan Neph
608bdd4872 adds broadcasting test cases (#106)
refs: #80, #90, #104, #105
2020-11-12 07:08:28 -08:00
adamritter
f1d21afe88 Somewhat more generic broadcasting (#105)
* Somewhat more generic broadcasting

* Add TODO

* Set Torch to deterministic in test

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-11 20:33:00 -08:00
Ryan Neph
8827a536e0 GPU MaxPool2D.backward(); TinyConvNet train passes (#103)
* no trailing whitespace

* GPU MaxPool2D.backward(); TinyConvNet train passes!

* Fix GPU avgpool.forward() init_val

Doesn’t change result but is simpler.

* Fix MaxPool GPU init_val

Tests only cover random non-negative inputs. This fixes issues if negative inputs are fed to GPU MaxPool2D. Test update to follow.
2020-11-11 07:58:43 -08:00
Marcel Bischoff
a3989f9e18 Supporting .png files in efficientnet (#102)
* to make it work locally

* definitely not working

* Conv2D GPU passes some of the tests

* Conv2D GPU passes more of the tests

* passes some tests and mnist

* removed unecessary code

* Conv2D Backpass works

* wrong test_ops.py

* white space + test backward

* ereased useless code

* removed default argument

* long lines

* works also with 4 channel .png files

* commenting out

* track
2020-11-10 20:06:24 -08:00
George Hotz
d93cd945aa reshape makes copies 2020-11-10 16:18:59 -08:00
George Hotz
d1284fa817 stride tests and i32 2020-11-10 16:10:14 -08:00
Marcel Bischoff
7bb803c5e0 Conv2D backward on GPU (#93)
* to make it work locally

* definitely not working

* Conv2D GPU passes some of the tests

* Conv2D GPU passes more of the tests

* passes some tests and mnist

* removed unecessary code

* Conv2D Backpass works

* wrong test_ops.py

* white space + test backward

* ereased useless code

* removed default argument

* long lines
2020-11-10 16:07:33 -08:00
George Hotz
5577b9d3a0 clean up imports 2020-11-10 15:53:05 -08:00
George Hotz
db755fa103 promote swish to a tensor ops 2020-11-10 15:48:11 -08:00
George Hotz
5f4b76a21b touch ups 2020-11-10 15:44:47 -08:00
George Hotz
52ee913c98 move the mnist loader out of tinygrad proper 2020-11-10 15:37:39 -08:00
George Hotz
498b4d2f27 i32 and reduce line count a bit 2020-11-10 15:35:30 -08:00
George Hotz
df64658a2c weee, opencl tests in CI 2020-11-10 10:04:45 -08:00
George Hotz
d47a128812 pocl 2020-11-10 10:02:13 -08:00
George Hotz
c05401a9ca sudo maybe 2020-11-10 09:53:49 -08:00
George Hotz
09bc8eddfe clinfo 2020-11-10 09:51:38 -08:00
George Hotz
58e703d099 fix tests 2020-11-10 09:49:19 -08:00
George Hotz
23405cec43 intel opencl 2020-11-10 09:41:40 -08:00
George Hotz
33090c4b0d install more 2020-11-10 09:34:56 -08:00