Commit Graph

403 Commits

Author SHA1 Message Date
George Hotz
bc5df477de readme and .ane() 2020-12-12 16:15:38 -08:00
George Hotz
da873cd556 Single ReLU in ANE (#188)
* aneworks

* cleanup
2020-12-12 16:11:34 -08:00
George Hotz
07ece2105e actually move it 2020-12-12 15:26:58 -08:00
George Hotz
1d10559d1d tinygrad.utils -> extra.utils 2020-12-12 15:26:07 -08:00
George Hotz
59358304a3 ane 2020-12-12 15:23:21 -08:00
George Hotz
36d4eee323 fix compiler segfault 2020-12-12 15:10:47 -08:00
George Hotz
abb7b74208 relu in python 2020-12-12 14:50:05 -08:00
George Hotz
d3886035dd ane dylib 2020-12-12 13:41:09 -08:00
George Hotz
cf66d549c1 fix example ane 2020-12-12 13:32:49 -08:00
George Hotz
566045cefc uint8 nope 2020-12-12 13:14:06 -08:00
pb1729
8c25431619 Faster but still general binop broadcasting (#159)
* allow for general broadcasting of binary operations. can handle any situation where corresponding dimensions between the tensors match, or at least one of them is of size 1. if a tensor has fewer dimensions than the other, then its size is padded with 1s until they match have the same number. also refactored buffer_zeros() by creating a function buff() that makes a buffer from a numpy array

* remove extra tabs

* messy loop unrolling

* fix loop unrolling bugs

* revert loop unrolling changes, new plan here

* binary_op(): avoid having a loop in the GPU C code, instead compute indices with nested expressions. simple broadcasts should have a similar level of performance to the simple-broadcast-specific code that was there before. broke out codegen and compilation into get_binop_prg(), which has a larger cache and depends only on the operation type and complist (this avoids doing a bunch of python string ops every time we want to compile something we've already compiled). the larger cache is needed since there will end up being quite a few possible types of broacasts (sum_i^N 3**i is a loose upper bound, N being the maximum number of dimensions). I assumed 5 kinds of binary operations when sizing the cache here, +, -, *, /, and **. More may be needed in the future.

* add .cl to binop arguments

* solved edge case where len(dimlist)==0. still problems when len(dimlist) > CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS

* pyopencl can't handle more than 3 gids, so we just use 1 gid and compute the indices into the returned tensor in the kernel. this means more computation for the individual indices, but less for the index into the flattened tensor (last line of kernel), since it's just gid0

* trim some lines

Co-authored-by: phillip <phillip_bement@reedbement.com>
2020-12-12 12:19:46 -08:00
Liam
bf9ba8718a Profile GPU and CPU copying. (#182)
Moving memory is slow, and therefor monitoring the time spent converting
and limiting the number of copy operations can improve performance.
2020-12-12 12:15:47 -08:00
James Roberts
8e8cbc74b3 Minor clean up (#184)
* Removes unused imports

* Minor clean up
2020-12-11 14:25:29 -08:00
Skosh
f4faf401bc require_init_gpu() function selects GPU as device and falls back to CPU if none are available (#180)
* require_init_gpu() function selects GPU as device and falls back to CPU if none are available

* Small fix for CPU specific code

* Should work...
2020-12-11 09:21:59 -08:00
Daulet
c7e95ddb21 Add diamond model test (#181)
* add backward pass test for diamond model

* fix train_efficientnet example
2020-12-11 09:21:36 -08:00
Marcel Bischoff
38b29f49dd abs (#172) 2020-12-10 09:24:35 -08:00
Liam
e79cda6dad Add pyopencl to dependency installs (#174)
* Add pyopencl to dependency installs

OpenCL was not actually being tested as pyopencl was not installed.

* Reduce installation to 1 liner
2020-12-10 09:24:08 -08:00
NeuralLink
8ab8a71d5d refactor (#178) 2020-12-10 09:23:36 -08:00
Marcel Bischoff
d204f09316 some progress on batchnorms (draft) (#147)
* no of categories for efficientnet

* need layer_init_uniforn

* merge fail

* merge fail

* batchnorms

* needs work

* needs work how determine training

* pow

* needs work

* reshape was needed

* sum with axis

* sum with axis and tests

* broken

* works again

* clean up

* Update test_ops.py

* using sum

* don't always update running_stats

* space

* self

* default return running_stats

* passes test

* need to use mean

* merge

* testing

* fixing pow

* test_ops had a line dropped

* undo pow

* rebase
2020-12-09 22:14:27 -08:00
Marcel Bischoff
5d46df638a abs as non-first class operation using relu (#171)
* abs (non-first class)

* whitespace
2020-12-09 12:20:34 -08:00
George Hotz
4c55c7208f no pow if mul will do 2020-12-09 08:19:29 -08:00
George Hotz
b85f17f247 more optim cleanup 2020-12-09 08:18:10 -08:00
George Hotz
9a64d13b94 add conv biases and max pool 2020-12-09 08:01:20 -08:00
George Hotz
99fa65f057 enable batchnorm in serious mnist 2020-12-09 03:29:40 -08:00
George Hotz
ffb96b2d0b batchnorm by marcelbischoff 2020-12-09 03:23:04 -08:00
NeuralLink
00e376f36c leaky relu as geohot suggested (#167) 2020-12-09 02:58:35 -08:00
George Hotz
c225e62dd2 touchups 2020-12-09 02:52:28 -08:00
Liam
89d0ff6989 Consistent testing (#137)
* Consistent GPU classes

Convert the existing GPU classes into one standard format.

Remove duplicated functions in `test_mnist` and create a TestMNISTGPU
class. This reduces line count and ensures consistency.

Use `@unittest.skipUnless(GPU, "Requires GPU")` instead of `if GPU:` to
skip GPU testing. This will ensure that skipped tests are displayed
accordingly in the pytest output.

* Optim Testing now supports GPU

* Tensor testing now supports GPU

jacobian and gradcheck auto skipped until GPU float64 support added.

* GPU support for custom constructor methods

* Remove GPU flag from Model constructors

It was requested that the `gpu` kwarg be removed from the model
constructor. GPU conversion is now handled in the train function.

This also required the conversion of Optimizer parameters as they are
constructed prior to execution of the `train` function and are dependant
on the model GPU state.

* Fix typo: float32->float64

* Clean `get_parameters` utility

Just a quick refactor w/ the new support for optimizers.

* Remove GPU kwarg from TinyNet

Remove `gpu` kwarg from tiny net to match test_mnist `train` function.
2020-12-09 02:25:27 -08:00
Liam
34b38dd4d0 Extra install requirements. (#164)
* Testing install requirements

* GPU install requirements
2020-12-09 02:22:47 -08:00
George Hotz
0e02f394ee serious_mnist 2020-12-08 21:43:05 -08:00
Daulet
24d688c184 win more lines for core library (#158)
...and sacrifice test speed
2020-12-08 14:18:45 -08:00
NeuralLink
9f77fd6135 🔨 refactor optim (#156)
* 🔨 refactor optim

* 🔨 refactor optim

* 🔨 more clean up
2020-12-08 14:16:31 -08:00
George Hotz
4e1a0de392 fix rsub 2020-12-08 10:05:21 -08:00
George Hotz
c4540f1b8c Support scalars by kartik4949 2020-12-08 09:52:07 -08:00
George Hotz
97fd9c1237 zero_grad there to match readme 2020-12-07 23:12:18 -08:00
George Hotz
c63f950348 need zero grad now 2020-12-07 23:10:43 -08:00
George Hotz
b355cd2571 Mean axis (doesn't work) (#154)
* mean axis

* fixed
2020-12-07 22:58:34 -08:00
George Hotz
38f97c8c80 prepare for ops_ane 2020-12-07 21:54:22 -08:00
George Hotz
7f249ec76d touch up 2020-12-07 21:51:32 -08:00
Marcel Bischoff
58ccebd7cd Sum with axis (#153)
* sum with axis and tests

* broken

* works again

* clean up

* Update test_ops.py
2020-12-07 21:49:18 -08:00
George Hotz
ac9fecb05d lots of notes 2020-12-07 21:40:31 -08:00
George Hotz
8d1500f497 conv neuron 2020-12-07 21:12:52 -08:00
George Hotz
e4bb53b0e9 work out more 2020-12-07 20:32:50 -08:00
George Hotz
4927ad1897 float16 weights in min.weights 2020-12-07 20:15:15 -08:00
George Hotz
3aac9aefce fix GPU profiling 2020-12-07 20:03:28 -08:00
James Roberts
b2eca6d45f Format debug output (#152) 2020-12-07 14:07:14 -08:00
George Hotz
c7973cb0a1 ugh buffer_np is bad 2020-12-07 08:07:00 -08:00
George Hotz
088f280dc3 touchups 2020-12-07 07:50:27 -08:00
George Hotz
0cf21881b7 hwx parse w/o macho mods 2020-12-06 23:13:28 -08:00
Josh Smith
aa4161f63e use classmethods for Tensor helper funcs (#146) 2020-12-06 22:35:43 -08:00