Commit Graph

469 Commits

Author SHA1 Message Date
George Hotz
f9170505b3 if you like your transformers twice as slow, use the GPU 2020-12-29 17:14:23 -05:00
George Hotz
6a6a82e999 support multidot on GPU 2020-12-29 16:56:30 -05:00
George Hotz
27208d729b add GPU max thanks to marcelbischoff 2020-12-29 16:44:14 -05:00
George Hotz
02655c07d5 break maxpool2d on GPU 2020-12-29 13:05:57 -05:00
George Hotz
061e37de39 touchups 2020-12-29 12:41:21 -05:00
George Hotz
a2e6562330 fix max op, less lines 2020-12-29 10:47:04 -05:00
Marcel Bischoff
dc8fa7999c Transpose on GPU (#221)
* 2serious

* load/save

* fixing GPU

* added DEBUG

* needs BatchNorm or doesn't learn anything

* old file not needed

* added conv biases

* added extra/training.py and checkpoint

* assert in test only

* save

* padding

* num_classes

* checkpoint

* checkpoints for padding

* training was broken

* merge

* rotation augmentation

* more aug

* needs testing

* streamline augment, augment is fast thus bicubic

* tidying up

* transformer eval

* axis=-1

* transpose

* test for permutation using torch.movedims

* another test

* line
2020-12-29 10:40:11 -05:00
George Hotz
36579f66bf max op 2020-12-28 23:54:52 -05:00
George Hotz
fafece9db7 avgpool2d is a second class op 2020-12-28 10:41:59 -05:00
George Hotz
593233b668 log and exp are first class ops 2020-12-28 10:00:30 -05:00
George Hotz
f15bec6dbc make multidot work on CPU 2020-12-27 17:25:37 -05:00
George Hotz
131e04c90c cpu only decorator 2020-12-27 17:18:55 -05:00
George Hotz
2f1b2c0a3b add transpose, start on transformer 2020-12-27 16:59:12 -05:00
iainwo
56d44637f3 fixed pylint, formatted python files iwth cblack on localhost (#204)
* fixed pylint, formatted python files iwth cblack on localhost

* Revert "fixed pylint, formatted python files iwth cblack on localhost"

This reverts commit 07e2b88466.

* dedented 4-spaces added linter

Co-authored-by: Iain Wong <iainwong@outlook.com>
2020-12-17 14:37:31 -08:00
Liam
bcf1518309 All devices are equal! (#196)
* Update all devices to be tested

ANE, CPU and OCL all now support all tests.

However tests are not currently passing on GPU and I cannot test on CPU.

Failing GPU test are not an issue caused by this update. Tests have not
been passing due to a missing "six" required installation.

OpenCL Tests have not been run since commit: 1a1c63a08b

devices have 3 types and are handle by a new DeviceTypes enum. (The goal
is to revert to Tensor.<type>, but this current setup allows for keyword
argument defaults: `device=DeviceType.CPU`)

All references to Tensor.GPU/CPU/ANE as been converted to the
corresponding `DeviceTypes` enum.

Refactor of the conversion code to allow for any device to any device
conversion.

* Add six dependency in requirements.txt

* Resolve failure to run tests

Move six into gpu required installs. Remove six from standard
installation.

* Remove repeated data conversion

* Refactor method names

Also reduce code with .to and .to_

* Dynamic device handlers

* Refactor DeviceTypes -> Device

* Add mem copy profiling back

* test_backward_pass_diamond_model passing

* Resolve Sum issue on GPU

* Revert batchnorm2d tests

* Update README with upadated API

* ANE testing with

* Last minute line gains
2020-12-15 23:44:08 -08:00
Marcel Bischoff
5d46df638a abs as non-first class operation using relu (#171)
* abs (non-first class)

* whitespace
2020-12-09 12:20:34 -08:00
NeuralLink
00e376f36c leaky relu as geohot suggested (#167) 2020-12-09 02:58:35 -08:00
Liam
89d0ff6989 Consistent testing (#137)
* Consistent GPU classes

Convert the existing GPU classes into one standard format.

Remove duplicated functions in `test_mnist` and create a TestMNISTGPU
class. This reduces line count and ensures consistency.

Use `@unittest.skipUnless(GPU, "Requires GPU")` instead of `if GPU:` to
skip GPU testing. This will ensure that skipped tests are displayed
accordingly in the pytest output.

* Optim Testing now supports GPU

* Tensor testing now supports GPU

jacobian and gradcheck auto skipped until GPU float64 support added.

* GPU support for custom constructor methods

* Remove GPU flag from Model constructors

It was requested that the `gpu` kwarg be removed from the model
constructor. GPU conversion is now handled in the train function.

This also required the conversion of Optimizer parameters as they are
constructed prior to execution of the `train` function and are dependant
on the model GPU state.

* Fix typo: float32->float64

* Clean `get_parameters` utility

Just a quick refactor w/ the new support for optimizers.

* Remove GPU kwarg from TinyNet

Remove `gpu` kwarg from tiny net to match test_mnist `train` function.
2020-12-09 02:25:27 -08:00
George Hotz
4e1a0de392 fix rsub 2020-12-08 10:05:21 -08:00
George Hotz
c4540f1b8c Support scalars by kartik4949 2020-12-08 09:52:07 -08:00
George Hotz
b355cd2571 Mean axis (doesn't work) (#154)
* mean axis

* fixed
2020-12-07 22:58:34 -08:00
Marcel Bischoff
58ccebd7cd Sum with axis (#153)
* sum with axis and tests

* broken

* works again

* clean up

* Update test_ops.py
2020-12-07 21:49:18 -08:00
adamritter
f190ca446d Detach (#123)
* Detach

* Torch.detach reuses the buffer in the

* Fix test

* wakey wakey GitHub Actions

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-19 19:03:42 -08:00
George Hotz
17bf90dbe4 unbroadcasting works on the GPU 2020-11-16 09:16:55 -08:00
George Hotz
17eab716b6 unbroadcast GPU template 2020-11-16 08:16:36 -08:00
adamritter
5ea3d76dfb Topological sort, zero_grads (#119)
* Topological sort, zero_grads

* Bug fix, add test

* Add zero_grads

* Put deepwalk function in backward

* Move zero_grad to optim

* Fix gradcheck hack

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-15 20:25:29 -08:00
Marcel Bischoff
c7b7f8ccc8 Backwards ops supporting broadcasting (#118)
* streamlined numerical_jacobian

* Got rid of the g loop in Conv2D.forward

* ereased stupid line

* nothing

* no loops in Conv2D forward

* Conv2D backprop improved

* stupid things in examples

* alternative to einsum

* Conv2D backward einsum alternative

* tidying up

* tidied up

* no ravel

* got rid of print

* Update efficientnet.py

* Update efficientnet.py

* Update efficientnet.py

* only tensordot

* 255.0

* whitespace

* aspect ratio error in efficientnet

* noprint

* efficient net wrong strides

* broadcasting for backward ops

* Update ops.py

* Update ops.py

- was wrong

* broadcast test for backward enabled

* function adBC + not summing over already 1 axis

* spacing

Co-authored-by: Marcel Bischoff <marcel@Marcels-iMac.local>
2020-11-15 15:21:10 -08:00
dustcollector12
28474949b8 refactoring of forward in reshape (#115)
* refactoring of forward in reshape

* test case for reshape added
2020-11-13 13:20:43 -08:00
pb1729
420af82888 General broadcasting of binary operations (#114)
* allow for general broadcasting of binary operations. can handle any situation where corresponding dimensions between the tensors match, or at least one of them is of size 1. if a tensor has fewer dimensions than the other, then its size is padded with 1s until they match have the same number. also refactored buffer_zeros() by creating a function buff() that makes a buffer from a numpy array

* remove extra tabs

Co-authored-by: phillip <phillip_bement@reedbement.com>
2020-11-12 22:27:48 -08:00
adamritter
08aa60d9d0 broadcasting 1s at the start, 1 kernel/4 divs version (#110)
* Pad2d backward pass on GPU

* Faster Pad2D GPU backward pass (no zeroing needed)

* Fix out of bounds error

* Don't save prg

* Let compiler optimize division by 1

* More generic broadcasting (1s at the start)

* Bug fix

* Add comment

* Try to fix flaky test with other method

* Add mixed broadcast support

* 1kernel

* Separate broadcast tests

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-12 13:33:35 -08:00
NeuralLink
f773ef3996 tanh non first class op (#111)
*  tanh non first class op

* tanh test with 1e-6 tol

Co-authored-by: Kartik Sharma <kartik.sharma@claimgenius.com>
2020-11-12 13:32:50 -08:00
Ryan Neph
608bdd4872 adds broadcasting test cases (#106)
refs: #80, #90, #104, #105
2020-11-12 07:08:28 -08:00
adamritter
f1d21afe88 Somewhat more generic broadcasting (#105)
* Somewhat more generic broadcasting

* Add TODO

* Set Torch to deterministic in test

Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
2020-11-11 20:33:00 -08:00
Ryan Neph
8827a536e0 GPU MaxPool2D.backward(); TinyConvNet train passes (#103)
* no trailing whitespace

* GPU MaxPool2D.backward(); TinyConvNet train passes!

* Fix GPU avgpool.forward() init_val

Doesn’t change result but is simpler.

* Fix MaxPool GPU init_val

Tests only cover random non-negative inputs. This fixes issues if negative inputs are fed to GPU MaxPool2D. Test update to follow.
2020-11-11 07:58:43 -08:00
George Hotz
d1284fa817 stride tests and i32 2020-11-10 16:10:14 -08:00
Marcel Bischoff
7bb803c5e0 Conv2D backward on GPU (#93)
* to make it work locally

* definitely not working

* Conv2D GPU passes some of the tests

* Conv2D GPU passes more of the tests

* passes some tests and mnist

* removed unecessary code

* Conv2D Backpass works

* wrong test_ops.py

* white space + test backward

* ereased useless code

* removed default argument

* long lines
2020-11-10 16:07:33 -08:00
George Hotz
866b759d3b match torch api for pad2d 2020-11-09 17:48:56 -08:00
Ryan Neph
16d564a53c finish unsupporting strided pool, add global avg pool test (#92) 2020-11-09 17:31:22 -08:00
George Hotz
870b84a893 test pad2d backward on GPU 2020-11-09 15:50:43 -08:00
George Hotz
e46d122f65 not supporting stride 2020-11-09 15:06:58 -08:00
Ryan Neph
c21c2a0b62 revert b0c0c5d: Strided Pool funcs (#74) (#87)
Strided CPU Pooling was introduced but assumes small kernel size
(<=(10,10)), but efficientnet.py feeds kernel_size=(112,112).

This causes a huge array buffer allocation in stack_for_pool() that
hangs inference for a long time or until system OOM.

Revert CPU Pooling for now, and re-introduce #74 later with a new
global-average-pooling op that can be used instead of avgpool2d with
large kernel size for efficientnet inference.

Co-authored-by: Ryan Neph <ryanneph@google.com>
2020-11-09 14:58:18 -08:00
Ryan Neph
7e515308a5 label op subtests by params (#83) 2020-11-09 06:25:06 -08:00
Ryan Neph
5bedf566d1 tests should use rtol unless special case (#82) 2020-11-08 17:25:11 -08:00
Ryan Neph
04b9312a34 Fix GPU Pooling bug at boundary + better Pooling test coverage (#81)
* fixed Pooling bug

* Clarify Pooling tests
2020-11-08 17:25:01 -08:00
Ryan Neph
b0c0c5d0d6 strided Pool funcs (#74)
* *Pool2D GPU forward supports stride

* kernel_size from ctx instead of saved_tensors

* *Pool2D CPU forward supports stride

* update ctx.stride properly
2020-11-08 11:45:55 -08:00
ziofil
db3eccc16b implemented backward for Pad2D & test (#73) 2020-11-07 21:58:42 -08:00
Ryan Neph
5265f6c578 add AvgPool2D backward pass on GPU (#68) 2020-11-07 12:27:29 -08:00
George Hotz
30442a086a some broadcasting, pool test is fail 2020-11-07 11:29:42 -08:00
George Hotz
94d44c97bf add pad2d on GPU 2020-11-07 10:46:36 -08:00
George Hotz
fbff6ab2e5 fix strided convs, GPU env var for enet 2020-11-07 10:26:37 -08:00