Commit Graph

4433 Commits

Author SHA1 Message Date
George Hotz
58ed46963e fix broadcastdot 2021-11-29 18:54:57 -05:00
George Hotz
dca076dbf1 remove dumb nn ops 2021-11-29 18:05:31 -05:00
George Hotz
f909ab194f gelu with broken test 2021-11-29 15:00:50 -05:00
George Hotz
c752033283 fix GPU OOM in test 2021-11-29 13:05:59 -05:00
George Hotz
99b6051467 add ff_dim to transformer 2021-11-29 12:40:52 -05:00
George Hotz
29dee59368 cat: forward only not required 2021-11-29 00:14:56 -05:00
George Hotz
3cdc77f526 add cat support 2021-11-28 23:21:49 -05:00
George Hotz
ce3d198bb7 less lines and fix default device 2021-11-27 11:18:49 -05:00
George Hotz
7ae14179d3 refactor ops 2021-11-27 11:12:23 -05:00
George Hotz
c162e748f5 fix float64 warning on training 2021-10-30 20:07:31 -07:00
George Hotz
b0f14b4af8 move datasets into datasets 2021-10-30 19:55:50 -07:00
George Hotz
7472a7ebe2 not forcing 3.9 for a stupid type 2021-10-30 16:52:40 -07:00
George Hotz
fc6597a6d9 only resnet18, it's too slow otherwise 2021-10-30 16:48:39 -07:00
Evan Mays
285621aeda Cherry backprop for conv2d (#281)
* quick math: 0 + x = x.

* gradient w.r.t. x using cherry for conv

* gradient w.r.t. w for conv on cherry but doing vector dot products

* small optimization

* [cherry] optimize conv backpass for large channel count

* get rid of numpy einsum
2021-10-30 16:12:19 -07:00
Sebastian Kreft
8113eec4cf feat: add efficientnet test (#285)
Simple test using the Chicken example from https://upload.wikimedia.org/wikipedia/commons/4/41/Chicken.jpg and the image preprocessing from example/efficientnet.py

Note that EfficientNet loads the weights from the internet so running the tests may be slow the first time. We could speed up the tests by caching the /tmp folder.

Fixes #234
2021-10-30 15:53:51 -07:00
Guglielmo Camporese
2b7589db64 Added ResNet-{18, 34, 50, 101, 152} (#271)
* added resnets

* fix minor

* fix minor

* resnet in models

* added resnet test

* added resnet train test

* added linear, conv2d nn tests

* fix minor in extra/training

* resnet in models

* fix minor

* fix tolerance for linear in nn test

* fix eval, this causes cpu and gpu UT failing

* revert transformer test

* fix minor for CPU test

* improved model get_params for sequential layer

* fix minor for params counting

* commented broken ops tests

* improved train for resnet
2021-06-21 09:37:24 -07:00
George Hotz
89798d2f43 some flags 2021-06-19 11:46:31 -07:00
George Hotz
d3f169b267 move good models to models, add a training step test 2021-06-19 11:24:15 -07:00
Jacky Lee
3a91d5434f Add dropout test (#265)
* Add dropout test

* Remove condition where training is false

* Skip dropout test when on GPU

* Revert changes to tensor.py and fix test case

* Revert change on whitespace

* Convert Tensor to cpu for testing

* Fix whitespace in tensor.py
2021-06-19 08:49:13 -07:00
George Hotz
2affd226b3 speed up sum 2021-06-17 16:38:34 -07:00
George Hotz
c1d469d440 sum op 2021-06-17 16:19:35 -07:00
George Hotz
2075fdeb4f FPGA Based Accelerator for Tinygrad (#258)
* ops_risk

* risk sim

* guessing is for winners

* minor

* better

* matmal with risk

* conv doesn't work

* closer

* conv2d works

* ops_risk

* opt2 works

* opt1 may not be possible

* opt1 is a mulacc

* arty

* attosoc example building on mac

* minor

* riscv assembler

* gucci gang

* we got C code

* not a scam

* hello

* make risk mergeable into master

* unop support
2021-06-07 17:45:09 -07:00
Skosh
81bf933a91 Improved __getitem__ (#254)
* Some progress on yolov3

* Removed some debugging comments… Also, the forward pass eats all RAM for some reason

* forward pass almost runs

* forward pass runs almost

* forward pass runs, now we gotta load the weights

* loading weights works

* fetches config and weights

* everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done

* some changes

* fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly

* Something is wrong with the forward pass, Conv2d tests added

* forward pass almost outputs correct values, gotta fix one more thign

* yolo works

* some final changes

* reverting changes

* removed dataloader

* fixed some indentation

* comment out failing test, somehow it fails CI even though it passes on my computer…

* fixed wrong probabilities

* added webcam option to YOLO, now just need to add bounding boxes and speed it up

* some progress towards adding bounding boxes

* trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage

* Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image

* removed some debugging print statements

* updated result image

* something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…

* Improved __getitem__

* Updated

* Updated __getitem__

* Linebreaks

* Maybe this works?

* Added MNIST locally, tests run now
2021-05-05 22:15:22 -07:00
Skosh
78aa147b39 [WIP] YOLO working on tinygrad! (#245)
* Some progress on yolov3

* Removed some debugging comments… Also, the forward pass eats all RAM for some reason

* forward pass almost runs

* forward pass runs almost

* forward pass runs, now we gotta load the weights

* loading weights works

* fetches config and weights

* everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done

* some changes

* fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly

* Something is wrong with the forward pass, Conv2d tests added

* forward pass almost outputs correct values, gotta fix one more thign

* yolo works

* some final changes

* reverting changes

* removed dataloader

* fixed some indentation

* comment out failing test, somehow it fails CI even though it passes on my computer…

* fixed wrong probabilities

* added webcam option to YOLO, now just need to add bounding boxes and speed it up

* some progress towards adding bounding boxes

* trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage

* Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image

* removed some debugging print statements

* updated result image

* something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…
2021-04-25 18:06:52 -07:00
George Hotz
62e3a8558c fix tolerance maybe 2021-01-05 07:45:47 -08:00
George Hotz
8a38e0d207 only mish failed 2021-01-03 09:47:11 -08:00
George Hotz
1a4487965a remove negative from things w/o negative 2021-01-03 09:43:34 -08:00
George Hotz
0702e0c763 nah, no sign, it's not what you want. use relu 2021-01-03 09:30:33 -08:00
George Hotz
c2eeb6950b add support for sign. technically relu can be second class now 2021-01-03 08:29:57 -08:00
NeuralLink
0825cf7f79 Added softplus and mish non stable (#220)
*  Added softplus and mish CPU

* 🔨 refactor

* 🔨 second class softplus and mish

* 🔨 test fix

* no need of device in testing
2021-01-03 08:08:41 -08:00
Liam
ebd72ff437 Test split (#231)
* Split tests

Split tests into "Test CPU" and "Test GPU".

Add test flag "TEST_DEVICES" which is a comma separated list of devices:
CPU,GPU,ANE

* Run tests based on provided TEST_DEVICES flag

By default will run all "CPU,GPU,ANE"

* fix bad quote

* Revert changes and use GPU=1

This is done through setting the default Tensor Device to Device.CPU of
GPU=1 is set.

Run GPU tests: GPU=1 pytest -s -v
2021-01-01 09:19:03 -05:00
George Hotz
4291002881 reorder GPU ops 2020-12-31 09:46:39 -05:00
Marcel Bischoff
e2f833f58f max to behave on ties like torch (#229)
* checkpoint

* fixing pow

* undo pow

* backward max on GPU and CPU rewrite

* indentation

* changing seed for curiosity

* max replaced equality

* undo seed

* rebase

* fixed tests

* merge error
2020-12-30 18:52:50 -05:00
George Hotz
fcfe3dae01 write slice for CPU 2020-12-30 10:32:53 -05:00
George Hotz
f9170505b3 if you like your transformers twice as slow, use the GPU 2020-12-29 17:14:23 -05:00
George Hotz
6a6a82e999 support multidot on GPU 2020-12-29 16:56:30 -05:00
George Hotz
27208d729b add GPU max thanks to marcelbischoff 2020-12-29 16:44:14 -05:00
George Hotz
02655c07d5 break maxpool2d on GPU 2020-12-29 13:05:57 -05:00
George Hotz
061e37de39 touchups 2020-12-29 12:41:21 -05:00
George Hotz
a2e6562330 fix max op, less lines 2020-12-29 10:47:04 -05:00
Marcel Bischoff
dc8fa7999c Transpose on GPU (#221)
* 2serious

* load/save

* fixing GPU

* added DEBUG

* needs BatchNorm or doesn't learn anything

* old file not needed

* added conv biases

* added extra/training.py and checkpoint

* assert in test only

* save

* padding

* num_classes

* checkpoint

* checkpoints for padding

* training was broken

* merge

* rotation augmentation

* more aug

* needs testing

* streamline augment, augment is fast thus bicubic

* tidying up

* transformer eval

* axis=-1

* transpose

* test for permutation using torch.movedims

* another test

* line
2020-12-29 10:40:11 -05:00
George Hotz
36579f66bf max op 2020-12-28 23:54:52 -05:00
George Hotz
fafece9db7 avgpool2d is a second class op 2020-12-28 10:41:59 -05:00
George Hotz
593233b668 log and exp are first class ops 2020-12-28 10:00:30 -05:00
George Hotz
a361ef6861 fixup training loop 2020-12-27 18:35:56 -05:00
George Hotz
f15bec6dbc make multidot work on CPU 2020-12-27 17:25:37 -05:00
George Hotz
131e04c90c cpu only decorator 2020-12-27 17:18:55 -05:00
George Hotz
2f1b2c0a3b add transpose, start on transformer 2020-12-27 16:59:12 -05:00
iainwo
56d44637f3 fixed pylint, formatted python files iwth cblack on localhost (#204)
* fixed pylint, formatted python files iwth cblack on localhost

* Revert "fixed pylint, formatted python files iwth cblack on localhost"

This reverts commit 07e2b88466.

* dedented 4-spaces added linter

Co-authored-by: Iain Wong <iainwong@outlook.com>
2020-12-17 14:37:31 -08:00
Liam
bcf1518309 All devices are equal! (#196)
* Update all devices to be tested

ANE, CPU and OCL all now support all tests.

However tests are not currently passing on GPU and I cannot test on CPU.

Failing GPU test are not an issue caused by this update. Tests have not
been passing due to a missing "six" required installation.

OpenCL Tests have not been run since commit: 1a1c63a08b

devices have 3 types and are handle by a new DeviceTypes enum. (The goal
is to revert to Tensor.<type>, but this current setup allows for keyword
argument defaults: `device=DeviceType.CPU`)

All references to Tensor.GPU/CPU/ANE as been converted to the
corresponding `DeviceTypes` enum.

Refactor of the conversion code to allow for any device to any device
conversion.

* Add six dependency in requirements.txt

* Resolve failure to run tests

Move six into gpu required installs. Remove six from standard
installation.

* Remove repeated data conversion

* Refactor method names

Also reduce code with .to and .to_

* Dynamic device handlers

* Refactor DeviceTypes -> Device

* Add mem copy profiling back

* test_backward_pass_diamond_model passing

* Resolve Sum issue on GPU

* Revert batchnorm2d tests

* Update README with upadated API

* ANE testing with

* Last minute line gains
2020-12-15 23:44:08 -08:00