Commit Graph

527 Commits

Author SHA1 Message Date
Dinesh Kumar Gnanasekaran
2146860307 fixed OpenCL installation while running tests (#262)
Co-authored-by: dinesh <dinesh-GDK>
2021-06-12 11:14:21 -07:00
George Hotz
a89d12d735 wow, way faster 2021-06-10 17:11:39 -07:00
George Hotz
10b1306525 binops 2021-06-10 16:52:37 -07:00
George Hotz
4535d39baa comments and pow 2021-06-10 09:03:40 -07:00
George Hotz
2075fdeb4f FPGA Based Accelerator for Tinygrad (#258)
* ops_risk

* risk sim

* guessing is for winners

* minor

* better

* matmal with risk

* conv doesn't work

* closer

* conv2d works

* ops_risk

* opt2 works

* opt1 may not be possible

* opt1 is a mulacc

* arty

* attosoc example building on mac

* minor

* riscv assembler

* gucci gang

* we got C code

* not a scam

* hello

* make risk mergeable into master

* unop support
2021-06-07 17:45:09 -07:00
George Hotz
77ba198b57 Revert "Update README.md (#259)" (#260)
This reverts commit 5a69c5db6d.
2021-06-04 14:41:41 -07:00
Gabriel Rojas
5a69c5db6d Update README.md (#259) 2021-06-04 14:41:07 -07:00
Josh Smith
ad756f6112 minor optimizations & cleaning (#257)
* use isinstance, some optimizations & whitespace removal

* revert whitespace changes

* revert more whitespace

* some more cleanup

* revert fstring (not a fan of the {{}})

* fix typo

* fix typo
2021-06-02 09:57:15 -07:00
George Hotz
74e874cc0d comment 2021-05-26 18:06:55 -07:00
George Hotz
343c5f13c7 add output shape to DEBUG 2021-05-26 17:42:38 -07:00
George Hotz
b80cacb416 fix GPU efficientnet example 2021-05-26 17:29:35 -07:00
George Hotz
1ae0e88627 nvidia notes 2021-05-26 14:27:00 -07:00
20kdc
2653d33292 vgg7 (image upscaling) implementation - not the best, but it works (#255)
* vgg7 implementation - not the best, but it works

* VGG7 implementation: Spread nansbane to deter NaNs, maybe improved training experience

* VGG7 implementation: Fix training, for real this time

Results actually attempt to approximate the input

* VGG7 implementation: Sample probability management
2021-05-12 23:48:51 -07:00
Skosh
81bf933a91 Improved __getitem__ (#254)
* Some progress on yolov3

* Removed some debugging comments… Also, the forward pass eats all RAM for some reason

* forward pass almost runs

* forward pass runs almost

* forward pass runs, now we gotta load the weights

* loading weights works

* fetches config and weights

* everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done

* some changes

* fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly

* Something is wrong with the forward pass, Conv2d tests added

* forward pass almost outputs correct values, gotta fix one more thign

* yolo works

* some final changes

* reverting changes

* removed dataloader

* fixed some indentation

* comment out failing test, somehow it fails CI even though it passes on my computer…

* fixed wrong probabilities

* added webcam option to YOLO, now just need to add bounding boxes and speed it up

* some progress towards adding bounding boxes

* trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage

* Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image

* removed some debugging print statements

* updated result image

* something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…

* Improved __getitem__

* Updated

* Updated __getitem__

* Linebreaks

* Maybe this works?

* Added MNIST locally, tests run now
2021-05-05 22:15:22 -07:00
Skosh
78aa147b39 [WIP] YOLO working on tinygrad! (#245)
* Some progress on yolov3

* Removed some debugging comments… Also, the forward pass eats all RAM for some reason

* forward pass almost runs

* forward pass runs almost

* forward pass runs, now we gotta load the weights

* loading weights works

* fetches config and weights

* everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done

* some changes

* fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly

* Something is wrong with the forward pass, Conv2d tests added

* forward pass almost outputs correct values, gotta fix one more thign

* yolo works

* some final changes

* reverting changes

* removed dataloader

* fixed some indentation

* comment out failing test, somehow it fails CI even though it passes on my computer…

* fixed wrong probabilities

* added webcam option to YOLO, now just need to add bounding boxes and speed it up

* some progress towards adding bounding boxes

* trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage

* Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image

* removed some debugging print statements

* updated result image

* something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…
2021-04-25 18:06:52 -07:00
ziofil
155ec1f18e saving 50 LOC with automatic @staticmethod for forward and backward (#252)
* automatic @staticmethod for forward and backward

* triggering unit tests
2021-04-25 18:04:16 -07:00
freedom" Koan-Sin Tan
f0cc2b66f8 add an aneccompile example in Objective-C (#240)
* add an aneccompile example in Objective-C

add a compile.m corresponding to compile.mm

build with
```clang compile.m -F /System/Library/PrivateFrameworks/ -framework ANECompiler -framework Foundation```

CoreFoundation framework is a C library.
Foundation is an Objective-C framework.

CF data structures in CoreFoundation usually have corresponding NS data structures in Foundation, e.g.,
NSDictionary is "toll-free bridged" with its Core Foundation counterpart, CFDictionary.
See [1].

[1] https://developer.apple.com/library/archive/documentation/General/Conceptual/CocoaEncyclopedia/Toll-FreeBridgin/Toll-FreeBridgin.html

* figure out how to use param_3 of ANECCompile

add a simple param_3 blocks callback, which dumps the status
dictionary when status != 0
2021-01-31 08:31:16 -08:00
Göktuğ Karakaşlı
eabe0b9017 remove deepwalk args (#243) 2021-01-31 08:30:17 -08:00
George Hotz
ce77dda805 yolov5 v4 2021-01-05 07:56:17 -08:00
George Hotz
62e3a8558c fix tolerance maybe 2021-01-05 07:45:47 -08:00
Asim
1c148f2fe4 fixed example broken after gpu refactor (#238) 2021-01-05 07:41:54 -08:00
George Hotz
8a38e0d207 only mish failed 2021-01-03 09:47:11 -08:00
George Hotz
a337f7780e smarter way to write sign 2021-01-03 09:46:00 -08:00
George Hotz
1a4487965a remove negative from things w/o negative 2021-01-03 09:43:34 -08:00
George Hotz
0531b848eb second class sign 2021-01-03 09:33:12 -08:00
George Hotz
0702e0c763 nah, no sign, it's not what you want. use relu 2021-01-03 09:30:33 -08:00
George Hotz
29655609d5 fix GPU sign...these tests aren't very good 2021-01-03 09:00:49 -08:00
George Hotz
ea9c9af5d7 faster sign 2021-01-03 08:54:21 -08:00
George Hotz
c2eeb6950b add support for sign. technically relu can be second class now 2021-01-03 08:29:57 -08:00
George Hotz
6842ad9ec8 minor cleanups, yolo work 2021-01-03 08:14:16 -08:00
NeuralLink
0825cf7f79 Added softplus and mish non stable (#220)
*  Added softplus and mish CPU

* 🔨 refactor

* 🔨 second class softplus and mish

* 🔨 test fix

* no need of device in testing
2021-01-03 08:08:41 -08:00
George Hotz
ac229ea750 remove print 2021-01-02 12:53:30 -08:00
George Hotz
895d142503 start trying to load yolo v5 2021-01-02 12:51:55 -08:00
NeuralLink
ece07a3d12 🔨 refactor register ops (#233)
* 🔨 refactor register ops

* 🔨 reorder and register for ANE

* 🔨 refactor

* 🔨 conflicts

* 🔨 minor fix

* ane fix

* extra reshape weird
2021-01-02 07:47:16 -08:00
Marcel Bischoff
42b4761025 transformer >99.98% test accuracy in ~30s (#230)
* transformer

* BS might divide len(Y_test)

* outoput when accuracy is high

* more readeable

* fixed loss in serious_mnist for new API
2021-01-02 07:45:09 -08:00
Liam
ebd72ff437 Test split (#231)
* Split tests

Split tests into "Test CPU" and "Test GPU".

Add test flag "TEST_DEVICES" which is a comma separated list of devices:
CPU,GPU,ANE

* Run tests based on provided TEST_DEVICES flag

By default will run all "CPU,GPU,ANE"

* fix bad quote

* Revert changes and use GPU=1

This is done through setting the default Tensor Device to Device.CPU of
GPU=1 is set.

Run GPU tests: GPU=1 pytest -s -v
2021-01-01 09:19:03 -05:00
George Hotz
4a7cf2e420 more reordering 2020-12-31 09:58:02 -05:00
George Hotz
92abe43683 reduce before binary because of unbroadcasting 2020-12-31 09:49:52 -05:00
George Hotz
4291002881 reorder GPU ops 2020-12-31 09:46:39 -05:00
George Hotz
de7fe085de no read out of bounds 2020-12-31 09:41:36 -05:00
George Hotz
1fb5fcafce GPU slice should fix tests 2020-12-31 09:37:03 -05:00
Liam
e972a45456 Dynamically register ops to Tensor (#232)
* Dynamically register ops to Tensor

This saves lines. And reduces redundant repetition.

* ffs spacing

you don't pay me enough!
2020-12-31 09:10:19 -05:00
Marcel Bischoff
e2f833f58f max to behave on ties like torch (#229)
* checkpoint

* fixing pow

* undo pow

* backward max on GPU and CPU rewrite

* indentation

* changing seed for curiosity

* max replaced equality

* undo seed

* rebase

* fixed tests

* merge error
2020-12-30 18:52:50 -05:00
George Hotz
30f8132646 reorder ops in ops cpu 2020-12-30 11:00:01 -05:00
George Hotz
e5b2803b5d ops in readme 2020-12-30 10:48:55 -05:00
George Hotz
2d44bf7f1a Dot -> Matmul 2020-12-30 10:41:51 -05:00
George Hotz
10fc3ff5b9 cleaner syntax 2020-12-30 10:35:37 -05:00
George Hotz
fcfe3dae01 write slice for CPU 2020-12-30 10:32:53 -05:00
George Hotz
47504004fd ane ops 2020-12-29 18:00:53 -05:00
George Hotz
1f5c9618ef refactor in readme and issue #225 2020-12-29 17:30:04 -05:00