Commit Graph

18 Commits

Author SHA1 Message Date
George Hotz
f28df9900f multidevice works (#763)
* basic multigpu working

* better multigpu test

* upper

* touchups

* cl sync
2023-05-04 01:04:58 -07:00
Joqsan
0b9d4126d0 Add Tensor.stack() and Tensor.repeat() (...trying to make einops work with tinygrad) (#758)
* add stack() and repeat() methods

* make stack a static method
2023-05-01 09:37:46 -07:00
George Hotz
1240c12ac5 download cifar to datasets dir 2023-03-29 12:25:42 +04:00
George Hotz
1a039306d2 good changes from llama branch (#671)
* good changes from llama

* transpose behavior changed
2023-03-09 20:51:22 -08:00
George Hotz
b1ba78ac38 move applegpu disassembler 2023-03-05 11:21:12 -08:00
George Hotz
262f81d795 applegpu everywhere 2023-02-27 22:54:59 -08:00
Marcello Fuschi
6d97d62ab3 Add PyCharm's .idea to .gitignore (#597) 2023-02-24 20:14:38 -08:00
George Hotz
714bf4b108 clang backend (#572)
* start clang backend

* mostly working

* no group for reduce w clang

* it compiles

* compiles

* a11y

* minor fixups

* formatting

* add a test

* rename test
2023-02-20 18:18:18 -08:00
George Hotz
5e6265be6e metal timing, fix speed test 2023-02-17 12:31:54 -08:00
George Hotz
2844482a60 Mypy fun (#541)
* mypy fun

* things are just faster

* running fast

* mypy is fast

* compile.sh

* no gpu hack

* refactor ops_cpu and ops_torch to not subclass

* make weak buffer work

* tensor works

* fix test failing

* cpu/torch cleanups

* no or operator on dict in python 3.8

* that was junk

* fix warnings

* comment and touchup
2023-02-08 09:56:51 -06:00
George Hotz
682dc64430 works at work 2022-09-06 08:06:11 -07:00
George Hotz
121d5a17ee use tinynn for Conv2d 2021-10-30 19:40:44 -07:00
Skosh
78aa147b39 [WIP] YOLO working on tinygrad! (#245)
* Some progress on yolov3

* Removed some debugging comments… Also, the forward pass eats all RAM for some reason

* forward pass almost runs

* forward pass runs almost

* forward pass runs, now we gotta load the weights

* loading weights works

* fetches config and weights

* everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done

* some changes

* fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly

* Something is wrong with the forward pass, Conv2d tests added

* forward pass almost outputs correct values, gotta fix one more thign

* yolo works

* some final changes

* reverting changes

* removed dataloader

* fixed some indentation

* comment out failing test, somehow it fails CI even though it passes on my computer…

* fixed wrong probabilities

* added webcam option to YOLO, now just need to add bounding boxes and speed it up

* some progress towards adding bounding boxes

* trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage

* Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image

* removed some debugging print statements

* updated result image

* something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…
2021-04-25 18:06:52 -07:00
George Hotz
1dcaecacc4 Support for Apple Neural Engine (#130)
* ane query is success

* cite and build instructions

* low level access, need to disable AMFI

* coreml_ane works

* coreml fun

* more work

* compiled example

* progress

* compiler works

* model flow

* TODOs in the readme

* put some real weights in

* we are learning objc

* much progress i think

* signed model still doesn't work

* working example

* there are float16

* clean up: part 1

* h11ane header, more cleanup

* cleanup DeviceController creation

* remove the stupid sleep

* notes

* start a hwx parser

* no tabs

* compare stuff

* hmm, why don't inputs work

* cache doesn't seem to fix it

* hmm, the issue was the compiler

* fix the compiler, guess i didn't put in weights

* logging for compiler

* uselessness in plist

* remove hwx before compile, weights are converted to float16

* better compare

* better compare

* last line in comparE

* opcodes from compiler

* notes
2020-12-03 10:32:26 -08:00
George Hotz
94d44c97bf add pad2d on GPU 2020-11-07 10:46:36 -08:00
Rene Delgado
cd54697fd8 fix gpu sum forward (#61)
* ignore venv

* add sum test

* fix sum forward
2020-11-05 21:59:16 -08:00
Göktuğ Karakaşlı
cc9bd45b44 add setup.py and change imports to relative 2020-10-26 18:19:50 +03:00
George Hotz
1bb2583500 start tinygrad 2020-10-17 22:57:01 -07:00