Commit Graph

1179 Commits

Author SHA1 Message Date
George Hotz
732884653c osx in hlb_cifar10_torch 2023-04-14 13:12:08 -07:00
George Hotz
584ee6f616 don't graph consts 2023-04-14 03:32:20 -07:00
George Hotz
9a39ebefde hlb_cifar10_torch gets 80% 2023-04-14 02:47:03 -07:00
Jacky Lee
06ed958abd Fix train_resnet example (#744)
* Fix ResNet example

* Scientific notation
2023-04-12 13:48:39 +05:30
Jacky Lee
7a45b989a1 Device: make GPU default and METAL/CUDA if possible (#732)
* Make GPU the default device

* Compile EfficientNet with CPU

* don't print device

* use METAL and CUDA if possible

* Revert some changes to workflow

* Fix import error when checking device availability

* device lookup is now optional

* hopefully fix linter and tests

* fix workflow

* Skip device if not available

* don't change default if CPU=1

* simplify device selection

* Default to CPU if no GPU

* don't print device name...

* No need to change default in llama

* Make GPU the default device

* Compile EfficientNet with CPU

* don't print device

* use METAL and CUDA if possible

* Revert some changes to workflow

* Fix import error when checking device availability

* device lookup is now optional

* hopefully fix linter and tests

* fix workflow

* Skip device if not available

* don't change default if CPU=1

* simplify device selection

* Default to CPU if no GPU

* don't print device name...

* No need to change default in llama

* run github workflow

* Fix logic to select default

* pass if an error occurs

* use separate function for try except
2023-04-04 09:41:52 +05:30
Jacky Lee
156640e90d Permute examples (#731)
* examples: use permute instead of transpose

* Use transpose but change args
2023-03-29 05:07:06 +04:00
George Hotz
b12b60af20 fix binop, other tests failure (#723)
* fix binop, other tests failure

* that was a bad idea

* better layernorm

* inference kernel count tests

* new style reshape pushing

* fixup replacement

* 199 kernels is okay. fix flops

* push reshape through unaryops only

* GRAPH=2 draws the phantom ops

* found resnet issue

* non working test

* mul is cheaper than div

* OPT inflation

* SHUFFLE_PAD_OPS in OPT=2
2023-03-22 18:15:07 -07:00
Fernando Vidal
73bd0b217b add int64 as supported dtype from numpy (#699)
* add int64 as supported dtype from numpy

Without this, examples/transformer.py didn't run. With this change it runs successfully.

* Update helpers.py

* Update transformer.py

* Update training.py
2023-03-18 17:15:04 -07:00
George Hotz
f5467cfedc Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
Kirill
26a3888ab8 Fix llama 13B RAM usage (#710) 2023-03-18 13:50:09 -07:00
Kirill
0fe5014b1f Use pathlib (#711)
* Use pathlib in llama

* Use pathlib in stablediffusion
2023-03-18 13:49:21 -07:00
Kirill
0532025b04 Fix llama 13B weights loading (#700)
* Fix llama 13B weights loading

* refactor more

* add test

* test storage offset

* fix spacing

* fix strides

* llama 13B working?

* yolo?

* better test for seeks
2023-03-15 08:59:52 -07:00
Ayushman Kumar
e28bd11ff1 Cast Tensor data to float32 (#703)
* Cast Tensor data to float32

* astype('float32') --> Tensor.randn()
2023-03-14 23:09:41 -07:00
Jacky Lee
5e820818e9 Cast image to float32 (#702) 2023-03-14 08:13:19 -07:00
George Hotz
fe0e8a306f jittable llama 2023-03-12 14:15:04 -07:00
George Hotz
15e0b56e39 compile works (#688)
* compile works

* runtimes

* line count

* fix custom, to tg dtype

* meh, that's fine with lazy import
2023-03-12 11:01:25 -07:00
Kirill
af7745073f Add comments to SD (#686)
* Add explanation for empty lambdas

* Fix my_unpickle if pytorch_lightning is installed

* oops
2023-03-12 10:56:49 -07:00
George Hotz
046b3952c3 get_state_dict 2023-03-11 23:46:53 -08:00
George Hotz
803b0aef28 track memory for numpy/torch 2023-03-11 20:39:10 -08:00
George Hotz
61071f881a fix bug, and add unit test to catch failure 2023-03-11 16:57:25 -08:00
George Hotz
3ec457248c failing llama test 2023-03-11 16:28:10 -08:00
George Hotz
8aa63847c7 llama: up max tokens to 1000 2023-03-11 13:39:33 -08:00
George Hotz
5ea44cefcc llama: add lexie personality 2023-03-11 10:23:33 -08:00
George Hotz
c908f911a7 llama defaults to metal on osx 2023-03-11 09:30:13 -08:00
George Hotz
5e1380df6a profiling llama + cache is_contiguous 2023-03-11 08:23:21 -08:00
George Hotz
f3ac52aee8 Mypyc (#680)
* building shapetracker

* default ENABLE_METHOD_CACHE

* symbolic compiles

* improve types

* tensor compiles

* oops, that's a bug

* best of both worlds

* find legit typing bugs

* pad2d can take list or tuple

* sub 200ms when compiled
2023-03-11 07:33:30 -08:00
George Hotz
b1206bcb18 third try at torch loading (#677)
* third try at torch loading

* numpy fixed

* fix enet compile

* load_single_weight supports empty weights

* oops, CPU wasn't the default

* so many bugs
2023-03-10 19:11:29 -08:00
George Hotz
8bf75a7fdd fix stable diffusion and CI 2023-03-10 17:48:12 -08:00
George Hotz
4780f9a6df llama runs (slowly) in master 2023-03-10 17:36:51 -08:00
jspieler
da7fb4b227 Fixed DDPG example (#667) 2023-03-09 11:49:52 -08:00
George Hotz
c22afc52db move the custom function example to a test 2023-03-08 10:05:04 -08:00
George Hotz
7d3b9d0e95 oops, things relied on that API. the global cache needs access to the ASTRunner class 2023-03-08 08:39:31 -08:00
George Hotz
4f957423c3 jitting custom ops + OPTLOCAL assignment bugfix 2023-03-08 08:30:37 -08:00
George Hotz
7285de41a1 tinygrad supports CUSTOM functions 2023-03-08 07:50:33 -08:00
Pankaj Doharey
9d97d97b26 Opens image in default viewer after saving. (#612) 2023-03-03 17:28:49 -08:00
George Hotz
2e26286294 speed like you wouldn't believe (#626)
* speed like you wouldn't believe

* fix tests
2023-03-02 07:49:19 -08:00
George Hotz
bfcec234a2 Refactor ASTs (#622)
* ugh worst branch name

* compiler refactor continues

* scc -> cloc

* buf -> _buf

* finish _buf, and program -> runtime

* gpu is still working, clang isn't

* clang in new style

* ops_metal

* something broke it

* improve metal

* clean up tons of cl crap

* hack fix sync

* cleaner gpu

* gpu metal clang

* cleanups

* minor refactor

* GPUCodegen

* fix up LLVM

* blind CUDA refactor

* codegen / runtime

* keep ops naming

* linter passes

* woah, llvm was allocing 4x what it needed to

* bugfixes

* fix openpilot compiler

* fix compile_efficientnet

* method cache should fix tests

* deal with duped functions
2023-03-01 18:57:29 -08:00
George Hotz
c4856aa193 fix yolo webcam 2023-02-26 17:24:05 -08:00
Jacky Lee
0f58c4c648 Cleanup yolo and remove stateless classes (#604)
* Add AvgPool2d as a layer

* Clean up a bit

* Remove stateless layers in yolo_nn

* More cleanup

* Save label for test

* Add test for YOLO

* Test without cv2

* Don't fail if cv2 not installed

* Better import

* Fix image read

* Use opencv :)

* Don't download the file

* Fix errors

* Use same version

* Set higher confidence

* Why is the confidence so low?

* Start over

* Remove stateless layers

* Remove extra lines

* Revert changes

* Save a few more lines
2023-02-26 16:55:21 -08:00
voidz
94bec40110 moved extras/jit.py -> tinygrad/jit.py (#599)
* moved extras/jit.py to tinygrad/jit.py

* fixed indent

* removed tinygrad.helpers.DEBUG from jit.py
2023-02-25 08:32:33 -08:00
Benedikt Mandelkow
7348e9a6c6 add restrict qualifier to inputs in c backend (#593)
* add restrict qualifier for clang backend convolution inputs/ outputs
see https://godbolt.org/z/Tb9jMxWfx for generated assembly

* enable more checks

* inline fmax to motivate the compiler to inline some more

* fix if else binding power
2023-02-25 08:32:21 -08:00
George Hotz
2e56a4793e rename log_softmax, support dim, fix onnx Softmax 2023-02-24 10:11:24 -08:00
George Hotz
94ccab941e compile_tensorflow: no cast required 2023-02-22 21:14:21 -08:00
George Hotz
135d0ddb78 compile_tensorflow: read weights from disk 2023-02-22 21:12:35 -08:00
George Hotz
0615dcffe7 compile_tensorflow: save the weights 2023-02-22 21:05:45 -08:00
George Hotz
c537fd0614 compile_tensorflow: add initialize and tests 2023-02-22 20:50:53 -08:00
George Hotz
dc914cde50 compile_tensorflow 2023-02-22 20:08:58 -08:00
George Hotz
76b4d0577d yolov8 works up to the MaxPool 2023-02-22 19:32:13 -08:00
Mischa Untaga
14bb2c40a2 Fix yolov3 example (#577) 2023-02-21 09:24:00 -08:00
George Hotz
d9fa47ecc9 use the TinyJit in the efficientnet runner, 200ms -> 20ms 2023-02-20 19:58:16 -08:00