Commit Graph

1783 Commits

Author SHA1 Message Date
George Hotz
1cb5b2d015 test_enet_se 2023-03-24 10:04:30 -07:00
Jacky Lee
fafe8e9ce2 casting: support all backends and implement half (#726)
* casting: support all backends and implement half

* map torch types in ops_torch

* reuse type map for torch buffer

* inverse dict lookup
2023-03-24 09:58:03 -07:00
George Hotz
e88b9bfe1e print gflops avg with DEBUG=2 2023-03-23 16:07:08 -07:00
George Hotz
de04208247 hotcast bug fix 2023-03-23 11:49:47 -07:00
Jacky Lee
e009b6f341 Add tests for casting (#724)
* Add tests for casting

* Skip half_matmul_upcast when TORCH=1

* Fix promotion on torch

* Fix spacing
2023-03-23 08:02:52 -07:00
George Hotz
68e45fca18 metal_matmul: bw and torch sync 2023-03-23 08:02:04 -07:00
George Hotz
bd6c3c31a9 compare to torch 2023-03-22 23:58:37 -07:00
George Hotz
c3a3db75c7 fix metal matmul example 2023-03-22 23:42:51 -07:00
George Hotz
f5aea472a3 latest torch and onnx should be fine 2023-03-22 23:33:50 -07:00
George Hotz
51e19ac25c OPTLOCAL=2 makes stable diffusion a usable speed after the cache builds 2023-03-22 19:19:11 -07:00
George Hotz
2e18469fd4 clean up display name 2023-03-22 18:32:05 -07:00
George Hotz
b12b60af20 fix binop, other tests failure (#723)
* fix binop, other tests failure

* that was a bad idea

* better layernorm

* inference kernel count tests

* new style reshape pushing

* fixup replacement

* 199 kernels is okay. fix flops

* push reshape through unaryops only

* GRAPH=2 draws the phantom ops

* found resnet issue

* non working test

* mul is cheaper than div

* OPT inflation

* SHUFFLE_PAD_OPS in OPT=2
2023-03-22 18:15:07 -07:00
George Hotz
d6f4219952 LayerNorm2d for 2 lines 2023-03-20 16:58:43 -07:00
George Hotz
128ca160ac lazy: remove required device 2023-03-20 16:31:45 -07:00
George Hotz
120d7072bd indexing merge almost works 2023-03-20 16:17:07 -07:00
George Hotz
06abbbfe7c remove the stupid register class (#721)
* remove the stupid register class

* touchups

* colorful display name
2023-03-20 15:45:12 -07:00
George Hotz
30b795874a remove RMSprop, nobody uses it anymore 2023-03-20 12:31:34 -07:00
George Hotz
25287a974e types (#720)
* types

* cleanups

* don't use None, use LocalBuffer

* eh
2023-03-20 12:31:02 -07:00
George Hotz
9b314c6342 factor uops transformers into functions 2023-03-20 08:19:48 -07:00
George Hotz
623fb1ef28 do test_conv_with_bn test 2023-03-19 23:53:56 -07:00
George Hotz
5495c7d64e linearizer! (#714)
* linearizer outputs something

* working ish

* cstyle codegen

* clang mostly works

* fix load valid

* fix numberless loop

* fancy gen

* working

* fix enet compiler

* cleanups

* float4 upcasting

* less lines

* supports_float4

* constant folding

* mulacc

* internet tests flaky in CI

* 90% image support

* fix image generic

* bugs exposed with shapetracker and single view

* new llvm

* use vload, remove OLD

* that's really poorly done

* ending up being more lines
2023-03-19 23:43:49 -07:00
Cyril Roumégous
b629fd4cd8 add AdamW optimizer (#716)
* add AdamW optimizer

* one liner Adam optimizer
2023-03-19 12:51:06 -07:00
George Hotz
1012b68f7e finally, some speedups 2023-03-18 18:17:33 -07:00
George Hotz
902906f909 Fix constant folding (#713)
* fix

* codegen

* contiguous is real

* no bufs_to_delete

* don't assign rawconst

* remove neg and not

* need exec to fix custom function jit
2023-03-18 17:52:46 -07:00
Fernando Vidal
73bd0b217b add int64 as supported dtype from numpy (#699)
* add int64 as supported dtype from numpy

Without this, examples/transformer.py didn't run. With this change it runs successfully.

* Update helpers.py

* Update transformer.py

* Update training.py
2023-03-18 17:15:04 -07:00
George Hotz
f355b02987 remove comments and reorder 2023-03-18 14:48:39 -07:00
George Hotz
f5467cfedc Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
Kirill
26a3888ab8 Fix llama 13B RAM usage (#710) 2023-03-18 13:50:09 -07:00
Kirill
0fe5014b1f Use pathlib (#711)
* Use pathlib in llama

* Use pathlib in stablediffusion
2023-03-18 13:49:21 -07:00
Connor Henderson
5e8fdfa956 Update path for test_mnist in README (#706) 2023-03-15 18:42:17 -07:00
George Hotz
3a8af99adb i understand ClassVar now 2023-03-15 09:00:25 -07:00
Kirill
0532025b04 Fix llama 13B weights loading (#700)
* Fix llama 13B weights loading

* refactor more

* add test

* test storage offset

* fix spacing

* fix strides

* llama 13B working?

* yolo?

* better test for seeks
2023-03-15 08:59:52 -07:00
Pasan Perera
df48753692 fixed the import error for latest changes in master (#705) 2023-03-15 08:59:42 -07:00
Ayushman Kumar
e28bd11ff1 Cast Tensor data to float32 (#703)
* Cast Tensor data to float32

* astype('float32') --> Tensor.randn()
2023-03-14 23:09:41 -07:00
Jacky Lee
5e820818e9 Cast image to float32 (#702) 2023-03-14 08:13:19 -07:00
George Hotz
54f499b623 Move rawbuffer (#697)
* move GlobalCounters to helpers

* that's not part of the public api

* move InterpretedBuffer

* remove fromCPU from devicebuffer
2023-03-13 22:30:36 -07:00
George Hotz
cbc5a7222a symbolic is now a 6/10 due to the infinite loop. do better. 2023-03-13 00:07:59 -07:00
George Hotz
aca244194f bufs not none 2023-03-12 23:57:41 -07:00
George Hotz
c594a0a835 fix flip bug, add new unit tests 2023-03-12 23:55:31 -07:00
George Hotz
a4abcf0969 improve test_example 2023-03-12 22:59:40 -07:00
George Hotz
5577634cf3 tests in pre commit 2023-03-12 22:42:26 -07:00
George Hotz
ce1564b05e fix shapetracker test 2023-03-12 22:33:25 -07:00
George Hotz
153cce0f7e tutorial 2023-03-12 22:31:46 -07:00
George Hotz
8d16ebaea7 we have docs: 2023-03-12 19:05:44 -07:00
George Hotz
b512edc9ff no decorators for image methods. move out RawMallocBuffer. -7 lines 2023-03-12 16:28:45 -07:00
George Hotz
ed9ab6ff03 move image to nn/image.py 2023-03-12 16:21:42 -07:00
George Hotz
fe0e8a306f jittable llama 2023-03-12 14:15:04 -07:00
George Hotz
dcac618515 stop wasting time with the compiler. tinygrad needs to just jit 2023-03-12 12:08:46 -07:00
George Hotz
46b49d50bd llvm was using wrong shapetracker 2023-03-12 11:49:03 -07:00
George Hotz
fdde87afda Revert "Revert "late simplify on st""
This reverts commit c8508e359d.
2023-03-12 11:47:44 -07:00