Commit Graph

1771 Commits

Author SHA1 Message Date
George Hotz
d6f4219952 LayerNorm2d for 2 lines 2023-03-20 16:58:43 -07:00
George Hotz
128ca160ac lazy: remove required device 2023-03-20 16:31:45 -07:00
George Hotz
120d7072bd indexing merge almost works 2023-03-20 16:17:07 -07:00
George Hotz
06abbbfe7c remove the stupid register class (#721)
* remove the stupid register class

* touchups

* colorful display name
2023-03-20 15:45:12 -07:00
George Hotz
30b795874a remove RMSprop, nobody uses it anymore 2023-03-20 12:31:34 -07:00
George Hotz
25287a974e types (#720)
* types

* cleanups

* don't use None, use LocalBuffer

* eh
2023-03-20 12:31:02 -07:00
George Hotz
9b314c6342 factor uops transformers into functions 2023-03-20 08:19:48 -07:00
George Hotz
623fb1ef28 do test_conv_with_bn test 2023-03-19 23:53:56 -07:00
George Hotz
5495c7d64e linearizer! (#714)
* linearizer outputs something

* working ish

* cstyle codegen

* clang mostly works

* fix load valid

* fix numberless loop

* fancy gen

* working

* fix enet compiler

* cleanups

* float4 upcasting

* less lines

* supports_float4

* constant folding

* mulacc

* internet tests flaky in CI

* 90% image support

* fix image generic

* bugs exposed with shapetracker and single view

* new llvm

* use vload, remove OLD

* that's really poorly done

* ending up being more lines
2023-03-19 23:43:49 -07:00
Cyril Roumégous
b629fd4cd8 add AdamW optimizer (#716)
* add AdamW optimizer

* one liner Adam optimizer
2023-03-19 12:51:06 -07:00
George Hotz
1012b68f7e finally, some speedups 2023-03-18 18:17:33 -07:00
George Hotz
902906f909 Fix constant folding (#713)
* fix

* codegen

* contiguous is real

* no bufs_to_delete

* don't assign rawconst

* remove neg and not

* need exec to fix custom function jit
2023-03-18 17:52:46 -07:00
Fernando Vidal
73bd0b217b add int64 as supported dtype from numpy (#699)
* add int64 as supported dtype from numpy

Without this, examples/transformer.py didn't run. With this change it runs successfully.

* Update helpers.py

* Update transformer.py

* Update training.py
2023-03-18 17:15:04 -07:00
George Hotz
f355b02987 remove comments and reorder 2023-03-18 14:48:39 -07:00
George Hotz
f5467cfedc Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
Kirill
26a3888ab8 Fix llama 13B RAM usage (#710) 2023-03-18 13:50:09 -07:00
Kirill
0fe5014b1f Use pathlib (#711)
* Use pathlib in llama

* Use pathlib in stablediffusion
2023-03-18 13:49:21 -07:00
Connor Henderson
5e8fdfa956 Update path for test_mnist in README (#706) 2023-03-15 18:42:17 -07:00
George Hotz
3a8af99adb i understand ClassVar now 2023-03-15 09:00:25 -07:00
Kirill
0532025b04 Fix llama 13B weights loading (#700)
* Fix llama 13B weights loading

* refactor more

* add test

* test storage offset

* fix spacing

* fix strides

* llama 13B working?

* yolo?

* better test for seeks
2023-03-15 08:59:52 -07:00
Pasan Perera
df48753692 fixed the import error for latest changes in master (#705) 2023-03-15 08:59:42 -07:00
Ayushman Kumar
e28bd11ff1 Cast Tensor data to float32 (#703)
* Cast Tensor data to float32

* astype('float32') --> Tensor.randn()
2023-03-14 23:09:41 -07:00
Jacky Lee
5e820818e9 Cast image to float32 (#702) 2023-03-14 08:13:19 -07:00
George Hotz
54f499b623 Move rawbuffer (#697)
* move GlobalCounters to helpers

* that's not part of the public api

* move InterpretedBuffer

* remove fromCPU from devicebuffer
2023-03-13 22:30:36 -07:00
George Hotz
cbc5a7222a symbolic is now a 6/10 due to the infinite loop. do better. 2023-03-13 00:07:59 -07:00
George Hotz
aca244194f bufs not none 2023-03-12 23:57:41 -07:00
George Hotz
c594a0a835 fix flip bug, add new unit tests 2023-03-12 23:55:31 -07:00
George Hotz
a4abcf0969 improve test_example 2023-03-12 22:59:40 -07:00
George Hotz
5577634cf3 tests in pre commit 2023-03-12 22:42:26 -07:00
George Hotz
ce1564b05e fix shapetracker test 2023-03-12 22:33:25 -07:00
George Hotz
153cce0f7e tutorial 2023-03-12 22:31:46 -07:00
George Hotz
8d16ebaea7 we have docs: 2023-03-12 19:05:44 -07:00
George Hotz
b512edc9ff no decorators for image methods. move out RawMallocBuffer. -7 lines 2023-03-12 16:28:45 -07:00
George Hotz
ed9ab6ff03 move image to nn/image.py 2023-03-12 16:21:42 -07:00
George Hotz
fe0e8a306f jittable llama 2023-03-12 14:15:04 -07:00
George Hotz
dcac618515 stop wasting time with the compiler. tinygrad needs to just jit 2023-03-12 12:08:46 -07:00
George Hotz
46b49d50bd llvm was using wrong shapetracker 2023-03-12 11:49:03 -07:00
George Hotz
fdde87afda Revert "Revert "late simplify on st""
This reverts commit c8508e359d.
2023-03-12 11:47:44 -07:00
George Hotz
c8508e359d Revert "late simplify on st"
This reverts commit 606550474c.
2023-03-12 11:46:10 -07:00
George Hotz
606550474c late simplify on st 2023-03-12 11:38:56 -07:00
George Hotz
de6f1695a3 only allow exact buffer name 2023-03-12 11:13:36 -07:00
George Hotz
15e0b56e39 compile works (#688)
* compile works

* runtimes

* line count

* fix custom, to tg dtype

* meh, that's fine with lazy import
2023-03-12 11:01:25 -07:00
Kirill
af7745073f Add comments to SD (#686)
* Add explanation for empty lambdas

* Fix my_unpickle if pytorch_lightning is installed

* oops
2023-03-12 10:56:49 -07:00
George Hotz
58d3824cbe better get_state_dict 2023-03-12 00:10:48 -08:00
George Hotz
046b3952c3 get_state_dict 2023-03-11 23:46:53 -08:00
George Hotz
6c3675c01c _mmap loads to gpu fast 2023-03-11 23:00:13 -08:00
George Hotz
dc9a6b4bb7 fix float16 in CLANG on linux 2023-03-11 21:51:22 -08:00
George Hotz
803b0aef28 track memory for numpy/torch 2023-03-11 20:39:10 -08:00
George Hotz
37cf6fc4c0 err, external_test_opt.py broke...fusing will have to wait. correctness over speed 2023-03-11 17:54:47 -08:00
George Hotz
305b9f2d21 multistep optim tests passing 2023-03-11 17:49:53 -08:00