Commit Graph

1259 Commits

Author SHA1 Message Date
Sohaib
70b9072663 add Pad onnx operator and rework _padding (#740) 2023-04-06 17:07:36 +05:30
jintuzhang
8e40ff8c8d Do not specify errors when trying to load devices. (#741) 2023-04-06 17:05:36 +05:30
Jacky Lee
7a45b989a1 Device: make GPU default and METAL/CUDA if possible (#732)
* Make GPU the default device

* Compile EfficientNet with CPU

* don't print device

* use METAL and CUDA if possible

* Revert some changes to workflow

* Fix import error when checking device availability

* device lookup is now optional

* hopefully fix linter and tests

* fix workflow

* Skip device if not available

* don't change default if CPU=1

* simplify device selection

* Default to CPU if no GPU

* don't print device name...

* No need to change default in llama

* Make GPU the default device

* Compile EfficientNet with CPU

* don't print device

* use METAL and CUDA if possible

* Revert some changes to workflow

* Fix import error when checking device availability

* device lookup is now optional

* hopefully fix linter and tests

* fix workflow

* Skip device if not available

* don't change default if CPU=1

* simplify device selection

* Default to CPU if no GPU

* don't print device name...

* No need to change default in llama

* run github workflow

* Fix logic to select default

* pass if an error occurs

* use separate function for try except
2023-04-04 09:41:52 +05:30
George Hotz
b05c2828f7 better cacheline test 2023-03-30 06:08:54 +04:00
George Hotz
76db1af6fc better archprobe 2023-03-30 05:52:00 +04:00
Jacky Lee
e5f430d8c6 Device: move below LazyBuffer (#733) 2023-03-29 10:35:11 +04:00
George Hotz
b99798f08e acc function not needed 2023-03-29 08:03:46 +04:00
George Hotz
20894991ed good changes from the M1 Tensor Core project (#730)
* good changes

* working except llvm

* llvm types

* nice acc

* archprobe

* lang.float4

* use self.acc for late acc

* fix store bug
2023-03-29 05:11:02 +04:00
Andre Slavescu
39d6e1525f Added activation ops + tests (#729)
* activation ops

* type hints + more testing

* formatting correction + parameter testing

* fixes to shape testing

* hardtanh to use clip + removed type hints

* assign val fix
2023-03-28 13:17:53 +04:00
George Hotz
fa5516dda0 fix lint, installed pre-commit on now computer 2023-03-24 11:15:59 -07:00
George Hotz
ebc4ad6223 color the jit nicer 2023-03-24 10:54:20 -07:00
George Hotz
23f88fb026 synchronize for honest speed compare 2023-03-24 10:24:27 -07:00
Jacky Lee
fafe8e9ce2 casting: support all backends and implement half (#726)
* casting: support all backends and implement half

* map torch types in ops_torch

* reuse type map for torch buffer

* inverse dict lookup
2023-03-24 09:58:03 -07:00
George Hotz
e88b9bfe1e print gflops avg with DEBUG=2 2023-03-23 16:07:08 -07:00
George Hotz
de04208247 hotcast bug fix 2023-03-23 11:49:47 -07:00
Jacky Lee
e009b6f341 Add tests for casting (#724)
* Add tests for casting

* Skip half_matmul_upcast when TORCH=1

* Fix promotion on torch

* Fix spacing
2023-03-23 08:02:52 -07:00
George Hotz
51e19ac25c OPTLOCAL=2 makes stable diffusion a usable speed after the cache builds 2023-03-22 19:19:11 -07:00
George Hotz
2e18469fd4 clean up display name 2023-03-22 18:32:05 -07:00
George Hotz
b12b60af20 fix binop, other tests failure (#723)
* fix binop, other tests failure

* that was a bad idea

* better layernorm

* inference kernel count tests

* new style reshape pushing

* fixup replacement

* 199 kernels is okay. fix flops

* push reshape through unaryops only

* GRAPH=2 draws the phantom ops

* found resnet issue

* non working test

* mul is cheaper than div

* OPT inflation

* SHUFFLE_PAD_OPS in OPT=2
2023-03-22 18:15:07 -07:00
George Hotz
d6f4219952 LayerNorm2d for 2 lines 2023-03-20 16:58:43 -07:00
George Hotz
128ca160ac lazy: remove required device 2023-03-20 16:31:45 -07:00
George Hotz
120d7072bd indexing merge almost works 2023-03-20 16:17:07 -07:00
George Hotz
06abbbfe7c remove the stupid register class (#721)
* remove the stupid register class

* touchups

* colorful display name
2023-03-20 15:45:12 -07:00
George Hotz
30b795874a remove RMSprop, nobody uses it anymore 2023-03-20 12:31:34 -07:00
George Hotz
25287a974e types (#720)
* types

* cleanups

* don't use None, use LocalBuffer

* eh
2023-03-20 12:31:02 -07:00
George Hotz
9b314c6342 factor uops transformers into functions 2023-03-20 08:19:48 -07:00
George Hotz
5495c7d64e linearizer! (#714)
* linearizer outputs something

* working ish

* cstyle codegen

* clang mostly works

* fix load valid

* fix numberless loop

* fancy gen

* working

* fix enet compiler

* cleanups

* float4 upcasting

* less lines

* supports_float4

* constant folding

* mulacc

* internet tests flaky in CI

* 90% image support

* fix image generic

* bugs exposed with shapetracker and single view

* new llvm

* use vload, remove OLD

* that's really poorly done

* ending up being more lines
2023-03-19 23:43:49 -07:00
Cyril Roumégous
b629fd4cd8 add AdamW optimizer (#716)
* add AdamW optimizer

* one liner Adam optimizer
2023-03-19 12:51:06 -07:00
George Hotz
1012b68f7e finally, some speedups 2023-03-18 18:17:33 -07:00
George Hotz
902906f909 Fix constant folding (#713)
* fix

* codegen

* contiguous is real

* no bufs_to_delete

* don't assign rawconst

* remove neg and not

* need exec to fix custom function jit
2023-03-18 17:52:46 -07:00
George Hotz
f355b02987 remove comments and reorder 2023-03-18 14:48:39 -07:00
George Hotz
f5467cfedc Devicebufferless (#708)
* runs one metal kernel

* conv2d works

* ops tests are passing

* const folding

* all ops work

* pre commit always passes

* torch works

* working still

* fix graph test

* tests passing

* image almost works

* image conv works

* most images

* fix custom

* fix assignment

* fix compile enet

* clean up comments

* fix realize return value

* include shapetracker in LB repr

* copy should make a copy

* reenable method cache

* fix lna

* dtypes in graph

* forward only for IMAGE=2

* simple realize

* getting close

* fixup new api, it's good except the kernel count

* back to 197 kernels

* tests should pass

* go to a real float

* no type_on_cpu

* fix the docs

* put shapetracker back in it's proper place
2023-03-18 14:40:23 -07:00
George Hotz
3a8af99adb i understand ClassVar now 2023-03-15 09:00:25 -07:00
George Hotz
54f499b623 Move rawbuffer (#697)
* move GlobalCounters to helpers

* that's not part of the public api

* move InterpretedBuffer

* remove fromCPU from devicebuffer
2023-03-13 22:30:36 -07:00
George Hotz
aca244194f bufs not none 2023-03-12 23:57:41 -07:00
George Hotz
c594a0a835 fix flip bug, add new unit tests 2023-03-12 23:55:31 -07:00
George Hotz
153cce0f7e tutorial 2023-03-12 22:31:46 -07:00
George Hotz
8d16ebaea7 we have docs: 2023-03-12 19:05:44 -07:00
George Hotz
b512edc9ff no decorators for image methods. move out RawMallocBuffer. -7 lines 2023-03-12 16:28:45 -07:00
George Hotz
ed9ab6ff03 move image to nn/image.py 2023-03-12 16:21:42 -07:00
George Hotz
fe0e8a306f jittable llama 2023-03-12 14:15:04 -07:00
George Hotz
dcac618515 stop wasting time with the compiler. tinygrad needs to just jit 2023-03-12 12:08:46 -07:00
George Hotz
46b49d50bd llvm was using wrong shapetracker 2023-03-12 11:49:03 -07:00
George Hotz
fdde87afda Revert "Revert "late simplify on st""
This reverts commit c8508e359d.
2023-03-12 11:47:44 -07:00
George Hotz
c8508e359d Revert "late simplify on st"
This reverts commit 606550474c.
2023-03-12 11:46:10 -07:00
George Hotz
606550474c late simplify on st 2023-03-12 11:38:56 -07:00
George Hotz
de6f1695a3 only allow exact buffer name 2023-03-12 11:13:36 -07:00
George Hotz
15e0b56e39 compile works (#688)
* compile works

* runtimes

* line count

* fix custom, to tg dtype

* meh, that's fine with lazy import
2023-03-12 11:01:25 -07:00
George Hotz
58d3824cbe better get_state_dict 2023-03-12 00:10:48 -08:00
George Hotz
046b3952c3 get_state_dict 2023-03-11 23:46:53 -08:00