Commit Graph

10417 Commits

Author SHA1 Message Date
George Hotz
62affbd9ce add CONTIGUOUS loadop 2022-10-20 15:55:19 -07:00
George Hotz
bb288e6938 safe_numpy and warning for broken matmul 2022-10-20 15:40:22 -07:00
George Hotz
50c95c7d9a add assert to catch issue in attention 2022-10-20 15:13:00 -07:00
George Hotz
26c78ccf7d remove useless buffer 2022-10-20 14:07:28 -07:00
George Hotz
a18c1f3178 zero out the inputs 2022-10-20 13:46:52 -07:00
George Hotz
61ee428e4c rerun 2022-10-20 13:29:14 -07:00
George Hotz
5dae64b7b0 read input shapes and break down the layers 2022-10-20 13:11:24 -07:00
George Hotz
e00601faea fix thneed self test 2022-10-20 12:55:02 -07:00
George Hotz
ace8db29f8 ReduceSum 2022-10-20 12:48:14 -07:00
George Hotz
c400ee0beb refactoring thneed (#400)
* refactoring thneed

* continue

* minor update

* looks like it's working

* big refactor

* confirm thneed got the right output

* code is there but it's broken

* works now

* always OPTWG, input -> dat

* fix type issue
2022-10-20 12:35:59 -07:00
George Hotz
0514594083 fix openpilot test 2022-10-20 11:56:26 -07:00
YassineYousfi
ae0f9b17df openpilot: new models and onnx ops (#401)
* ngrl stuff

* fngrl

* fix typo in compile script

* workflow dispatch

* new models in tests

* dont need to up this threshold

Co-authored-by: HaraldSchafer <harald.the.engineer@gmail.com>
2022-10-20 11:49:19 -07:00
Drew Hintz
a4ad1d774a enable tests in test_ops.py that are disabled but now work. (#396)
remove custom tolerances that don't appear to be needed.
2022-10-13 09:58:53 -07:00
Drew Hintz
165fb4d631 remove redundant list comprehension from inside all. (#397)
remove explicit inherit from object.
2022-10-13 09:58:35 -07:00
George Hotz
793edf8900 touchup 2022-10-10 16:13:34 -07:00
George Hotz
d54a45b50d measure speed vs torch 2022-10-10 16:06:00 -07:00
George Hotz
b7f748c15a Fix GPU 2**31 virtual size limit (#392)
* in progress

* big conv test works

* that's unneeded

* fix opencl with reduce

* rewrite contiguous_view_constant_fold

* clean up mids in loop code

* subidx

* print cl kernel before run

* no reduce, no loop

* Revert "no reduce, no loop"

This reverts commit 92777e40e9.
2022-10-05 00:55:20 -04:00
George Hotz
392e57aea7 ugh, why did that fail 2022-10-01 13:38:43 -04:00
George Hotz
8382c51c12 always MATMUL, test the ops in OPENCL 2022-10-01 13:31:29 -04:00
George Hotz
7a61dc7ee9 test_sd_big_conv 2022-10-01 13:26:05 -04:00
George Hotz
178ba50c03 some args for stable diffusion 2022-09-29 01:52:04 -04:00
Ollin Boer Bohan
3b1767e013 Fix OpenCL Metal texture issues (#378)
* Fix OpenCL Metal texture issues

Tile CL images when needed, to fit into the 16384 max Metal image size;
gets me to ~4.8s/iteration for SD on M1 Pro with OPENCL=1 FLOAT16=1.

* Minor cleanup

* Fix mish in CI, or no-op?

* Is mish being framed?

* It would help if any of this reproduced locally

* ???

* OPT is reverted; use original mish

* Cleanup post-review

* Fix some shape usage

* Tiler tests, shouldn't oom or overflow either

* Can't CL if there's no CL?

* Run tiler tests even if GPU=1

* relu6 segfault binary chop; revert test

* relu6 segfault binary chop; revert accel

* relu6 segfault binary chop; revert . (???)

* end relu6 segfault binary chop; repo's haunted
2022-09-29 01:21:54 -04:00
George Hotz
e737513c52 external_test_opt 2022-09-28 23:29:41 -04:00
George Hotz
650c011646 notrain test 2022-09-28 23:27:20 -04:00
George Hotz
af87d692e4 should this be 10? 2022-09-28 23:25:52 -04:00
George Hotz
0fd459b24e ugh, global state 2022-09-28 23:10:49 -04:00
George Hotz
fa4eff9cc1 Device.GPU isn't definied 2022-09-28 23:00:15 -04:00
George Hotz
0b6537a572 fix tests 2022-09-28 22:57:58 -04:00
George Hotz
726cca78cd fix bn folding issue, add new test 2022-09-28 22:52:18 -04:00
George Hotz
a0d169eb59 fix efficientnet 2022-09-28 14:23:01 -07:00
George Hotz
dec5334da9 revert layernorm to have axis param 2022-09-26 10:11:38 -04:00
George Hotz
dc80bf6f85 layernorm is all axis but the first 2022-09-25 17:55:48 -04:00
George Hotz
60df954377 Fix weight init: this work? (#391)
* this work?

* glorot uniform

* requies_grad broke

* propagate the None correctly

* so this weight init works

* ahh, i think it's this

* can't beat this

* glorot is best for ae

* remove comments
2022-09-25 16:46:33 -04:00
George Hotz
ff11c4316b move get_parameters to optim.py 2022-09-25 13:16:58 -04:00
George Hotz
a0c0239ff1 fix mnist load from other dirs 2022-09-25 12:50:28 -04:00
Jacky Lee
2c01a66265 Reshape dataset from fetch_mnist (#390) 2022-09-24 21:16:29 -04:00
George Hotz
acae9a20c1 clipnorm support 2022-09-24 13:26:38 -04:00
George Hotz
271446e3eb set requires_grad to None (#387)
* set requires_grad to None

* some things need gradients

* hmm, why was get_parameters filtering
2022-09-21 11:16:02 -04:00
George Hotz
29ae21bb0d import tests from CL metal texture fix 2022-09-19 20:01:47 -04:00
George Hotz
a8aa1f9589 that's simpler 2022-09-18 20:40:46 -04:00
George Hotz
57e804a9bf add min support 2022-09-18 20:39:41 -04:00
YassineYousfi
2f0f91ba3d support float16 onnx weights (#384) 2022-09-15 09:12:18 -04:00
Comma Device
75f937227a add barrier 2022-09-13 11:39:48 -04:00
George Hotz
3c3534736e fix matmul kernel and tests 2022-09-13 08:31:04 -07:00
Comma Device
62e9419206 fix test failure on MATMUL=1 backward pass 2022-09-13 11:18:52 -04:00
Comma Device
3b82afc6a0 simple on device failing test 2022-09-13 10:59:15 -04:00
George Hotz
4efde1ba0a test_matmul 2022-09-13 07:51:33 -07:00
George Hotz
894a7cee79 forgot a few 2022-09-12 09:21:46 -07:00
George Hotz
801ecd4a07 cleanup clip tokenizer 2022-09-12 09:20:12 -07:00
Fernand Pajot
ff0da4c802 Added standalone CLIP tokenizer (#382)
* Added standalone CLIP tokenizer.

* Fixed empty phrase.

* Truncating long prompts.

* Keeping two slots for the start and end token.

* Fixed empty phrase.

* Using tokenizer for empty phrase.

* Typo.
2022-09-12 09:12:55 -07:00