Drew Hintz
165fb4d631
remove redundant list comprehension from inside all. ( #397 )
...
remove explicit inherit from object.
2022-10-13 09:58:35 -07:00
George Hotz
793edf8900
touchup
2022-10-10 16:13:34 -07:00
George Hotz
d54a45b50d
measure speed vs torch
2022-10-10 16:06:00 -07:00
George Hotz
b7f748c15a
Fix GPU 2**31 virtual size limit ( #392 )
...
* in progress
* big conv test works
* that's unneeded
* fix opencl with reduce
* rewrite contiguous_view_constant_fold
* clean up mids in loop code
* subidx
* print cl kernel before run
* no reduce, no loop
* Revert "no reduce, no loop"
This reverts commit 92777e40e9 .
2022-10-05 00:55:20 -04:00
George Hotz
392e57aea7
ugh, why did that fail
2022-10-01 13:38:43 -04:00
George Hotz
8382c51c12
always MATMUL, test the ops in OPENCL
2022-10-01 13:31:29 -04:00
George Hotz
7a61dc7ee9
test_sd_big_conv
2022-10-01 13:26:05 -04:00
George Hotz
178ba50c03
some args for stable diffusion
2022-09-29 01:52:04 -04:00
Ollin Boer Bohan
3b1767e013
Fix OpenCL Metal texture issues ( #378 )
...
* Fix OpenCL Metal texture issues
Tile CL images when needed, to fit into the 16384 max Metal image size;
gets me to ~4.8s/iteration for SD on M1 Pro with OPENCL=1 FLOAT16=1.
* Minor cleanup
* Fix mish in CI, or no-op?
* Is mish being framed?
* It would help if any of this reproduced locally
* ???
* OPT is reverted; use original mish
* Cleanup post-review
* Fix some shape usage
* Tiler tests, shouldn't oom or overflow either
* Can't CL if there's no CL?
* Run tiler tests even if GPU=1
* relu6 segfault binary chop; revert test
* relu6 segfault binary chop; revert accel
* relu6 segfault binary chop; revert . (???)
* end relu6 segfault binary chop; repo's haunted
2022-09-29 01:21:54 -04:00
George Hotz
e737513c52
external_test_opt
2022-09-28 23:29:41 -04:00
George Hotz
650c011646
notrain test
2022-09-28 23:27:20 -04:00
George Hotz
af87d692e4
should this be 10?
2022-09-28 23:25:52 -04:00
George Hotz
0fd459b24e
ugh, global state
2022-09-28 23:10:49 -04:00
George Hotz
fa4eff9cc1
Device.GPU isn't definied
2022-09-28 23:00:15 -04:00
George Hotz
0b6537a572
fix tests
2022-09-28 22:57:58 -04:00
George Hotz
726cca78cd
fix bn folding issue, add new test
2022-09-28 22:52:18 -04:00
George Hotz
a0d169eb59
fix efficientnet
2022-09-28 14:23:01 -07:00
George Hotz
dec5334da9
revert layernorm to have axis param
2022-09-26 10:11:38 -04:00
George Hotz
dc80bf6f85
layernorm is all axis but the first
2022-09-25 17:55:48 -04:00
George Hotz
60df954377
Fix weight init: this work? ( #391 )
...
* this work?
* glorot uniform
* requies_grad broke
* propagate the None correctly
* so this weight init works
* ahh, i think it's this
* can't beat this
* glorot is best for ae
* remove comments
2022-09-25 16:46:33 -04:00
George Hotz
ff11c4316b
move get_parameters to optim.py
2022-09-25 13:16:58 -04:00
George Hotz
a0c0239ff1
fix mnist load from other dirs
2022-09-25 12:50:28 -04:00
Jacky Lee
2c01a66265
Reshape dataset from fetch_mnist ( #390 )
2022-09-24 21:16:29 -04:00
George Hotz
acae9a20c1
clipnorm support
2022-09-24 13:26:38 -04:00
George Hotz
271446e3eb
set requires_grad to None ( #387 )
...
* set requires_grad to None
* some things need gradients
* hmm, why was get_parameters filtering
2022-09-21 11:16:02 -04:00
George Hotz
29ae21bb0d
import tests from CL metal texture fix
2022-09-19 20:01:47 -04:00
George Hotz
a8aa1f9589
that's simpler
2022-09-18 20:40:46 -04:00
George Hotz
57e804a9bf
add min support
2022-09-18 20:39:41 -04:00
YassineYousfi
2f0f91ba3d
support float16 onnx weights ( #384 )
2022-09-15 09:12:18 -04:00
Comma Device
75f937227a
add barrier
2022-09-13 11:39:48 -04:00
George Hotz
3c3534736e
fix matmul kernel and tests
2022-09-13 08:31:04 -07:00
Comma Device
62e9419206
fix test failure on MATMUL=1 backward pass
2022-09-13 11:18:52 -04:00
Comma Device
3b82afc6a0
simple on device failing test
2022-09-13 10:59:15 -04:00
George Hotz
4efde1ba0a
test_matmul
2022-09-13 07:51:33 -07:00
George Hotz
894a7cee79
forgot a few
2022-09-12 09:21:46 -07:00
George Hotz
801ecd4a07
cleanup clip tokenizer
2022-09-12 09:20:12 -07:00
Fernand Pajot
ff0da4c802
Added standalone CLIP tokenizer ( #382 )
...
* Added standalone CLIP tokenizer.
* Fixed empty phrase.
* Truncating long prompts.
* Keeping two slots for the start and end token.
* Fixed empty phrase.
* Using tokenizer for empty phrase.
* Typo.
2022-09-12 09:12:55 -07:00
David Redmon
a1810c8617
update serious_mnist.py ( #380 )
2022-09-11 13:37:40 -07:00
George Hotz
ce348f0c92
Revert "change default opt to 2"
...
This reverts commit 726f4e98e9 .
2022-09-11 13:35:42 -07:00
George Hotz
726f4e98e9
change default opt to 2
2022-09-09 07:50:25 -07:00
YassineYousfi
1a7bdc51f8
support more onnx ops ( #376 )
...
* broadcast from right to left
* add another broadcasted add test
* more onnx ops
* use float32 range in clip
2022-09-07 15:15:24 -07:00
George Hotz
0b8c2221b5
relax mnist test a tiny bit
2022-09-07 07:52:05 -07:00
George Hotz
ecc1a0470d
add Linear to tinygrad.nn
2022-09-07 07:40:48 -07:00
George Hotz
d26bd73c1e
have to ignore that type
2022-09-07 07:24:27 -07:00
George Hotz
b7783565af
cpu line savings and cleaner
2022-09-06 21:24:22 -07:00
George Hotz
1c92a6da22
make gpu code readable
2022-09-06 21:17:36 -07:00
George Hotz
790af99a48
fix slice one multi, and linear can be simpler with new broadcasting
2022-09-06 19:51:33 -07:00
George Hotz
4f4ecbec97
add div to operators
2022-09-06 17:39:26 -07:00
George Hotz
5a76e652b8
simpler movement op
2022-09-06 17:27:33 -07:00
George Hotz
896f9f74a9
hmm, need this with broadcast change
2022-09-06 16:54:01 -07:00