Commit Graph

4433 Commits

Author SHA1 Message Date
George Hotz
2f602a92ff seperate STRIDED and EXPAND 2022-10-30 13:23:58 -07:00
George Hotz
544cb0a069 oops, remove while(1) 2022-10-29 14:05:13 -07:00
George Hotz
fdb43fe553 gemm is 1.7 TFLOPS on a single M1 core 2022-10-29 13:42:33 -07:00
George Hotz
52bfbc31be vectorization 2022-10-29 12:47:52 -07:00
George Hotz
e473d35f90 llvm doesn't vectorize 2022-10-29 11:59:48 -07:00
George Hotz
7909786dbf one more opt test 2022-10-28 18:37:53 -07:00
George Hotz
294ab9e2f8 more test opt 2022-10-28 18:04:12 -07:00
George Hotz
f885ceb695 test speed w/o bias 2022-10-28 11:22:15 -07:00
George Hotz
df31dde174 hasattr and DeviceBuffer type fixups 2022-10-28 09:05:45 -07:00
George Hotz
b65b70812a Exec AST (#404)
* working exec ast

* exec_ast is staticmethod

* GenericExecAST

* fold that sometimes

* ExplicitExecAST

* exec_ast for GPU

* gpu working

* get_lazyop_shape

* now gpubuffer is ExplicitExecAST

* dedup

* add a type

* RESHAPE in opencl code

* fix linter

* that too for linter

* cleanups

* remove dead code

* GenericShape is less lines

* add ALLOWED_KERNEL_COUNT to tests

* fix mypy

* that's gotta be recursive

* fix opencl shape processing

* remove unneeded lambda
2022-10-28 08:27:03 -07:00
George Hotz
6a15fd3844 LLVM Backend take 2 (#403)
* take 2 llvm

* get_lazybuffers -> get_buffers

* llvm tests pass

* fix type issues and refactor LLVM
2022-10-26 20:32:31 -07:00
George Hotz
10921a60c4 more imports from llvm branch 2022-10-26 18:02:36 -07:00
Drew Hintz
a4ad1d774a enable tests in test_ops.py that are disabled but now work. (#396)
remove custom tolerances that don't appear to be needed.
2022-10-13 09:58:53 -07:00
George Hotz
793edf8900 touchup 2022-10-10 16:13:34 -07:00
George Hotz
d54a45b50d measure speed vs torch 2022-10-10 16:06:00 -07:00
George Hotz
b7f748c15a Fix GPU 2**31 virtual size limit (#392)
* in progress

* big conv test works

* that's unneeded

* fix opencl with reduce

* rewrite contiguous_view_constant_fold

* clean up mids in loop code

* subidx

* print cl kernel before run

* no reduce, no loop

* Revert "no reduce, no loop"

This reverts commit 92777e40e9.
2022-10-05 00:55:20 -04:00
George Hotz
392e57aea7 ugh, why did that fail 2022-10-01 13:38:43 -04:00
George Hotz
7a61dc7ee9 test_sd_big_conv 2022-10-01 13:26:05 -04:00
Ollin Boer Bohan
3b1767e013 Fix OpenCL Metal texture issues (#378)
* Fix OpenCL Metal texture issues

Tile CL images when needed, to fit into the 16384 max Metal image size;
gets me to ~4.8s/iteration for SD on M1 Pro with OPENCL=1 FLOAT16=1.

* Minor cleanup

* Fix mish in CI, or no-op?

* Is mish being framed?

* It would help if any of this reproduced locally

* ???

* OPT is reverted; use original mish

* Cleanup post-review

* Fix some shape usage

* Tiler tests, shouldn't oom or overflow either

* Can't CL if there's no CL?

* Run tiler tests even if GPU=1

* relu6 segfault binary chop; revert test

* relu6 segfault binary chop; revert accel

* relu6 segfault binary chop; revert . (???)

* end relu6 segfault binary chop; repo's haunted
2022-09-29 01:21:54 -04:00
George Hotz
e737513c52 external_test_opt 2022-09-28 23:29:41 -04:00
George Hotz
650c011646 notrain test 2022-09-28 23:27:20 -04:00
George Hotz
af87d692e4 should this be 10? 2022-09-28 23:25:52 -04:00
George Hotz
0fd459b24e ugh, global state 2022-09-28 23:10:49 -04:00
George Hotz
fa4eff9cc1 Device.GPU isn't definied 2022-09-28 23:00:15 -04:00
George Hotz
0b6537a572 fix tests 2022-09-28 22:57:58 -04:00
George Hotz
726cca78cd fix bn folding issue, add new test 2022-09-28 22:52:18 -04:00
George Hotz
60df954377 Fix weight init: this work? (#391)
* this work?

* glorot uniform

* requies_grad broke

* propagate the None correctly

* so this weight init works

* ahh, i think it's this

* can't beat this

* glorot is best for ae

* remove comments
2022-09-25 16:46:33 -04:00
George Hotz
271446e3eb set requires_grad to None (#387)
* set requires_grad to None

* some things need gradients

* hmm, why was get_parameters filtering
2022-09-21 11:16:02 -04:00
George Hotz
29ae21bb0d import tests from CL metal texture fix 2022-09-19 20:01:47 -04:00
George Hotz
57e804a9bf add min support 2022-09-18 20:39:41 -04:00
YassineYousfi
2f0f91ba3d support float16 onnx weights (#384) 2022-09-15 09:12:18 -04:00
George Hotz
3c3534736e fix matmul kernel and tests 2022-09-13 08:31:04 -07:00
Comma Device
62e9419206 fix test failure on MATMUL=1 backward pass 2022-09-13 11:18:52 -04:00
Comma Device
3b82afc6a0 simple on device failing test 2022-09-13 10:59:15 -04:00
George Hotz
4efde1ba0a test_matmul 2022-09-13 07:51:33 -07:00
George Hotz
0b8c2221b5 relax mnist test a tiny bit 2022-09-07 07:52:05 -07:00
George Hotz
ecc1a0470d add Linear to tinygrad.nn 2022-09-07 07:40:48 -07:00
George Hotz
790af99a48 fix slice one multi, and linear can be simpler with new broadcasting 2022-09-06 19:51:33 -07:00
YassineYousfi
5aad460c7a broadcast from right to left (#375)
* broadcast from right to left

* add another broadcasted add test
2022-09-06 16:36:13 -07:00
George Hotz
bcb867cdd6 better idea for numbers, do the division in python 2022-09-03 16:23:39 -07:00
George Hotz
033a3ecccf found tinygrad bug 2022-09-03 12:32:43 -07:00
George Hotz
7f15779942 t.assign in optim 2022-08-20 14:04:33 -07:00
George Hotz
1eb12dafbc reduce axis at the end 2022-08-20 07:40:56 -07:00
George Hotz
b132de677d tinygrad.nn (#367)
* tinygrad.nn

* flake8

* working on pylint

* more pylint

* more pylint

* pylint passes

* networkx

* mypy can't infer that type

* junk
2022-08-18 07:41:00 -07:00
George Hotz
18fde22dac fix that soon 2022-07-20 09:07:09 -07:00
George Hotz
5d45c6e516 Fold reduce (#362)
* folding reduce

* fold through movementops

* fixup shapes

* was too aggressive

* i knew we needed that

* don't recompute reduce

* working

* fix openpilot compile

* prunegraph openpilot

* types and reduce_shape

* refactor

* cleanups

* neater

* 1009

* 1004

* clean up reduce for 998
2022-07-19 09:24:02 -07:00
George Hotz
f76d41812b prune graph 2022-07-17 15:38:43 -07:00
George Hotz
73b0471b25 join expands 2022-07-17 13:42:05 -07:00
George Hotz
cfabbbd6bb more crap to remove without convs 2022-07-17 13:02:27 -07:00
George Hotz
5e96ed523a fix opencl bug, no training on opencl 2022-07-17 12:55:26 -07:00