George Hotz
8c849e637c
that was in there twice, DEBUG>=4 to see loop opt
2022-10-30 15:31:39 -07:00
George Hotz
cfdf803b52
fix llvm vectorization by add analysis passes from the target machine
2022-10-30 15:28:36 -07:00
George Hotz
2f602a92ff
seperate STRIDED and EXPAND
2022-10-30 13:23:58 -07:00
George Hotz
4b6097f81d
more amx notes
2022-10-29 14:04:10 -07:00
George Hotz
fdb43fe553
gemm is 1.7 TFLOPS on a single M1 core
2022-10-29 13:42:33 -07:00
George Hotz
52bfbc31be
vectorization
2022-10-29 12:47:52 -07:00
George Hotz
e473d35f90
llvm doesn't vectorize
2022-10-29 11:59:48 -07:00
George Hotz
86eb06eb76
accurate flop estimation
2022-10-28 19:13:20 -07:00
George Hotz
dd543fbc7a
MovementOps is unused
2022-10-28 18:26:08 -07:00
George Hotz
71b336503f
no RESHAPEs in the AST
2022-10-28 18:25:30 -07:00
George Hotz
b65b70812a
Exec AST ( #404 )
...
* working exec ast
* exec_ast is staticmethod
* GenericExecAST
* fold that sometimes
* ExplicitExecAST
* exec_ast for GPU
* gpu working
* get_lazyop_shape
* now gpubuffer is ExplicitExecAST
* dedup
* add a type
* RESHAPE in opencl code
* fix linter
* that too for linter
* cleanups
* remove dead code
* GenericShape is less lines
* add ALLOWED_KERNEL_COUNT to tests
* fix mypy
* that's gotta be recursive
* fix opencl shape processing
* remove unneeded lambda
2022-10-28 08:27:03 -07:00
George Hotz
6a15fd3844
LLVM Backend take 2 ( #403 )
...
* take 2 llvm
* get_lazybuffers -> get_buffers
* llvm tests pass
* fix type issues and refactor LLVM
2022-10-26 20:32:31 -07:00
George Hotz
6a8fb53304
move ops.py into lazy.py ( #402 )
...
* move ops.py into lazy.py
* fix graph and linter
* ugh, didn't add
2022-10-25 13:58:03 -07:00
George Hotz
1bec4651b3
fix nonstatic weights
2022-10-20 17:04:14 -07:00
George Hotz
9f8c414589
might fix tests
2022-10-20 16:27:11 -07:00
George Hotz
fd6ba8e7ac
don't recopy backing
2022-10-20 16:06:11 -07:00
George Hotz
0514594083
fix openpilot test
2022-10-20 11:56:26 -07:00
George Hotz
b7f748c15a
Fix GPU 2**31 virtual size limit ( #392 )
...
* in progress
* big conv test works
* that's unneeded
* fix opencl with reduce
* rewrite contiguous_view_constant_fold
* clean up mids in loop code
* subidx
* print cl kernel before run
* no reduce, no loop
* Revert "no reduce, no loop"
This reverts commit 92777e40e9 .
2022-10-05 00:55:20 -04:00
George Hotz
8382c51c12
always MATMUL, test the ops in OPENCL
2022-10-01 13:31:29 -04:00
Ollin Boer Bohan
3b1767e013
Fix OpenCL Metal texture issues ( #378 )
...
* Fix OpenCL Metal texture issues
Tile CL images when needed, to fit into the 16384 max Metal image size;
gets me to ~4.8s/iteration for SD on M1 Pro with OPENCL=1 FLOAT16=1.
* Minor cleanup
* Fix mish in CI, or no-op?
* Is mish being framed?
* It would help if any of this reproduced locally
* ???
* OPT is reverted; use original mish
* Cleanup post-review
* Fix some shape usage
* Tiler tests, shouldn't oom or overflow either
* Can't CL if there's no CL?
* Run tiler tests even if GPU=1
* relu6 segfault binary chop; revert test
* relu6 segfault binary chop; revert accel
* relu6 segfault binary chop; revert . (???)
* end relu6 segfault binary chop; repo's haunted
2022-09-29 01:21:54 -04:00
Comma Device
75f937227a
add barrier
2022-09-13 11:39:48 -04:00
George Hotz
3c3534736e
fix matmul kernel and tests
2022-09-13 08:31:04 -07:00
Comma Device
62e9419206
fix test failure on MATMUL=1 backward pass
2022-09-13 11:18:52 -04:00
George Hotz
0516359af8
fix stupid OPENCL=1 OOM
2022-09-06 14:29:23 -07:00
George Hotz
f215534a64
1100 lines, but sane linter rules
2022-09-06 13:47:45 -07:00
George Hotz
f683b26eef
bring back native exp log
2022-09-06 07:59:04 -07:00
George Hotz
d6f499fd69
improve opencl, why is it OOMing
2022-09-05 20:14:31 -07:00
Comma Device
c07bf72d6e
save free 200ms
2022-08-31 20:31:42 -04:00
Comma Device
a734df98fa
TEST_ENET for openpilot compiler
2022-08-31 13:23:36 -04:00
George Hotz
e194ae0c1d
typos
2022-08-30 19:52:21 -07:00
George Hotz
5efab7cf1d
add reciprocal
2022-08-29 18:00:24 -07:00
George Hotz
dc7af8c3ac
thneed run float32
2022-08-28 11:03:35 -07:00
Comma Device
9678cb8a1a
hmm, the native exp/log breaks it too much
2022-08-22 17:13:08 -07:00
George Hotz
2162cd3383
fix typing
2022-08-22 16:25:15 -07:00
Comma Device
e0a8d0f836
image input works
2022-08-22 16:04:17 -07:00
George Hotz
18340e7d30
remove from_image
2022-08-22 15:52:26 -07:00
Comma Device
1b5f4e52d9
refactor getters
2022-08-22 13:29:08 -07:00
George Hotz
a8734df030
add openpilot tests to tinygrad
2022-08-21 12:03:37 -07:00
George Hotz
b132de677d
tinygrad.nn ( #367 )
...
* tinygrad.nn
* flake8
* working on pylint
* more pylint
* more pylint
* pylint passes
* networkx
* mypy can't infer that type
* junk
2022-08-18 07:41:00 -07:00
George Hotz
783c120a8c
rawcpu ( #365 )
...
* rawcpu
* add should work when we respect shapetracker
* now that's true
* still have to handle shapetracker
* copyin
* Fix mypy
2022-08-17 11:33:20 +02:00
George Hotz
57e5df9f28
ane: procPath issue. don't waste more time with this, focus on core tinygrad
2022-08-16 10:36:13 +02:00
George Hotz
bdfdbc8f8d
broken amfi patch
2022-08-13 10:41:25 +02:00
George Hotz
262efe5784
update readme
2022-08-09 11:08:52 +02:00
George Hotz
6267a3c8c2
notes
2022-08-09 00:42:14 +02:00
George Hotz
f4ff130947
docs
2022-08-09 00:06:24 +02:00
George Hotz
01de17eeb8
amfi note
2022-08-08 13:17:36 +02:00
George Hotz
136706169d
fix ane on new mac os x
2022-08-06 19:10:22 +00:00
George Hotz
f300caa486
notes
2022-08-06 15:21:26 +00:00
George Hotz
94d526f8fc
fix op estimate
2022-08-06 14:15:50 +00:00
George Hotz
f2847cb710
remove useless init, add ops counter
2022-08-06 14:05:25 +00:00