Commit Graph

49 Commits

Author SHA1 Message Date
George Hotz
0514594083 fix openpilot test 2022-10-20 11:56:26 -07:00
George Hotz
b7f748c15a Fix GPU 2**31 virtual size limit (#392)
* in progress

* big conv test works

* that's unneeded

* fix opencl with reduce

* rewrite contiguous_view_constant_fold

* clean up mids in loop code

* subidx

* print cl kernel before run

* no reduce, no loop

* Revert "no reduce, no loop"

This reverts commit 92777e40e9.
2022-10-05 00:55:20 -04:00
George Hotz
8382c51c12 always MATMUL, test the ops in OPENCL 2022-10-01 13:31:29 -04:00
Ollin Boer Bohan
3b1767e013 Fix OpenCL Metal texture issues (#378)
* Fix OpenCL Metal texture issues

Tile CL images when needed, to fit into the 16384 max Metal image size;
gets me to ~4.8s/iteration for SD on M1 Pro with OPENCL=1 FLOAT16=1.

* Minor cleanup

* Fix mish in CI, or no-op?

* Is mish being framed?

* It would help if any of this reproduced locally

* ???

* OPT is reverted; use original mish

* Cleanup post-review

* Fix some shape usage

* Tiler tests, shouldn't oom or overflow either

* Can't CL if there's no CL?

* Run tiler tests even if GPU=1

* relu6 segfault binary chop; revert test

* relu6 segfault binary chop; revert accel

* relu6 segfault binary chop; revert . (???)

* end relu6 segfault binary chop; repo's haunted
2022-09-29 01:21:54 -04:00
Comma Device
75f937227a add barrier 2022-09-13 11:39:48 -04:00
George Hotz
3c3534736e fix matmul kernel and tests 2022-09-13 08:31:04 -07:00
Comma Device
62e9419206 fix test failure on MATMUL=1 backward pass 2022-09-13 11:18:52 -04:00
George Hotz
0516359af8 fix stupid OPENCL=1 OOM 2022-09-06 14:29:23 -07:00
George Hotz
f215534a64 1100 lines, but sane linter rules 2022-09-06 13:47:45 -07:00
George Hotz
f683b26eef bring back native exp log 2022-09-06 07:59:04 -07:00
George Hotz
d6f499fd69 improve opencl, why is it OOMing 2022-09-05 20:14:31 -07:00
Comma Device
c07bf72d6e save free 200ms 2022-08-31 20:31:42 -04:00
Comma Device
a734df98fa TEST_ENET for openpilot compiler 2022-08-31 13:23:36 -04:00
George Hotz
e194ae0c1d typos 2022-08-30 19:52:21 -07:00
George Hotz
5efab7cf1d add reciprocal 2022-08-29 18:00:24 -07:00
George Hotz
dc7af8c3ac thneed run float32 2022-08-28 11:03:35 -07:00
Comma Device
9678cb8a1a hmm, the native exp/log breaks it too much 2022-08-22 17:13:08 -07:00
George Hotz
2162cd3383 fix typing 2022-08-22 16:25:15 -07:00
Comma Device
e0a8d0f836 image input works 2022-08-22 16:04:17 -07:00
George Hotz
18340e7d30 remove from_image 2022-08-22 15:52:26 -07:00
Comma Device
1b5f4e52d9 refactor getters 2022-08-22 13:29:08 -07:00
George Hotz
a8734df030 add openpilot tests to tinygrad 2022-08-21 12:03:37 -07:00
George Hotz
b132de677d tinygrad.nn (#367)
* tinygrad.nn

* flake8

* working on pylint

* more pylint

* more pylint

* pylint passes

* networkx

* mypy can't infer that type

* junk
2022-08-18 07:41:00 -07:00
George Hotz
5d45c6e516 Fold reduce (#362)
* folding reduce

* fold through movementops

* fixup shapes

* was too aggressive

* i knew we needed that

* don't recompute reduce

* working

* fix openpilot compile

* prunegraph openpilot

* types and reduce_shape

* refactor

* cleanups

* neater

* 1009

* 1004

* clean up reduce for 998
2022-07-19 09:24:02 -07:00
George Hotz
5e96ed523a fix opencl bug, no training on opencl 2022-07-17 12:55:26 -07:00
George Hotz
608e2431f7 test opencl, commit to removing the crap conv code from GPU 2022-07-17 11:55:37 -07:00
George Hotz
3c4565fa21 SLICE -> PAD,SHRINK 2022-07-17 11:33:59 -07:00
George Hotz
bcf422dfdd Device2 (#358)
* option for matmul

* fixups

* fast like a nascar

* running

* thneed runner

* no buffer id makes no backing buffer

* move constant folding to the top

* runs on mac

* folded biases

* was v slow

* maybe just that

* elu touchup

* speed and float32

Co-authored-by: Comma Device <device@comma.ai>
2022-07-16 07:26:19 -07:00
George Hotz
817b64f5e5 A conv is a reduce op (#356)
* universal strided conv

* more correct

* hmm, CPU works

* cleaner cl code output

* make noconv a flag

* cleanup __getitem__

* refactor broadcasting

* put that back

* unneeded reshape in getitem

* fix strided for torch
2022-07-10 19:58:50 -07:00
George Hotz
68959be05d precompute weights for opencl 2022-07-08 10:56:48 -07:00
George Hotz
d8e7f1f8bc opencl type ignore 2022-07-08 10:33:55 -07:00
George Hotz
ae335b6d3e opencl works, but tons of kernels 2022-07-08 10:22:04 -07:00
George Hotz
5b66d1bb0b begin fixing up opencl 2022-07-08 10:20:14 -07:00
George Hotz
e3c2579537 flip stride to match canonical 2022-06-26 19:19:53 -07:00
George Hotz
3e13e3330a UNSAFE_FLOAT4 env 2022-06-22 08:20:29 -07:00
George Hotz
73415e20ab this fixes 2 of the conv recomputes...but it's ugh 2022-06-22 08:18:12 -07:00
George Hotz
b2d5df6049 3 convs are being recomputed 2022-06-22 07:54:52 -07:00
George Hotz
ba2defcdef elif False 2022-06-21 23:54:09 -07:00
George Hotz
9cb0522574 noargs 2022-06-21 23:48:58 -07:00
George Hotz
1074dfbb71 unstrided 2022-06-21 23:42:21 -07:00
George Hotz
9ae01290ba pass in shorts 2022-06-21 23:33:23 -07:00
George Hotz
18d74c01b1 float4 opt 2022-06-21 21:27:51 -07:00
George Hotz
ff3d5fe962 debugging while we compile 2022-06-21 21:12:04 -07:00
George Hotz
9d06a86f7f CL class, debugging 2022-06-21 20:16:29 -07:00
George Hotz
1ebc2b5545 lazy opencl works 2022-06-21 19:41:08 -07:00
George Hotz
c53c91f949 opencl tests passed (#347) 2022-06-21 18:57:09 -07:00
George Hotz
77f5cef8a6 First batch from lazy branch (#332)
* test and helpers from lazy

* lazy pt2
2022-06-18 17:26:59 -07:00
George Hotz
52505faaf4 minor 2022-06-16 15:53:45 -07:00
George Hotz
d5b3e18540 Accelerate with CL (#325)
* accelerated opencl

* it's running, it's just wrong

* bugfix

* model is correct in opencl

* lazy image convert

* add padding support to convolution

* that stuff was all upstreamed

* remove HEAD

* oops

* test_simple_conv2d_4 passes, add dilation support

* put logic in ops_opencl

* fix crash

* hmm, stride seems okay

* padding for batched inputs

* just an issue now with cout%4

* op model still passes

* fix startPackedInputChannel

* pre and post processing ops for graph

* don't break other llops

* shapetrackering

* reshapes are free

* lazy movement ops
2022-06-16 15:40:52 -07:00