Commit Graph

68 Commits

Author SHA1 Message Date
George Hotz
bdfdbc8f8d broken amfi patch 2022-08-13 10:41:25 +02:00
George Hotz
262efe5784 update readme 2022-08-09 11:08:52 +02:00
George Hotz
6267a3c8c2 notes 2022-08-09 00:42:14 +02:00
George Hotz
f4ff130947 docs 2022-08-09 00:06:24 +02:00
George Hotz
01de17eeb8 amfi note 2022-08-08 13:17:36 +02:00
George Hotz
136706169d fix ane on new mac os x 2022-08-06 19:10:22 +00:00
George Hotz
f300caa486 notes 2022-08-06 15:21:26 +00:00
George Hotz
94d526f8fc fix op estimate 2022-08-06 14:15:50 +00:00
George Hotz
f2847cb710 remove useless init, add ops counter 2022-08-06 14:05:25 +00:00
George Hotz
5d45c6e516 Fold reduce (#362)
* folding reduce

* fold through movementops

* fixup shapes

* was too aggressive

* i knew we needed that

* don't recompute reduce

* working

* fix openpilot compile

* prunegraph openpilot

* types and reduce_shape

* refactor

* cleanups

* neater

* 1009

* 1004

* clean up reduce for 998
2022-07-19 09:24:02 -07:00
George Hotz
5e96ed523a fix opencl bug, no training on opencl 2022-07-17 12:55:26 -07:00
George Hotz
608e2431f7 test opencl, commit to removing the crap conv code from GPU 2022-07-17 11:55:37 -07:00
George Hotz
3c4565fa21 SLICE -> PAD,SHRINK 2022-07-17 11:33:59 -07:00
George Hotz
bcf422dfdd Device2 (#358)
* option for matmul

* fixups

* fast like a nascar

* running

* thneed runner

* no buffer id makes no backing buffer

* move constant folding to the top

* runs on mac

* folded biases

* was v slow

* maybe just that

* elu touchup

* speed and float32

Co-authored-by: Comma Device <device@comma.ai>
2022-07-16 07:26:19 -07:00
George Hotz
817b64f5e5 A conv is a reduce op (#356)
* universal strided conv

* more correct

* hmm, CPU works

* cleaner cl code output

* make noconv a flag

* cleanup __getitem__

* refactor broadcasting

* put that back

* unneeded reshape in getitem

* fix strided for torch
2022-07-10 19:58:50 -07:00
George Hotz
68959be05d precompute weights for opencl 2022-07-08 10:56:48 -07:00
George Hotz
d8e7f1f8bc opencl type ignore 2022-07-08 10:33:55 -07:00
George Hotz
ae335b6d3e opencl works, but tons of kernels 2022-07-08 10:22:04 -07:00
George Hotz
5b66d1bb0b begin fixing up opencl 2022-07-08 10:20:14 -07:00
George Hotz
e822aae9ec reorg opts, nicer graph 2022-07-02 22:29:09 -07:00
George Hotz
7276f8d6bf improve constant folding, detach before moving tensor 2022-07-02 15:29:40 -07:00
George Hotz
07b438aa8b move that to resolve time 2022-07-02 14:26:13 -07:00
George Hotz
dbf4aa09db assert and tuple 2022-06-27 09:19:54 -07:00
George Hotz
37a6c0ef59 create with new ShapeTracker 2022-06-27 09:07:45 -07:00
George Hotz
e55a9833fb a little more readable 2022-06-27 08:54:04 -07:00
George Hotz
3a414d7f50 cleanup, add flops tracking 2022-06-26 22:43:39 -07:00
George Hotz
a699f7cb0b debug cleanups 2022-06-26 21:58:44 -07:00
George Hotz
15a16b98e6 remove get_root 2022-06-26 21:18:02 -07:00
George Hotz
e3c2579537 flip stride to match canonical 2022-06-26 19:19:53 -07:00
George Hotz
49c954b389 comments 2022-06-26 17:20:25 -07:00
George Hotz
8c483fbdc9 maxpool lazy fix 2022-06-26 17:07:03 -07:00
George Hotz
bdde95f16e CACHE_LAZYBUFFERS options + benchmark. only a couple x from torch 2022-06-24 22:33:53 -07:00
George Hotz
3e13e3330a UNSAFE_FLOAT4 env 2022-06-22 08:20:29 -07:00
George Hotz
73415e20ab this fixes 2 of the conv recomputes...but it's ugh 2022-06-22 08:18:12 -07:00
George Hotz
b2d5df6049 3 convs are being recomputed 2022-06-22 07:54:52 -07:00
George Hotz
ba2defcdef elif False 2022-06-21 23:54:09 -07:00
George Hotz
9cb0522574 noargs 2022-06-21 23:48:58 -07:00
George Hotz
1074dfbb71 unstrided 2022-06-21 23:42:21 -07:00
George Hotz
9ae01290ba pass in shorts 2022-06-21 23:33:23 -07:00
George Hotz
18d74c01b1 float4 opt 2022-06-21 21:27:51 -07:00
George Hotz
ff3d5fe962 debugging while we compile 2022-06-21 21:12:04 -07:00
George Hotz
9d06a86f7f CL class, debugging 2022-06-21 20:16:29 -07:00
George Hotz
0b820f7966 FOLD_CONSTANTS_INTO_KERNELS and shapetracker OOB tweak 2022-06-21 19:47:15 -07:00
George Hotz
1ebc2b5545 lazy opencl works 2022-06-21 19:41:08 -07:00
George Hotz
c53c91f949 opencl tests passed (#347) 2022-06-21 18:57:09 -07:00
George Hotz
8fbe2e4aed No ctx in llops (#345)
* remove ctx from gpu ops

* ctx for the others

* this is okay

* mlops are not static. fix lazy

* cl is property, _processing_op is class method

* kernel_name

* contiguous_op
2022-06-21 10:07:49 -07:00
George Hotz
159a2d1a80 Simple Lazy (#340)
* simple lazy

* simple

* fix graph and make realize simpler

* SHUFFLE_MOVEMENT_OPS already works

* MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS

* it works, but it's slow

* constant inlining

* cache misses are the reason for loss

* fix non determinism

* cleanup, a few tests fail

* profile

* cache lazyop

* cleanups

* create namedtuple once

* bunch of caches

* it's not deleting

* nograd

* caching allocator

* reduce_op

* fromCPU if you want fromCPU

* complain

* nvidia fix

* realized on Tensor

* numpy is very slow

* no loads in second run

* caching in View

* 10ms speedups on batman

* remove old profiler

* bunch of refactors

* contiguous on view

* elementwise_op_compile for conv

* support ewop after processing op

* this still works

* conv folding works

* all we do is conv conv conv no matter what

* all args to the conv

* still works

* unify conv and ewop

* ops_gpu cleanup

* move around ops_gpu

* remove caching allocator

* remove unused

* find_conv shorten

* gpu refactors

* simpler gpu

* and that

* cmp is fast

* 18ms on mac

* it's a lot of lines, but it's faster

* minor

* tests pass

* LoadOps.CONTIGUOUS

* remove dups

* torch converter doesn't support slice

* move lazy out for merge

* LoadOps are only for lazy
2022-06-20 22:45:11 -07:00
George Hotz
77f5cef8a6 First batch from lazy branch (#332)
* test and helpers from lazy

* lazy pt2
2022-06-18 17:26:59 -07:00
George Hotz
52505faaf4 minor 2022-06-16 15:53:45 -07:00
George Hotz
d5b3e18540 Accelerate with CL (#325)
* accelerated opencl

* it's running, it's just wrong

* bugfix

* model is correct in opencl

* lazy image convert

* add padding support to convolution

* that stuff was all upstreamed

* remove HEAD

* oops

* test_simple_conv2d_4 passes, add dilation support

* put logic in ops_opencl

* fix crash

* hmm, stride seems okay

* padding for batched inputs

* just an issue now with cout%4

* op model still passes

* fix startPackedInputChannel

* pre and post processing ops for graph

* don't break other llops

* shapetrackering

* reshapes are free

* lazy movement ops
2022-06-16 15:40:52 -07:00