Commit Graph

7979 Commits

Author SHA1 Message Date
George Hotz
e3c2579537 flip stride to match canonical 2022-06-26 19:19:53 -07:00
George Hotz
53ab09de79 remove the SLICE on conv dw 2022-06-26 19:09:36 -07:00
George Hotz
149581b0b2 Cdx without SLICE 2022-06-26 18:51:53 -07:00
George Hotz
a04813ffe3 1 line less in cpu, fix torch tests 2022-06-26 18:11:53 -07:00
George Hotz
dffde3de5a support both asymmetric and negative padding 2022-06-26 17:59:25 -07:00
George Hotz
49c954b389 comments 2022-06-26 17:20:25 -07:00
George Hotz
8c483fbdc9 maxpool lazy fix 2022-06-26 17:07:03 -07:00
George Hotz
f607f18006 fix backward 2022-06-25 00:00:53 -07:00
George Hotz
ec30f0402f improve benchmark_train_efficientnet 2022-06-24 23:46:38 -07:00
George Hotz
3a147137ee CL_DEVICE option 2022-06-24 23:22:10 -07:00
George Hotz
d748353ce5 err, okay, a bit more off 2022-06-24 22:44:57 -07:00
George Hotz
bdde95f16e CACHE_LAZYBUFFERS options + benchmark. only a couple x from torch 2022-06-24 22:33:53 -07:00
George Hotz
6847eaf5b6 comments 2022-06-22 09:37:50 -07:00
George Hotz
1d4fb3527e cleanups to Tensor class 2022-06-22 09:33:30 -07:00
George Hotz
3e13e3330a UNSAFE_FLOAT4 env 2022-06-22 08:20:29 -07:00
George Hotz
73415e20ab this fixes 2 of the conv recomputes...but it's ugh 2022-06-22 08:18:12 -07:00
George Hotz
b2d5df6049 3 convs are being recomputed 2022-06-22 07:54:52 -07:00
George Hotz
ba2defcdef elif False 2022-06-21 23:54:09 -07:00
George Hotz
9cb0522574 noargs 2022-06-21 23:48:58 -07:00
George Hotz
1074dfbb71 unstrided 2022-06-21 23:42:21 -07:00
George Hotz
9ae01290ba pass in shorts 2022-06-21 23:33:23 -07:00
George Hotz
18d74c01b1 float4 opt 2022-06-21 21:27:51 -07:00
George Hotz
ff3d5fe962 debugging while we compile 2022-06-21 21:12:04 -07:00
George Hotz
b12985b013 openpilot compiler 2022-06-21 20:31:18 -07:00
George Hotz
98a730dd00 benchmark on different inputs 2022-06-21 20:20:58 -07:00
George Hotz
9d06a86f7f CL class, debugging 2022-06-21 20:16:29 -07:00
George Hotz
0b820f7966 FOLD_CONSTANTS_INTO_KERNELS and shapetracker OOB tweak 2022-06-21 19:47:15 -07:00
George Hotz
83d50e2687 move to extra.onnx 2022-06-21 19:43:44 -07:00
George Hotz
1ebc2b5545 lazy opencl works 2022-06-21 19:41:08 -07:00
George Hotz
c833886bf5 improved shapetracker 2022-06-21 19:17:25 -07:00
George Hotz
c53c91f949 opencl tests passed (#347) 2022-06-21 18:57:09 -07:00
George Hotz
8fbe2e4aed No ctx in llops (#345)
* remove ctx from gpu ops

* ctx for the others

* this is okay

* mlops are not static. fix lazy

* cl is property, _processing_op is class method

* kernel_name

* contiguous_op
2022-06-21 10:07:49 -07:00
George Hotz
159a2d1a80 Simple Lazy (#340)
* simple lazy

* simple

* fix graph and make realize simpler

* SHUFFLE_MOVEMENT_OPS already works

* MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS

* it works, but it's slow

* constant inlining

* cache misses are the reason for loss

* fix non determinism

* cleanup, a few tests fail

* profile

* cache lazyop

* cleanups

* create namedtuple once

* bunch of caches

* it's not deleting

* nograd

* caching allocator

* reduce_op

* fromCPU if you want fromCPU

* complain

* nvidia fix

* realized on Tensor

* numpy is very slow

* no loads in second run

* caching in View

* 10ms speedups on batman

* remove old profiler

* bunch of refactors

* contiguous on view

* elementwise_op_compile for conv

* support ewop after processing op

* this still works

* conv folding works

* all we do is conv conv conv no matter what

* all args to the conv

* still works

* unify conv and ewop

* ops_gpu cleanup

* move around ops_gpu

* remove caching allocator

* remove unused

* find_conv shorten

* gpu refactors

* simpler gpu

* and that

* cmp is fast

* 18ms on mac

* it's a lot of lines, but it's faster

* minor

* tests pass

* LoadOps.CONTIGUOUS

* remove dups

* torch converter doesn't support slice

* move lazy out for merge

* LoadOps are only for lazy
2022-06-20 22:45:11 -07:00
George Hotz
a3538e225a Simple Lazy Pieces (#343)
* simple lazy

* simple

* fix graph and make realize simpler

* SHUFFLE_MOVEMENT_OPS already works

* MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS

* it works, but it's slow

* constant inlining

* cache misses are the reason for loss

* fix non determinism

* cleanup, a few tests fail

* profile

* cache lazyop

* cleanups

* create namedtuple once

* bunch of caches

* it's not deleting

* nograd

* caching allocator

* reduce_op

* fromCPU if you want fromCPU

* complain

* nvidia fix

* realized on Tensor

* numpy is very slow

* no loads in second run

* caching in View

* 10ms speedups on batman

* remove old profiler

* bunch of refactors

* contiguous on view

* elementwise_op_compile for conv

* support ewop after processing op

* this still works

* conv folding works

* all we do is conv conv conv no matter what

* all args to the conv

* still works

* unify conv and ewop

* ops_gpu cleanup

* move around ops_gpu

* remove caching allocator

* remove unused

* find_conv shorten

* gpu refactors

* simpler gpu

* mergable without this

* ops torch
2022-06-20 20:28:10 -07:00
George Hotz
2ee85812f7 intel opencl (#342)
* intel opencl

* run clinfo

* that fix it?

* meh

* think it's the same

* basekit fix

* it wasn't basekit

* more minimal

* no clinfo
2022-06-20 19:25:55 -07:00
George Hotz
3e7416163d batch from lazy branch (#341) 2022-06-20 17:42:35 -07:00
George Hotz
a7131b6a46 Non contig (#339)
* contiguous_view

* non contig reduce too

* conv fast

* maybe faster valid

* improve test_onnx

* improve params

* elementwise_op

* draw non contig

* improve contiguous
2022-06-19 22:40:48 -07:00
George Hotz
d05e7c291a contiguous_view (#336)
* contiguous_view

* non contig reduce too

* conv fast

* maybe faster valid

* improve test_onnx

* improve params

* elementwise_op

* draw non contig
2022-06-19 20:37:28 -07:00
George Hotz
fb72ea3fbd gpu uses shapetracker (fix tests) (#335)
* shapetracker

* movement_op

* hmm, that's why repr failed
2022-06-19 17:32:07 -07:00
George Hotz
ce2e20b768 fix test 2022-06-19 17:07:09 -07:00
George Hotz
f5f21ecb86 gpu buffer is shapetracker 2022-06-19 17:02:24 -07:00
George Hotz
6b652dafb2 touchups 2022-06-19 16:57:14 -07:00
George Hotz
e364849b3b stuff from lazy 2022-06-19 09:57:16 -07:00
Tim Lügger
2069fef292 unnecessary assign add in cpu processing_op (#334)
We can replace += with = since we only change tmp once.
Now np.empty() can replace np.zeros() which might be slightly faster.
This saves a few milliseconds, best case ~60ms.

(However, most of the time in ops_cpu.processing_op() seems to be spend on np.reshape())
2022-06-19 07:41:40 -07:00
George Hotz
8d08e41c21 print time in test 2022-06-19 00:59:09 -07:00
George Hotz
395eb60f46 less lines, and oddly faster 2022-06-18 21:48:42 -07:00
George Hotz
aa164d901e remove ctx from buffers (#333) 2022-06-18 17:27:10 -07:00
George Hotz
77f5cef8a6 First batch from lazy branch (#332)
* test and helpers from lazy

* lazy pt2
2022-06-18 17:26:59 -07:00
George Hotz
3faf8353ca remove out_shape from processing_op 2022-06-16 17:07:57 -07:00
George Hotz
a11deb5150 shapetracker check for noop 2022-06-16 16:29:18 -07:00