Commit Graph

4667 Commits

Author SHA1 Message Date
George Hotz
e6e43e820e should fix tests 2022-07-03 16:06:11 -07:00
George Hotz
d7aad46758 test lazy also, make TestMNIST faster 2022-07-03 15:19:19 -07:00
George Hotz
93c378dffc add test for slice_one 2022-07-03 12:14:20 -07:00
George Hotz
f9a8412b68 make contiguous ops yellow 2022-07-02 17:54:04 -07:00
George Hotz
207b9e1df3 padding is now a param to conv2d 2022-07-02 17:11:12 -07:00
George Hotz
cde137d163 simple shapetracker tests 2022-07-02 16:02:15 -07:00
George Hotz
368c0ce2f6 NUM=-2 for ants 2022-07-02 15:47:10 -07:00
George Hotz
7276f8d6bf improve constant folding, detach before moving tensor 2022-07-02 15:29:40 -07:00
George Hotz
e55a9833fb a little more readable 2022-06-27 08:54:04 -07:00
George Hotz
3a414d7f50 cleanup, add flops tracking 2022-06-26 22:43:39 -07:00
George Hotz
dffde3de5a support both asymmetric and negative padding 2022-06-26 17:59:25 -07:00
George Hotz
49c954b389 comments 2022-06-26 17:20:25 -07:00
George Hotz
8c483fbdc9 maxpool lazy fix 2022-06-26 17:07:03 -07:00
George Hotz
98a730dd00 benchmark on different inputs 2022-06-21 20:20:58 -07:00
George Hotz
83d50e2687 move to extra.onnx 2022-06-21 19:43:44 -07:00
George Hotz
c833886bf5 improved shapetracker 2022-06-21 19:17:25 -07:00
George Hotz
159a2d1a80 Simple Lazy (#340)
* simple lazy

* simple

* fix graph and make realize simpler

* SHUFFLE_MOVEMENT_OPS already works

* MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS

* it works, but it's slow

* constant inlining

* cache misses are the reason for loss

* fix non determinism

* cleanup, a few tests fail

* profile

* cache lazyop

* cleanups

* create namedtuple once

* bunch of caches

* it's not deleting

* nograd

* caching allocator

* reduce_op

* fromCPU if you want fromCPU

* complain

* nvidia fix

* realized on Tensor

* numpy is very slow

* no loads in second run

* caching in View

* 10ms speedups on batman

* remove old profiler

* bunch of refactors

* contiguous on view

* elementwise_op_compile for conv

* support ewop after processing op

* this still works

* conv folding works

* all we do is conv conv conv no matter what

* all args to the conv

* still works

* unify conv and ewop

* ops_gpu cleanup

* move around ops_gpu

* remove caching allocator

* remove unused

* find_conv shorten

* gpu refactors

* simpler gpu

* and that

* cmp is fast

* 18ms on mac

* it's a lot of lines, but it's faster

* minor

* tests pass

* LoadOps.CONTIGUOUS

* remove dups

* torch converter doesn't support slice

* move lazy out for merge

* LoadOps are only for lazy
2022-06-20 22:45:11 -07:00
George Hotz
a3538e225a Simple Lazy Pieces (#343)
* simple lazy

* simple

* fix graph and make realize simpler

* SHUFFLE_MOVEMENT_OPS already works

* MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS

* it works, but it's slow

* constant inlining

* cache misses are the reason for loss

* fix non determinism

* cleanup, a few tests fail

* profile

* cache lazyop

* cleanups

* create namedtuple once

* bunch of caches

* it's not deleting

* nograd

* caching allocator

* reduce_op

* fromCPU if you want fromCPU

* complain

* nvidia fix

* realized on Tensor

* numpy is very slow

* no loads in second run

* caching in View

* 10ms speedups on batman

* remove old profiler

* bunch of refactors

* contiguous on view

* elementwise_op_compile for conv

* support ewop after processing op

* this still works

* conv folding works

* all we do is conv conv conv no matter what

* all args to the conv

* still works

* unify conv and ewop

* ops_gpu cleanup

* move around ops_gpu

* remove caching allocator

* remove unused

* find_conv shorten

* gpu refactors

* simpler gpu

* mergable without this

* ops torch
2022-06-20 20:28:10 -07:00
George Hotz
a7131b6a46 Non contig (#339)
* contiguous_view

* non contig reduce too

* conv fast

* maybe faster valid

* improve test_onnx

* improve params

* elementwise_op

* draw non contig

* improve contiguous
2022-06-19 22:40:48 -07:00
George Hotz
d05e7c291a contiguous_view (#336)
* contiguous_view

* non contig reduce too

* conv fast

* maybe faster valid

* improve test_onnx

* improve params

* elementwise_op

* draw non contig
2022-06-19 20:37:28 -07:00
George Hotz
fb72ea3fbd gpu uses shapetracker (fix tests) (#335)
* shapetracker

* movement_op

* hmm, that's why repr failed
2022-06-19 17:32:07 -07:00
George Hotz
ce2e20b768 fix test 2022-06-19 17:07:09 -07:00
George Hotz
6b652dafb2 touchups 2022-06-19 16:57:14 -07:00
George Hotz
e364849b3b stuff from lazy 2022-06-19 09:57:16 -07:00
George Hotz
8d08e41c21 print time in test 2022-06-19 00:59:09 -07:00
George Hotz
77f5cef8a6 First batch from lazy branch (#332)
* test and helpers from lazy

* lazy pt2
2022-06-18 17:26:59 -07:00
George Hotz
a11deb5150 shapetracker check for noop 2022-06-16 16:29:18 -07:00
George Hotz
52505faaf4 minor 2022-06-16 15:53:45 -07:00
George Hotz
d5b3e18540 Accelerate with CL (#325)
* accelerated opencl

* it's running, it's just wrong

* bugfix

* model is correct in opencl

* lazy image convert

* add padding support to convolution

* that stuff was all upstreamed

* remove HEAD

* oops

* test_simple_conv2d_4 passes, add dilation support

* put logic in ops_opencl

* fix crash

* hmm, stride seems okay

* padding for batched inputs

* just an issue now with cout%4

* op model still passes

* fix startPackedInputChannel

* pre and post processing ops for graph

* don't break other llops

* shapetrackering

* reshapes are free

* lazy movement ops
2022-06-16 15:40:52 -07:00
George Hotz
bd7068f635 fix tests hopefully 2022-06-16 14:07:37 -07:00
George Hotz
ce15bf2bdb the big memory gradient didn't even need to be computed 2022-06-16 11:41:29 -07:00
George Hotz
2e58948f6a Revert "can put that test back"
This reverts commit 51b082b41a.
2022-06-16 11:25:49 -07:00
George Hotz
51b082b41a can put that test back 2022-06-16 11:18:14 -07:00
George Hotz
85fe25e27b add stride support to shapetracker 2022-06-15 17:48:41 -07:00
George Hotz
3d4657167b fix tests hopefully 2022-06-15 17:26:37 -07:00
George Hotz
2a14befb74 support padding 2022-06-15 14:46:44 -07:00
George Hotz
fef6c82491 wow dilation support was simple 2022-06-15 11:38:23 -07:00
George Hotz
0b182029dd support dilated convolution in torch 2022-06-14 18:03:35 -07:00
George Hotz
a690ba4588 add test for padding 2022-06-14 17:41:22 -07:00
George Hotz
e057ca23bb add flip 2022-06-14 17:28:43 -07:00
George Hotz
6261a0639b ShapeTracker (#328)
* start shapetracker

* that late reshape is crushing our hopes

* simple failure

* DumbShapeTracker passes tests

* improve st tests

* stacked view tracker works

* flip works

* tests pass

* shapetracker works

* use ShapeTracker in ops_gpu

* a couple lines

* fix 0 shape

* less lines

* use shapetracker for new_shape in ops.py

* simpler still

* padding with a ZeroView

* gamed it a little
2022-06-14 16:08:22 -07:00
George Hotz
dcbca4fdf1 Expand Operator (#327)
* replace broadcasting with expand

* Tensor, not self

* remove broadcasting from mlops

* delete useless A operator

* expand, not repeat

* remove A op

* expand on gpu

* binary_op doesn't broadcast anymore

* expand is still total junk, but the tests should pass
2022-06-12 12:31:48 -07:00
George Hotz
33f18c61a1 test_broadcasted_add 2022-06-12 10:19:58 -07:00
George Hotz
af300b121b refactor to pass conv args into llops 2022-06-11 23:08:46 -07:00
George Hotz
d747a4b9e2 add padding to conv2d function, other minor things 2022-06-11 22:29:42 -07:00
George Hotz
9a3c048724 skip broken tests, no float64 allowed 2022-06-11 17:12:04 -07:00
George Hotz
9ebd472375 move ops to ops.py 2022-06-11 15:58:56 -07:00
George Hotz
b5b68e75ff simpler onnx 2022-06-11 15:35:45 -07:00
George Hotz
2305a5347b test_onnx works with enet also 2022-06-11 14:30:26 -07:00
George Hotz
6fdb276886 flip batchnorm function order 2022-06-11 13:20:41 -07:00