Commit Graph

21 Commits

Author SHA1 Message Date
YassineYousfi
2f0f91ba3d support float16 onnx weights (#384) 2022-09-15 09:12:18 -04:00
George Hotz
18fde22dac fix that soon 2022-07-20 09:07:09 -07:00
George Hotz
44848ee5dc prints show we can precompute from the outside 2022-07-08 10:59:20 -07:00
George Hotz
001cfe83a2 local 2022-07-07 10:05:26 -07:00
George Hotz
2720ef49ca extra and test and tuple 2022-07-07 10:01:33 -07:00
George Hotz
81b73f97a3 Optiimzation (#355)
* constant folding into kernels

* that opt worth it?

* fix mypy

* ast one kernel

* save 2 lines in conv kernel

* debug print kernel count

* cl debugging

* early realize inputs

* refactor Device
2022-07-04 08:58:57 -07:00
George Hotz
e6e43e820e should fix tests 2022-07-03 16:06:11 -07:00
George Hotz
98a730dd00 benchmark on different inputs 2022-06-21 20:20:58 -07:00
George Hotz
83d50e2687 move to extra.onnx 2022-06-21 19:43:44 -07:00
George Hotz
159a2d1a80 Simple Lazy (#340)
* simple lazy

* simple

* fix graph and make realize simpler

* SHUFFLE_MOVEMENT_OPS already works

* MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS

* it works, but it's slow

* constant inlining

* cache misses are the reason for loss

* fix non determinism

* cleanup, a few tests fail

* profile

* cache lazyop

* cleanups

* create namedtuple once

* bunch of caches

* it's not deleting

* nograd

* caching allocator

* reduce_op

* fromCPU if you want fromCPU

* complain

* nvidia fix

* realized on Tensor

* numpy is very slow

* no loads in second run

* caching in View

* 10ms speedups on batman

* remove old profiler

* bunch of refactors

* contiguous on view

* elementwise_op_compile for conv

* support ewop after processing op

* this still works

* conv folding works

* all we do is conv conv conv no matter what

* all args to the conv

* still works

* unify conv and ewop

* ops_gpu cleanup

* move around ops_gpu

* remove caching allocator

* remove unused

* find_conv shorten

* gpu refactors

* simpler gpu

* and that

* cmp is fast

* 18ms on mac

* it's a lot of lines, but it's faster

* minor

* tests pass

* LoadOps.CONTIGUOUS

* remove dups

* torch converter doesn't support slice

* move lazy out for merge

* LoadOps are only for lazy
2022-06-20 22:45:11 -07:00
George Hotz
a3538e225a Simple Lazy Pieces (#343)
* simple lazy

* simple

* fix graph and make realize simpler

* SHUFFLE_MOVEMENT_OPS already works

* MERGE_MOVEMENT_OPS and REMOVE_MOVEMENT_NOPS

* it works, but it's slow

* constant inlining

* cache misses are the reason for loss

* fix non determinism

* cleanup, a few tests fail

* profile

* cache lazyop

* cleanups

* create namedtuple once

* bunch of caches

* it's not deleting

* nograd

* caching allocator

* reduce_op

* fromCPU if you want fromCPU

* complain

* nvidia fix

* realized on Tensor

* numpy is very slow

* no loads in second run

* caching in View

* 10ms speedups on batman

* remove old profiler

* bunch of refactors

* contiguous on view

* elementwise_op_compile for conv

* support ewop after processing op

* this still works

* conv folding works

* all we do is conv conv conv no matter what

* all args to the conv

* still works

* unify conv and ewop

* ops_gpu cleanup

* move around ops_gpu

* remove caching allocator

* remove unused

* find_conv shorten

* gpu refactors

* simpler gpu

* mergable without this

* ops torch
2022-06-20 20:28:10 -07:00
George Hotz
d05e7c291a contiguous_view (#336)
* contiguous_view

* non contig reduce too

* conv fast

* maybe faster valid

* improve test_onnx

* improve params

* elementwise_op

* draw non contig
2022-06-19 20:37:28 -07:00
George Hotz
8d08e41c21 print time in test 2022-06-19 00:59:09 -07:00
George Hotz
77f5cef8a6 First batch from lazy branch (#332)
* test and helpers from lazy

* lazy pt2
2022-06-18 17:26:59 -07:00
George Hotz
d747a4b9e2 add padding to conv2d function, other minor things 2022-06-11 22:29:42 -07:00
George Hotz
9ebd472375 move ops to ops.py 2022-06-11 15:58:56 -07:00
George Hotz
b5b68e75ff simpler onnx 2022-06-11 15:35:45 -07:00
George Hotz
2305a5347b test_onnx works with enet also 2022-06-11 14:30:26 -07:00
George Hotz
6fdb276886 flip batchnorm function order 2022-06-11 13:20:41 -07:00
George Hotz
85d17a2acd running resnet onnx 2022-06-11 13:17:15 -07:00
George Hotz
db5a632e8c multicat + test onnx is generic onnx 2022-06-11 11:50:47 -07:00