George Hotz
2ee85812f7
intel opencl ( #342 )
...
* intel opencl
* run clinfo
* that fix it?
* meh
* think it's the same
* basekit fix
* it wasn't basekit
* more minimal
* no clinfo
2022-06-20 19:25:55 -07:00
George Hotz
3e7416163d
batch from lazy branch ( #341 )
2022-06-20 17:42:35 -07:00
George Hotz
a7131b6a46
Non contig ( #339 )
...
* contiguous_view
* non contig reduce too
* conv fast
* maybe faster valid
* improve test_onnx
* improve params
* elementwise_op
* draw non contig
* improve contiguous
2022-06-19 22:40:48 -07:00
George Hotz
d05e7c291a
contiguous_view ( #336 )
...
* contiguous_view
* non contig reduce too
* conv fast
* maybe faster valid
* improve test_onnx
* improve params
* elementwise_op
* draw non contig
2022-06-19 20:37:28 -07:00
George Hotz
fb72ea3fbd
gpu uses shapetracker (fix tests) ( #335 )
...
* shapetracker
* movement_op
* hmm, that's why repr failed
2022-06-19 17:32:07 -07:00
George Hotz
ce2e20b768
fix test
2022-06-19 17:07:09 -07:00
George Hotz
f5f21ecb86
gpu buffer is shapetracker
2022-06-19 17:02:24 -07:00
George Hotz
6b652dafb2
touchups
2022-06-19 16:57:14 -07:00
George Hotz
e364849b3b
stuff from lazy
2022-06-19 09:57:16 -07:00
Tim Lügger
2069fef292
unnecessary assign add in cpu processing_op ( #334 )
...
We can replace += with = since we only change tmp once.
Now np.empty() can replace np.zeros() which might be slightly faster.
This saves a few milliseconds, best case ~60ms.
(However, most of the time in ops_cpu.processing_op() seems to be spend on np.reshape())
2022-06-19 07:41:40 -07:00
George Hotz
8d08e41c21
print time in test
2022-06-19 00:59:09 -07:00
George Hotz
395eb60f46
less lines, and oddly faster
2022-06-18 21:48:42 -07:00
George Hotz
aa164d901e
remove ctx from buffers ( #333 )
2022-06-18 17:27:10 -07:00
George Hotz
77f5cef8a6
First batch from lazy branch ( #332 )
...
* test and helpers from lazy
* lazy pt2
2022-06-18 17:26:59 -07:00
George Hotz
3faf8353ca
remove out_shape from processing_op
2022-06-16 17:07:57 -07:00
George Hotz
a11deb5150
shapetracker check for noop
2022-06-16 16:29:18 -07:00
George Hotz
52505faaf4
minor
2022-06-16 15:53:45 -07:00
George Hotz
d5b3e18540
Accelerate with CL ( #325 )
...
* accelerated opencl
* it's running, it's just wrong
* bugfix
* model is correct in opencl
* lazy image convert
* add padding support to convolution
* that stuff was all upstreamed
* remove HEAD
* oops
* test_simple_conv2d_4 passes, add dilation support
* put logic in ops_opencl
* fix crash
* hmm, stride seems okay
* padding for batched inputs
* just an issue now with cout%4
* op model still passes
* fix startPackedInputChannel
* pre and post processing ops for graph
* don't break other llops
* shapetrackering
* reshapes are free
* lazy movement ops
2022-06-16 15:40:52 -07:00
George Hotz
bd7068f635
fix tests hopefully
2022-06-16 14:07:37 -07:00
George Hotz
9306759cbc
put the allocations back in the ops
2022-06-16 12:12:55 -07:00
George Hotz
ce15bf2bdb
the big memory gradient didn't even need to be computed
2022-06-16 11:41:29 -07:00
George Hotz
2e58948f6a
Revert "can put that test back"
...
This reverts commit 51b082b41a .
2022-06-16 11:25:49 -07:00
George Hotz
51b082b41a
can put that test back
2022-06-16 11:18:14 -07:00
George Hotz
73bc181fbe
cleaner output shape
2022-06-16 10:24:03 -07:00
George Hotz
b5796ae4f9
remove useless reshape
2022-06-16 10:15:43 -07:00
George Hotz
89db797e57
get rid of reduce using channels
2022-06-16 10:01:54 -07:00
George Hotz
38d6cfec2a
remove the expand
2022-06-16 09:54:56 -07:00
George Hotz
bcfbb4c81b
minor cleanups
2022-06-15 22:27:46 -07:00
George Hotz
3667200df5
remove unused unstride
2022-06-15 20:03:43 -07:00
George Hotz
ff648e9510
remove convt and compute dx with conv
2022-06-15 19:54:15 -07:00
George Hotz
142c88f2e3
move to mlops
2022-06-15 18:06:07 -07:00
George Hotz
85fe25e27b
add stride support to shapetracker
2022-06-15 17:48:41 -07:00
George Hotz
827e8f67eb
comment
2022-06-15 17:31:27 -07:00
George Hotz
3d4657167b
fix tests hopefully
2022-06-15 17:26:37 -07:00
George Hotz
e4ab57e39d
oops, only stride
2022-06-15 15:25:58 -07:00
George Hotz
86f55b078d
transpose dilation was simple
2022-06-15 15:20:51 -07:00
George Hotz
2a14befb74
support padding
2022-06-15 14:46:44 -07:00
George Hotz
6d98366214
move CONVDW out of llops
2022-06-15 12:05:11 -07:00
George Hotz
fef6c82491
wow dilation support was simple
2022-06-15 11:38:23 -07:00
George Hotz
0b182029dd
support dilated convolution in torch
2022-06-14 18:03:35 -07:00
George Hotz
a690ba4588
add test for padding
2022-06-14 17:41:22 -07:00
George Hotz
e057ca23bb
add flip
2022-06-14 17:28:43 -07:00
George Hotz
a8aeebfb0c
use shapetracker to combine adj reduce axis
2022-06-14 17:08:12 -07:00
George Hotz
906cce9916
reduce with loops
2022-06-14 16:38:33 -07:00
George Hotz
6261a0639b
ShapeTracker ( #328 )
...
* start shapetracker
* that late reshape is crushing our hopes
* simple failure
* DumbShapeTracker passes tests
* improve st tests
* stacked view tracker works
* flip works
* tests pass
* shapetracker works
* use ShapeTracker in ops_gpu
* a couple lines
* fix 0 shape
* less lines
* use shapetracker for new_shape in ops.py
* simpler still
* padding with a ZeroView
* gamed it a little
2022-06-14 16:08:22 -07:00
George Hotz
e58b5711ec
simpler convdw
2022-06-13 17:56:54 -07:00
George Hotz
dcbca4fdf1
Expand Operator ( #327 )
...
* replace broadcasting with expand
* Tensor, not self
* remove broadcasting from mlops
* delete useless A operator
* expand, not repeat
* remove A op
* expand on gpu
* binary_op doesn't broadcast anymore
* expand is still total junk, but the tests should pass
2022-06-12 12:31:48 -07:00
George Hotz
5cf7649eda
register the operators outside
2022-06-12 10:26:34 -07:00
George Hotz
33f18c61a1
test_broadcasted_add
2022-06-12 10:19:58 -07:00
George Hotz
d47a421970
add cout to conv_args, don't change the first 12
2022-06-12 00:10:15 -07:00