* in progress
* big conv test works
* that's unneeded
* fix opencl with reduce
* rewrite contiguous_view_constant_fold
* clean up mids in loop code
* subidx
* print cl kernel before run
* no reduce, no loop
* Revert "no reduce, no loop"
This reverts commit 92777e40e9.
* option for matmul
* fixups
* fast like a nascar
* running
* thneed runner
* no buffer id makes no backing buffer
* move constant folding to the top
* runs on mac
* folded biases
* was v slow
* maybe just that
* elu touchup
* speed and float32
Co-authored-by: Comma Device <device@comma.ai>
* accelerated opencl
* it's running, it's just wrong
* bugfix
* model is correct in opencl
* lazy image convert
* add padding support to convolution
* that stuff was all upstreamed
* remove HEAD
* oops
* test_simple_conv2d_4 passes, add dilation support
* put logic in ops_opencl
* fix crash
* hmm, stride seems okay
* padding for batched inputs
* just an issue now with cout%4
* op model still passes
* fix startPackedInputChannel
* pre and post processing ops for graph
* don't break other llops
* shapetrackering
* reshapes are free
* lazy movement ops
* replace broadcasting with expand
* Tensor, not self
* remove broadcasting from mlops
* delete useless A operator
* expand, not repeat
* remove A op
* expand on gpu
* binary_op doesn't broadcast anymore
* expand is still total junk, but the tests should pass
* quick math: 0 + x = x.
* gradient w.r.t. x using cherry for conv
* gradient w.r.t. w for conv on cherry but doing vector dot products
* small optimization
* [cherry] optimize conv backpass for large channel count
* get rid of numpy einsum
* added resnets
* fix minor
* fix minor
* resnet in models
* added resnet test
* added resnet train test
* added linear, conv2d nn tests
* fix minor in extra/training
* resnet in models
* fix minor
* fix tolerance for linear in nn test
* fix eval, this causes cpu and gpu UT failing
* revert transformer test
* fix minor for CPU test
* improved model get_params for sequential layer
* fix minor for params counting
* commented broken ops tests
* improved train for resnet
* ops_risk
* risk sim
* guessing is for winners
* minor
* better
* matmal with risk
* conv doesn't work
* closer
* conv2d works
* ops_risk
* opt2 works
* opt1 may not be possible
* opt1 is a mulacc
* arty
* attosoc example building on mac
* minor
* riscv assembler
* gucci gang
* we got C code
* not a scam
* hello
* make risk mergeable into master
* unop support