* quick math: 0 + x = x.
* gradient w.r.t. x using cherry for conv
* gradient w.r.t. w for conv on cherry but doing vector dot products
* small optimization
* [cherry] optimize conv backpass for large channel count
* get rid of numpy einsum
* added resnets
* fix minor
* fix minor
* resnet in models
* added resnet test
* added resnet train test
* added linear, conv2d nn tests
* fix minor in extra/training
* resnet in models
* fix minor
* fix tolerance for linear in nn test
* fix eval, this causes cpu and gpu UT failing
* revert transformer test
* fix minor for CPU test
* improved model get_params for sequential layer
* fix minor for params counting
* commented broken ops tests
* improved train for resnet
* ops_risk
* risk sim
* guessing is for winners
* minor
* better
* matmal with risk
* conv doesn't work
* closer
* conv2d works
* ops_risk
* opt2 works
* opt1 may not be possible
* opt1 is a mulacc
* arty
* attosoc example building on mac
* minor
* riscv assembler
* gucci gang
* we got C code
* not a scam
* hello
* make risk mergeable into master
* unop support
* use isinstance, some optimizations & whitespace removal
* revert whitespace changes
* revert more whitespace
* some more cleanup
* revert fstring (not a fan of the {{}})
* fix typo
* fix typo
* vgg7 implementation - not the best, but it works
* VGG7 implementation: Spread nansbane to deter NaNs, maybe improved training experience
* VGG7 implementation: Fix training, for real this time
Results actually attempt to approximate the input
* VGG7 implementation: Sample probability management
* Split tests
Split tests into "Test CPU" and "Test GPU".
Add test flag "TEST_DEVICES" which is a comma separated list of devices:
CPU,GPU,ANE
* Run tests based on provided TEST_DEVICES flag
By default will run all "CPU,GPU,ANE"
* fix bad quote
* Revert changes and use GPU=1
This is done through setting the default Tensor Device to Device.CPU of
GPU=1 is set.
Run GPU tests: GPU=1 pytest -s -v