* tensor implementation for rmsprop and adam
* test_mnist.py extended to cover sgd, rmsprop and adam on cpu and gpu
* number of steps reduced for adam from 1000 to 200
* streamlined numerical_jacobian
* Got rid of the g loop in Conv2D.forward
* ereased stupid line
* nothing
* no loops in Conv2D forward
* Conv2D backprop improved
* stupid things in examples
* alternative to einsum
* Conv2D backward einsum alternative
* tidying up
* tidied up
* no ravel
* got rid of print
* Update efficientnet.py
* Update efficientnet.py
* Update efficientnet.py
* only tensordot
* 255.0
* whitespace
* aspect ratio error in efficientnet
* noprint
* efficient net wrong strides
* broadcasting for backward ops
* Update ops.py
* Update ops.py
- was wrong
* broadcast test for backward enabled
* function adBC + not summing over already 1 axis
* spacing
Co-authored-by: Marcel Bischoff <marcel@Marcels-iMac.local>
* allow for general broadcasting of binary operations. can handle any situation where corresponding dimensions between the tensors match, or at least one of them is of size 1. if a tensor has fewer dimensions than the other, then its size is padded with 1s until they match have the same number. also refactored buffer_zeros() by creating a function buff() that makes a buffer from a numpy array
* remove extra tabs
Co-authored-by: phillip <phillip_bement@reedbement.com>
* Pad2d backward pass on GPU
* Faster Pad2D GPU backward pass (no zeroing needed)
* Fix out of bounds error
* Don't save prg
* Let compiler optimize division by 1
* More generic broadcasting (1s at the start)
* Bug fix
* Add comment
* Try to fix flaky test with other method
* Add mixed broadcast support
* 1kernel
* Separate broadcast tests
Co-authored-by: holonomicjl <58403584+holonomicjl@users.noreply.github.com>
* no trailing whitespace
* GPU MaxPool2D.backward(); TinyConvNet train passes!
* Fix GPU avgpool.forward() init_val
Doesn’t change result but is simpler.
* Fix MaxPool GPU init_val
Tests only cover random non-negative inputs. This fixes issues if negative inputs are fed to GPU MaxPool2D. Test update to follow.
* to make it work locally
* definitely not working
* Conv2D GPU passes some of the tests
* Conv2D GPU passes more of the tests
* passes some tests and mnist
* removed unecessary code
* Conv2D Backpass works
* wrong test_ops.py
* white space + test backward
* ereased useless code
* removed default argument
* long lines
Strided CPU Pooling was introduced but assumes small kernel size
(<=(10,10)), but efficientnet.py feeds kernel_size=(112,112).
This causes a huge array buffer allocation in stack_for_pool() that
hangs inference for a long time or until system OOM.
Revert CPU Pooling for now, and re-introduce #74 later with a new
global-average-pooling op that can be used instead of avgpool2d with
large kernel size for efficientnet inference.
Co-authored-by: Ryan Neph <ryanneph@google.com>
* copy tensors to and from gpu
* add on GPU
* adding works
* we stick shapes in
* works on cpu and gpu
* test changes, not passing yet
* something else
* op tests pass
* add, mean, and sum have working forward/backward
* mul ops test
* no gpu support, no problem
* test pass, clean up later
* gpu cleanup
* cleanup test ops, don't let div fail
* revert more
* aimpler dispatcher
* clean up grad
* GPU and
* grad is a Tensor now
* gate test on GPU
* cleanups
* late loading gpu
* GPU as input option
* last cleanups