* Update all devices to be tested
ANE, CPU and OCL all now support all tests.
However tests are not currently passing on GPU and I cannot test on CPU.
Failing GPU test are not an issue caused by this update. Tests have not
been passing due to a missing "six" required installation.
OpenCL Tests have not been run since commit: 1a1c63a08b
devices have 3 types and are handle by a new DeviceTypes enum. (The goal
is to revert to Tensor.<type>, but this current setup allows for keyword
argument defaults: `device=DeviceType.CPU`)
All references to Tensor.GPU/CPU/ANE as been converted to the
corresponding `DeviceTypes` enum.
Refactor of the conversion code to allow for any device to any device
conversion.
* Add six dependency in requirements.txt
* Resolve failure to run tests
Move six into gpu required installs. Remove six from standard
installation.
* Remove repeated data conversion
* Refactor method names
Also reduce code with .to and .to_
* Dynamic device handlers
* Refactor DeviceTypes -> Device
* Add mem copy profiling back
* test_backward_pass_diamond_model passing
* Resolve Sum issue on GPU
* Revert batchnorm2d tests
* Update README with upadated API
* ANE testing with
* Last minute line gains
* 2serious
* load/save
* fixing GPU
* added DEBUG
* needs BatchNorm or doesn't learn anything
* old file not needed
* added conv biases
* added extra/training.py and checkpoint
* assert in test only
* save
* padding
* num_classes
* checkpoint
* checkpoints for padding
* training was broken
* merge
* rotation augmentation
* more aug
* needs testing
* streamline augment, augment is fast thus bicubic
* tidying up
* 🎉 effort to generate mnist data with tinygrad.
* dropout added
* working gan
* minor bug fixes
* more bug fixes
* todo reg l2
* detach
* logsoftmax twice
* allow for general broadcasting of binary operations. can handle any situation where corresponding dimensions between the tensors match, or at least one of them is of size 1. if a tensor has fewer dimensions than the other, then its size is padded with 1s until they match have the same number. also refactored buffer_zeros() by creating a function buff() that makes a buffer from a numpy array
* remove extra tabs
* messy loop unrolling
* fix loop unrolling bugs
* revert loop unrolling changes, new plan here
* binary_op(): avoid having a loop in the GPU C code, instead compute indices with nested expressions. simple broadcasts should have a similar level of performance to the simple-broadcast-specific code that was there before. broke out codegen and compilation into get_binop_prg(), which has a larger cache and depends only on the operation type and complist (this avoids doing a bunch of python string ops every time we want to compile something we've already compiled). the larger cache is needed since there will end up being quite a few possible types of broacasts (sum_i^N 3**i is a loose upper bound, N being the maximum number of dimensions). I assumed 5 kinds of binary operations when sizing the cache here, +, -, *, /, and **. More may be needed in the future.
* add .cl to binop arguments
* solved edge case where len(dimlist)==0. still problems when len(dimlist) > CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS
* pyopencl can't handle more than 3 gids, so we just use 1 gid and compute the indices into the returned tensor in the kernel. this means more computation for the individual indices, but less for the index into the flattened tensor (last line of kernel), since it's just gid0
* trim some lines
Co-authored-by: phillip <phillip_bement@reedbement.com>
* no of categories for efficientnet
* need layer_init_uniforn
* merge fail
* merge fail
* batchnorms
* needs work
* needs work how determine training
* pow
* needs work
* reshape was needed
* sum with axis
* sum with axis and tests
* broken
* works again
* clean up
* Update test_ops.py
* using sum
* don't always update running_stats
* space
* self
* default return running_stats
* passes test
* need to use mean
* merge
* testing
* fixing pow
* test_ops had a line dropped
* undo pow
* rebase
* Consistent GPU classes
Convert the existing GPU classes into one standard format.
Remove duplicated functions in `test_mnist` and create a TestMNISTGPU
class. This reduces line count and ensures consistency.
Use `@unittest.skipUnless(GPU, "Requires GPU")` instead of `if GPU:` to
skip GPU testing. This will ensure that skipped tests are displayed
accordingly in the pytest output.
* Optim Testing now supports GPU
* Tensor testing now supports GPU
jacobian and gradcheck auto skipped until GPU float64 support added.
* GPU support for custom constructor methods
* Remove GPU flag from Model constructors
It was requested that the `gpu` kwarg be removed from the model
constructor. GPU conversion is now handled in the train function.
This also required the conversion of Optimizer parameters as they are
constructed prior to execution of the `train` function and are dependant
on the model GPU state.
* Fix typo: float32->float64
* Clean `get_parameters` utility
Just a quick refactor w/ the new support for optimizers.
* Remove GPU kwarg from TinyNet
Remove `gpu` kwarg from tiny net to match test_mnist `train` function.