* conv2d is an hlop
* shorter conv
* KOPT=-1
* alt imp
* MULACC
* smarter mulacc
* pop conv
* 7x7 -> 5x5
* didn't fix, that's not going to work
* this is faster and matches old behavior
* oh, non lazy just won't work with mulacc
* mulacc in torch
* bool types were creeping in
* optimizer is actually better with hlop conv
* fix pushing permutes issue
* refactor einsum_mulacc
* fix up readme
* update readme
* _image_conv2d
* fix bias addition location
* pushing permutes gets back to 200 kernels
* conv cleanup
* disable hlop conv
* don't hide that in helpers
* Rename Normalize and move to nn
* Fix comparison to None error
* Add test for GroupNorm
* Rename test case
* Flip parameters to match PyTorch
* Increase error tolerance
* Fix elementwise_affine on channels
* Match arguments with PyTorch
* Initialize weight and bias only when affine is true
* Is this it?
* A bit cleaner
* Handle case where weight or bias is None
* added resnets
* fix minor
* fix minor
* resnet in models
* added resnet test
* added resnet train test
* added linear, conv2d nn tests
* fix minor in extra/training
* resnet in models
* fix minor
* fix tolerance for linear in nn test
* fix eval, this causes cpu and gpu UT failing
* revert transformer test
* fix minor for CPU test
* improved model get_params for sequential layer
* fix minor for params counting
* commented broken ops tests
* improved train for resnet
* Some progress on yolov3
* Removed some debugging comments… Also, the forward pass eats all RAM for some reason
* forward pass almost runs
* forward pass runs almost
* forward pass runs, now we gotta load the weights
* loading weights works
* fetches config and weights
* everything kind of works, postprocessing of output still needs to be implemented, temp_process_results kind of works, but its kind of terrible, and not how things should be done
* some changes
* fixed some bugs in the forward pass and load_weights function, now outputs more correct values, however some values are still loaded incorrectly
* Something is wrong with the forward pass, Conv2d tests added
* forward pass almost outputs correct values, gotta fix one more thign
* yolo works
* some final changes
* reverting changes
* removed dataloader
* fixed some indentation
* comment out failing test, somehow it fails CI even though it passes on my computer…
* fixed wrong probabilities
* added webcam option to YOLO, now just need to add bounding boxes and speed it up
* some progress towards adding bounding boxes
* trying to speed up yolo layer on GPU, still faster on CPU but with 30GB ram usage
* Faster inference times, bounding boxes added correctly, webcam works, but is slow, and there is a memory leak when running on CPU... Also added tinygrads output on the classic dog image
* removed some debugging print statements
* updated result image
* something weird is going on, mean op on GPU tensor randomly faults, copying a tensor from GPU->CPU takes 10+ seconds…
* Split tests
Split tests into "Test CPU" and "Test GPU".
Add test flag "TEST_DEVICES" which is a comma separated list of devices:
CPU,GPU,ANE
* Run tests based on provided TEST_DEVICES flag
By default will run all "CPU,GPU,ANE"
* fix bad quote
* Revert changes and use GPU=1
This is done through setting the default Tensor Device to Device.CPU of
GPU=1 is set.
Run GPU tests: GPU=1 pytest -s -v
* Update all devices to be tested
ANE, CPU and OCL all now support all tests.
However tests are not currently passing on GPU and I cannot test on CPU.
Failing GPU test are not an issue caused by this update. Tests have not
been passing due to a missing "six" required installation.
OpenCL Tests have not been run since commit: 1a1c63a08b
devices have 3 types and are handle by a new DeviceTypes enum. (The goal
is to revert to Tensor.<type>, but this current setup allows for keyword
argument defaults: `device=DeviceType.CPU`)
All references to Tensor.GPU/CPU/ANE as been converted to the
corresponding `DeviceTypes` enum.
Refactor of the conversion code to allow for any device to any device
conversion.
* Add six dependency in requirements.txt
* Resolve failure to run tests
Move six into gpu required installs. Remove six from standard
installation.
* Remove repeated data conversion
* Refactor method names
Also reduce code with .to and .to_
* Dynamic device handlers
* Refactor DeviceTypes -> Device
* Add mem copy profiling back
* test_backward_pass_diamond_model passing
* Resolve Sum issue on GPU
* Revert batchnorm2d tests
* Update README with upadated API
* ANE testing with
* Last minute line gains