* the universe is flat as a 2D tensor
* try this
* TESTS
* less lines in test
* don't change all_int since other places use it
* add tests and del noqa by making non-aesthetic spacing LOOOOOL
* some reordering
* fixed empty list and add tests
* more tests
* add list bool tensors
* clearer with least lines added
* added bool
* oops
* more tests
* improved tests
* oops
* zero in shape start
* no assert for that
* if output size is 0, return without exec
* tweak
* strides
* reduce over non-zero
* shrink and expand
* fix import
* test_elementwise where
* cannot reshape from size 0 to size 1
* compiled backend reduce over 0
* zeros for numpy
* reduce over 0 and keepdim resulted in 1
* reduce empty set default values
* compare with same input
* pad test case
* cat test case
* torch does not support that?
* use correct dtype in Tensor when data is an ndarray
* attempt 2
* add assert to be consistent
* Add test case for ndarray
* Add test case for list
* remove whitespace
* models matrix
* fix typo and install gpu deps
* install llvm deps if needed
* fix
* testops with cuda
* remove pip cache since not work
* cuda env
* install cuda deps
* maybe it will work now
* i can't read
* all tests in matrix
* trim down more
* opencl stuff in matrix
* opencl pip cache
* test split
* change cuda test exclusion
* test
* fix cuda maybe
* add models
* add more n=auto
* third thing
* fix bug
* cache pip more
* change name
* update tests
* try again cause why not
* balance
* try again...
* try apt cache for cuda
* try on gpu:
* try cuda again
* update packages step
* replace libz-dev with zlib1g-dev
* only cache cuda
* why error
* fix gpuocelot bug
* apt cache err
* apt cache to slow?
* opt and image in single runner
* add a couple n=autos
* remove test matrix
* try cuda apt cache again
* libz-dev -> zlib1g-dev
* remove -s since not supported by xdist
* the cache takes too long and doesn't work
* combine webgpu and metal tests
* combine imagenet to c and cpu tests
* torch tests with linters
* torch back by itself
* small windows clang test with torch tests
* fix a goofy windows bug
* im dumb
* bro
* clang with linters
* fix pylint error
* linter not work on windows
* try with clang again
* clang and imagenet?
* install deps
* fix
* fix quote
* clang by itself (windows too slow)
* env vars for imagenet
* cache pip for metal and webgpu tests
* try torch with metal and webgpu
* doesn't work, too long
* remove -v
* try -n=logical
* don't use logical
* revert accidental thing
* remove some prints unless CI
* fix print unless CI
* ignore speed tests for slow tests
* clang windows in matrix (ubuntu being tested in imagenet->c test)
* try manual pip cache
* fix windows pip cache path
* all manual pip cache
* fix pip cache dir for macos
* print_ci function in helpers
* CI as variable, no print_ci
* missed one
* cuda tests with docker image
* remove setup-python action for cuda
* python->python3?
* remove -s -v
* try fix pip cache
* maybe fix
* try to fix pip cache
* is this the path?
* maybe cache pip
* try again
* create wheels dir
* ?
* cuda pip deps in dockerfile
* disable pip cache for clang
* image from ghcr instead of docker hub
* why is clang like this
* fast deps
* try use different caches
* remove the fast thing
* try with lighter image
* remove setup python for cuda
* small docker and cuda fast deps
* ignore a few more tests
* cool docker thing (maybe)
* oops
* quotes
* fix docker command
* fix bug
* ignore train efficientnet test
* remove dockerfile (docker stuff takes too long)
* remove docker stuff and normal cuda
* oops
* ignore the tests for cuda
* does this work
* ignore test_train on slow backends
* add space
* llvm ignore same tests as cuda
* nvm
* ignore lr scheduler tests
* get some stats
* fix ignore bug
* remove extra '
* remove and
* ignore test for llvm
* change ignored tests and durationon all backends
* fix
* and -> or
* ignore some more cuda tests
* finally?
* does this fix it
* remove durations=0
* add some more tests to llvm
* make last pytest more readable
* fix
* don't train efficientnet on cpu
* try w/out pip cache
* pip cache seems to be generally better
* pytest file markers
* try apt fast for cuda
* use quick install for apt-fast
* apt-fast not worth
* apt-get to apt
* fix typo
* suppress warnings
* register markers
* disable debug on fuzz tests
* change marker names
* apt update and apt install in one command
* update marker names in test.yml
* webgpu pytest marker
* add and reorganize test_slice_* tests
* refactor Tensor.__getitem__()
* preliminary tests for 1) 0D tensors and 2) varargs for Tensor.zeros and Tensor.ones
* always compare shapes of the numpy arrays obtained from tinygrad and torch tensors
* add more tests for 0D support
* remove test_tensor.test_slicing(). All slicing tests at test/test_ops.py
* add zero-dim support
* make test_end2end.py consistent with 0dim support
* add test for tensor with zero in shape
* don't simplify ones if shape is ()
* skip tests that need zero-size tensor support.
- zero-size tensor support not related to 0dim tensors.
* add tests for __getitem__() supporting strides >= 1
* refactor __getitem__: support for strides >= 1
* minor refactors and add comments to __getitem__
* add tests for slices with negative steps
* add support for slices with negative strides
* Added few missing return typehints for tensor.py
* added test for empty tensor for Tensor.numel()
* fixed missing numel call in test_numel
---------
Co-authored-by: deefi <dee7ine@gmail.com>
* use tensor dtype for zeros_like()
* add tests for zeros_like dtype
* iterate over dtypes
* remove space
* remove print
* fix test, iterate over a list
* simple convnext implementation
* shorter function names
* need to realize the random functions now
* creating an optimizer realizes all params
* assign contiguous
* fix lazy lazy
* why was i doing that...add convnext to tests
* LazyNumpyArray
* enable assert + comment
* no two tiny
* Rewrote Tensor.__getitem__ to fix negative indices and add support for np.newaxis/None
* Fixed pad2d
* mypy doesn't know about mlops methods
* normal python behavior for out-of-bounds slicing
* type: ignore
* inlined idxfix
* added comment for __getitem__
* Better comments, better tests, and fixed bug in np.newaxis
* Add dropout test
* Remove condition where training is false
* Skip dropout test when on GPU
* Revert changes to tensor.py and fix test case
* Revert change on whitespace
* Convert Tensor to cpu for testing
* Fix whitespace in tensor.py
* Split tests
Split tests into "Test CPU" and "Test GPU".
Add test flag "TEST_DEVICES" which is a comma separated list of devices:
CPU,GPU,ANE
* Run tests based on provided TEST_DEVICES flag
By default will run all "CPU,GPU,ANE"
* fix bad quote
* Revert changes and use GPU=1
This is done through setting the default Tensor Device to Device.CPU of
GPU=1 is set.
Run GPU tests: GPU=1 pytest -s -v
* Update all devices to be tested
ANE, CPU and OCL all now support all tests.
However tests are not currently passing on GPU and I cannot test on CPU.
Failing GPU test are not an issue caused by this update. Tests have not
been passing due to a missing "six" required installation.
OpenCL Tests have not been run since commit: 1a1c63a08b
devices have 3 types and are handle by a new DeviceTypes enum. (The goal
is to revert to Tensor.<type>, but this current setup allows for keyword
argument defaults: `device=DeviceType.CPU`)
All references to Tensor.GPU/CPU/ANE as been converted to the
corresponding `DeviceTypes` enum.
Refactor of the conversion code to allow for any device to any device
conversion.
* Add six dependency in requirements.txt
* Resolve failure to run tests
Move six into gpu required installs. Remove six from standard
installation.
* Remove repeated data conversion
* Refactor method names
Also reduce code with .to and .to_
* Dynamic device handlers
* Refactor DeviceTypes -> Device
* Add mem copy profiling back
* test_backward_pass_diamond_model passing
* Resolve Sum issue on GPU
* Revert batchnorm2d tests
* Update README with upadated API
* ANE testing with
* Last minute line gains