* remove pytest marks
* test more stuff
* fine revert some
* add that mark back
* skip that
* hmm LLVM does not work on ubuntu
* too slow on CUDA CI
* dup test
* ops_gpu is go
* fix size 0
* fix image, and add more tests
* nerf openpilot test, doesn't test thneed
* run the schedule
* better
* oops, new inputs
* delete pyopencl
* Update ops_gpu.py
* cpu tests pass
* torch works
* works
* metal works
* fix ops_disk
* metal jit works
* fix openpilot
* llvm and clang work
* fix webgpu
* docs are rly broken
* LRU works on metal
* delete comment
* revert name to ._buf. LRU only on Compiled
* changes
* allocator
* allocator, getting closer
* lru alloc
* LRUAllocator
* all pass
* metal
* cuda
* test examples
* linearizer
* test fixes
* fix custom + clean realize
* fix hip
* skip tests
* fix tests
* fix size=0
* fix MOCKHIP
* fix thneed
* copy better
* simple
* old style metal copy
* fix thneed
* np reshape
* give cuda a device
* rebalance
* balance
* parallel apt-get for all
* .local/lib/python3.11/site-packages
* what is user doing
* is that path right
* Update test.yml
* okay where are you
* site-packages
* image support weird loads
* umm, that was always wrong
* openpilot compile fails with a weird error
* image test passes
* we have valids now
* clean that up
* no more required opts
* add fastvits test, fix bug
* minor cleanups
* hip amd compilation
* gate the test properly
* cleanup unused import
* remove superfluous numpy conversion
* add SpeedyNet tests (f32 [passes] & f16 [fails])
* make CI verbose (error log from hip compiler)
* test the real ops_hip
* Merge branch 'tinygrad:master' into ci/hip-compilation
* fix CI
* cleanup
* really fix CI
* Fix CI Three: the refixening
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* hip amd compilation
* gate the test properly
* cleanup unused import
* remove superfluous numpy conversion
* add SpeedyNet tests (f32 [passes] & f16 [fails])
* make CI verbose (error log from hip compiler)
* test the real ops_hip
* Merge branch 'tinygrad:master' into ci/hip-compilation
* fix CI
* cleanup
* really fix CI
* add name support
* use fetch in gpt2
* remove requests from main lib, networkx also optional
* umm, keep that assert
* updates to fetch
* i love the walrus so much
* stop bundling mnist with tinygrad
* err, https
* download cache names
* add DOWNLOAD_CACHE_VERSION
* need env.
* ugh, wrong path
* replace get_child
* force rebuild of ocelot
* SzymonOzog gpuocelot
* delete that
* downgrade that
* non parallel
* force rebuild
* use llvm
* nauto
* less mem maybe
* print test
* helper_test_exception skip CUDACPU
* helper_test_exception
* shippable
* very close
* remove comment
* negative strides working
* almost everything passes
* calculate offset with list comprehension
* some cleanup
* got disk load working
* review suggestions
* fix after merge
* overlap working
* did it
* clean
* fixed disk load
* lint
* mypy
* removed as_strided
* trying without simplify
* added back simplify
* make sure expanding to smaller shape
* cleanup
* removed comment
* removed env file
* trying whisper test again
* onnx test sqlite issue
* working on test
* finished test
* eliminate unnecessary shrink-then-pad
* don't shrink buffer
* added strides check
* added to ci under linters
* switch issue
* allow symbolic stride
* removed .env
* isinstance
* adjust strides for double expand
* cleanup
* needed to add type hint for mypy
* set pythonpath
* Enable Multi-Output Export
* Add test
* Update examples and lint
* fix padding
* test ops
* dummy commit to rerun test
* revert cuda lint
* Enforce tuple/list of tensors
* subscripted generics
* put back webgpu test
* Re-enable WebGPU Efficientnet test
* wmma: refactor tensor cores using existing local dims
* optimizer: fix bad rebase and break after one late local
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>