* move view pushing to codegen, try 2
* fix up some linearizer tests
* fix test search
* fix test schedule
* delete that test
* fix test arange
* fix a few tests
* update tests
* push views
* ebs cleanup
* fix local/reg
* test and lint
* fix more tests
* test cleanups
* skipped that one
* noop
* fix noop
* store cat is NOOP
* store dtype is void
* stores aren't passed through anymore
* meh, skip those for ptx
* correct ptx skip
* hl runs
* kernel.py no longer permutes reduce axis [pr]
* delete tests that handcode uops
* regen of sops is broken...
* put import back
* just remove that
* disable those tests
* refactor count_float4 to take uops as input instead of kernel
* remove some calls to linearize in test_linearizer
* remove some more calls
* remove one more call
* minor cleanup on test_tensor_core_opts tests
Tests now notify when skipped
Before, they silently skipped if backend didn't had half precision and
accumulation
Also cleaned up atol and rtol setup
* refactor test_tensor_core_opts_group
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
* work
* no itertools + top down pass
* clean viz
* python can do that
* webgpu
* gbarrier of gbarrier is gbarrier
* device can be tuple
* bug in toposort
* failing test for gated toposort
* contiguous of gbarrier is gbarrier
* check for binops
* Revert "check for binops"
This reverts commit 53e3cdf720.
* viz + match on gbarrier, self exists by default
* alt
* green now
* cleanup
* propagate use_tensor_cores
* add use_tensor_core to arg in test and search
* bugfix
* get TC val from ContextVar in search
* revert minor space change
* add tc emulation test to ci and benchmark
* revert
* revert whitespace change
* remove test for ptx
* add comment and remove llvm test run
* hotfix amd and amd_llvm
* bf16 not supported in ci
* hotfix amd_llvm is not a device
* remove default
* dont gate on ci and amd_llvm
* minor cleanup
* skip bf16 tc test for amd_llvm
* fix helper_tc_allclose
* cleanup
* hotfix
* cleanup
* cleanup
* check real buffer and add cast for bf16
* cleanup
* fix padded for ops_python
* avoid assert on amd emulated tc
* swap dimensions
* revert, should have nothing to do with padded
* revert fix, should not go in this pr
* remove skip
* init
* add expected failure to correctly track progres
* hotfix
* skip for amd_llvm as well
* add skip
* add pr number
* move comment to amd test
* change reason