* move high level stuff to unit tests [pr]
* process replay on unit tests
* fix pr, less compute
* set omp num threads
* set 200MB buffer size limit
* delete junk
* fix tests
* faster
* move test_indexing to unit
* faster
* remove del spam from CI
* more
* preconstruct default buffer spec
* ignore those errors
* check exception
* more exception check
* skip stuff
* smaller tests mean faster tests
* a few more
* skip a few slow tests
* use a venv for python packages
* create venv
* no user, it's in venv
* ignore venv
* venv
* new cache key
* try that
* this
* version the python cache
* First version, caught a bug?
* Nicely print failure to reproduce
* Remove that
* Put the assert back
* Change fuzzing to use testing_unit so it has z3
* Test key to match
* Add rule
* Add test
* Add test for edge case 0
* Merge patterns
* update comment
* consistent whitespace
* whitespace
* add condition
* add test
* update comment
* use Variable
* fuzzer using z3_renderer
* Cleaned up printing and debugging
* working new fuzzer
* change some comments and printing
* more formatting
* fuzz failures in seperate file
* fix fstring
* more tests
* naming
* remove added line
* remove comment
* print number of skipped expressions
* use self.assertEqual
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
This mockgpu sqtt emulation will just ignore basically everything and end
up with a 0x1000 size trace full of zeroes, but just testing for things
like register rename is better than nothing i guess
* amd mockgpu graph support
For testing remote graph stuff (prompted by #10371) in ci
* Remote fixedvars
Somehow none of existing tests failed when fixedvars were added, looking
what to add as an regression test for this
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Not strictly required for anything but soon there will be like 4 new
properties and having it be a huge json just seems like a bad taste.
It also seems right to not have a separate endpoint for this, just
`GetProperties` request that returns a repr of this similar to how
requests are sent in `BatchRequest`.
This will also make a switch to anything other than http much simpler
if it will be required for any reason, like just a tcp stream of
`BatchRequest`s
* Basic remote multi support
Simplest thing to be able to use remote with multiple gpus, very slow
because no transfers (copyin copyout for cross-device copies)
* tests
* use function for infinity instead of uniform
* test infinity math locally
* test infinity math in CI
* make pytest available to MacOS (WebGPU)
* revert to master except failing webgpu test
* Less messy broken graph on paravirtualized metal workaround
GitHub CI macOS runners use paravirtualized metal which is broken with
graph (some comments say that ICB in particular is broken but in my
testing it was fine sometimes, but other times hitting an assert inside
metal's code related to resouces, so not sure).
> Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458.
This can be reproduced locally with any virtualization software (like utm)
that can create macOS VMs with apple's own virtualization framework.
* unused import
* propagate use_tensor_cores
* add use_tensor_core to arg in test and search
* bugfix
* get TC val from ContextVar in search
* revert minor space change
* add tc emulation test to ci and benchmark
* revert
* revert whitespace change
* remove test for ptx
* add comment and remove llvm test run
* init
* add expected failure to correctly track progres
* hotfix
* skip for amd_llvm as well
* add skip
* add pr number
* move comment to amd test
* change reason
A lot more work is required to enable all of them and move into osxtests
matrix, for now i created a separate runner for them (copied from WebGPU)
Will add test/test_graph.py to those tests in #9876
* set pad t 3 for amd padded tc test
* change pad for amd regardless CI
* test tc padded uops and correctness separately
* add test_tensor_cores_padded_uops test to ci
* remove redundant chack for amd device
* cleanup
* FastPatternMatcher
* works without that
* fix test pickle
* strict len
* compile match function
* dynamic compile
* fast
* faster
* compile
* track
* a lot faster
* clean up
* dup or
* faster and simpler
* fast match doesn't support store
* plane
* minor refactor
* real speed
* don't imply return None
* upat
* fix test
* heard you wanted more speed
* no generator
* split cf
* early fixup
* fxn fixup
* reconstruct_function
* Revert "reconstruct_function"
This reverts commit 37dac010ab.
* simpler stuff
* too big
* upat compile error
* cleanups
* don't cache that
* cleanups
* 10 -> 15
Had to autogen newer uapi headers for #9746 (dmabuf export ioctl missing),
submitting just the fix without updating to newer headers as they are only
needed for infiniband stuff
* fast idiv with tests and fuzzer
* Add todo comment
* Add env variable to toggle fast_idiv
* Move env check
* Add fuzz fast_idiv to ci
---------
Co-authored-by: chenyu <chenyu@fastmail.com>