* First version, caught a bug?
* Nicely print failure to reproduce
* Remove that
* Put the assert back
* Change fuzzing to use testing_unit so it has z3
* Test key to match
* Add rule
* Add test
* Add test for edge case 0
* Merge patterns
* update comment
* consistent whitespace
* whitespace
* add condition
* add test
* update comment
* use Variable
* fuzzer using z3_renderer
* Cleaned up printing and debugging
* working new fuzzer
* change some comments and printing
* more formatting
* fuzz failures in seperate file
* fix fstring
* more tests
* naming
* remove added line
* remove comment
* print number of skipped expressions
* use self.assertEqual
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
This mockgpu sqtt emulation will just ignore basically everything and end
up with a 0x1000 size trace full of zeroes, but just testing for things
like register rename is better than nothing i guess
* amd mockgpu graph support
For testing remote graph stuff (prompted by #10371) in ci
* Remote fixedvars
Somehow none of existing tests failed when fixedvars were added, looking
what to add as an regression test for this
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Not strictly required for anything but soon there will be like 4 new
properties and having it be a huge json just seems like a bad taste.
It also seems right to not have a separate endpoint for this, just
`GetProperties` request that returns a repr of this similar to how
requests are sent in `BatchRequest`.
This will also make a switch to anything other than http much simpler
if it will be required for any reason, like just a tcp stream of
`BatchRequest`s
* Basic remote multi support
Simplest thing to be able to use remote with multiple gpus, very slow
because no transfers (copyin copyout for cross-device copies)
* tests
* use function for infinity instead of uniform
* test infinity math locally
* test infinity math in CI
* make pytest available to MacOS (WebGPU)
* revert to master except failing webgpu test
* Less messy broken graph on paravirtualized metal workaround
GitHub CI macOS runners use paravirtualized metal which is broken with
graph (some comments say that ICB in particular is broken but in my
testing it was fine sometimes, but other times hitting an assert inside
metal's code related to resouces, so not sure).
> Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458.
This can be reproduced locally with any virtualization software (like utm)
that can create macOS VMs with apple's own virtualization framework.
* unused import
* propagate use_tensor_cores
* add use_tensor_core to arg in test and search
* bugfix
* get TC val from ContextVar in search
* revert minor space change
* add tc emulation test to ci and benchmark
* revert
* revert whitespace change
* remove test for ptx
* add comment and remove llvm test run
* init
* add expected failure to correctly track progres
* hotfix
* skip for amd_llvm as well
* add skip
* add pr number
* move comment to amd test
* change reason
A lot more work is required to enable all of them and move into osxtests
matrix, for now i created a separate runner for them (copied from WebGPU)
Will add test/test_graph.py to those tests in #9876
* set pad t 3 for amd padded tc test
* change pad for amd regardless CI
* test tc padded uops and correctness separately
* add test_tensor_cores_padded_uops test to ci
* remove redundant chack for amd device
* cleanup
* FastPatternMatcher
* works without that
* fix test pickle
* strict len
* compile match function
* dynamic compile
* fast
* faster
* compile
* track
* a lot faster
* clean up
* dup or
* faster and simpler
* fast match doesn't support store
* plane
* minor refactor
* real speed
* don't imply return None
* upat
* fix test
* heard you wanted more speed
* no generator
* split cf
* early fixup
* fxn fixup
* reconstruct_function
* Revert "reconstruct_function"
This reverts commit 37dac010ab.
* simpler stuff
* too big
* upat compile error
* cleanups
* don't cache that
* cleanups
* 10 -> 15
Had to autogen newer uapi headers for #9746 (dmabuf export ioctl missing),
submitting just the fix without updating to newer headers as they are only
needed for infiniband stuff
* fast idiv with tests and fuzzer
* Add todo comment
* Add env variable to toggle fast_idiv
* Move env check
* Add fuzz fast_idiv to ci
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
* add default gate in index
* assert store
* add TestRendererFailures
- move test_gated_store_with_alu to new TestRenderFailures class for
tests that fail on multiple renderers
- add test_renderer_failures.py run on python CI
* add test for gated index in 2d
* test TestRenderFailures
* fix some tests in test_ops for torch backend(171 failing)
* fix more tests (135 failures)
* fix tests (126 failing)
* handle transposed convs (109 tests failing)
* fix slice
* fix lshift & rshift and more tests (87 tests failing)
* revert accidental change
* remove unnecessary changes (82 failures)
* fix backward for avg_pool2d (78 failures)
* fix backward for avg_pool2d (78 failures)
* fix replication backpass
* fix reflection pad back pass (71 failures)
* cummax with indicies, aten.mv and move out methods (67 failures)
* extract avg_pool2d and avg_pool3d to separate functions (62 failures)
* revert changes for cat_out
* rewrite avg_pool and pad without repetition
* remove duplicates from decomps
* slice rewrite and add slice_backward (59 failures)
* add dtype fixup from https://github.com/tinygrad/tinygrad/pull/9297
* fix linter error and remove Tensor.pad (48 failures)
* add select_backward and index_put (40 failures)
* fix some more tests (36 failures)
* fix more tests (12 failures)
* some cleanups and fix couple more tests (10 failures)
* cleaner way to write upsample
* some more upsample cleanups
* use lambda for upsample
* add autowrapper for upsample forward
* cumsum and max_dim without aten functions
* revert _log_softmax
* fix more tests (1 failure)
* make linter happy
* move import to appropriate func
* make linter happy
* add codes for noqa
* some more refactors
* remove comment
* remove dependency on aten function for conv backward
* some more refactors
* add returns
* revert a change from merge
* some cleanups
* remove whitespace
* remove ruff change
* revert upsample
* add masked_fill_.Tensor and scatter.src_out
* add todo
* fix test_biased_conv2d
* fix test_var_one_in_axis & test_std_one_in_axis but break test_biased_conv2d :(
* revert torch_debug
* revert torch_debug
* skip test_gather_failure for the tiny backend
* make padding registration more consise
* add nonzero
* remove scatter_add since we already have the out
* fix scatter
* remove some repetition
* make upsample backward registrations more concise
* remove select.int
* use Tensor.cumsum
* realize conv2d outputs before backward to fix test_biased_conv2d
* add a todo for realize(1 failure)
* add new_empty and new_empty_strided
* make test_pad_circular_mode forward only and remove redundant stuff
* fix linter errors
* remove expect failure
* just tb
* slice is a view_op
* contiguous only when lazydata.is_realized
* fix backward for test_pad_circular_mode
* revert torch.nn.functional.pad override
* add transpose.int and make constant_pad_nd contiguous
* slice_backwards has no kwargs
---------
Co-authored-by: chenyu <chenyu@fastmail.com>