chenyu
06ef8a26b7
add a test case that triggers CALL passthrough_multi ( #14887 )
2026-02-19 10:45:40 -05:00
Kartik Vashishta
9a9c7648e9
system: fix pci_scan_bus vendor filter ( #14885 )
...
* system: fix pci_scan_bus vendor filter
* fix: formatting
2026-02-19 17:23:32 +03:00
George Hotz
f6c1cf343c
new symbolic rule from prealloc_bufs ( #14883 )
...
* new symbolic rule from prealloc_bufs
* optim
2026-02-19 20:57:30 +08:00
qazal
911399bee5
assembly/amd: move the kernel capture stuff out of helpers ( #14881 )
2026-02-19 16:28:48 +09:00
George Hotz
2f0f8b5776
more test relaxations from prealloc_bufs ( #14880 )
2026-02-19 14:23:28 +08:00
George Hotz
ab61c16730
fixes and test relaxations from prealloc_bufs ( #14875 )
...
* fixes and test relaxations from prealloc_bufs
* fix error type and guard _mop
* revert that
* contiguous makes extra/torch_backend/test_kernel_fusion.py fail
2026-02-19 11:37:25 +08:00
chenyu
0c85b93938
support shink sharded and non-sharded axes ( #14874 )
...
simpler to just support it
2026-02-18 20:54:10 -05:00
chenyu
e8252e6e4f
use offical gguf in test ( #14872 )
...
also deleted bad test_load_sample_mxfp4, added some hard coded simple tests
2026-02-18 19:55:09 -05:00
chenyu
8c830c5b44
test_full_like_shrink_on_shard_axis ( #14870 )
...
* test_full_like_shrink_on_shard_axis
add a test case that triggers non-copy branch in mstack_early_shrink
* 0
2026-02-18 19:23:44 -05:00
Ananta Ranganathan
4005e9db6d
Mxfp4 fix ( #14866 )
...
* double e2m1 values for mxfp4
* check if assert equal works in ci
* Revert "check if assert equal works in ci"
This reverts commit 8cf902ce0d .
* remove unnecessary whitespace change
* add test case that fails for old implementation but passes for new
* add note that the previous test is bad
* clarification on the methodology for the test
* fix the indent problem that happened to skip this test
* for now update mxfp4 block test to similarly use allclose (bad)
* add gist link and clearer explanation of process for computing test data
2026-02-18 18:50:59 -05:00
chenyu
f771de6738
gc.collect() to get the correct GlobalCounters.mem_used in tests ( #14868 )
...
test can be flaky if gc happens in between
2026-02-18 15:01:23 -05:00
chenyu
f84a11bb9f
delete uneven shard tests and mentions ( #14867 )
2026-02-18 14:10:33 -05:00
chenyu
5746a605ce
UOp.axis raises for invalid reshape ( #14863 )
...
reshape is lazy now, so better to raise from the .axis call and not have caller to handle invalid case
2026-02-18 11:28:56 -05:00
George Hotz
af839b2bd1
remove all the outerworld stuff, it was too complex ( #14852 )
2026-02-18 17:44:11 +08:00
George Hotz
d5636fba90
assign after copy shouldn't contig ( #14847 )
...
* assign after copy shouldn't contig
* fix assign copy
2026-02-18 12:23:49 +08:00
George Hotz
ab55e8c6b9
assign should be used as output buffer ( #14845 )
...
* assign should be used as buffer
* late removed
* the fix
* better fix
* backward slice
2026-02-18 09:37:46 +08:00
chenyu
e3c120c8e1
exclude 100 in test_assign_add ( #14846 )
...
this can crash, not sure why. skip 100 to see if it's better
2026-02-17 19:12:47 -05:00
chenyu
72cf603805
removed if self.buffer.is_allocated() in realized ( #14836 )
...
automatically fixes is_realized issue for empty
2026-02-17 15:35:56 -05:00
chenyu
aec8a6c85b
Revert "one run_schedule for assign realize ( #14835 )" ( #14837 )
...
This reverts commit df7c37f611 .
2026-02-17 14:34:26 -05:00
chenyu
df7c37f611
one run_schedule for assign realize ( #14835 )
...
concat schedules. separate out the execution part
2026-02-17 14:01:55 -05:00
chenyu
61867c2f35
TestRealizeIsRealized ( #14834 )
...
test after calling .realize(), uop.is_realized is True. currently not working for empty (thus disk tensor), and const
2026-02-17 13:30:35 -05:00
chenyu
f147791105
update test to reset and test kernel_count directly ( #14832 )
2026-02-17 11:48:46 -05:00
chenyu
9d4937ab5e
remove assign test @unittest.skip("this test is crashing!") ( #14831 )
2026-02-17 10:30:58 -05:00
nimlgen
dda5ccf63b
hcq: fix usb<->cpu mappings ( #14827 )
...
* hcq: fix usb<->cpu mappings
* non cpu
* um
2026-02-17 18:04:18 +03:00
chenyu
f2f039cc0f
fix chained full-buffer assign ( #14828 )
...
this shows issue that pm_remove_bufferize drops tags, will fix in bufferize next. this also fixed rand being different in jit vs no-jit
2026-02-17 09:11:04 -05:00
chenyu
58fa82eef5
stronger test_assign_add ( #14826 )
...
also test self add 10 and 100 times
2026-02-17 08:36:09 -05:00
George Hotz
ff60dab622
Revert "big sink is on base ( #14819 )" ( #14825 )
...
This reverts commit 5fc3d8109f .
2026-02-17 19:18:06 +08:00
George Hotz
5fc3d8109f
big sink is on base ( #14819 )
...
* big sink is on base
* contiguous fixes tests
2026-02-17 18:32:56 +08:00
qazal
f590564bf7
gemm multiple is only for cdna4 asm ( #14814 )
...
* gemm multiple is only for cdna4 asm
* move to backend
* and arch
* path
2026-02-17 14:00:02 +09:00
George Hotz
f081f154ae
parameterize the CDNA asm gemm ( #14813 )
...
* parameterize the CDNA asm gemm
* fix llama test
* fix
* add more gemmt ests
* confirm all match
* test these asm gemms
2026-02-17 11:35:18 +08:00
George Hotz
bc3487d607
VIZ display cleanups ( #14811 )
...
* exclude reshape/expand broadcasts from viz
* limit src lines
2026-02-17 10:03:08 +08:00
chenyu
5bca5be2d2
test slice assign twice retains the buffer ( #14807 )
2026-02-16 20:01:47 -05:00
chenyu
9b44fbe0b8
update test_assign_add_twice ( #14806 )
...
failed test case to show that `+=1` twice returns a different buffer
2026-02-16 17:52:11 -05:00
chenyu
f290af6c7d
test_schedule always test with SPLIT_REDUCEOP=0 ( #14802 )
...
* test_schedule always test with SPLIT_REDUCEOP=0
except tests that tests SPLIT_REDUCEOP=1
* like that
2026-02-16 15:30:26 -05:00
kevvz
e41da0c396
use relative address for MOCKGPU rdna4 tracing ( #14801 )
...
* rdna3/4 trace separation
* remove comments
2026-02-16 22:59:46 +03:00
nimlgen
9f8afb518c
viz: sdma gb/s in graph ( #14798 )
...
* viz: sdma gb/s in graph
* f
2026-02-16 16:45:06 +03:00
qazal
db3db476ff
viz: add GB/s to SDMA ( #14795 )
...
* work
* better
* fix that
* no decimal
2026-02-16 20:09:20 +09:00
George Hotz
47d39a6b8b
add sqtt support to the emulator ( #14791 )
...
* add sqtt support to the emulator
* more sqtt
* cleanup
* cleanups
* simpler tests
* some decent tests
* test branch
2026-02-16 16:48:26 +08:00
wozeparrot
45aebe1572
hipkittens fa backward ( #14723 )
2026-02-16 00:38:44 -08:00
Nicolas Pinto
20b658b786
fuse MULACC after MUL->SHL ( #14788 )
...
* decompositions: fuse (x << n) + c to MULACC
MUL→SHL converts x*(2^n) to x<<n before MULACC can fuse (x*c)+y.
Add pattern to also fuse (x<<n)+c → MULACC(x, 2^n, c) for backends
that support both MULACC and SHL.
* test: add test_mulacc_shl for SHL->MULACC fusion
* test: relax test_mulacc_unrolled to >= 4
SHL->MULACC fusion now also catches power-of-2 address calculations,
increasing MULACC count from 4 to 6 on PTX. the test's intent is that
each unrolled multiply is individually fused (not grouped), so >= 4
is the correct assertion.
---------
Co-authored-by: Prithvish <deformercoding@gmail.com >
Co-authored-by: Nicolas Pinto <41171+npinto@users.noreply.github.com >
Co-authored-by: Nicolas Pinto <npinto@mbp23.local >
2026-02-16 16:26:44 +08:00
qazal
ac62d28ddc
viz: amdgpu arch cleanup ( #14790 )
...
* viz: amdgpu arch cleanup
* don't do that
* simpler sqttmap
* work
* self.arch
2026-02-16 16:48:12 +09:00
George Hotz
401095e3e7
emulator barrier tests ( #14789 )
2026-02-16 15:31:01 +08:00
Bautista Garcia
0f1ca8eb43
torch_load: fix shared storage slicing ( #14771 )
...
* faster zip_extract + usage in torch load
* clean zip in torch load
* working zipextract in torchload
* tar_extract in tar path
* faster tar path
* tests passing, cleanup needed
* faster tar with 1MB buffer
* comments
* unify storage_source with all paths
* use bufferedreader in zip path
* fix ruff
* clean
* removed unnecessary string conversion
* fix for tensors that share storage
* less hacky
* shared storage test
* test comment
* linter
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2026-02-16 14:30:13 +08:00
George Hotz
dff9cf35c2
amd asm emulator fixes + run it in CI ( #14786 )
...
* amd asm fix, try 2
* fix tests
2026-02-16 13:24:21 +08:00
qazal
55a4dfa2e0
cdna4 asm_gemm tests in CI on the null backend ( #14785 )
...
* cdna4 asm_gemm tests in CI on the null backend
* no .numpy() in null
* better
* gemm/asm: device comes from renderer
2026-02-16 14:06:23 +09:00
qazal
c2be31e75b
move Estimates to rewrite rules [pr] ( #14782 )
...
* move Estimates to rewrite rules [pr]
* don't need this cached_property
* tuple
* return
2026-02-16 12:59:42 +09:00
George Hotz
0abcb9aac2
move more to mixins ( #14780 )
...
* move more to mixins
* revert
* move some
* do not change
* more
* fix tests
* Revert "more"
This reverts commit d942d59fa4 .
* go
* work
* more
* work
* guard
* base
2026-02-16 11:35:00 +08:00
qazal
8e7c5f5b09
remove Tensor.training = True in test_arange ( #14781 )
2026-02-16 11:19:42 +09:00
kevvz
33b2ade8cd
Rdna4 emulator test_ops, dtypes pass ( #14773 )
...
* test_ops, test_dtypes pass
* merge cdna4
* ruff + more tests
* reorganize
* /backend
* again
* again...
* add rdna4
2026-02-16 10:13:39 +08:00
qazal
156b6cb7e4
native bf16 cast in cdna4 ( #14574 )
...
* native bf16 cast in cdna4
* don't need contig backward
* simpler
* contig bw still wins in those cases
2026-02-16 10:51:32 +09:00