Commit Graph

5131 Commits

Author SHA1 Message Date
chenyu
06ef8a26b7 add a test case that triggers CALL passthrough_multi (#14887) 2026-02-19 10:45:40 -05:00
Kartik Vashishta
9a9c7648e9 system: fix pci_scan_bus vendor filter (#14885)
* system: fix pci_scan_bus vendor filter

* fix: formatting
2026-02-19 17:23:32 +03:00
George Hotz
f6c1cf343c new symbolic rule from prealloc_bufs (#14883)
* new symbolic rule from prealloc_bufs

* optim
2026-02-19 20:57:30 +08:00
qazal
911399bee5 assembly/amd: move the kernel capture stuff out of helpers (#14881) 2026-02-19 16:28:48 +09:00
George Hotz
2f0f8b5776 more test relaxations from prealloc_bufs (#14880) 2026-02-19 14:23:28 +08:00
George Hotz
ab61c16730 fixes and test relaxations from prealloc_bufs (#14875)
* fixes and test relaxations from prealloc_bufs

* fix error type and guard _mop

* revert that

* contiguous makes extra/torch_backend/test_kernel_fusion.py fail
2026-02-19 11:37:25 +08:00
chenyu
0c85b93938 support shink sharded and non-sharded axes (#14874)
simpler to just support it
2026-02-18 20:54:10 -05:00
chenyu
e8252e6e4f use offical gguf in test (#14872)
also deleted bad test_load_sample_mxfp4, added some hard coded simple tests
2026-02-18 19:55:09 -05:00
chenyu
8c830c5b44 test_full_like_shrink_on_shard_axis (#14870)
* test_full_like_shrink_on_shard_axis

add a test case that triggers non-copy branch in mstack_early_shrink

* 0
2026-02-18 19:23:44 -05:00
Ananta Ranganathan
4005e9db6d Mxfp4 fix (#14866)
* double e2m1 values for mxfp4

* check if assert equal works in ci

* Revert "check if assert equal works in ci"

This reverts commit 8cf902ce0d.

* remove unnecessary whitespace change

* add test case that fails for old implementation but passes for new

* add note that the previous test is bad

* clarification on the methodology for the test

* fix the indent problem that happened to skip this test

* for now update mxfp4 block test to similarly use allclose (bad)

* add gist link and clearer explanation of process for computing test data
2026-02-18 18:50:59 -05:00
chenyu
f771de6738 gc.collect() to get the correct GlobalCounters.mem_used in tests (#14868)
test can be flaky if gc happens in between
2026-02-18 15:01:23 -05:00
chenyu
f84a11bb9f delete uneven shard tests and mentions (#14867) 2026-02-18 14:10:33 -05:00
chenyu
5746a605ce UOp.axis raises for invalid reshape (#14863)
reshape is lazy now, so better to raise from the .axis call and not have caller to handle invalid case
2026-02-18 11:28:56 -05:00
George Hotz
af839b2bd1 remove all the outerworld stuff, it was too complex (#14852) 2026-02-18 17:44:11 +08:00
George Hotz
d5636fba90 assign after copy shouldn't contig (#14847)
* assign after copy shouldn't contig

* fix assign copy
2026-02-18 12:23:49 +08:00
George Hotz
ab55e8c6b9 assign should be used as output buffer (#14845)
* assign should be used as buffer

* late removed

* the fix

* better fix

* backward slice
2026-02-18 09:37:46 +08:00
chenyu
e3c120c8e1 exclude 100 in test_assign_add (#14846)
this can crash, not sure why. skip 100 to see if it's better
2026-02-17 19:12:47 -05:00
chenyu
72cf603805 removed if self.buffer.is_allocated() in realized (#14836)
automatically fixes is_realized issue for empty
2026-02-17 15:35:56 -05:00
chenyu
aec8a6c85b Revert "one run_schedule for assign realize (#14835)" (#14837)
This reverts commit df7c37f611.
2026-02-17 14:34:26 -05:00
chenyu
df7c37f611 one run_schedule for assign realize (#14835)
concat schedules. separate out the execution part
2026-02-17 14:01:55 -05:00
chenyu
61867c2f35 TestRealizeIsRealized (#14834)
test after calling .realize(), uop.is_realized is True. currently not working for empty (thus disk tensor), and const
2026-02-17 13:30:35 -05:00
chenyu
f147791105 update test to reset and test kernel_count directly (#14832) 2026-02-17 11:48:46 -05:00
chenyu
9d4937ab5e remove assign test @unittest.skip("this test is crashing!") (#14831) 2026-02-17 10:30:58 -05:00
nimlgen
dda5ccf63b hcq: fix usb<->cpu mappings (#14827)
* hcq: fix usb<->cpu mappings

* non cpu

* um
2026-02-17 18:04:18 +03:00
chenyu
f2f039cc0f fix chained full-buffer assign (#14828)
this shows issue that pm_remove_bufferize drops tags, will fix in bufferize next. this also fixed rand being different in jit vs no-jit
2026-02-17 09:11:04 -05:00
chenyu
58fa82eef5 stronger test_assign_add (#14826)
also test self add 10 and 100 times
2026-02-17 08:36:09 -05:00
George Hotz
ff60dab622 Revert "big sink is on base (#14819)" (#14825)
This reverts commit 5fc3d8109f.
2026-02-17 19:18:06 +08:00
George Hotz
5fc3d8109f big sink is on base (#14819)
* big sink is on base

* contiguous fixes tests
2026-02-17 18:32:56 +08:00
qazal
f590564bf7 gemm multiple is only for cdna4 asm (#14814)
* gemm multiple is only for cdna4 asm

* move to backend

* and arch

* path
2026-02-17 14:00:02 +09:00
George Hotz
f081f154ae parameterize the CDNA asm gemm (#14813)
* parameterize the CDNA asm gemm

* fix llama test

* fix

* add more gemmt ests

* confirm all match

* test these asm gemms
2026-02-17 11:35:18 +08:00
George Hotz
bc3487d607 VIZ display cleanups (#14811)
* exclude reshape/expand broadcasts from viz

* limit src lines
2026-02-17 10:03:08 +08:00
chenyu
5bca5be2d2 test slice assign twice retains the buffer (#14807) 2026-02-16 20:01:47 -05:00
chenyu
9b44fbe0b8 update test_assign_add_twice (#14806)
failed test case to show that `+=1` twice returns a different buffer
2026-02-16 17:52:11 -05:00
chenyu
f290af6c7d test_schedule always test with SPLIT_REDUCEOP=0 (#14802)
* test_schedule always test with SPLIT_REDUCEOP=0

except tests that tests SPLIT_REDUCEOP=1

* like that
2026-02-16 15:30:26 -05:00
kevvz
e41da0c396 use relative address for MOCKGPU rdna4 tracing (#14801)
* rdna3/4 trace separation

* remove comments
2026-02-16 22:59:46 +03:00
nimlgen
9f8afb518c viz: sdma gb/s in graph (#14798)
* viz: sdma gb/s in graph

* f
2026-02-16 16:45:06 +03:00
qazal
db3db476ff viz: add GB/s to SDMA (#14795)
* work

* better

* fix that

* no decimal
2026-02-16 20:09:20 +09:00
George Hotz
47d39a6b8b add sqtt support to the emulator (#14791)
* add sqtt support to the emulator

* more sqtt

* cleanup

* cleanups

* simpler tests

* some decent tests

* test branch
2026-02-16 16:48:26 +08:00
wozeparrot
45aebe1572 hipkittens fa backward (#14723) 2026-02-16 00:38:44 -08:00
Nicolas Pinto
20b658b786 fuse MULACC after MUL->SHL (#14788)
* decompositions: fuse (x << n) + c to MULACC

MUL→SHL converts x*(2^n) to x<<n before MULACC can fuse (x*c)+y.
Add pattern to also fuse (x<<n)+c → MULACC(x, 2^n, c) for backends
that support both MULACC and SHL.

* test: add test_mulacc_shl for SHL->MULACC fusion

* test: relax test_mulacc_unrolled to >= 4

SHL->MULACC fusion now also catches power-of-2 address calculations,
increasing MULACC count from 4 to 6 on PTX. the test's intent is that
each unrolled multiply is individually fused (not grouped), so >= 4
is the correct assertion.

---------

Co-authored-by: Prithvish <deformercoding@gmail.com>
Co-authored-by: Nicolas Pinto <41171+npinto@users.noreply.github.com>
Co-authored-by: Nicolas Pinto <npinto@mbp23.local>
2026-02-16 16:26:44 +08:00
qazal
ac62d28ddc viz: amdgpu arch cleanup (#14790)
* viz: amdgpu arch cleanup

* don't do that

* simpler sqttmap

* work

* self.arch
2026-02-16 16:48:12 +09:00
George Hotz
401095e3e7 emulator barrier tests (#14789) 2026-02-16 15:31:01 +08:00
Bautista Garcia
0f1ca8eb43 torch_load: fix shared storage slicing (#14771)
* faster zip_extract + usage in torch load

* clean zip in torch load

* working zipextract in torchload

* tar_extract in tar path

* faster tar path

* tests passing, cleanup needed

* faster tar with 1MB buffer

* comments

* unify storage_source with all paths

* use bufferedreader in zip path

* fix ruff

* clean

* removed unnecessary string conversion

* fix for tensors that share storage

* less hacky

* shared storage test

* test comment

* linter

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-02-16 14:30:13 +08:00
George Hotz
dff9cf35c2 amd asm emulator fixes + run it in CI (#14786)
* amd asm fix, try 2

* fix tests
2026-02-16 13:24:21 +08:00
qazal
55a4dfa2e0 cdna4 asm_gemm tests in CI on the null backend (#14785)
* cdna4 asm_gemm tests in CI on the null backend

* no .numpy() in null

* better

* gemm/asm: device comes from renderer
2026-02-16 14:06:23 +09:00
qazal
c2be31e75b move Estimates to rewrite rules [pr] (#14782)
* move Estimates to rewrite rules [pr]

* don't need this cached_property

* tuple

* return
2026-02-16 12:59:42 +09:00
George Hotz
0abcb9aac2 move more to mixins (#14780)
* move more to mixins

* revert

* move some

* do not change

* more

* fix tests

* Revert "more"

This reverts commit d942d59fa4.

* go

* work

* more

* work

* guard

* base
2026-02-16 11:35:00 +08:00
qazal
8e7c5f5b09 remove Tensor.training = True in test_arange (#14781) 2026-02-16 11:19:42 +09:00
kevvz
33b2ade8cd Rdna4 emulator test_ops, dtypes pass (#14773)
* test_ops, test_dtypes pass

* merge cdna4

* ruff + more tests

* reorganize

* /backend

* again

* again...

* add rdna4
2026-02-16 10:13:39 +08:00
qazal
156b6cb7e4 native bf16 cast in cdna4 (#14574)
* native bf16 cast in cdna4

* don't need contig backward

* simpler

* contig bw still wins in those cases
2026-02-16 10:51:32 +09:00