Commit Graph

5101 Commits

Author SHA1 Message Date
George Hotz
bc3487d607 VIZ display cleanups (#14811)
* exclude reshape/expand broadcasts from viz

* limit src lines
2026-02-17 10:03:08 +08:00
chenyu
5bca5be2d2 test slice assign twice retains the buffer (#14807) 2026-02-16 20:01:47 -05:00
chenyu
9b44fbe0b8 update test_assign_add_twice (#14806)
failed test case to show that `+=1` twice returns a different buffer
2026-02-16 17:52:11 -05:00
chenyu
f290af6c7d test_schedule always test with SPLIT_REDUCEOP=0 (#14802)
* test_schedule always test with SPLIT_REDUCEOP=0

except tests that tests SPLIT_REDUCEOP=1

* like that
2026-02-16 15:30:26 -05:00
kevvz
e41da0c396 use relative address for MOCKGPU rdna4 tracing (#14801)
* rdna3/4 trace separation

* remove comments
2026-02-16 22:59:46 +03:00
nimlgen
9f8afb518c viz: sdma gb/s in graph (#14798)
* viz: sdma gb/s in graph

* f
2026-02-16 16:45:06 +03:00
qazal
db3db476ff viz: add GB/s to SDMA (#14795)
* work

* better

* fix that

* no decimal
2026-02-16 20:09:20 +09:00
George Hotz
47d39a6b8b add sqtt support to the emulator (#14791)
* add sqtt support to the emulator

* more sqtt

* cleanup

* cleanups

* simpler tests

* some decent tests

* test branch
2026-02-16 16:48:26 +08:00
wozeparrot
45aebe1572 hipkittens fa backward (#14723) 2026-02-16 00:38:44 -08:00
Nicolas Pinto
20b658b786 fuse MULACC after MUL->SHL (#14788)
* decompositions: fuse (x << n) + c to MULACC

MUL→SHL converts x*(2^n) to x<<n before MULACC can fuse (x*c)+y.
Add pattern to also fuse (x<<n)+c → MULACC(x, 2^n, c) for backends
that support both MULACC and SHL.

* test: add test_mulacc_shl for SHL->MULACC fusion

* test: relax test_mulacc_unrolled to >= 4

SHL->MULACC fusion now also catches power-of-2 address calculations,
increasing MULACC count from 4 to 6 on PTX. the test's intent is that
each unrolled multiply is individually fused (not grouped), so >= 4
is the correct assertion.

---------

Co-authored-by: Prithvish <deformercoding@gmail.com>
Co-authored-by: Nicolas Pinto <41171+npinto@users.noreply.github.com>
Co-authored-by: Nicolas Pinto <npinto@mbp23.local>
2026-02-16 16:26:44 +08:00
qazal
ac62d28ddc viz: amdgpu arch cleanup (#14790)
* viz: amdgpu arch cleanup

* don't do that

* simpler sqttmap

* work

* self.arch
2026-02-16 16:48:12 +09:00
George Hotz
401095e3e7 emulator barrier tests (#14789) 2026-02-16 15:31:01 +08:00
Bautista Garcia
0f1ca8eb43 torch_load: fix shared storage slicing (#14771)
* faster zip_extract + usage in torch load

* clean zip in torch load

* working zipextract in torchload

* tar_extract in tar path

* faster tar path

* tests passing, cleanup needed

* faster tar with 1MB buffer

* comments

* unify storage_source with all paths

* use bufferedreader in zip path

* fix ruff

* clean

* removed unnecessary string conversion

* fix for tensors that share storage

* less hacky

* shared storage test

* test comment

* linter

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-02-16 14:30:13 +08:00
George Hotz
dff9cf35c2 amd asm emulator fixes + run it in CI (#14786)
* amd asm fix, try 2

* fix tests
2026-02-16 13:24:21 +08:00
qazal
55a4dfa2e0 cdna4 asm_gemm tests in CI on the null backend (#14785)
* cdna4 asm_gemm tests in CI on the null backend

* no .numpy() in null

* better

* gemm/asm: device comes from renderer
2026-02-16 14:06:23 +09:00
qazal
c2be31e75b move Estimates to rewrite rules [pr] (#14782)
* move Estimates to rewrite rules [pr]

* don't need this cached_property

* tuple

* return
2026-02-16 12:59:42 +09:00
George Hotz
0abcb9aac2 move more to mixins (#14780)
* move more to mixins

* revert

* move some

* do not change

* more

* fix tests

* Revert "more"

This reverts commit d942d59fa4.

* go

* work

* more

* work

* guard

* base
2026-02-16 11:35:00 +08:00
qazal
8e7c5f5b09 remove Tensor.training = True in test_arange (#14781) 2026-02-16 11:19:42 +09:00
kevvz
33b2ade8cd Rdna4 emulator test_ops, dtypes pass (#14773)
* test_ops, test_dtypes pass

* merge cdna4

* ruff + more tests

* reorganize

* /backend

* again

* again...

* add rdna4
2026-02-16 10:13:39 +08:00
qazal
156b6cb7e4 native bf16 cast in cdna4 (#14574)
* native bf16 cast in cdna4

* don't need contig backward

* simpler

* contig bw still wins in those cases
2026-02-16 10:51:32 +09:00
qazal
33b31d9cd6 tinykittens flash attention dtype fix, add CI (#14770)
* don't hardcdoe amd device

* add failing tests, ci too

* fix: fix for dtype mixin

* bump to rocm 7.1

---------

Co-authored-by: Woze Parrot <wozeparrot@gmail.com>
2026-02-16 01:15:11 +09:00
chenyu
352845d8cc update cast to uint tests (#14768)
result in valid range should work, add intermediate cast to NIRRenderer since it's UB for [128, 256)
2026-02-15 10:55:13 -05:00
qazal
ceccc8eb86 unskip now passing multi tests [pr] (#14759) 2026-02-15 20:30:00 +09:00
qazal
9da7f5e733 disable process replay for AMD emulator renderer [pr] (#14766)
* disable process replay for AMD emulator renderer [pr]

* line

* skip
2026-02-15 18:52:37 +09:00
George Hotz
9759fd6193 dtype mixin (#14763)
* dtype mixin

* dtype mixin methods
2026-02-15 16:03:48 +08:00
qazal
42b6bf0b7a fix sdpa causal failing test on multi (#14762)
* simple failing test

* device is from xq
2026-02-15 16:54:33 +09:00
George Hotz
0e215c433d remove hack from cast (#14760)
* remove hack from cast

* skip tests

* linters to 3.12, another skip

* fix rand

* m_
2026-02-15 13:56:38 +08:00
George Hotz
d176af6269 start outerworld call test, fix gate (#14758) 2026-02-15 12:35:01 +08:00
chenyu
ca68037f26 lazy basic setitem to unrealized Tensor (#14756)
undo the view and make it a mask, this fuses the setitem with any pending compute too.

one behavior change is that for target not backed by a buffer (const and arange), rangeify makes output contiguous under the hood.
this is stricter better than raise and ask user to call contiguous, as that would no longer be fuse-able.
2026-02-14 20:27:03 -05:00
George Hotz
32980c74d1 hotfix: skip flaky tests, looped many times on tinymac3 2026-02-15 07:46:29 +08:00
chenyu
902dc7c09c fix test_numpy_parity_and_backward_2d (#14755)
test setup issue, test failed locally with `RUN_SLOW=1`
2026-02-14 17:59:00 -05:00
chenyu
043f5dbfa0 fix write-after-read tracking (#14754)
AFTER-AFTER was silently dropped, which breaks write-after-read
2026-02-14 17:23:05 -05:00
chenyu
d79c63a0ff test_multi_step_assign_read_write_same_buffer (#14752)
pattern in LAMB that can be off subtly
2026-02-14 16:39:08 -05:00
chenyu
95f4c7e90a fix limit_bufs to not limit index (#14751)
index is not real buffer. also made MAX_KERNEL_BUFFERS a ContextVar
2026-02-14 16:00:03 -05:00
chenyu
0ce4a55dad clean up test_setitem_slice (#14750)
moved to test_setitem_schedule, and use contiguous zeros as scheduler handles empty differently now
2026-02-14 14:29:16 -05:00
chenyu
8f6772fd8c more setitem kernel mem tests (#14749)
* more setitem kernel mem tests

test only the slice is accessed

* update
2026-02-14 11:01:03 -05:00
chenyu
446909fb7a more setitem kernel tests (#14748)
check where realize happened
2026-02-14 09:57:46 -05:00
nimlgen
e1a18dadae fix devices for copies (#14747)
* fix devices for copies

* add test
2026-02-14 17:39:41 +03:00
Christopher Milan
eaa9506a00 disallow subnormals in emulated test_dtype (#14744) 2026-02-14 00:11:57 -05:00
qazal
c88bb075f0 hotfix: correct way to get renderer arch (#14743) 2026-02-14 12:38:20 +08:00
qazal
6dc7ea58fd make flash attention tests run on DEV=NULL EMULATE=AMD_CDNA4 (#14742)
* make flash attention tests run on DEV=NULL EMULATE=AMD_CDNA4

* no if CI, this is just the arch
2026-02-14 12:24:37 +09:00
George Hotz
e8bd432bf6 move amd emulator out of tree (#14740)
* move amd emulator out of tree

* move the readme too
2026-02-14 10:32:00 +08:00
chenyu
dca7819f76 more setitem into unrealized tests (#14737)
* more setitem into unrealized tests

into empty, const with alu, and arange

* typo
2026-02-13 20:28:51 -05:00
chenyu
8b205a007e lazy setitem for realized target (#14735) 2026-02-13 12:20:14 -05:00
nimlgen
3bee6638e3 external_test_hive_reset (#14729)
* external_test_hive_reset

* add fault
2026-02-13 19:08:36 +03:00
George Hotz
c0fe78f73b BUG: metadata is lost with partial assign (#14732) 2026-02-13 21:35:21 +08:00
George Hotz
5289b4e882 renderer/amd: add cdna emulator (#14721)
* renderer/amd: add cdna emulator

* fixes

* no predecode

* no early

* REMU_PATH

* delete that

* round

* Fix cache invalidation check in _compile_smem
2026-02-13 16:06:58 +08:00
Christopher Milan
08a555c875 skip test_expand_buffer_before_cast on WEBGPU metal (#14724) 2026-02-13 00:01:05 -05:00
chenyu
50cb40be88 clean up test/null/test_indexing.py (#14720) 2026-02-12 22:36:53 -05:00
qazal
5b624b5e93 viz: better error message for out of range timestamps (#14722)
* test_timestamp_out_of_range

* rel_ts helper

* linter
2026-02-13 12:13:40 +09:00