chenyu
ca68037f26
lazy basic setitem to unrealized Tensor ( #14756 )
...
undo the view and make it a mask, this fuses the setitem with any pending compute too.
one behavior change is that for target not backed by a buffer (const and arange), rangeify makes output contiguous under the hood.
this is stricter better than raise and ask user to call contiguous, as that would no longer be fuse-able.
2026-02-14 20:27:03 -05:00
George Hotz
32980c74d1
hotfix: skip flaky tests, looped many times on tinymac3
2026-02-15 07:46:29 +08:00
chenyu
902dc7c09c
fix test_numpy_parity_and_backward_2d ( #14755 )
...
test setup issue, test failed locally with `RUN_SLOW=1`
2026-02-14 17:59:00 -05:00
chenyu
043f5dbfa0
fix write-after-read tracking ( #14754 )
...
AFTER-AFTER was silently dropped, which breaks write-after-read
2026-02-14 17:23:05 -05:00
chenyu
d79c63a0ff
test_multi_step_assign_read_write_same_buffer ( #14752 )
...
pattern in LAMB that can be off subtly
2026-02-14 16:39:08 -05:00
chenyu
95f4c7e90a
fix limit_bufs to not limit index ( #14751 )
...
index is not real buffer. also made MAX_KERNEL_BUFFERS a ContextVar
2026-02-14 16:00:03 -05:00
chenyu
0ce4a55dad
clean up test_setitem_slice ( #14750 )
...
moved to test_setitem_schedule, and use contiguous zeros as scheduler handles empty differently now
2026-02-14 14:29:16 -05:00
chenyu
8f6772fd8c
more setitem kernel mem tests ( #14749 )
...
* more setitem kernel mem tests
test only the slice is accessed
* update
2026-02-14 11:01:03 -05:00
chenyu
446909fb7a
more setitem kernel tests ( #14748 )
...
check where realize happened
2026-02-14 09:57:46 -05:00
nimlgen
e1a18dadae
fix devices for copies ( #14747 )
...
* fix devices for copies
* add test
2026-02-14 17:39:41 +03:00
Christopher Milan
eaa9506a00
disallow subnormals in emulated test_dtype ( #14744 )
2026-02-14 00:11:57 -05:00
qazal
c88bb075f0
hotfix: correct way to get renderer arch ( #14743 )
2026-02-14 12:38:20 +08:00
qazal
6dc7ea58fd
make flash attention tests run on DEV=NULL EMULATE=AMD_CDNA4 ( #14742 )
...
* make flash attention tests run on DEV=NULL EMULATE=AMD_CDNA4
* no if CI, this is just the arch
2026-02-14 12:24:37 +09:00
George Hotz
e8bd432bf6
move amd emulator out of tree ( #14740 )
...
* move amd emulator out of tree
* move the readme too
2026-02-14 10:32:00 +08:00
chenyu
dca7819f76
more setitem into unrealized tests ( #14737 )
...
* more setitem into unrealized tests
into empty, const with alu, and arange
* typo
2026-02-13 20:28:51 -05:00
chenyu
8b205a007e
lazy setitem for realized target ( #14735 )
2026-02-13 12:20:14 -05:00
nimlgen
3bee6638e3
external_test_hive_reset ( #14729 )
...
* external_test_hive_reset
* add fault
2026-02-13 19:08:36 +03:00
George Hotz
c0fe78f73b
BUG: metadata is lost with partial assign ( #14732 )
2026-02-13 21:35:21 +08:00
George Hotz
5289b4e882
renderer/amd: add cdna emulator ( #14721 )
...
* renderer/amd: add cdna emulator
* fixes
* no predecode
* no early
* REMU_PATH
* delete that
* round
* Fix cache invalidation check in _compile_smem
2026-02-13 16:06:58 +08:00
Christopher Milan
08a555c875
skip test_expand_buffer_before_cast on WEBGPU metal ( #14724 )
2026-02-13 00:01:05 -05:00
chenyu
50cb40be88
clean up test/null/test_indexing.py ( #14720 )
2026-02-12 22:36:53 -05:00
qazal
5b624b5e93
viz: better error message for out of range timestamps ( #14722 )
...
* test_timestamp_out_of_range
* rel_ts helper
* linter
2026-02-13 12:13:40 +09:00
George Hotz
4088d686b2
remove llvm requirement from amd ( #14717 )
...
* remove llvm requirement from amd
* tests pass
* test
* sink kernarg_size
* move stuff
* amd_asm_matmul to new style
* default type
* fix tests, simpler
* cu mode is faster and simpler
* darken
2026-02-13 10:50:12 +08:00
George Hotz
d3adb8428e
Revert "hotfix: skip test/amd in macpytest" ( #14704 )
...
* Revert "hotfix: skip test/amd in macpytest"
This reverts commit b7dade2adf .
* no llvm subprocess
* simpler
* sys.exec
* cleanup
* process safe
* diag
* arm ftz support
* 5 sec
* this one
2026-02-13 08:00:24 +08:00
Christopher Milan
c30bb0f006
fix WEBGPU isnan check ( #14711 )
2026-02-12 17:01:18 -05:00
chenyu
787998fac3
fix getitem tensor indexing detection ( #14712 )
...
issue with sint
2026-02-12 16:04:37 -05:00
chenyu
86352988d8
update test_uops_stats for setitem ( #14710 )
...
realize both full tensor and the slice should not add to global_mem
2026-02-12 12:26:13 -05:00
chenyu
56caf6a3a2
fix Estimate.from_uops for sliced access ( #14695 )
...
"assume all DEFINE_GLOBAL memory is accessed" is wrong for partial load. get accessed accumulated from INDEX, then cap at full size. now mem_est never exceeds lds_est
2026-02-12 11:18:07 -05:00
chenyu
8551fa50d3
support bitcast in sym_infer ( #14708 )
...
fixed `DEBUG=2 DEV=WEBGPU python -m pytest test/backend/test_tensor_variable.py::TestTensorVariable::test_symbolic_pad`
2026-02-12 10:21:05 -05:00
chenyu
557134e1c7
model/test fix that failed with WEBGPU=1 DEBUG=2 ( #14706 )
2026-02-12 09:08:16 -05:00
nimlgen
b376bd7a21
jit: fix raw in same kernel ( #14699 )
...
* jit: fix raw in same kernel
* fix
* ugh
* x
* simpler
2026-02-12 15:33:32 +03:00
George Hotz
19e68a1833
skip AMD on not AMD ( #14703 )
2026-02-12 18:56:54 +08:00
George Hotz
4680247e35
renderer/amd: move in tree ( #14702 )
...
* renderer/amd: move in tree
* fix paths in tests
* 24000 lines
* no delete for amd files
2026-02-12 18:09:16 +08:00
George Hotz
095a064ba8
test.yml explicitly says backend ( #14700 )
...
* test.yml explicitly says backend
* 1e-5
2026-02-12 16:03:44 +08:00
nimlgen
14a1991da6
viz: sort tracks in timeline ( #14591 )
...
* viz: sort devices in timeline
* fix
* rev
* upd
* skip
2026-02-12 10:51:41 +03:00
George Hotz
befc1e800c
assembly/amd: disasm is test only ( #14694 )
...
* assembly/amd: disasm is test only
* viz uses str
2026-02-12 12:33:46 +08:00
George Hotz
c331798201
move tests to test/backend ( #14691 )
...
* move tests to test/backend
* fix imports
* fix CI
* revert that one
* Fix formatting in README for test command
2026-02-12 11:09:44 +08:00
chenyu
0c63f63ee4
recursive resolve assign dependency ( #14688 )
...
remove the .realize in llm.py
2026-02-11 17:41:05 -05:00
chenyu
cbbc2fdea5
update test_assign_slice_then_read ( #14687 )
...
passes locally now
2026-02-11 15:02:44 -05:00
chenyu
7465b22ba0
handle setitem target in rangeify ( #14685 )
2026-02-11 11:38:59 -05:00
chenyu
0d215b962e
few setitem test cases diff from numpy ( #14684 )
...
have claude fuzzed frontend and found some real bugs
2026-02-11 08:41:03 -05:00
nimlgen
df8b21eeb5
add real self assign test ( #14683 )
...
* self assign fix
* no
2026-02-11 12:41:53 +03:00
George Hotz
4565958792
some lil speedups ( #14679 )
2026-02-11 10:01:58 +08:00
George Hotz
2d4ad9e739
add a waitlist for graph rewrite ( #14678 )
...
* add a waitlist for graph rewrite
* cleaner
* one context on spec check
2026-02-11 09:30:13 +08:00
Christopher Milan
389e2eeda1
Revert "transcendental works with long decomp" ( #14676 )
2026-02-10 19:46:34 -05:00
Christopher Milan
0662c8037d
transcendental works with long decomp ( #14672 )
2026-02-10 19:30:24 -05:00
chenyu
ebef63dba0
update test_self_assign_same_device_copy ( #14673 )
...
that test would have passed without the optimization because .to shortcut
2026-02-10 17:23:43 -05:00
nimlgen
aafa9dcb5b
eliminate same-device copy self-assigns ( #14671 )
...
* eliminate same-device copy self-assigns
* ugh
2026-02-10 22:54:51 +03:00
chenyu
494eec2694
test_setitem_const_fused ( #14668 )
...
did not realize #14640 also fixed #10690 , so added a test for it
2026-02-10 08:33:02 -05:00
George Hotz
8dc46dde07
everything has dtype.long now ( #14661 )
...
* everything has dtype.long now
* int64/uint64 are everywhere now
* that doesn't work
2026-02-10 15:08:50 +08:00