Commit Graph

12208 Commits

Author SHA1 Message Date
George Hotz
8091661df3 more more to mixins (#14761) 2026-02-15 15:18:37 +08:00
George Hotz
0e215c433d remove hack from cast (#14760)
* remove hack from cast

* skip tests

* linters to 3.12, another skip

* fix rand

* m_
2026-02-15 13:56:38 +08:00
George Hotz
d176af6269 start outerworld call test, fix gate (#14758) 2026-02-15 12:35:01 +08:00
qazal
9bb6014900 keep existing profile trace in viz cli (#14757) 2026-02-15 13:16:32 +09:00
chenyu
ca68037f26 lazy basic setitem to unrealized Tensor (#14756)
undo the view and make it a mask, this fuses the setitem with any pending compute too.

one behavior change is that for target not backed by a buffer (const and arange), rangeify makes output contiguous under the hood.
this is stricter better than raise and ask user to call contiguous, as that would no longer be fuse-able.
2026-02-14 20:27:03 -05:00
George Hotz
32980c74d1 hotfix: skip flaky tests, looped many times on tinymac3 2026-02-15 07:46:29 +08:00
chenyu
902dc7c09c fix test_numpy_parity_and_backward_2d (#14755)
test setup issue, test failed locally with `RUN_SLOW=1`
2026-02-14 17:59:00 -05:00
chenyu
043f5dbfa0 fix write-after-read tracking (#14754)
AFTER-AFTER was silently dropped, which breaks write-after-read
2026-02-14 17:23:05 -05:00
chenyu
d79c63a0ff test_multi_step_assign_read_write_same_buffer (#14752)
pattern in LAMB that can be off subtly
2026-02-14 16:39:08 -05:00
chenyu
95f4c7e90a fix limit_bufs to not limit index (#14751)
index is not real buffer. also made MAX_KERNEL_BUFFERS a ContextVar
2026-02-14 16:00:03 -05:00
chenyu
0ce4a55dad clean up test_setitem_slice (#14750)
moved to test_setitem_schedule, and use contiguous zeros as scheduler handles empty differently now
2026-02-14 14:29:16 -05:00
chenyu
8f6772fd8c more setitem kernel mem tests (#14749)
* more setitem kernel mem tests

test only the slice is accessed

* update
2026-02-14 11:01:03 -05:00
chenyu
446909fb7a more setitem kernel tests (#14748)
check where realize happened
2026-02-14 09:57:46 -05:00
nimlgen
4ab51b55bd stream pma decoder (#14746) 2026-02-14 17:40:18 +03:00
nimlgen
e1a18dadae fix devices for copies (#14747)
* fix devices for copies

* add test
2026-02-14 17:39:41 +03:00
George Hotz
e35bd960e8 Revert "use zip_extract and tar_extract in torch load (#14734)" (#14745)
This reverts commit 9d9ef81608.
2026-02-14 13:24:01 +08:00
Christopher Milan
eaa9506a00 disallow subnormals in emulated test_dtype (#14744) 2026-02-14 00:11:57 -05:00
Bautista Garcia
9d9ef81608 use zip_extract and tar_extract in torch load (#14734)
* faster zip_extract + usage in torch load

* clean zip in torch load

* working zipextract in torchload

* tar_extract in tar path

* faster tar path

* tests passing, cleanup needed

* faster tar with 1MB buffer

* comments

* unify storage_source with all paths

* use bufferedreader in zip path

* fix ruff

* clean

* removed unnecessary string conversion
2026-02-14 12:57:28 +08:00
qazal
c88bb075f0 hotfix: correct way to get renderer arch (#14743) 2026-02-14 12:38:20 +08:00
George Hotz
f9d2eca91a clean up amd/elf.py (#14741) 2026-02-14 12:09:05 +08:00
qazal
6dc7ea58fd make flash attention tests run on DEV=NULL EMULATE=AMD_CDNA4 (#14742)
* make flash attention tests run on DEV=NULL EMULATE=AMD_CDNA4

* no if CI, this is just the arch
2026-02-14 12:24:37 +09:00
George Hotz
e8bd432bf6 move amd emulator out of tree (#14740)
* move amd emulator out of tree

* move the readme too
2026-02-14 10:32:00 +08:00
chenyu
dca7819f76 more setitem into unrealized tests (#14737)
* more setitem into unrealized tests

into empty, const with alu, and arange

* typo
2026-02-13 20:28:51 -05:00
chenyu
9f607cf84f disk setitem does not need realize either (#14736)
disk base is a COPY and is_realized is always False for now, disk assign is still eager
2026-02-13 12:57:58 -05:00
chenyu
8b205a007e lazy setitem for realized target (#14735) 2026-02-13 12:20:14 -05:00
nimlgen
3bee6638e3 external_test_hive_reset (#14729)
* external_test_hive_reset

* add fault
2026-02-13 19:08:36 +03:00
nimlgen
7d88626068 nv: fix pma_bytes to be system memory (#14733) 2026-02-13 17:55:46 +03:00
George Hotz
c0fe78f73b BUG: metadata is lost with partial assign (#14732) 2026-02-13 21:35:21 +08:00
qazal
d0543063dd viz: wave color is locally scoped (#14728) 2026-02-13 18:22:20 +09:00
nimlgen
ba67425680 am: reset mi300 with pm4 (#14727) 2026-02-13 11:22:32 +03:00
George Hotz
c0de4f75b1 improve mmapeak, print names with sqtt (#14726) 2026-02-13 16:07:06 +08:00
George Hotz
5289b4e882 renderer/amd: add cdna emulator (#14721)
* renderer/amd: add cdna emulator

* fixes

* no predecode

* no early

* REMU_PATH

* delete that

* round

* Fix cache invalidation check in _compile_smem
2026-02-13 16:06:58 +08:00
Christopher Milan
08a555c875 skip test_expand_buffer_before_cast on WEBGPU metal (#14724) 2026-02-13 00:01:05 -05:00
Christopher Milan
7993f3a277 autogen: use snapshot.debian.org for linux src (#14718) 2026-02-12 23:36:38 -05:00
wozeparrot
0613c0ac0c hipkittens fa forward (#14692) 2026-02-12 20:16:43 -08:00
chenyu
50cb40be88 clean up test/null/test_indexing.py (#14720) 2026-02-12 22:36:53 -05:00
qazal
5b624b5e93 viz: better error message for out of range timestamps (#14722)
* test_timestamp_out_of_range

* rel_ts helper

* linter
2026-02-13 12:13:40 +09:00
George Hotz
4088d686b2 remove llvm requirement from amd (#14717)
* remove llvm requirement from amd

* tests pass

* test

* sink kernarg_size

* move stuff

* amd_asm_matmul to new style

* default type

* fix tests, simpler

* cu mode is faster and simpler

* darken
2026-02-13 10:50:12 +08:00
chenyu
9e33a08adb use more pad_to and shrink_to in tensor.py (#14719)
good wins
2026-02-12 20:10:57 -05:00
George Hotz
d3adb8428e Revert "hotfix: skip test/amd in macpytest" (#14704)
* Revert "hotfix: skip test/amd in macpytest"

This reverts commit b7dade2adf.

* no llvm subprocess

* simpler

* sys.exec

* cleanup

* process safe

* diag

* arm ftz support

* 5 sec

* this one
2026-02-13 08:00:24 +08:00
Christopher Milan
d4bc5ab609 autogen: download linux sources (#14714) 2026-02-12 18:50:50 -05:00
Christopher Milan
084d0d0103 cleanup macos webgpu tests (#14715) 2026-02-12 17:56:34 -05:00
Christopher Milan
c30bb0f006 fix WEBGPU isnan check (#14711) 2026-02-12 17:01:18 -05:00
chenyu
9b3b597423 minor getitem cleanups (#14713) 2026-02-12 16:54:54 -05:00
chenyu
787998fac3 fix getitem tensor indexing detection (#14712)
issue with sint
2026-02-12 16:04:37 -05:00
chenyu
86352988d8 update test_uops_stats for setitem (#14710)
realize both full tensor and the slice should not add to global_mem
2026-02-12 12:26:13 -05:00
chenyu
56caf6a3a2 fix Estimate.from_uops for sliced access (#14695)
"assume all DEFINE_GLOBAL memory is accessed" is wrong for partial load. get accessed accumulated from INDEX, then cap at full size. now mem_est never exceeds lds_est
2026-02-12 11:18:07 -05:00
chenyu
8551fa50d3 support bitcast in sym_infer (#14708)
fixed `DEBUG=2 DEV=WEBGPU python -m pytest test/backend/test_tensor_variable.py::TestTensorVariable::test_symbolic_pad`
2026-02-12 10:21:05 -05:00
chenyu
212789e31e fix long_decomp with None tag (#14707)
fixed `DEBUG=2 WEBGPU=1 python -m pytest test/null/test_tensor.py::TestIdxUpcast::test_int64_unsupported_overflow_sym`
2026-02-12 09:31:52 -05:00
chenyu
557134e1c7 model/test fix that failed with WEBGPU=1 DEBUG=2 (#14706) 2026-02-12 09:08:16 -05:00