Commit Graph

12188 Commits

Author SHA1 Message Date
qazal
6dc7ea58fd make flash attention tests run on DEV=NULL EMULATE=AMD_CDNA4 (#14742)
* make flash attention tests run on DEV=NULL EMULATE=AMD_CDNA4

* no if CI, this is just the arch
2026-02-14 12:24:37 +09:00
George Hotz
e8bd432bf6 move amd emulator out of tree (#14740)
* move amd emulator out of tree

* move the readme too
2026-02-14 10:32:00 +08:00
chenyu
dca7819f76 more setitem into unrealized tests (#14737)
* more setitem into unrealized tests

into empty, const with alu, and arange

* typo
2026-02-13 20:28:51 -05:00
chenyu
9f607cf84f disk setitem does not need realize either (#14736)
disk base is a COPY and is_realized is always False for now, disk assign is still eager
2026-02-13 12:57:58 -05:00
chenyu
8b205a007e lazy setitem for realized target (#14735) 2026-02-13 12:20:14 -05:00
nimlgen
3bee6638e3 external_test_hive_reset (#14729)
* external_test_hive_reset

* add fault
2026-02-13 19:08:36 +03:00
nimlgen
7d88626068 nv: fix pma_bytes to be system memory (#14733) 2026-02-13 17:55:46 +03:00
George Hotz
c0fe78f73b BUG: metadata is lost with partial assign (#14732) 2026-02-13 21:35:21 +08:00
qazal
d0543063dd viz: wave color is locally scoped (#14728) 2026-02-13 18:22:20 +09:00
nimlgen
ba67425680 am: reset mi300 with pm4 (#14727) 2026-02-13 11:22:32 +03:00
George Hotz
c0de4f75b1 improve mmapeak, print names with sqtt (#14726) 2026-02-13 16:07:06 +08:00
George Hotz
5289b4e882 renderer/amd: add cdna emulator (#14721)
* renderer/amd: add cdna emulator

* fixes

* no predecode

* no early

* REMU_PATH

* delete that

* round

* Fix cache invalidation check in _compile_smem
2026-02-13 16:06:58 +08:00
Christopher Milan
08a555c875 skip test_expand_buffer_before_cast on WEBGPU metal (#14724) 2026-02-13 00:01:05 -05:00
Christopher Milan
7993f3a277 autogen: use snapshot.debian.org for linux src (#14718) 2026-02-12 23:36:38 -05:00
wozeparrot
0613c0ac0c hipkittens fa forward (#14692) 2026-02-12 20:16:43 -08:00
chenyu
50cb40be88 clean up test/null/test_indexing.py (#14720) 2026-02-12 22:36:53 -05:00
qazal
5b624b5e93 viz: better error message for out of range timestamps (#14722)
* test_timestamp_out_of_range

* rel_ts helper

* linter
2026-02-13 12:13:40 +09:00
George Hotz
4088d686b2 remove llvm requirement from amd (#14717)
* remove llvm requirement from amd

* tests pass

* test

* sink kernarg_size

* move stuff

* amd_asm_matmul to new style

* default type

* fix tests, simpler

* cu mode is faster and simpler

* darken
2026-02-13 10:50:12 +08:00
chenyu
9e33a08adb use more pad_to and shrink_to in tensor.py (#14719)
good wins
2026-02-12 20:10:57 -05:00
George Hotz
d3adb8428e Revert "hotfix: skip test/amd in macpytest" (#14704)
* Revert "hotfix: skip test/amd in macpytest"

This reverts commit b7dade2adf.

* no llvm subprocess

* simpler

* sys.exec

* cleanup

* process safe

* diag

* arm ftz support

* 5 sec

* this one
2026-02-13 08:00:24 +08:00
Christopher Milan
d4bc5ab609 autogen: download linux sources (#14714) 2026-02-12 18:50:50 -05:00
Christopher Milan
084d0d0103 cleanup macos webgpu tests (#14715) 2026-02-12 17:56:34 -05:00
Christopher Milan
c30bb0f006 fix WEBGPU isnan check (#14711) 2026-02-12 17:01:18 -05:00
chenyu
9b3b597423 minor getitem cleanups (#14713) 2026-02-12 16:54:54 -05:00
chenyu
787998fac3 fix getitem tensor indexing detection (#14712)
issue with sint
2026-02-12 16:04:37 -05:00
chenyu
86352988d8 update test_uops_stats for setitem (#14710)
realize both full tensor and the slice should not add to global_mem
2026-02-12 12:26:13 -05:00
chenyu
56caf6a3a2 fix Estimate.from_uops for sliced access (#14695)
"assume all DEFINE_GLOBAL memory is accessed" is wrong for partial load. get accessed accumulated from INDEX, then cap at full size. now mem_est never exceeds lds_est
2026-02-12 11:18:07 -05:00
chenyu
8551fa50d3 support bitcast in sym_infer (#14708)
fixed `DEBUG=2 DEV=WEBGPU python -m pytest test/backend/test_tensor_variable.py::TestTensorVariable::test_symbolic_pad`
2026-02-12 10:21:05 -05:00
chenyu
212789e31e fix long_decomp with None tag (#14707)
fixed `DEBUG=2 WEBGPU=1 python -m pytest test/null/test_tensor.py::TestIdxUpcast::test_int64_unsupported_overflow_sym`
2026-02-12 09:31:52 -05:00
chenyu
557134e1c7 model/test fix that failed with WEBGPU=1 DEBUG=2 (#14706) 2026-02-12 09:08:16 -05:00
nimlgen
10c94d2c2d amd: print more info about device hang (#14705) 2026-02-12 15:34:08 +03:00
nimlgen
b376bd7a21 jit: fix raw in same kernel (#14699)
* jit: fix raw in same kernel

* fix

* ugh

* x

* simpler
2026-02-12 15:33:32 +03:00
George Hotz
19e68a1833 skip AMD on not AMD (#14703) 2026-02-12 18:56:54 +08:00
George Hotz
b7dade2adf hotfix: skip test/amd in macpytest 2026-02-12 18:16:04 +08:00
George Hotz
4680247e35 renderer/amd: move in tree (#14702)
* renderer/amd: move in tree

* fix paths in tests

* 24000 lines

* no delete for amd files
2026-02-12 18:09:16 +08:00
George Hotz
d5fc3ea1ba assembly/amd: mypy+ruff passes (#14701)
* assembly/amd: mypy+ruff passes

* touchups
2026-02-12 16:59:42 +08:00
George Hotz
095a064ba8 test.yml explicitly says backend (#14700)
* test.yml explicitly says backend

* 1e-5
2026-02-12 16:03:44 +08:00
nimlgen
14a1991da6 viz: sort tracks in timeline (#14591)
* viz: sort devices in timeline

* fix

* rev

* upd

* skip
2026-02-12 10:51:41 +03:00
George Hotz
025049c521 clean up sqtt / update src formatting in viz (#14696)
* update src formatting in viz

* rename to RDNA3/RDNA4 in sqtt

* wrap

* move sqttmap

* update readme

* why did that change?

* cdna

* that's just for test
2026-02-12 14:27:14 +08:00
Christopher Milan
b1a3876492 IMAGE=1 supports FLOAT16=1 (#14693)
requires 2d indexing to be actually fast
2026-02-12 00:30:55 -05:00
George Hotz
befc1e800c assembly/amd: disasm is test only (#14694)
* assembly/amd: disasm is test only

* viz uses str
2026-02-12 12:33:46 +08:00
George Hotz
c331798201 move tests to test/backend (#14691)
* move tests to test/backend

* fix imports

* fix CI

* revert that one

* Fix formatting in README for test command
2026-02-12 11:09:44 +08:00
wozeparrot
4b5d3bda1f llama3: data seed (#14681) 2026-02-11 19:04:40 -08:00
chenyu
0c63f63ee4 recursive resolve assign dependency (#14688)
remove the .realize in llm.py
2026-02-11 17:41:05 -05:00
nimlgen
869083e373 nv: pciiface pma (#14686)
* x

* w

* z

* clean

* o

* r

* x

* c

* r

* list

* deanon

* b
2026-02-11 23:29:07 +03:00
chenyu
cbbc2fdea5 update test_assign_slice_then_read (#14687)
passes locally now
2026-02-11 15:02:44 -05:00
chenyu
7465b22ba0 handle setitem target in rangeify (#14685) 2026-02-11 11:38:59 -05:00
chenyu
0d215b962e few setitem test cases diff from numpy (#14684)
have claude fuzzed frontend and found some real bugs
2026-02-11 08:41:03 -05:00
nimlgen
df8b21eeb5 add real self assign test (#14683)
* self assign fix

* no
2026-02-11 12:41:53 +03:00
wozeparrot
a60220bed9 llama3: move dl to numpy & jit more (#14677)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-02-10 18:16:40 -08:00