chenyu
4f2e7aed24
fix multiple REDUCE on same RANGE ( #14504 )
...
each RANGE maps to one END, but reduce_to_acc is local and would not know this
2026-02-02 20:42:09 -05:00
chenyu
93c41a78fa
clean up NOOP [pr] ( #14503 )
...
should not be used as a COPY, started with removing from ALWAYS_RUN_OPS
2026-02-02 19:46:45 -05:00
chenyu
66d2b02f11
delete files that depends on extra.optimization.helpers ( #14499 )
2026-02-02 13:33:33 -05:00
George Hotz
ec0398fceb
test amd gpu crashes ( #14459 )
...
* test amd gpu crashes
* cleanup
* less sketch tests
2026-02-02 18:57:47 +03:00
nimlgen
6e4238c016
amd: recovery ( #14461 )
...
* rec
* ?
* rv
* cleaner
* post merge
* not used
* um
* clnr
* x
* x
* d
* move
2026-02-02 18:57:35 +03:00
chenyu
61ca19ff24
after with empty src is self [pr] ( #14496 )
2026-02-02 10:19:05 -05:00
George Hotz
6e958dbfd4
assembly/amd: add RDNA4 support to emulator ( #14341 )
...
* start new rdna4
* work
* plus works
* more pass
* rdna4
* assembly/amd: fix RDNA4 emulator for float16 and VOP3 clamp
* stale
* rev
* rr
* rdna4 emu tests
* cleanup
* cleanup
* simp
* works
* better factorizaion
* hacks
* fix mockgpu
* guard both
* cleaner
* gate
* bug fix and a few tests
* all test_tiny
2026-02-02 21:35:59 +08:00
chenyu
a908f447d5
remove disk special case in mstack_early_shrink [pr] ( #14494 )
2026-02-02 08:34:45 -05:00
qazal
965940dd00
sqtt: update examples after event field change ( #14493 )
...
* regen sqtt examples
* cdna
* rdna4
* packet counts for rdna3
* sqttmap work
2026-02-02 21:39:48 +09:00
George Hotz
965149a46d
assembly/amd: add ds perm instructions ( #14486 )
...
* assembly/amd: add ds perm instructions
* NO SKIP
* fix preexisting RDNA3 issues
* pcode
* assert
* asserts
* unify
* simp
* good fix
2026-02-02 16:02:00 +08:00
qazal
1746d1f997
remove SPEC=0 context in custom_kernel tests, pyrender always skips it ( #14489 )
2026-02-02 16:32:01 +09:00
George Hotz
d4007f36e0
remove DEFINE_GLOBAL (it is PARAM now) ( #14488 )
2026-02-02 14:56:37 +08:00
qazal
6c487656f9
viz: kernel metadata from rodata entry ( #14487 )
2026-02-02 15:41:42 +09:00
Robbe Derks
d75a1b0d5a
usbgpu: use BOT interface for patch.py ( #13644 )
...
* BOT usage
* cleanup
* fix lint
* fix ruff
* fix -7?
2026-02-02 11:54:46 +08:00
Christopher Milan
2931b52875
skip autogen if MTLCompiler is loaded ( #14466 )
2026-02-01 22:12:27 -05:00
George Hotz
9a32d6e090
add depth limit for SPEC=2 ( #14485 )
...
* make SPEC=2 work for everything
* that's a horrible fix
* add depth limit
2026-02-02 10:43:28 +08:00
George Hotz
368a692e1a
make SPEC=2 work for everything ( #14476 )
...
* make SPEC=2 work for everything
* that's a horrible fix
2026-02-02 10:30:56 +08:00
chenyu
ea1f1d2b9d
test_assign_to_bitcast_view ( #14483 )
...
currently disk allows assign same size dtype into a bitcasted view
2026-02-01 16:46:04 -05:00
chenyu
6deeccc192
fix RING with single dest ( #14482 )
2026-02-01 12:14:46 -05:00
chenyu
3ff390159b
don't implicitly change dtype in assign ( #14481 )
...
broadcast shape is fine, but implicitly cast dtype is hard to find
2026-02-01 11:48:54 -05:00
imaolo
2111762a48
failed test case for RING output device ( #14191 )
...
* Add enable/disable scheduler cache ContextVar
* add allreduce ring and naive to() tests
* clearer test comparing native vs ring allreduce
* split tests, add helper
* removing trailing whitespace
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-02-01 11:48:43 -05:00
chenyu
02afae04f4
atol in test_call_gemm ( #14480 )
...
flaky
2026-02-01 11:24:58 -05:00
chenyu
5705398a1f
assign cleanup [pr] ( #14479 )
...
share more code path between disk and non-disk. also raise RuntimeError instead of Assert for mismatches
2026-02-01 09:10:22 -05:00
chenyu
da500dbe06
simplify late_buffer_view [pr] ( #14478 )
...
check the only allowed Ops in the chain, and offset cannot be negative
2026-01-31 22:38:40 -05:00
chenyu
b4f96301e0
remove unused rules [pr] ( #14477 )
2026-01-31 21:29:30 -05:00
qazal
54e78dbec8
viz: remove hardcoded strings in cfg tests ( #14462 )
2026-02-01 09:30:43 +09:00
chenyu
5d38db9da6
generic bitcast assign ( #14474 )
...
a.bitcast(X).assign(src) -> a.assign(src.bitcast(a.dtype))
2026-01-31 17:29:20 -05:00
chenyu
b38fc43b07
assert assign dtype mismatch for disk [pr] ( #14473 )
...
the disk hack is generally wrong, now force bitcast on the source before assign
2026-01-31 17:08:54 -05:00
chenyu
ced886f26c
failed test case for assign into bitcast ( #14469 )
...
* failed test case for assign into bitcast
DISK assign has custom hack for this. need to fix before we can unify assign
* test_assign_bitcast_different_size
2026-01-31 14:26:47 -05:00
chenyu
81eee5b30a
unused spec [pr] ( #14468 )
...
no BUFFER_VIEW in tensor, and no CONTIGUOUS in KERNEL
2026-01-31 13:53:16 -05:00
nimlgen
f873c7b6c5
amd: fetch_name is file_name ( #14465 )
2026-01-31 20:11:07 +03:00
chenyu
c765641215
remove unused allow_any_len [pr] ( #14464 )
...
STORE has 2 src, RESHAPE has 2 src, BUFFER has 2 src
added some tests for the untested allow_any_len
2026-01-31 11:05:42 -05:00
chenyu
b4f5a51ebb
move tests to unit ( #14463 )
...
test_uop_graph does not need device, test_memory_planner can use NULL
2026-01-31 10:49:31 -05:00
qazal
616e9c1483
CDNA assembly gemm in tensor.py with flag ( #14310 )
...
* work
* work
* the assembly
* remove the old one
* remove ws bufs, assert splitk
* notes cleanup
* work
* gemm args
* gemm in mixins would be nice
* add gemm gradient
* print counters
* the realize is for DEBUG=2 aesthetics
* dedup
* rewrite to python dsl, no list copies
* leave that
* add B, M, N, K to gemm name
* it's M0 not NULL
* fp16 support
* test cleanup + more gemms
* work from viz
* more work
* gemm batch_size
* xccg path work
* tiny comments on the label naming
* s_waitcnt
2026-01-31 22:34:14 +09:00
chenyu
55f806b713
tighter late_buffer_view match [pr] ( #14456 )
...
src must be len 2 at that point
2026-01-31 07:28:26 -05:00
qazal
d69bc5aa1a
make DEV=NULL EMULATE=AMD amd_asm_matmul run ( #14460 )
2026-01-31 20:45:24 +09:00
qazal
4976544bf9
multi ram usage tests on the NULL device ( #14457 )
2026-01-31 14:14:53 +09:00
chenyu
99b44121bc
failed test case for non-consecutive disk read ( #14455 )
...
silently fail now
2026-01-30 23:44:04 -05:00
George Hotz
b705c9143c
assembly/amd: test more instructions ( #14365 )
...
* assembly/amd: test more instructions
* more
* passing
* revert
* no const fold
* remove junk
* cleaner
2026-01-31 12:40:22 +08:00
George Hotz
c9a3ddb341
benchmark llama walltime script ( #14454 )
...
* benchmark llama walltime script
* adj layers
2026-01-31 10:21:54 +08:00
George Hotz
f5346d6a1a
fix USE_ATOMICS for non float dtypes and make it the default ( #14444 )
...
* embedded multistep test
* complex test
* with jit
* fix dtypes and reenable USE_ATOMICS
* that test didn't catch anything
2026-01-31 09:44:16 +08:00
Christopher Milan
e575dd8275
prevent UB in long decomp and more emulated tests ( #14447 )
2026-01-30 19:38:41 -05:00
chenyu
3204f94454
correct var_vals schedule filter ( #14451 )
...
complete_create_schedule_with_vars returns var_vals that's used in schedule
2026-01-30 17:10:07 -05:00
chenyu
cfcd1debb5
test schedule with multiple AFTER ( #14449 )
2026-01-30 15:59:00 -05:00
nimlgen
486d53d646
device: call free for external_ptr ( #14448 )
...
* device: call free for external_ptr
* lin
2026-01-30 23:53:17 +03:00
nimlgen
e0978498dc
amd: read_ptr/write_ptr/doorbells are not lists ( #14445 )
2026-01-30 23:11:57 +03:00
Christopher Milan
1803ee939d
EMULATED_DTYPES=long works with CPU_LLVM ( #14446 )
2026-01-30 13:54:43 -05:00
chenyu
03613e83ad
update TestTensorMetadata ( #14443 )
...
run with SCACHE=0 some more TODOs
2026-01-30 12:39:01 -05:00
George Hotz
cbb1eed57b
hotfix: partial revert of 9eb449f88, caused llama NaN
2026-01-30 17:19:27 +00:00
chenyu
26f5c00265
move TestTensorMetadata to unit ( #14442 )
2026-01-30 12:14:21 -05:00