qazal
f9cfb64cd9
test asm_gemm in CI ( #14551 )
...
* test asm_gemm in CI
* default float16
* use a smaller shape for multi
* smaller size
* smaller for CI
* smaller for ci
* need half
2026-02-05 13:32:22 +09:00
chenyu
03d0fa9c3f
merge as_buf into buf_uop [pr] ( #14541 )
2026-02-04 16:32:23 -05:00
chenyu
d57d24c7d4
Buffer.as_buffer -> Buffer.as_memoryview [pr] ( #14535 )
...
it casts to memoryview. also inline the as_typed_buffer checks to Tensor._data
2026-02-04 11:31:11 -05:00
chenyu
67f91e897b
UOp.is_contiguous -> UOp.has_buffer_identity [pr] ( #14530 )
...
one more confusing buffer related method, but it's definitely not is_contiguous
2026-02-04 09:21:26 -05:00
Christopher Milan
8c3c026d86
decomp float16 to float32 ( #14417 )
...
* decomp float16 to float32
* denormals arent zero
* add test
* denormals are zero
* fix
* oops
* bitcast works
* fix LOADs
* test_dtype passing
* cleanup
* mypy
* debug print
* only emulate if EMULATED
* very ugly, but passes spec
* add test_dtype_alu tests
* Revert "very ugly, but passes spec"
This reverts commit fdc3999b65 .
* bottom up decompositions
* that should have symbolic
* simplify a bit
* SPEC really works
* run with DEBUG
* debug=4
* rm debug
2026-02-04 01:37:47 -05:00
chenyu
9c2fc118ef
relax setitem target check ( #14526 )
...
old check was too conservative
2026-02-03 22:32:49 -05:00
nimlgen
2f55005ad9
qcom: sync cpu cache when from_blob ( #14518 )
...
* um
* fx
* d
* x
* x
* x
* x
* f
* ren
2026-02-03 21:51:03 +03:00
George Hotz
d59e6e7a37
move more tests to test/null, split some existing ones ( #14512 )
...
* move more tests to test/null, split some existing ones
* null work
* null work
* move more
* fixes
* move PIL
* PIL in CLIP
* don't move that
2026-02-03 20:20:20 +08:00
qazal
5c1d21349e
viz: profiler command line tool ( #14515 )
2026-02-03 19:51:25 +09:00
George Hotz
dd2de4f838
rename all DEFINE_GLOBAL to PARAM ( #14511 )
2026-02-03 15:09:38 +08:00
George Hotz
dc77b3318b
move files that pass with NULL=1 to test/null ( #14508 )
...
* move files that pass with NULL=1 to test/null
* fix windows
* cpu 0
* bugfix + durations
2026-02-03 13:52:36 +08:00
George Hotz
888819ee09
call autodiff gradient ( #14510 )
2026-02-03 13:51:02 +08:00
wozeparrot
bbcd3d67a3
fa: faster ( #14453 )
2026-02-02 21:34:17 -08:00
chenyu
3c5845e8a5
remove cut_store_range ( #14505 )
...
special scheduling for CPU
2026-02-02 21:58:36 -05:00
chenyu
4f2e7aed24
fix multiple REDUCE on same RANGE ( #14504 )
...
each RANGE maps to one END, but reduce_to_acc is local and would not know this
2026-02-02 20:42:09 -05:00
chenyu
66d2b02f11
delete files that depends on extra.optimization.helpers ( #14499 )
2026-02-02 13:33:33 -05:00
George Hotz
ec0398fceb
test amd gpu crashes ( #14459 )
...
* test amd gpu crashes
* cleanup
* less sketch tests
2026-02-02 18:57:47 +03:00
George Hotz
6e958dbfd4
assembly/amd: add RDNA4 support to emulator ( #14341 )
...
* start new rdna4
* work
* plus works
* more pass
* rdna4
* assembly/amd: fix RDNA4 emulator for float16 and VOP3 clamp
* stale
* rev
* rr
* rdna4 emu tests
* cleanup
* cleanup
* simp
* works
* better factorizaion
* hacks
* fix mockgpu
* guard both
* cleaner
* gate
* bug fix and a few tests
* all test_tiny
2026-02-02 21:35:59 +08:00
qazal
1746d1f997
remove SPEC=0 context in custom_kernel tests, pyrender always skips it ( #14489 )
2026-02-02 16:32:01 +09:00
George Hotz
d4007f36e0
remove DEFINE_GLOBAL (it is PARAM now) ( #14488 )
2026-02-02 14:56:37 +08:00
Christopher Milan
2931b52875
skip autogen if MTLCompiler is loaded ( #14466 )
2026-02-01 22:12:27 -05:00
chenyu
ea1f1d2b9d
test_assign_to_bitcast_view ( #14483 )
...
currently disk allows assign same size dtype into a bitcasted view
2026-02-01 16:46:04 -05:00
chenyu
6deeccc192
fix RING with single dest ( #14482 )
2026-02-01 12:14:46 -05:00
chenyu
3ff390159b
don't implicitly change dtype in assign ( #14481 )
...
broadcast shape is fine, but implicitly cast dtype is hard to find
2026-02-01 11:48:54 -05:00
imaolo
2111762a48
failed test case for RING output device ( #14191 )
...
* Add enable/disable scheduler cache ContextVar
* add allreduce ring and naive to() tests
* clearer test comparing native vs ring allreduce
* split tests, add helper
* removing trailing whitespace
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-02-01 11:48:43 -05:00
chenyu
02afae04f4
atol in test_call_gemm ( #14480 )
...
flaky
2026-02-01 11:24:58 -05:00
chenyu
5705398a1f
assign cleanup [pr] ( #14479 )
...
share more code path between disk and non-disk. also raise RuntimeError instead of Assert for mismatches
2026-02-01 09:10:22 -05:00
chenyu
b4f96301e0
remove unused rules [pr] ( #14477 )
2026-01-31 21:29:30 -05:00
qazal
54e78dbec8
viz: remove hardcoded strings in cfg tests ( #14462 )
2026-02-01 09:30:43 +09:00
chenyu
5d38db9da6
generic bitcast assign ( #14474 )
...
a.bitcast(X).assign(src) -> a.assign(src.bitcast(a.dtype))
2026-01-31 17:29:20 -05:00
chenyu
b38fc43b07
assert assign dtype mismatch for disk [pr] ( #14473 )
...
the disk hack is generally wrong, now force bitcast on the source before assign
2026-01-31 17:08:54 -05:00
chenyu
ced886f26c
failed test case for assign into bitcast ( #14469 )
...
* failed test case for assign into bitcast
DISK assign has custom hack for this. need to fix before we can unify assign
* test_assign_bitcast_different_size
2026-01-31 14:26:47 -05:00
chenyu
c765641215
remove unused allow_any_len [pr] ( #14464 )
...
STORE has 2 src, RESHAPE has 2 src, BUFFER has 2 src
added some tests for the untested allow_any_len
2026-01-31 11:05:42 -05:00
chenyu
b4f5a51ebb
move tests to unit ( #14463 )
...
test_uop_graph does not need device, test_memory_planner can use NULL
2026-01-31 10:49:31 -05:00
qazal
4976544bf9
multi ram usage tests on the NULL device ( #14457 )
2026-01-31 14:14:53 +09:00
chenyu
99b44121bc
failed test case for non-consecutive disk read ( #14455 )
...
silently fail now
2026-01-30 23:44:04 -05:00
Christopher Milan
e575dd8275
prevent UB in long decomp and more emulated tests ( #14447 )
2026-01-30 19:38:41 -05:00
chenyu
3204f94454
correct var_vals schedule filter ( #14451 )
...
complete_create_schedule_with_vars returns var_vals that's used in schedule
2026-01-30 17:10:07 -05:00
chenyu
cfcd1debb5
test schedule with multiple AFTER ( #14449 )
2026-01-30 15:59:00 -05:00
chenyu
03613e83ad
update TestTensorMetadata ( #14443 )
...
run with SCACHE=0 some more TODOs
2026-01-30 12:39:01 -05:00
chenyu
26f5c00265
move TestTensorMetadata to unit ( #14442 )
2026-01-30 12:14:21 -05:00
George Hotz
838cd078bc
use atomics for embedding backward ( #14400 )
...
* embedding is slow
* failing
* float is fine
* null
* it fails
* simplify embedding with broadcasting
* ATOMIC_ADD incoming
* min change
* simpler test
* better test
* fix test
* real test
* simpler
* cleanups
* types and names
* _zero_kernel
* grad multi
* hack
* none
* multi unshard
* more for call
* don't tag in call
* good
* call_multi
* call_multi wow claude is useless
* embedding backward mutli test
* test passes
* fix as_param
* shape_to_shape_arg
* add clip
* before cast
* fix spec=2, use atomics
2026-01-30 18:10:59 +08:00
George Hotz
7a9dee4e50
add call/param UOps ( #14433 )
...
* add call/param UOps
* resolve call
* skip that for now
* grad on call
* fix tests
2026-01-30 14:51:45 +08:00
qazal
66d6a68016
viz: sqtt work from cdna gemm ( #14434 )
...
* it's the tag
* initialize rows based on the disasm
* test_cfg with Ops.BINARY
* pyremu wants s_code_end?
* test_diamond
* diff cleanup
2026-01-30 14:00:56 +09:00
chenyu
86a204d22a
allow Tensor setitem input to be list/tuple ( #14432 )
...
matches assign, and generally matches numpy
2026-01-29 21:26:58 -05:00
chenyu
ddc041854b
failed test case for disk setitem ( #14426 )
...
strided setitem is wrong
2026-01-29 14:54:19 -05:00
nimlgen
230d08ec70
test for am recovery and faults handling ( #14421 )
...
* test for am recovery and faults handling
* linter
2026-01-29 17:11:24 +03:00
chenyu
2b5e99ccc1
minor type cleanups [pr] ( #14408 )
...
mypy --warn-redundant-casts has false negative
2026-01-28 14:11:50 -05:00
chenyu
7b9bc1d8cf
_MockMemoryviewMeta for mockgpu ( #14405 )
...
fixed `PYTHONPATH=. TYPED=1 DEV=AMD MOCKGPU=1 python test/test_tiny.py`. basically make `isinstance(TrackedMemoryView_instance, memoryview)` true
2026-01-28 11:59:00 -05:00
chenyu
a9b44070a8
fix webgpu runtime types ( #14402 )
...
`CHECK_OOB=0 DEV=WEBGPU TYPED=1 python test/test_tiny.py` passed, also skip tests that failed locally
2026-01-28 10:37:25 -05:00