Commit Graph

4988 Commits

Author SHA1 Message Date
qazal
f9cfb64cd9 test asm_gemm in CI (#14551)
* test asm_gemm in CI

* default float16

* use a smaller shape for multi

* smaller size

* smaller for CI

* smaller for ci

* need half
2026-02-05 13:32:22 +09:00
chenyu
03d0fa9c3f merge as_buf into buf_uop [pr] (#14541) 2026-02-04 16:32:23 -05:00
chenyu
d57d24c7d4 Buffer.as_buffer -> Buffer.as_memoryview [pr] (#14535)
it casts to memoryview. also inline the as_typed_buffer checks to Tensor._data
2026-02-04 11:31:11 -05:00
chenyu
67f91e897b UOp.is_contiguous -> UOp.has_buffer_identity [pr] (#14530)
one more confusing buffer related method, but it's definitely not is_contiguous
2026-02-04 09:21:26 -05:00
Christopher Milan
8c3c026d86 decomp float16 to float32 (#14417)
* decomp float16 to float32

* denormals arent zero

* add test

* denormals are zero

* fix

* oops

* bitcast works

* fix LOADs

* test_dtype passing

* cleanup

* mypy

* debug print

* only emulate if EMULATED

* very ugly, but passes spec

* add test_dtype_alu tests

* Revert "very ugly, but passes spec"

This reverts commit fdc3999b65.

* bottom up decompositions

* that should have symbolic

* simplify a bit

* SPEC really works

* run with DEBUG

* debug=4

* rm debug
2026-02-04 01:37:47 -05:00
chenyu
9c2fc118ef relax setitem target check (#14526)
old check was too conservative
2026-02-03 22:32:49 -05:00
nimlgen
2f55005ad9 qcom: sync cpu cache when from_blob (#14518)
* um

* fx

* d

* x

* x

* x

* x

* f

* ren
2026-02-03 21:51:03 +03:00
George Hotz
d59e6e7a37 move more tests to test/null, split some existing ones (#14512)
* move more tests to test/null, split some existing ones

* null work

* null work

* move more

* fixes

* move PIL

* PIL in CLIP

* don't move that
2026-02-03 20:20:20 +08:00
qazal
5c1d21349e viz: profiler command line tool (#14515) 2026-02-03 19:51:25 +09:00
George Hotz
dd2de4f838 rename all DEFINE_GLOBAL to PARAM (#14511) 2026-02-03 15:09:38 +08:00
George Hotz
dc77b3318b move files that pass with NULL=1 to test/null (#14508)
* move files that pass with NULL=1 to test/null

* fix windows

* cpu 0

* bugfix + durations
2026-02-03 13:52:36 +08:00
George Hotz
888819ee09 call autodiff gradient (#14510) 2026-02-03 13:51:02 +08:00
wozeparrot
bbcd3d67a3 fa: faster (#14453) 2026-02-02 21:34:17 -08:00
chenyu
3c5845e8a5 remove cut_store_range (#14505)
special scheduling for CPU
2026-02-02 21:58:36 -05:00
chenyu
4f2e7aed24 fix multiple REDUCE on same RANGE (#14504)
each RANGE maps to one END, but reduce_to_acc is local and would not know this
2026-02-02 20:42:09 -05:00
chenyu
66d2b02f11 delete files that depends on extra.optimization.helpers (#14499) 2026-02-02 13:33:33 -05:00
George Hotz
ec0398fceb test amd gpu crashes (#14459)
* test amd gpu crashes

* cleanup

* less sketch tests
2026-02-02 18:57:47 +03:00
George Hotz
6e958dbfd4 assembly/amd: add RDNA4 support to emulator (#14341)
* start new rdna4

* work

* plus works

* more pass

* rdna4

* assembly/amd: fix RDNA4 emulator for float16 and VOP3 clamp

* stale

* rev

* rr

* rdna4 emu tests

* cleanup

* cleanup

* simp

* works

* better factorizaion

* hacks

* fix mockgpu

* guard both

* cleaner

* gate

* bug fix and a few tests

* all test_tiny
2026-02-02 21:35:59 +08:00
qazal
1746d1f997 remove SPEC=0 context in custom_kernel tests, pyrender always skips it (#14489) 2026-02-02 16:32:01 +09:00
George Hotz
d4007f36e0 remove DEFINE_GLOBAL (it is PARAM now) (#14488) 2026-02-02 14:56:37 +08:00
Christopher Milan
2931b52875 skip autogen if MTLCompiler is loaded (#14466) 2026-02-01 22:12:27 -05:00
chenyu
ea1f1d2b9d test_assign_to_bitcast_view (#14483)
currently disk allows assign same size dtype into a bitcasted view
2026-02-01 16:46:04 -05:00
chenyu
6deeccc192 fix RING with single dest (#14482) 2026-02-01 12:14:46 -05:00
chenyu
3ff390159b don't implicitly change dtype in assign (#14481)
broadcast shape is fine, but implicitly cast dtype is hard to find
2026-02-01 11:48:54 -05:00
imaolo
2111762a48 failed test case for RING output device (#14191)
* Add enable/disable scheduler cache ContextVar

* add allreduce ring and naive to() tests

* clearer test comparing native vs ring allreduce

* split tests, add helper

* removing trailing whitespace

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-02-01 11:48:43 -05:00
chenyu
02afae04f4 atol in test_call_gemm (#14480)
flaky
2026-02-01 11:24:58 -05:00
chenyu
5705398a1f assign cleanup [pr] (#14479)
share more code path between disk and non-disk. also raise RuntimeError instead of Assert for mismatches
2026-02-01 09:10:22 -05:00
chenyu
b4f96301e0 remove unused rules [pr] (#14477) 2026-01-31 21:29:30 -05:00
qazal
54e78dbec8 viz: remove hardcoded strings in cfg tests (#14462) 2026-02-01 09:30:43 +09:00
chenyu
5d38db9da6 generic bitcast assign (#14474)
a.bitcast(X).assign(src) -> a.assign(src.bitcast(a.dtype))
2026-01-31 17:29:20 -05:00
chenyu
b38fc43b07 assert assign dtype mismatch for disk [pr] (#14473)
the disk hack is generally wrong, now force bitcast on the source before assign
2026-01-31 17:08:54 -05:00
chenyu
ced886f26c failed test case for assign into bitcast (#14469)
* failed test case for assign into bitcast

DISK assign has custom hack for this. need to fix before we can unify assign

* test_assign_bitcast_different_size
2026-01-31 14:26:47 -05:00
chenyu
c765641215 remove unused allow_any_len [pr] (#14464)
STORE has 2 src, RESHAPE has 2 src, BUFFER has 2 src
added some tests for the untested allow_any_len
2026-01-31 11:05:42 -05:00
chenyu
b4f5a51ebb move tests to unit (#14463)
test_uop_graph does not need device, test_memory_planner can use NULL
2026-01-31 10:49:31 -05:00
qazal
4976544bf9 multi ram usage tests on the NULL device (#14457) 2026-01-31 14:14:53 +09:00
chenyu
99b44121bc failed test case for non-consecutive disk read (#14455)
silently fail now
2026-01-30 23:44:04 -05:00
Christopher Milan
e575dd8275 prevent UB in long decomp and more emulated tests (#14447) 2026-01-30 19:38:41 -05:00
chenyu
3204f94454 correct var_vals schedule filter (#14451)
complete_create_schedule_with_vars returns var_vals that's used in schedule
2026-01-30 17:10:07 -05:00
chenyu
cfcd1debb5 test schedule with multiple AFTER (#14449) 2026-01-30 15:59:00 -05:00
chenyu
03613e83ad update TestTensorMetadata (#14443)
run with SCACHE=0 some more TODOs
2026-01-30 12:39:01 -05:00
chenyu
26f5c00265 move TestTensorMetadata to unit (#14442) 2026-01-30 12:14:21 -05:00
George Hotz
838cd078bc use atomics for embedding backward (#14400)
* embedding is slow

* failing

* float is fine

* null

* it fails

* simplify embedding with broadcasting

* ATOMIC_ADD incoming

* min change

* simpler test

* better test

* fix test

* real test

* simpler

* cleanups

* types and names

* _zero_kernel

* grad multi

* hack

* none

* multi unshard

* more for call

* don't tag in call

* good

* call_multi

* call_multi wow claude is useless

* embedding backward mutli test

* test passes

* fix as_param

* shape_to_shape_arg

* add clip

* before cast

* fix spec=2, use atomics
2026-01-30 18:10:59 +08:00
George Hotz
7a9dee4e50 add call/param UOps (#14433)
* add call/param UOps

* resolve call

* skip that for now

* grad on call

* fix tests
2026-01-30 14:51:45 +08:00
qazal
66d6a68016 viz: sqtt work from cdna gemm (#14434)
* it's the tag

* initialize rows based on the disasm

* test_cfg with Ops.BINARY

* pyremu wants s_code_end?

* test_diamond

* diff cleanup
2026-01-30 14:00:56 +09:00
chenyu
86a204d22a allow Tensor setitem input to be list/tuple (#14432)
matches assign, and generally matches numpy
2026-01-29 21:26:58 -05:00
chenyu
ddc041854b failed test case for disk setitem (#14426)
strided setitem is wrong
2026-01-29 14:54:19 -05:00
nimlgen
230d08ec70 test for am recovery and faults handling (#14421)
* test for am recovery and faults handling

* linter
2026-01-29 17:11:24 +03:00
chenyu
2b5e99ccc1 minor type cleanups [pr] (#14408)
mypy --warn-redundant-casts has false negative
2026-01-28 14:11:50 -05:00
chenyu
7b9bc1d8cf _MockMemoryviewMeta for mockgpu (#14405)
fixed `PYTHONPATH=. TYPED=1 DEV=AMD MOCKGPU=1 python test/test_tiny.py`. basically make `isinstance(TrackedMemoryView_instance, memoryview)` true
2026-01-28 11:59:00 -05:00
chenyu
a9b44070a8 fix webgpu runtime types (#14402)
`CHECK_OOB=0 DEV=WEBGPU TYPED=1 python test/test_tiny.py` passed, also skip tests that failed locally
2026-01-28 10:37:25 -05:00