Commit Graph

1050 Commits

Author SHA1 Message Date
chenyu
ca68037f26 lazy basic setitem to unrealized Tensor (#14756)
undo the view and make it a mask, this fuses the setitem with any pending compute too.

one behavior change is that for target not backed by a buffer (const and arange), rangeify makes output contiguous under the hood.
this is stricter better than raise and ask user to call contiguous, as that would no longer be fuse-able.
2026-02-14 20:27:03 -05:00
chenyu
902dc7c09c fix test_numpy_parity_and_backward_2d (#14755)
test setup issue, test failed locally with `RUN_SLOW=1`
2026-02-14 17:59:00 -05:00
chenyu
043f5dbfa0 fix write-after-read tracking (#14754)
AFTER-AFTER was silently dropped, which breaks write-after-read
2026-02-14 17:23:05 -05:00
chenyu
d79c63a0ff test_multi_step_assign_read_write_same_buffer (#14752)
pattern in LAMB that can be off subtly
2026-02-14 16:39:08 -05:00
chenyu
0ce4a55dad clean up test_setitem_slice (#14750)
moved to test_setitem_schedule, and use contiguous zeros as scheduler handles empty differently now
2026-02-14 14:29:16 -05:00
chenyu
8f6772fd8c more setitem kernel mem tests (#14749)
* more setitem kernel mem tests

test only the slice is accessed

* update
2026-02-14 11:01:03 -05:00
chenyu
787998fac3 fix getitem tensor indexing detection (#14712)
issue with sint
2026-02-12 16:04:37 -05:00
chenyu
0c63f63ee4 recursive resolve assign dependency (#14688)
remove the .realize in llm.py
2026-02-11 17:41:05 -05:00
chenyu
cbbc2fdea5 update test_assign_slice_then_read (#14687)
passes locally now
2026-02-11 15:02:44 -05:00
George Hotz
8dc46dde07 everything has dtype.long now (#14661)
* everything has dtype.long now

* int64/uint64 are everywhere now

* that doesn't work
2026-02-10 15:08:50 +08:00
chenyu
9e3f24db9f assign realize fix (#14649)
fix the need for explicit assign. track pending assigns for each buffer, and run those before the main realize in order
2026-02-09 17:46:46 -05:00
chenyu
e9f40f49d4 explicitly check advanced setitem (#14644)
advanced setitem DISK would failed in rangeify with bad error, now it's checked directly in setitem. eventully DISK can use regular setitem path
2026-02-09 13:36:46 -05:00
chenyu
7d193a6e26 fix wgsl bitcast (#14600)
was wrong for signed int
2026-02-06 16:57:36 -05:00
George Hotz
03af2404e2 small changes and test fixes from kernel is call (#14586) 2026-02-06 17:08:33 +08:00
George Hotz
3c26ce29b2 make disk tensor tests process safe (#14584) 2026-02-06 15:39:55 +08:00
chenyu
15d3344d9e use int inputs in test_assign (#14580)
int is less flaky
2026-02-06 00:07:31 -05:00
chenyu
b09dc646f5 revert some late_buffer_view change (#14578)
revert #14478 which breaks tinyfs
2026-02-05 22:51:40 -05:00
chenyu
03d0fa9c3f merge as_buf into buf_uop [pr] (#14541) 2026-02-04 16:32:23 -05:00
George Hotz
d59e6e7a37 move more tests to test/null, split some existing ones (#14512)
* move more tests to test/null, split some existing ones

* null work

* null work

* move more

* fixes

* move PIL

* PIL in CLIP

* don't move that
2026-02-03 20:20:20 +08:00
George Hotz
dc77b3318b move files that pass with NULL=1 to test/null (#14508)
* move files that pass with NULL=1 to test/null

* fix windows

* cpu 0

* bugfix + durations
2026-02-03 13:52:36 +08:00
George Hotz
888819ee09 call autodiff gradient (#14510) 2026-02-03 13:51:02 +08:00
qazal
1746d1f997 remove SPEC=0 context in custom_kernel tests, pyrender always skips it (#14489) 2026-02-02 16:32:01 +09:00
Christopher Milan
2931b52875 skip autogen if MTLCompiler is loaded (#14466) 2026-02-01 22:12:27 -05:00
chenyu
ea1f1d2b9d test_assign_to_bitcast_view (#14483)
currently disk allows assign same size dtype into a bitcasted view
2026-02-01 16:46:04 -05:00
chenyu
3ff390159b don't implicitly change dtype in assign (#14481)
broadcast shape is fine, but implicitly cast dtype is hard to find
2026-02-01 11:48:54 -05:00
chenyu
02afae04f4 atol in test_call_gemm (#14480)
flaky
2026-02-01 11:24:58 -05:00
chenyu
5705398a1f assign cleanup [pr] (#14479)
share more code path between disk and non-disk. also raise RuntimeError instead of Assert for mismatches
2026-02-01 09:10:22 -05:00
chenyu
b4f96301e0 remove unused rules [pr] (#14477) 2026-01-31 21:29:30 -05:00
chenyu
5d38db9da6 generic bitcast assign (#14474)
a.bitcast(X).assign(src) -> a.assign(src.bitcast(a.dtype))
2026-01-31 17:29:20 -05:00
chenyu
b38fc43b07 assert assign dtype mismatch for disk [pr] (#14473)
the disk hack is generally wrong, now force bitcast on the source before assign
2026-01-31 17:08:54 -05:00
chenyu
ced886f26c failed test case for assign into bitcast (#14469)
* failed test case for assign into bitcast

DISK assign has custom hack for this. need to fix before we can unify assign

* test_assign_bitcast_different_size
2026-01-31 14:26:47 -05:00
chenyu
c765641215 remove unused allow_any_len [pr] (#14464)
STORE has 2 src, RESHAPE has 2 src, BUFFER has 2 src
added some tests for the untested allow_any_len
2026-01-31 11:05:42 -05:00
chenyu
b4f5a51ebb move tests to unit (#14463)
test_uop_graph does not need device, test_memory_planner can use NULL
2026-01-31 10:49:31 -05:00
chenyu
99b44121bc failed test case for non-consecutive disk read (#14455)
silently fail now
2026-01-30 23:44:04 -05:00
chenyu
03613e83ad update TestTensorMetadata (#14443)
run with SCACHE=0 some more TODOs
2026-01-30 12:39:01 -05:00
chenyu
26f5c00265 move TestTensorMetadata to unit (#14442) 2026-01-30 12:14:21 -05:00
George Hotz
838cd078bc use atomics for embedding backward (#14400)
* embedding is slow

* failing

* float is fine

* null

* it fails

* simplify embedding with broadcasting

* ATOMIC_ADD incoming

* min change

* simpler test

* better test

* fix test

* real test

* simpler

* cleanups

* types and names

* _zero_kernel

* grad multi

* hack

* none

* multi unshard

* more for call

* don't tag in call

* good

* call_multi

* call_multi wow claude is useless

* embedding backward mutli test

* test passes

* fix as_param

* shape_to_shape_arg

* add clip

* before cast

* fix spec=2, use atomics
2026-01-30 18:10:59 +08:00
George Hotz
7a9dee4e50 add call/param UOps (#14433)
* add call/param UOps

* resolve call

* skip that for now

* grad on call

* fix tests
2026-01-30 14:51:45 +08:00
chenyu
86a204d22a allow Tensor setitem input to be list/tuple (#14432)
matches assign, and generally matches numpy
2026-01-29 21:26:58 -05:00
chenyu
ddc041854b failed test case for disk setitem (#14426)
strided setitem is wrong
2026-01-29 14:54:19 -05:00
Christopher Milan
5e36482314 decompose long to ints where unsupported, try 2 (#14383) 2026-01-27 23:20:43 -05:00
George Hotz
065b95cfb0 Revert "add retry to fetch (#14370)" (#14385)
This reverts commit dc4d7f2d55.
2026-01-28 09:35:37 +08:00
Eitan Turok
dc4d7f2d55 add retry to fetch (#14370) 2026-01-27 14:04:25 -08:00
Christopher Milan
289a3e415e also skip test_nonoverlapping_shrink_assignment (#14382) 2026-01-27 16:26:26 -05:00
chenyu
db010a31be IGNORE_OOB -> CHECK_OOB [pr] (#14374)
flip the meaning
2026-01-27 12:20:59 -05:00
chenyu
c22667b0c4 also skip test_overlapping_shrink_assignment_reverse (#14375)
crashing
2026-01-27 12:20:39 -05:00
George Hotz
0ced258726 HOTFIX: skip crashing assign test 2026-01-27 20:35:17 +08:00
imaolo
14574c68fa Add ContextVar to disable the scheduler cache (#14257)
* add scheduler cache ContextVar

* test scheduler cache context var

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-01-27 19:55:29 +08:00
Christopher Milan
2e72625652 Revert "decompose dtypes.long to ints where unsupported (#14261)" (#14362) 2026-01-27 02:04:59 -05:00
Christopher Milan
0793319929 decompose dtypes.long to ints where unsupported (#14261)
* add works

* use carry not overflow

* bitwise ops

* use tag instead of vec

* cleaner

* mul somewhat works

* mul actually works

* SUB and NEG work

* SHL/SHR

* ulong support

* this should work?

* oops

* fix indexing

* all ALU mostly works

* refactor

* test_dtype passing

* signed division works

* format

* clean

* some tests

* ruff
2026-01-26 18:34:13 -05:00