chenyu
03d0fa9c3f
merge as_buf into buf_uop [pr] ( #14541 )
2026-02-04 16:32:23 -05:00
chenyu
d57d24c7d4
Buffer.as_buffer -> Buffer.as_memoryview [pr] ( #14535 )
...
it casts to memoryview. also inline the as_typed_buffer checks to Tensor._data
2026-02-04 11:31:11 -05:00
chenyu
67f91e897b
UOp.is_contiguous -> UOp.has_buffer_identity [pr] ( #14530 )
...
one more confusing buffer related method, but it's definitely not is_contiguous
2026-02-04 09:21:26 -05:00
George Hotz
d59e6e7a37
move more tests to test/null, split some existing ones ( #14512 )
...
* move more tests to test/null, split some existing ones
* null work
* null work
* move more
* fixes
* move PIL
* PIL in CLIP
* don't move that
2026-02-03 20:20:20 +08:00
chenyu
4f2e7aed24
fix multiple REDUCE on same RANGE ( #14504 )
...
each RANGE maps to one END, but reduce_to_acc is local and would not know this
2026-02-02 20:42:09 -05:00
chenyu
3204f94454
correct var_vals schedule filter ( #14451 )
...
complete_create_schedule_with_vars returns var_vals that's used in schedule
2026-01-30 17:10:07 -05:00
chenyu
05fcb57696
also return index in Tensor.cummax ( #14117 )
...
* also return index in Tensor.cummax
* fix
2026-01-12 22:42:10 -05:00
chenyu
92246ea731
update tests, WEBGPU=1 pytest . passes ( #14089 )
...
* update tests, `WEBGPU=1 pytest .` passes
* minor update
2026-01-10 00:03:02 -05:00
George Hotz
744af193f0
remove ScheduleItem and merge it with ExecItem ( #13759 )
...
* remove ExecItem and merge it with ScheduleItem
* less diff
* fix issues
* min diff
* don't change bufs in _lower
* min diff
* update
* revert
* fixes
* diff
2025-12-19 17:04:24 -04:00
George Hotz
3dbde178c1
mark slow tests as slow instead of as CI ( #13736 )
...
* mark slow tests as slow instead of as CI
* CI shouldn't have different behavior
* more skips / CI
* slow
2025-12-17 10:29:57 -04:00
chenyu
89f9e1dcd5
add SGD to beautiful_mnist ( #13571 )
2025-12-04 12:17:29 -05:00
George Hotz
a4c4e48385
add LUNIQUE op ( #13554 )
2025-12-03 14:34:34 -08:00
George Hotz
e4cd649ff0
remove kernelize to prepare for refactors ( #13463 )
...
* remove kernelize to prepare for refactors
* less kernelize
* last test
2025-11-26 14:18:50 -08:00
George Hotz
ffb9e8396f
fix indexing bug with convs
...
* minimal difference for ONE_POOL=1
* fix indexing bug
* improve indexing debugger
* more debugger improvements
* always for reshape
2025-11-07 16:45:19 -08:00
George Hotz
962d980919
fuse hasn't worked since rangeify, remove it ( #13057 )
2025-11-02 14:01:52 +08:00
Sieds Lykles
885b6dea9e
multiple reduce range arange folding ( #13047 )
...
* multi reduce arange folding
* add test
* cvar to var
* add circular_pad_bw test
2025-11-01 22:11:26 +01:00
Sieds Lykles
ecb8565f67
Revert "Better cleanup of arange bufferize ( #13046 )" ( #13048 )
...
This reverts commit c99b7dfd4a .
2025-11-01 18:09:37 +01:00
Sieds Lykles
c99b7dfd4a
Better cleanup of arange bufferize ( #13046 )
...
* check for reduce and index instead of cast
* add test
2025-11-01 16:16:31 +01:00
George Hotz
b791d70725
support custom UOp kernels ( #13028 )
...
* support custom UOp kernels
* no number
* multioutput works
* backward kernel runs
* move kernel class
* grad later
* work
* no tags in kernel graph
* test arange
* arange + contig
* delete comment
2025-10-31 15:51:39 +08:00
George Hotz
2da02f1ae1
add loads at the end ( #12988 )
...
* add loads at the end
* simpler
* late load
* tests passing
* fix matvec
* spec test passes
* fix where on load
* fix abs2
* fix more tests
2025-10-30 10:42:19 +08:00
George Hotz
b147e7e8e6
flatten bufferize ( #12984 )
...
* flatten bufferize
* simpler
* tests pass
* flat
* not flat
2025-10-29 11:23:43 +08:00
George Hotz
b0da173f2f
add unique to const, fix longstanding bug ( #12965 )
...
* add unique to const, fix longstanding bug
* _force_unique=True
* fix tests
* fix more tests
2025-10-28 15:11:37 +08:00
George Hotz
804133cffd
rename RECIP to RECIPROCAL ( #12939 )
2025-10-27 16:53:13 +08:00
Sieds Lykles
7f798a9630
Cleanup const buffers ( #12829 )
...
* split pm_cleanups
* update test_schedule
* shrink when we remove bufferize
* dont do shrink if shape is empty
* update tests
* remove *1 from metadata
* deal with the noop bufferize
* only noop on cvar
* cleanup
* fix if
* rename
2025-10-21 14:53:49 +02:00
chenyu
fcdf4ab37e
remove a contiguous in LARS ( #12770 )
2025-10-17 17:07:30 -04:00
George Hotz
062a6d68d7
test flash attention backward ( #12762 )
...
* test flash attention backward
* TODO: fix pcontig
* end ranges
* render colors
* very big
* multiout at every level
* reset ending ranges
* fix tests
* ugh
2025-10-17 23:15:59 +08:00
chenyu
9561803cb0
fix assert in test_schedule ( #12745 )
...
* fix assert in test_schedule
updated kernel counts and some old tests
* fix
2025-10-16 15:39:50 -04:00
chenyu
285534ce64
delete DONT_REALIZE_EXPAND and DONT_GROUP_REDUCES ( #12744 )
...
does nothing now
2025-10-16 14:11:33 -04:00
George Hotz
592e86f6f5
remove UOp.st ( #12716 )
...
* remove UOp.st
* fix tests
* torch backend disable
2025-10-16 14:44:09 +08:00
George Hotz
612e3d6143
replace mop arg with vectorized index ( #12695 )
...
* replace mop arg with vectorized index
* tests passing
* better viz
* no compile4
2025-10-15 20:50:06 +08:00
George Hotz
cab034b863
improve typing ( #12611 )
...
* improve typing and bump to 3.11
* no need for Self yet
* improve typing
* binop also
2025-10-11 16:20:23 +08:00
chenyu
f2c3a72b0c
remove RANGEIFY flag [pr] ( #12577 )
2025-10-09 21:52:54 -04:00
qazal
b86ad6053a
test_schedule independent of RANGEIFY flag ( #12568 )
...
* test_schedule independent of RANGEIFY flag
* comment for expectedFailure + test_cast_padded_view
* test_cast_padded_const works
* don't use full_shape it's fine
* add todos for the rest
2025-10-09 20:00:50 +03:00
chenyu
ae51bdd06a
remove trivial use of RANGEIFY flag ( #12550 )
...
some tests need update still
2025-10-09 02:29:38 -04:00
qazal
bb5671a837
some more ops.py cleanups ( #12525 )
...
* remove GroupOp.Meta and st_arg
* inline axis_arg
* only allow .buffer on reshapes (or the buffer)
* gate is the other way
* still want can_pad?
* use op_in_backward_slice_with_self
* .buffer is recursive
* lint
* pathlib there
2025-10-09 06:06:44 +03:00
chenyu
c4732a18bd
update tests that depend on SPLIT_REDUCEOP ( #12534 )
2025-10-08 21:53:30 -04:00
chenyu
28edea5d67
delete FUSE_CONV_BW ( #12527 )
2025-10-08 10:41:38 -04:00
qazal
b6835f4134
remove Ops.VIEW and related UOp methods ( #12522 )
...
* remove Ops.VIEW and related UOp methods
* update abstractions2.py
* no ShapeTrackers in abstractions2.py
* it's a size 1
2025-10-08 14:47:02 +03:00
George Hotz
3b0b3a2e64
fast RANGEIFY ( #12504 )
...
* rtoposort is fast, can replace rangeify with this
* fast rangeify
* work
* fast rangeify works for mnist
* should work
* progress
* pad fix
* FAST
* tests passing
* don't delete those shape ops
* put in rangeify map
* ending ranges fix
* tests
* mstack/mselect no hacks
* move to indexing.py
* touch up tests + add comments
* disable failing test
* actually make the file readable
* failing
* error
2025-10-08 19:38:06 +08:00
qazal
6f26603f06
delete swizzler.py ( #12518 )
...
* delete swizzler
* remove merge_views tests
* don't need rewrites_for_views
* apply_rewrites
2025-10-08 13:02:34 +03:00
qazal
7e0b14243e
delete grouper and kernelize ( #12517 )
...
* delete grouper and kernelize
* +sys.setrecursionlimit
2025-10-08 12:27:26 +03:00
chenyu
e701106a64
remove FUSE_ARANGE ( #12511 )
...
it was the default already
2025-10-08 04:54:07 -04:00
qazal
60b6dca5ba
update some tests instead of expect_rangeify_fails ( #12500 )
...
* update test_clone_doesnt_dedup to use base
* new_flat_buffer passes
* fix test_reorder_expand
* remove the view stuff
* remove that test, we don't want this view const behavior
* test_setitem_becomes_subbuffer is good
2025-10-08 07:42:31 +03:00
qazal
84597ed53c
early assert for device mistmatched asts in rangeify ( #12499 )
...
* early assert for device mistmatched asts in rangeify
* alt also passes
2025-10-08 07:19:36 +03:00
qazal
76e8a3250c
rangeify: late zero folding ( #12464 )
...
* rangeify: late zero folding
* early
* not kernels
* none
* multi
* linter
* mstack is sink comment
* more comment
2025-10-06 12:52:33 +03:00
qazal
1b1978b9c0
early copy fixup ( #12463 )
...
* simple failing test
* early copy fixup
2025-10-06 06:38:29 +03:00
chenyu
c1e85f699c
multi test case for sharded ring allreduce ( #12462 )
...
* multi test case for sharded ring allreduce
triggers `children not making progress` with RANGEIFY
* expect_rangeify_fails
2025-10-05 23:18:24 -04:00
George Hotz
46e8ea15c1
split pm_substitute_recurse ( #12460 )
2025-10-05 21:35:50 -04:00
qazal
6ad9a688ed
add failing test after "pend substitutes for speed" ( #12457 )
...
* add failing substitute test
* expect_rangeify_fails
2025-10-05 16:10:04 +03:00
qazal
13a25b2e67
rangeify: don't shape INDEX on kernelize ( #12417 )
2025-10-02 09:45:37 +03:00