chenyu
89f9e1dcd5
add SGD to beautiful_mnist ( #13571 )
2025-12-04 12:17:29 -05:00
George Hotz
a4c4e48385
add LUNIQUE op ( #13554 )
2025-12-03 14:34:34 -08:00
George Hotz
e4cd649ff0
remove kernelize to prepare for refactors ( #13463 )
...
* remove kernelize to prepare for refactors
* less kernelize
* last test
2025-11-26 14:18:50 -08:00
George Hotz
ffb9e8396f
fix indexing bug with convs
...
* minimal difference for ONE_POOL=1
* fix indexing bug
* improve indexing debugger
* more debugger improvements
* always for reshape
2025-11-07 16:45:19 -08:00
George Hotz
962d980919
fuse hasn't worked since rangeify, remove it ( #13057 )
2025-11-02 14:01:52 +08:00
Sieds Lykles
885b6dea9e
multiple reduce range arange folding ( #13047 )
...
* multi reduce arange folding
* add test
* cvar to var
* add circular_pad_bw test
2025-11-01 22:11:26 +01:00
Sieds Lykles
ecb8565f67
Revert "Better cleanup of arange bufferize ( #13046 )" ( #13048 )
...
This reverts commit c99b7dfd4a .
2025-11-01 18:09:37 +01:00
Sieds Lykles
c99b7dfd4a
Better cleanup of arange bufferize ( #13046 )
...
* check for reduce and index instead of cast
* add test
2025-11-01 16:16:31 +01:00
George Hotz
b791d70725
support custom UOp kernels ( #13028 )
...
* support custom UOp kernels
* no number
* multioutput works
* backward kernel runs
* move kernel class
* grad later
* work
* no tags in kernel graph
* test arange
* arange + contig
* delete comment
2025-10-31 15:51:39 +08:00
George Hotz
2da02f1ae1
add loads at the end ( #12988 )
...
* add loads at the end
* simpler
* late load
* tests passing
* fix matvec
* spec test passes
* fix where on load
* fix abs2
* fix more tests
2025-10-30 10:42:19 +08:00
George Hotz
b147e7e8e6
flatten bufferize ( #12984 )
...
* flatten bufferize
* simpler
* tests pass
* flat
* not flat
2025-10-29 11:23:43 +08:00
George Hotz
b0da173f2f
add unique to const, fix longstanding bug ( #12965 )
...
* add unique to const, fix longstanding bug
* _force_unique=True
* fix tests
* fix more tests
2025-10-28 15:11:37 +08:00
George Hotz
804133cffd
rename RECIP to RECIPROCAL ( #12939 )
2025-10-27 16:53:13 +08:00
Sieds Lykles
7f798a9630
Cleanup const buffers ( #12829 )
...
* split pm_cleanups
* update test_schedule
* shrink when we remove bufferize
* dont do shrink if shape is empty
* update tests
* remove *1 from metadata
* deal with the noop bufferize
* only noop on cvar
* cleanup
* fix if
* rename
2025-10-21 14:53:49 +02:00
chenyu
fcdf4ab37e
remove a contiguous in LARS ( #12770 )
2025-10-17 17:07:30 -04:00
George Hotz
062a6d68d7
test flash attention backward ( #12762 )
...
* test flash attention backward
* TODO: fix pcontig
* end ranges
* render colors
* very big
* multiout at every level
* reset ending ranges
* fix tests
* ugh
2025-10-17 23:15:59 +08:00
chenyu
9561803cb0
fix assert in test_schedule ( #12745 )
...
* fix assert in test_schedule
updated kernel counts and some old tests
* fix
2025-10-16 15:39:50 -04:00
chenyu
285534ce64
delete DONT_REALIZE_EXPAND and DONT_GROUP_REDUCES ( #12744 )
...
does nothing now
2025-10-16 14:11:33 -04:00
George Hotz
592e86f6f5
remove UOp.st ( #12716 )
...
* remove UOp.st
* fix tests
* torch backend disable
2025-10-16 14:44:09 +08:00
George Hotz
612e3d6143
replace mop arg with vectorized index ( #12695 )
...
* replace mop arg with vectorized index
* tests passing
* better viz
* no compile4
2025-10-15 20:50:06 +08:00
George Hotz
cab034b863
improve typing ( #12611 )
...
* improve typing and bump to 3.11
* no need for Self yet
* improve typing
* binop also
2025-10-11 16:20:23 +08:00
chenyu
f2c3a72b0c
remove RANGEIFY flag [pr] ( #12577 )
2025-10-09 21:52:54 -04:00
qazal
b86ad6053a
test_schedule independent of RANGEIFY flag ( #12568 )
...
* test_schedule independent of RANGEIFY flag
* comment for expectedFailure + test_cast_padded_view
* test_cast_padded_const works
* don't use full_shape it's fine
* add todos for the rest
2025-10-09 20:00:50 +03:00
chenyu
ae51bdd06a
remove trivial use of RANGEIFY flag ( #12550 )
...
some tests need update still
2025-10-09 02:29:38 -04:00
qazal
bb5671a837
some more ops.py cleanups ( #12525 )
...
* remove GroupOp.Meta and st_arg
* inline axis_arg
* only allow .buffer on reshapes (or the buffer)
* gate is the other way
* still want can_pad?
* use op_in_backward_slice_with_self
* .buffer is recursive
* lint
* pathlib there
2025-10-09 06:06:44 +03:00
chenyu
c4732a18bd
update tests that depend on SPLIT_REDUCEOP ( #12534 )
2025-10-08 21:53:30 -04:00
chenyu
28edea5d67
delete FUSE_CONV_BW ( #12527 )
2025-10-08 10:41:38 -04:00
qazal
b6835f4134
remove Ops.VIEW and related UOp methods ( #12522 )
...
* remove Ops.VIEW and related UOp methods
* update abstractions2.py
* no ShapeTrackers in abstractions2.py
* it's a size 1
2025-10-08 14:47:02 +03:00
George Hotz
3b0b3a2e64
fast RANGEIFY ( #12504 )
...
* rtoposort is fast, can replace rangeify with this
* fast rangeify
* work
* fast rangeify works for mnist
* should work
* progress
* pad fix
* FAST
* tests passing
* don't delete those shape ops
* put in rangeify map
* ending ranges fix
* tests
* mstack/mselect no hacks
* move to indexing.py
* touch up tests + add comments
* disable failing test
* actually make the file readable
* failing
* error
2025-10-08 19:38:06 +08:00
qazal
6f26603f06
delete swizzler.py ( #12518 )
...
* delete swizzler
* remove merge_views tests
* don't need rewrites_for_views
* apply_rewrites
2025-10-08 13:02:34 +03:00
qazal
7e0b14243e
delete grouper and kernelize ( #12517 )
...
* delete grouper and kernelize
* +sys.setrecursionlimit
2025-10-08 12:27:26 +03:00
chenyu
e701106a64
remove FUSE_ARANGE ( #12511 )
...
it was the default already
2025-10-08 04:54:07 -04:00
qazal
60b6dca5ba
update some tests instead of expect_rangeify_fails ( #12500 )
...
* update test_clone_doesnt_dedup to use base
* new_flat_buffer passes
* fix test_reorder_expand
* remove the view stuff
* remove that test, we don't want this view const behavior
* test_setitem_becomes_subbuffer is good
2025-10-08 07:42:31 +03:00
qazal
84597ed53c
early assert for device mistmatched asts in rangeify ( #12499 )
...
* early assert for device mistmatched asts in rangeify
* alt also passes
2025-10-08 07:19:36 +03:00
qazal
76e8a3250c
rangeify: late zero folding ( #12464 )
...
* rangeify: late zero folding
* early
* not kernels
* none
* multi
* linter
* mstack is sink comment
* more comment
2025-10-06 12:52:33 +03:00
qazal
1b1978b9c0
early copy fixup ( #12463 )
...
* simple failing test
* early copy fixup
2025-10-06 06:38:29 +03:00
chenyu
c1e85f699c
multi test case for sharded ring allreduce ( #12462 )
...
* multi test case for sharded ring allreduce
triggers `children not making progress` with RANGEIFY
* expect_rangeify_fails
2025-10-05 23:18:24 -04:00
George Hotz
46e8ea15c1
split pm_substitute_recurse ( #12460 )
2025-10-05 21:35:50 -04:00
qazal
6ad9a688ed
add failing test after "pend substitutes for speed" ( #12457 )
...
* add failing substitute test
* expect_rangeify_fails
2025-10-05 16:10:04 +03:00
qazal
13a25b2e67
rangeify: don't shape INDEX on kernelize ( #12417 )
2025-10-02 09:45:37 +03:00
qazal
6fc6b51b59
fix limit_bufs with kernelize ( #12415 )
2025-10-02 07:49:11 +03:00
George Hotz
89bed28716
split reduceop ( #12404 )
...
* some rangeify tests fixed
* bring split reduceop to rangeify
* fix tests
2025-10-01 18:45:16 +08:00
qazal
90b1c0dd96
rangeify: test_where_fold kernel count ( #12379 )
...
* rangeify: test_where_fold kernel count
* get these from the index
* replace ranges
* fine
* movement ops
* diff
* better
2025-10-01 09:35:12 +03:00
nimlgen
2c397eb2a2
rangeify: buf limit ( #12336 )
...
* limit bufs
* g
* fix buffer limit
* um?
* fix
* only these?
* typo
* f
* cleaner
2025-09-30 14:59:47 +03:00
qazal
4ff7f20b9d
rangeify: fix kernelize ( #12357 )
2025-09-30 10:10:08 +03:00
George Hotz
ab6b0d3a21
enable cleanup_dead_axes ( #12351 )
...
* enable cleanup_dead_axes
* don't mess with user contig
* correct tag behavior
* double reshape isn't correct
* block on assign too
* skip messing with symbolic
* Fix tests
* disable RANGEIFY=2
* test w rangeify
2025-09-30 14:09:39 +08:00
wozeparrot
2a0caa09c2
push copy to disk ( #12348 )
2025-09-29 21:55:05 -07:00
qazal
250cb10e8f
rangeify permuted assign ( #12299 )
...
* enable RANGEIFY=1 test_assign
* work
* rangeify=0 asserts this ast
* remove that
* beta test, it's correct though
* skip multi
* matches torch/np output
* memcopy without memcopy
* can remove this
* rangeify isn't silently wrong anymore
* diff cleanup
* use UOp toposort instead of global tags
* actual assert TestRangeifyAssign
* step
* work
* this isn't optimizing away now
* some todos
* test fusion schedule
* typo
* dedup idxs
* cleaner
* pre
* work
* diff
2025-09-29 07:27:57 +03:00
Sieds Lykles
ed90de6583
Revert "Bufferize early, fix "children not making progress" on big graphs (#1…" ( #12318 )
...
This reverts commit 6f1cf717de .
2025-09-28 19:10:21 +02:00
Sieds Lykles
6f1cf717de
Bufferize early, fix "children not making progress" on big graphs ( #12308 )
...
* bufferize children early
* cleaner
* fix types
* lower number of reduceops
* test openpilot
2025-09-27 04:17:15 +02:00