442 Commits

Author SHA1 Message Date
George Hotz
744af193f0 remove ScheduleItem and merge it with ExecItem (#13759)
* remove ExecItem and merge it with ScheduleItem

* less diff

* fix issues

* min diff

* don't change bufs in _lower

* min diff

* update

* revert

* fixes

* diff
2025-12-19 17:04:24 -04:00
George Hotz
3dbde178c1 mark slow tests as slow instead of as CI (#13736)
* mark slow tests as slow instead of as CI

* CI shouldn't have different behavior

* more skips / CI

* slow
2025-12-17 10:29:57 -04:00
chenyu
89f9e1dcd5 add SGD to beautiful_mnist (#13571) 2025-12-04 12:17:29 -05:00
George Hotz
a4c4e48385 add LUNIQUE op (#13554) 2025-12-03 14:34:34 -08:00
George Hotz
e4cd649ff0 remove kernelize to prepare for refactors (#13463)
* remove kernelize to prepare for refactors

* less kernelize

* last test
2025-11-26 14:18:50 -08:00
George Hotz
ffb9e8396f fix indexing bug with convs
* minimal difference for ONE_POOL=1

* fix indexing bug

* improve indexing debugger

* more debugger improvements

* always for reshape
2025-11-07 16:45:19 -08:00
George Hotz
962d980919 fuse hasn't worked since rangeify, remove it (#13057) 2025-11-02 14:01:52 +08:00
Sieds Lykles
885b6dea9e multiple reduce range arange folding (#13047)
* multi reduce arange folding

* add test

* cvar to var

* add circular_pad_bw test
2025-11-01 22:11:26 +01:00
Sieds Lykles
ecb8565f67 Revert "Better cleanup of arange bufferize (#13046)" (#13048)
This reverts commit c99b7dfd4a.
2025-11-01 18:09:37 +01:00
Sieds Lykles
c99b7dfd4a Better cleanup of arange bufferize (#13046)
* check for reduce and index instead of cast

* add test
2025-11-01 16:16:31 +01:00
George Hotz
b791d70725 support custom UOp kernels (#13028)
* support custom UOp kernels

* no number

* multioutput works

* backward kernel runs

* move kernel class

* grad later

* work

* no tags in kernel graph

* test arange

* arange + contig

* delete comment
2025-10-31 15:51:39 +08:00
George Hotz
2da02f1ae1 add loads at the end (#12988)
* add loads at the end

* simpler

* late load

* tests passing

* fix matvec

* spec test passes

* fix where on load

* fix abs2

* fix more tests
2025-10-30 10:42:19 +08:00
George Hotz
b147e7e8e6 flatten bufferize (#12984)
* flatten bufferize

* simpler

* tests pass

* flat

* not flat
2025-10-29 11:23:43 +08:00
George Hotz
b0da173f2f add unique to const, fix longstanding bug (#12965)
* add unique to const, fix longstanding bug

* _force_unique=True

* fix tests

* fix more tests
2025-10-28 15:11:37 +08:00
George Hotz
804133cffd rename RECIP to RECIPROCAL (#12939) 2025-10-27 16:53:13 +08:00
Sieds Lykles
7f798a9630 Cleanup const buffers (#12829)
* split pm_cleanups

* update test_schedule

* shrink when we remove bufferize

* dont do shrink if shape is empty

* update tests

* remove *1 from metadata

* deal with the noop bufferize

* only noop on cvar

* cleanup

* fix if

* rename
2025-10-21 14:53:49 +02:00
chenyu
fcdf4ab37e remove a contiguous in LARS (#12770) 2025-10-17 17:07:30 -04:00
George Hotz
062a6d68d7 test flash attention backward (#12762)
* test flash attention backward

* TODO: fix pcontig

* end ranges

* render colors

* very big

* multiout at every level

* reset ending ranges

* fix tests

* ugh
2025-10-17 23:15:59 +08:00
chenyu
9561803cb0 fix assert in test_schedule (#12745)
* fix assert in test_schedule

updated kernel counts and some old tests

* fix
2025-10-16 15:39:50 -04:00
chenyu
285534ce64 delete DONT_REALIZE_EXPAND and DONT_GROUP_REDUCES (#12744)
does nothing now
2025-10-16 14:11:33 -04:00
George Hotz
592e86f6f5 remove UOp.st (#12716)
* remove UOp.st

* fix tests

* torch backend disable
2025-10-16 14:44:09 +08:00
George Hotz
612e3d6143 replace mop arg with vectorized index (#12695)
* replace mop arg with vectorized index

* tests passing

* better viz

* no compile4
2025-10-15 20:50:06 +08:00
George Hotz
cab034b863 improve typing (#12611)
* improve typing and bump to 3.11

* no need for Self yet

* improve typing

* binop also
2025-10-11 16:20:23 +08:00
chenyu
f2c3a72b0c remove RANGEIFY flag [pr] (#12577) 2025-10-09 21:52:54 -04:00
qazal
b86ad6053a test_schedule independent of RANGEIFY flag (#12568)
* test_schedule independent of RANGEIFY flag

* comment for expectedFailure + test_cast_padded_view

* test_cast_padded_const works

* don't use full_shape it's fine

* add todos for the rest
2025-10-09 20:00:50 +03:00
chenyu
ae51bdd06a remove trivial use of RANGEIFY flag (#12550)
some tests need update still
2025-10-09 02:29:38 -04:00
qazal
bb5671a837 some more ops.py cleanups (#12525)
* remove GroupOp.Meta and st_arg

* inline axis_arg

* only allow .buffer on reshapes (or the buffer)

* gate is the other way

* still want can_pad?

* use op_in_backward_slice_with_self

* .buffer is recursive

* lint

* pathlib there
2025-10-09 06:06:44 +03:00
chenyu
c4732a18bd update tests that depend on SPLIT_REDUCEOP (#12534) 2025-10-08 21:53:30 -04:00
chenyu
28edea5d67 delete FUSE_CONV_BW (#12527) 2025-10-08 10:41:38 -04:00
qazal
b6835f4134 remove Ops.VIEW and related UOp methods (#12522)
* remove Ops.VIEW and related UOp methods

* update abstractions2.py

* no ShapeTrackers in abstractions2.py

* it's a size 1
2025-10-08 14:47:02 +03:00
George Hotz
3b0b3a2e64 fast RANGEIFY (#12504)
* rtoposort is fast, can replace rangeify with this

* fast rangeify

* work

* fast rangeify works for mnist

* should work

* progress

* pad fix

* FAST

* tests passing

* don't delete those shape ops

* put in rangeify map

* ending ranges fix

* tests

* mstack/mselect no hacks

* move to indexing.py

* touch up tests + add comments

* disable failing test

* actually make the file readable

* failing

* error
2025-10-08 19:38:06 +08:00
qazal
6f26603f06 delete swizzler.py (#12518)
* delete swizzler

* remove merge_views tests

* don't need rewrites_for_views

* apply_rewrites
2025-10-08 13:02:34 +03:00
qazal
7e0b14243e delete grouper and kernelize (#12517)
* delete grouper and kernelize

* +sys.setrecursionlimit
2025-10-08 12:27:26 +03:00
chenyu
e701106a64 remove FUSE_ARANGE (#12511)
it was the default already
2025-10-08 04:54:07 -04:00
qazal
60b6dca5ba update some tests instead of expect_rangeify_fails (#12500)
* update test_clone_doesnt_dedup to use base

* new_flat_buffer passes

* fix test_reorder_expand

* remove the view stuff

* remove that test, we don't want this view const behavior

* test_setitem_becomes_subbuffer is good
2025-10-08 07:42:31 +03:00
qazal
84597ed53c early assert for device mistmatched asts in rangeify (#12499)
* early assert for device mistmatched asts in rangeify

* alt also passes
2025-10-08 07:19:36 +03:00
qazal
76e8a3250c rangeify: late zero folding (#12464)
* rangeify: late zero folding

* early

* not kernels

* none

* multi

* linter

* mstack is sink comment

* more comment
2025-10-06 12:52:33 +03:00
qazal
1b1978b9c0 early copy fixup (#12463)
* simple failing test

* early copy fixup
2025-10-06 06:38:29 +03:00
chenyu
c1e85f699c multi test case for sharded ring allreduce (#12462)
* multi test case for sharded ring allreduce

triggers `children not making progress` with RANGEIFY

* expect_rangeify_fails
2025-10-05 23:18:24 -04:00
George Hotz
46e8ea15c1 split pm_substitute_recurse (#12460) 2025-10-05 21:35:50 -04:00
qazal
6ad9a688ed add failing test after "pend substitutes for speed" (#12457)
* add failing substitute test

* expect_rangeify_fails
2025-10-05 16:10:04 +03:00
qazal
13a25b2e67 rangeify: don't shape INDEX on kernelize (#12417) 2025-10-02 09:45:37 +03:00
qazal
6fc6b51b59 fix limit_bufs with kernelize (#12415) 2025-10-02 07:49:11 +03:00
George Hotz
89bed28716 split reduceop (#12404)
* some rangeify tests fixed

* bring split reduceop to rangeify

* fix tests
2025-10-01 18:45:16 +08:00
qazal
90b1c0dd96 rangeify: test_where_fold kernel count (#12379)
* rangeify: test_where_fold kernel count

* get these from the index

* replace ranges

* fine

* movement ops

* diff

* better
2025-10-01 09:35:12 +03:00
nimlgen
2c397eb2a2 rangeify: buf limit (#12336)
* limit bufs

* g

* fix buffer limit

* um?

* fix

* only these?

* typo

* f

* cleaner
2025-09-30 14:59:47 +03:00
qazal
4ff7f20b9d rangeify: fix kernelize (#12357) 2025-09-30 10:10:08 +03:00
George Hotz
ab6b0d3a21 enable cleanup_dead_axes (#12351)
* enable cleanup_dead_axes

* don't mess with user contig

* correct tag behavior

* double reshape isn't correct

* block on assign too

* skip messing with symbolic

* Fix tests

* disable RANGEIFY=2

* test w rangeify
2025-09-30 14:09:39 +08:00
wozeparrot
2a0caa09c2 push copy to disk (#12348) 2025-09-29 21:55:05 -07:00
qazal
250cb10e8f rangeify permuted assign (#12299)
* enable RANGEIFY=1 test_assign

* work

* rangeify=0 asserts this ast

* remove that

* beta test, it's correct though

* skip multi

* matches torch/np output

* memcopy without memcopy

* can remove this

* rangeify isn't silently wrong anymore

* diff cleanup

* use UOp toposort instead of global tags

* actual assert TestRangeifyAssign

* step

* work

* this isn't optimizing away now

* some todos

* test fusion schedule

* typo

* dedup idxs

* cleaner

* pre

* work

* diff
2025-09-29 07:27:57 +03:00