Commit Graph

10417 Commits

Author SHA1 Message Date
qazal
05275c9ec3 rangeify: enable assign to mstack target (#12345) 2025-09-30 06:27:57 +03:00
chenyu
8e508a9927 reduce const folding (#12344) 2025-09-29 23:08:56 -04:00
chenyu
3a480b858f use more getitem in gpt2 (#12343) 2025-09-29 23:08:03 -04:00
qazal
32d69d07d7 rangeify: enable multitensor TestBatchNorm (#12342) 2025-09-30 06:05:00 +03:00
Sieds Lykles
d55d829635 Lower index dtype spec fix (#12337)
* new pm_lower_index_dtype

* load_store_indexing after index lowering

* shorten line

* seperate rule for long removal

* fix test

* fix index_to_concrete_int

* minor fixes

* add sink there

* update types in linearizer test
2025-09-30 04:26:50 +02:00
Sieds Lykles
c38f6ce140 unified_rewrite: use deque and dont add nodes to the stack multiple times (#12320)
* use deque instead of list

* increase ctx.progress and max stack_len

* add openpilot

* prevent placing uops on stack many times

* revert increasing ctx.progress and stack length limit

* dont block adding to the stack there

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-09-30 10:02:28 +08:00
hooved
c2689c505e Clip model updates for Stable Diffusion mlperf training (#12313)
* stable diffusion mlperf clip changes

* add clip tests

* set gelu as attribute

* add more tests

* factor out GPUS

* rerun CI

* add imports to if blocks

* remove unneeded axis

* add clip tests to CI

* move clip tests

* add deps, disable max buf size
2025-09-29 21:50:14 -04:00
George Hotz
cdfa0f29fd add rendering to index (#12338) 2025-09-30 09:18:05 +08:00
George Hotz
baf3b60cfb fix gpt2 on rangeify (#12335) 2025-09-29 19:16:44 +08:00
qazal
9513f025c5 apply multi before rangeify (#12298)
* it doesn't realize it when i reshape

* cleaner graph

* map out

* REDUCE_AXIS also gives the wrong answer

* maybe

* work

* back here

* try

* more

* refactor tests

* check MultiBuffer

* or copy

* fine with this

* don't need graph_rewrite_map in rangeify
2025-09-29 14:16:31 +03:00
George Hotz
b899392f30 fix llm app with rangeify (#12334)
* fix llm app with rangeify

* add gpt2 contiguous also
2025-09-29 18:42:44 +08:00
wozeparrot
7ae6898e31 better late bufferview (#12333) 2025-09-29 03:08:34 -07:00
George Hotz
3291e00df7 fix efficientnet slowness on rangeify (#12332) 2025-09-29 18:01:01 +08:00
chenyu
9d2f2b8e34 skip test_mean_half_precision_overflow (#12331)
it only works with SPLIT_REDUCEOP=1
2025-09-29 05:15:04 -04:00
qazal
9915bcf2b4 remove no-op contiguous from rand (#12329) 2025-09-29 11:53:16 +03:00
chenyu
76c87d81b3 delete test_backward_sum_acc_dtype (#12330)
this test tests the wrong thing, it was only working because expand realize rule
2025-09-29 04:46:17 -04:00
George Hotz
fd2e4f2353 failing rng test (#12328)
* tighten spec: fixup devectorizer types / rangeify

* tighten assign

* failing rangeify test

* simpler

* otherwise contig

* more tolerance cause rng seed changed
2025-09-29 16:06:45 +08:00
George Hotz
29469577e8 tighten spec: fixup devectorizer types / rangeify (#12327)
* tighten spec: fixup devectorizer types / rangeify

* tighten assign
2025-09-29 15:41:11 +08:00
wozeparrot
a982480512 feat: late to_bufferview (#12271) 2025-09-29 00:29:43 -07:00
qazal
e01a3eb59a rangeify whitespace cleanups [pr] (#12326)
* rangeify whitespace cleanups

* this is a noop
2025-09-29 10:04:51 +03:00
George Hotz
cf925d1ac5 remove metadata for rangeify codegen (#12325) 2025-09-29 14:29:28 +08:00
George Hotz
b252f890da add support for SPEC=1 (#12322)
* add support for SPEC=1

* cleaner place for it

* non rangeify spec

* split non rangeify
2025-09-29 12:55:01 +08:00
qazal
292cb6ae26 viz: 404 if the requested rewrite doesn't exist (#12323) 2025-09-29 07:51:10 +03:00
qazal
250cb10e8f rangeify permuted assign (#12299)
* enable RANGEIFY=1 test_assign

* work

* rangeify=0 asserts this ast

* remove that

* beta test, it's correct though

* skip multi

* matches torch/np output

* memcopy without memcopy

* can remove this

* rangeify isn't silently wrong anymore

* diff cleanup

* use UOp toposort instead of global tags

* actual assert TestRangeifyAssign

* step

* work

* this isn't optimizing away now

* some todos

* test fusion schedule

* typo

* dedup idxs

* cleaner

* pre

* work

* diff
2025-09-29 07:27:57 +03:00
Sieds Lykles
ed90de6583 Revert "Bufferize early, fix "children not making progress" on big graphs (#1…" (#12318)
This reverts commit 6f1cf717de.
2025-09-28 19:10:21 +02:00
Sieds Lykles
29f0886395 skip test_softmax_fusion tests if RANGEIFY==1 (#12310) 2025-09-27 05:57:40 +02:00
Sieds Lykles
b98f1881ef dsp opt test has different axis number on rangeify (#12309) 2025-09-27 05:06:11 +02:00
Sieds Lykles
6f1cf717de Bufferize early, fix "children not making progress" on big graphs (#12308)
* bufferize children early

* cleaner

* fix types

* lower number of reduceops

* test openpilot
2025-09-27 04:17:15 +02:00
qazal
0104b16b9b rangeify: fix empty tags in reshapes (#12307) 2025-09-26 16:32:48 +03:00
nimlgen
f5eb46a3d9 fix limit buf metal on non rangeify (#12303)
* add failure test for limit buf on non rangeify

* correct metal

* correct

* hm
2025-09-26 11:06:28 +03:00
qazal
8b2e0930d7 rangeify: enable passing multi test (#12301) 2025-09-26 08:31:13 +03:00
Sieds Lykles
74411984fc Rangeify IMAGE (#12304)
* add imagedtype to rangeify

* enable some image tests

* move the tests

* image upcast before locals

* add if statement

* rangeify image_dtype test

* decrease read_image count
2025-09-26 07:21:02 +02:00
wozeparrot
d2cd269e28 fix: try close mmap (#12306) 2025-09-25 20:54:27 -07:00
chenyu
17cec8d645 RANGEIFY winograd test (#12297)
speed seems fine
2025-09-24 23:42:32 -04:00
nimlgen
476a2a0a96 test_qcom: update (#12293) 2025-09-24 21:45:58 +03:00
qazal
38ecefaacb RANGEIFY=1 allreduce (#12260)
* ci

* extract mops

* work

* assert early

* port this?

* can realize shard

* allreduce passing

* notes

* better handling of shard

* err

* outerworld allreduce twice

* work

* don't tag movement ops

* don't tag movement ops

* delete old logic

* 19 failing + ram

* cleanup

* reset stuff

* simplest failing test

* diff

* test_ones

* allreduce work

* allreduce more work

* down to 22 failing tests

* port _device_num

* replace creates a new UOp here

* pour symbolic everywhere

* 7 failing

* focus on allreduce

* work

* cleanup

* more ci

* fix test_schedule_ring

* post index const shape

* much better

* diff cleanup
2025-09-24 18:13:08 +03:00
qazal
0e778296be rangeify: refactor const folding (#12291)
* rangeify: refactor const folding [pr]

* it got better
2025-09-24 17:58:39 +03:00
qazal
6c9d8c7e41 rangeify: simplify noop copy (#12289) 2025-09-24 17:01:23 +03:00
qazal
1400ce105f rangeify: fix sharding (#12288) 2025-09-24 14:33:56 +03:00
qazal
154c865966 rangeify: fix ram usage in multi (#12286) 2025-09-24 13:48:58 +03:00
Sieds Lykles
e8945c74de fix infinite symbolic loop with VCONST (#12285) 2025-09-24 07:06:22 +02:00
Sieds Lykles
45c7252aed Better div nesting 2 (#11812)
* remove check

* use fold_divmod_congruence instead of simplify

* adjust tests

* shorten line

* new algo

* add test

* cleanup

* update tests

* ALLOWED_GATED_READ_IMAGE from 16 -> 12

* only remove the call to simplify

* add option to simplify with factor_remainder

* Allowed readimage gates back to 16
2025-09-24 04:50:26 +02:00
Sieds Lykles
6146c64d81 lower the invalid gate last (#12164)
* lowering invalid gate is part of lower_index_dtype

* update test

* remove import

* put that back

* reduce_collapse uses invalid

* fix that pattern to use invalid_pat

* valid creates the right dtype count

* seperate rule for lowering invalid gate

* dont unvectorize Invalid gate

* image_fixup uses Invalid

* update tests

* cleanup

* update split_load_store

* add .scalar() there
2025-09-24 04:27:35 +02:00
qazal
ad7c8c21ea rangeify: INDEX doesn't passthrough MSELECT (#12279) 2025-09-23 21:36:50 +03:00
nimlgen
02a7b7fe48 rangeify: fix test_setitem (#12269)
* rangeify: fix test_setitem

* um?

* better?

* simple where folding

* f

* revert

* x
2025-09-23 20:42:36 +03:00
qazal
2f145a98e0 rangeify: fix contiguous multi (#12278)
* rangeify: fix contiguous multi

* when it's changing root, it should construct a new UOp
2025-09-23 20:05:29 +03:00
nimlgen
5f4eeb054c rangeify: passes now (#12277) 2025-09-23 18:46:49 +03:00
qazal
680ce54dd4 add types to replace_dnum (#12276) 2025-09-23 14:43:04 +03:00
chenyu
fffce0a6b4 use more no_range in simplify [pr] (#12275) 2025-09-23 02:33:56 -04:00
chenyu
51b88b2265 process replay tests in rangeify (#12274) 2025-09-23 01:30:06 -04:00