Commit Graph

10490 Commits

Author SHA1 Message Date
Sieds Lykles
6f1cf717de Bufferize early, fix "children not making progress" on big graphs (#12308)
* bufferize children early

* cleaner

* fix types

* lower number of reduceops

* test openpilot
2025-09-27 04:17:15 +02:00
qazal
0104b16b9b rangeify: fix empty tags in reshapes (#12307) 2025-09-26 16:32:48 +03:00
nimlgen
f5eb46a3d9 fix limit buf metal on non rangeify (#12303)
* add failure test for limit buf on non rangeify

* correct metal

* correct

* hm
2025-09-26 11:06:28 +03:00
qazal
8b2e0930d7 rangeify: enable passing multi test (#12301) 2025-09-26 08:31:13 +03:00
Sieds Lykles
74411984fc Rangeify IMAGE (#12304)
* add imagedtype to rangeify

* enable some image tests

* move the tests

* image upcast before locals

* add if statement

* rangeify image_dtype test

* decrease read_image count
2025-09-26 07:21:02 +02:00
wozeparrot
d2cd269e28 fix: try close mmap (#12306) 2025-09-25 20:54:27 -07:00
chenyu
17cec8d645 RANGEIFY winograd test (#12297)
speed seems fine
2025-09-24 23:42:32 -04:00
nimlgen
476a2a0a96 test_qcom: update (#12293) 2025-09-24 21:45:58 +03:00
qazal
38ecefaacb RANGEIFY=1 allreduce (#12260)
* ci

* extract mops

* work

* assert early

* port this?

* can realize shard

* allreduce passing

* notes

* better handling of shard

* err

* outerworld allreduce twice

* work

* don't tag movement ops

* don't tag movement ops

* delete old logic

* 19 failing + ram

* cleanup

* reset stuff

* simplest failing test

* diff

* test_ones

* allreduce work

* allreduce more work

* down to 22 failing tests

* port _device_num

* replace creates a new UOp here

* pour symbolic everywhere

* 7 failing

* focus on allreduce

* work

* cleanup

* more ci

* fix test_schedule_ring

* post index const shape

* much better

* diff cleanup
2025-09-24 18:13:08 +03:00
qazal
0e778296be rangeify: refactor const folding (#12291)
* rangeify: refactor const folding [pr]

* it got better
2025-09-24 17:58:39 +03:00
qazal
6c9d8c7e41 rangeify: simplify noop copy (#12289) 2025-09-24 17:01:23 +03:00
qazal
1400ce105f rangeify: fix sharding (#12288) 2025-09-24 14:33:56 +03:00
qazal
154c865966 rangeify: fix ram usage in multi (#12286) 2025-09-24 13:48:58 +03:00
Sieds Lykles
e8945c74de fix infinite symbolic loop with VCONST (#12285) 2025-09-24 07:06:22 +02:00
Sieds Lykles
45c7252aed Better div nesting 2 (#11812)
* remove check

* use fold_divmod_congruence instead of simplify

* adjust tests

* shorten line

* new algo

* add test

* cleanup

* update tests

* ALLOWED_GATED_READ_IMAGE from 16 -> 12

* only remove the call to simplify

* add option to simplify with factor_remainder

* Allowed readimage gates back to 16
2025-09-24 04:50:26 +02:00
Sieds Lykles
6146c64d81 lower the invalid gate last (#12164)
* lowering invalid gate is part of lower_index_dtype

* update test

* remove import

* put that back

* reduce_collapse uses invalid

* fix that pattern to use invalid_pat

* valid creates the right dtype count

* seperate rule for lowering invalid gate

* dont unvectorize Invalid gate

* image_fixup uses Invalid

* update tests

* cleanup

* update split_load_store

* add .scalar() there
2025-09-24 04:27:35 +02:00
qazal
ad7c8c21ea rangeify: INDEX doesn't passthrough MSELECT (#12279) 2025-09-23 21:36:50 +03:00
nimlgen
02a7b7fe48 rangeify: fix test_setitem (#12269)
* rangeify: fix test_setitem

* um?

* better?

* simple where folding

* f

* revert

* x
2025-09-23 20:42:36 +03:00
qazal
2f145a98e0 rangeify: fix contiguous multi (#12278)
* rangeify: fix contiguous multi

* when it's changing root, it should construct a new UOp
2025-09-23 20:05:29 +03:00
nimlgen
5f4eeb054c rangeify: passes now (#12277) 2025-09-23 18:46:49 +03:00
qazal
680ce54dd4 add types to replace_dnum (#12276) 2025-09-23 14:43:04 +03:00
chenyu
fffce0a6b4 use more no_range in simplify [pr] (#12275) 2025-09-23 02:33:56 -04:00
chenyu
51b88b2265 process replay tests in rangeify (#12274) 2025-09-23 01:30:06 -04:00
chenyu
b54cb272d0 move test_qcom to test/device (#12272) 2025-09-22 21:07:10 -04:00
Sieds Lykles
d21e34e617 enable test_sum_twice (#12270)
* remove skip

* remove import
2025-09-23 00:57:29 +02:00
Sieds Lykles
5a4b244e6b Check for group inside another reduce (#12268)
* add check

* get the ranges correctly

* add test

* comment and better check
2025-09-23 00:32:41 +02:00
qazal
a6fd96f620 rangeify: don't tag movement ops (#12267)
* don't tag movement ops

* delete old logic
2025-09-22 16:40:17 +03:00
chenyu
b03ceb806e move test_sample to test_randomness (#12266) 2025-09-21 21:11:32 -04:00
qazal
25e0b725d1 cleanup section 0 rangeify (#12264) 2025-09-22 00:30:44 +03:00
qazal
1aba668a37 cleanup buffer_view matcher (#12263) 2025-09-21 23:45:48 +03:00
nimlgen
b53a266254 rangeify: fix test_optim (#12262)
* rangeify: fix test_optim

* add to cl?

* these are good now
2025-09-21 18:08:35 +03:00
qazal
461e9becec srender UOp in movement op arg (#12261) 2025-09-21 13:55:45 +03:00
Sieds Lykles
9569fdfa36 use str for AxisType and AddrSpace __repr__ (#12252) 2025-09-21 05:24:41 +02:00
qazal
8365c28cd5 viz: put a limit of brightness scale (#12259) 2025-09-20 18:52:55 +03:00
nimlgen
4762a24022 test_free_intermediates force buffers (#12255)
* test_free_intermediates force buffers

* f

* fix for rangiefy

* xx
2025-09-20 18:14:39 +03:00
qazal
57c7e0a8f8 RANGEIFY=1 test_jit (#12254)
* RANGEIFY=1 test_jit

* don't do any of that

* disk

* simple disk tensor

* more work

* run more tests

* it also doesn't copy everytime

* skip tests that hang everything
2025-09-20 17:34:32 +03:00
chenyu
393c6b236c test case to sum twice in different order (#12253)
* test case to sum twice in different order

fixed by #12251

* try metal
2025-09-20 10:11:57 -04:00
qazal
4756971c88 skip test_bf16_disk_write_read on CL=1 (#12256) 2025-09-20 17:11:06 +03:00
chenyu
5e794be8af tighter spec for RANGE (#12250) 2025-09-20 07:59:50 -04:00
Sieds Lykles
73c8dae60d add missing remove_blockend case (#12251)
* add missing remove_blockend case

* remove expectedFailure

* better comment
2025-09-20 06:29:19 +02:00
wozeparrot
dc4dd898b7 fix: close mmap (#12249) 2025-09-19 14:09:12 -07:00
Sieds Lykles
bb1f376ae6 profile z3 (#12248) 2025-09-19 22:52:06 +02:00
Sieds Lykles
7e06d3ebba enable test_symbolic_jit (#12245)
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-09-19 20:23:42 +02:00
qazal
bb59eed82f rangeify: don't tag consts, they are global (#12247)
* rangeify: don't tag consts, they are global

* don't map movement ops

* sym failing test

* remove that

* update comment

* simpler test

* work
2025-09-19 15:25:03 +03:00
Sieds Lykles
cc038b31b6 Shrink instead of reshape to unregister symbolic (#12241)
* Slice to unbind symbolic

* use vmax for now

* assert shape in reshape is valid

* update test_symbolic_ops to use shrink instead of reshape

* remove infer_with_bound_values for npw

* symbolic output doesnt have symbolic strides

* symbolic jit tests use shrink to unregister symbolic

* update test

* update more tests

* wrap vmax in int()

* only create a new st if the store is not an assigne

* unwrap st

* comments
2025-09-19 06:04:35 +02:00
chenyu
a531a649fb test_resize_upsample_scales_cubic_align_corners_cpu is fixed (#12244) 2025-09-18 20:55:26 -04:00
Sieds Lykles
8d703a6369 z3 xor doesnt use bitcast (#12243) 2025-09-19 00:31:44 +02:00
chenyu
0dad6cc518 good RANGEIFY kernel counts in external_test_opt (#12242)
no push permute stuff. the model ones are less clear if it's good, some got slower
2025-09-18 17:58:54 -04:00
chenyu
cff1065f5e test CL=1 RANGEIFY=1 onnx (#12240)
all except test_resize_upsample_scales_cubic_align_corners_cpu runs
2025-09-18 16:49:46 -04:00
Sieds Lykles
ef05178855 fix 0//0 infinite rewrite in rangeify onnx (#12239) 2025-09-18 21:59:50 +02:00