Sieds Lykles
6f1cf717de
Bufferize early, fix "children not making progress" on big graphs ( #12308 )
...
* bufferize children early
* cleaner
* fix types
* lower number of reduceops
* test openpilot
2025-09-27 04:17:15 +02:00
qazal
0104b16b9b
rangeify: fix empty tags in reshapes ( #12307 )
2025-09-26 16:32:48 +03:00
nimlgen
f5eb46a3d9
fix limit buf metal on non rangeify ( #12303 )
...
* add failure test for limit buf on non rangeify
* correct metal
* correct
* hm
2025-09-26 11:06:28 +03:00
qazal
8b2e0930d7
rangeify: enable passing multi test ( #12301 )
2025-09-26 08:31:13 +03:00
Sieds Lykles
74411984fc
Rangeify IMAGE ( #12304 )
...
* add imagedtype to rangeify
* enable some image tests
* move the tests
* image upcast before locals
* add if statement
* rangeify image_dtype test
* decrease read_image count
2025-09-26 07:21:02 +02:00
wozeparrot
d2cd269e28
fix: try close mmap ( #12306 )
2025-09-25 20:54:27 -07:00
chenyu
17cec8d645
RANGEIFY winograd test ( #12297 )
...
speed seems fine
2025-09-24 23:42:32 -04:00
nimlgen
476a2a0a96
test_qcom: update ( #12293 )
2025-09-24 21:45:58 +03:00
qazal
38ecefaacb
RANGEIFY=1 allreduce ( #12260 )
...
* ci
* extract mops
* work
* assert early
* port this?
* can realize shard
* allreduce passing
* notes
* better handling of shard
* err
* outerworld allreduce twice
* work
* don't tag movement ops
* don't tag movement ops
* delete old logic
* 19 failing + ram
* cleanup
* reset stuff
* simplest failing test
* diff
* test_ones
* allreduce work
* allreduce more work
* down to 22 failing tests
* port _device_num
* replace creates a new UOp here
* pour symbolic everywhere
* 7 failing
* focus on allreduce
* work
* cleanup
* more ci
* fix test_schedule_ring
* post index const shape
* much better
* diff cleanup
2025-09-24 18:13:08 +03:00
qazal
0e778296be
rangeify: refactor const folding ( #12291 )
...
* rangeify: refactor const folding [pr]
* it got better
2025-09-24 17:58:39 +03:00
qazal
6c9d8c7e41
rangeify: simplify noop copy ( #12289 )
2025-09-24 17:01:23 +03:00
qazal
1400ce105f
rangeify: fix sharding ( #12288 )
2025-09-24 14:33:56 +03:00
qazal
154c865966
rangeify: fix ram usage in multi ( #12286 )
2025-09-24 13:48:58 +03:00
Sieds Lykles
e8945c74de
fix infinite symbolic loop with VCONST ( #12285 )
2025-09-24 07:06:22 +02:00
Sieds Lykles
45c7252aed
Better div nesting 2 ( #11812 )
...
* remove check
* use fold_divmod_congruence instead of simplify
* adjust tests
* shorten line
* new algo
* add test
* cleanup
* update tests
* ALLOWED_GATED_READ_IMAGE from 16 -> 12
* only remove the call to simplify
* add option to simplify with factor_remainder
* Allowed readimage gates back to 16
2025-09-24 04:50:26 +02:00
Sieds Lykles
6146c64d81
lower the invalid gate last ( #12164 )
...
* lowering invalid gate is part of lower_index_dtype
* update test
* remove import
* put that back
* reduce_collapse uses invalid
* fix that pattern to use invalid_pat
* valid creates the right dtype count
* seperate rule for lowering invalid gate
* dont unvectorize Invalid gate
* image_fixup uses Invalid
* update tests
* cleanup
* update split_load_store
* add .scalar() there
2025-09-24 04:27:35 +02:00
qazal
ad7c8c21ea
rangeify: INDEX doesn't passthrough MSELECT ( #12279 )
2025-09-23 21:36:50 +03:00
nimlgen
02a7b7fe48
rangeify: fix test_setitem ( #12269 )
...
* rangeify: fix test_setitem
* um?
* better?
* simple where folding
* f
* revert
* x
2025-09-23 20:42:36 +03:00
qazal
2f145a98e0
rangeify: fix contiguous multi ( #12278 )
...
* rangeify: fix contiguous multi
* when it's changing root, it should construct a new UOp
2025-09-23 20:05:29 +03:00
nimlgen
5f4eeb054c
rangeify: passes now ( #12277 )
2025-09-23 18:46:49 +03:00
qazal
680ce54dd4
add types to replace_dnum ( #12276 )
2025-09-23 14:43:04 +03:00
chenyu
fffce0a6b4
use more no_range in simplify [pr] ( #12275 )
2025-09-23 02:33:56 -04:00
chenyu
51b88b2265
process replay tests in rangeify ( #12274 )
2025-09-23 01:30:06 -04:00
chenyu
b54cb272d0
move test_qcom to test/device ( #12272 )
2025-09-22 21:07:10 -04:00
Sieds Lykles
d21e34e617
enable test_sum_twice ( #12270 )
...
* remove skip
* remove import
2025-09-23 00:57:29 +02:00
Sieds Lykles
5a4b244e6b
Check for group inside another reduce ( #12268 )
...
* add check
* get the ranges correctly
* add test
* comment and better check
2025-09-23 00:32:41 +02:00
qazal
a6fd96f620
rangeify: don't tag movement ops ( #12267 )
...
* don't tag movement ops
* delete old logic
2025-09-22 16:40:17 +03:00
chenyu
b03ceb806e
move test_sample to test_randomness ( #12266 )
2025-09-21 21:11:32 -04:00
qazal
25e0b725d1
cleanup section 0 rangeify ( #12264 )
2025-09-22 00:30:44 +03:00
qazal
1aba668a37
cleanup buffer_view matcher ( #12263 )
2025-09-21 23:45:48 +03:00
nimlgen
b53a266254
rangeify: fix test_optim ( #12262 )
...
* rangeify: fix test_optim
* add to cl?
* these are good now
2025-09-21 18:08:35 +03:00
qazal
461e9becec
srender UOp in movement op arg ( #12261 )
2025-09-21 13:55:45 +03:00
Sieds Lykles
9569fdfa36
use str for AxisType and AddrSpace __repr__ ( #12252 )
2025-09-21 05:24:41 +02:00
qazal
8365c28cd5
viz: put a limit of brightness scale ( #12259 )
2025-09-20 18:52:55 +03:00
nimlgen
4762a24022
test_free_intermediates force buffers ( #12255 )
...
* test_free_intermediates force buffers
* f
* fix for rangiefy
* xx
2025-09-20 18:14:39 +03:00
qazal
57c7e0a8f8
RANGEIFY=1 test_jit ( #12254 )
...
* RANGEIFY=1 test_jit
* don't do any of that
* disk
* simple disk tensor
* more work
* run more tests
* it also doesn't copy everytime
* skip tests that hang everything
2025-09-20 17:34:32 +03:00
chenyu
393c6b236c
test case to sum twice in different order ( #12253 )
...
* test case to sum twice in different order
fixed by #12251
* try metal
2025-09-20 10:11:57 -04:00
qazal
4756971c88
skip test_bf16_disk_write_read on CL=1 ( #12256 )
2025-09-20 17:11:06 +03:00
chenyu
5e794be8af
tighter spec for RANGE ( #12250 )
2025-09-20 07:59:50 -04:00
Sieds Lykles
73c8dae60d
add missing remove_blockend case ( #12251 )
...
* add missing remove_blockend case
* remove expectedFailure
* better comment
2025-09-20 06:29:19 +02:00
wozeparrot
dc4dd898b7
fix: close mmap ( #12249 )
2025-09-19 14:09:12 -07:00
Sieds Lykles
bb1f376ae6
profile z3 ( #12248 )
2025-09-19 22:52:06 +02:00
Sieds Lykles
7e06d3ebba
enable test_symbolic_jit ( #12245 )
...
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2025-09-19 20:23:42 +02:00
qazal
bb59eed82f
rangeify: don't tag consts, they are global ( #12247 )
...
* rangeify: don't tag consts, they are global
* don't map movement ops
* sym failing test
* remove that
* update comment
* simpler test
* work
2025-09-19 15:25:03 +03:00
Sieds Lykles
cc038b31b6
Shrink instead of reshape to unregister symbolic ( #12241 )
...
* Slice to unbind symbolic
* use vmax for now
* assert shape in reshape is valid
* update test_symbolic_ops to use shrink instead of reshape
* remove infer_with_bound_values for npw
* symbolic output doesnt have symbolic strides
* symbolic jit tests use shrink to unregister symbolic
* update test
* update more tests
* wrap vmax in int()
* only create a new st if the store is not an assigne
* unwrap st
* comments
2025-09-19 06:04:35 +02:00
chenyu
a531a649fb
test_resize_upsample_scales_cubic_align_corners_cpu is fixed ( #12244 )
2025-09-18 20:55:26 -04:00
Sieds Lykles
8d703a6369
z3 xor doesnt use bitcast ( #12243 )
2025-09-19 00:31:44 +02:00
chenyu
0dad6cc518
good RANGEIFY kernel counts in external_test_opt ( #12242 )
...
no push permute stuff. the model ones are less clear if it's good, some got slower
2025-09-18 17:58:54 -04:00
chenyu
cff1065f5e
test CL=1 RANGEIFY=1 onnx ( #12240 )
...
all except test_resize_upsample_scales_cubic_align_corners_cpu runs
2025-09-18 16:49:46 -04:00
Sieds Lykles
ef05178855
fix 0//0 infinite rewrite in rangeify onnx ( #12239 )
2025-09-18 21:59:50 +02:00