George Hotz
e7c7fdb47b
now that needs rangeify 2 also
2025-09-30 19:39:23 +08:00
George Hotz
17a1777823
Merge branch 'master' into fix_rangeify_tests
2025-09-30 19:29:50 +08:00
George Hotz
49dc879e8d
fix some rangeify tests
2025-09-30 19:23:42 +08:00
qazal
a95159d579
remove TestShapeSpec, it relies on ShapeTracker [pr] ( #12369 )
2025-09-30 14:20:35 +03:00
George Hotz
7eee206177
fix uop gc
2025-09-30 19:08:35 +08:00
George Hotz
d8bb679a3a
Merge branch 'master' into fix_rng_merge
2025-09-30 18:59:08 +08:00
qazal
de1d562b69
rangeify: update test_pickle asserts ( #12366 )
...
* realized exists on the base
* use is_realized
2025-09-30 13:27:41 +03:00
George Hotz
dc11a23775
fix bad range merges
2025-09-30 18:26:57 +08:00
qazal
e8c595c29e
remu: add new instructions introduced in RANGEIFY ( #12363 )
...
* add v_mad_i64_i32 for test_output_padded_conv_transpose2d
* run amd test_ops
* skip test_masked_select
2025-09-30 12:36:29 +03:00
qazal
109c63b904
update Tensor unit tests for RANGEIFY ( #12359 )
...
* update test_kernelize for RANGEIFY
* also kernelizes user contiguous
* skip that test
* tensor uop repr
* 4 kernels, still realizes a float
2025-09-30 11:17:21 +03:00
George Hotz
7129419500
fix cifar training in RANGEIFY ( #12355 )
...
* fix cifar training in RANGEIFY
* even more wino fuse
* bugfix
* test to show issue
2025-09-30 15:59:19 +08:00
qazal
4ff7f20b9d
rangeify: fix kernelize ( #12357 )
2025-09-30 10:10:08 +03:00
chenyu
86c5c969ea
linalg cosmetic change ( #12356 )
2025-09-30 03:00:59 -04:00
qazal
6a56d3c859
rangeify: only test correctness in multi ( #12339 )
...
* work
* more work
* back here
* skip tests
* work
2025-09-30 09:55:59 +03:00
George Hotz
ab6b0d3a21
enable cleanup_dead_axes ( #12351 )
...
* enable cleanup_dead_axes
* don't mess with user contig
* correct tag behavior
* double reshape isn't correct
* block on assign too
* skip messing with symbolic
* Fix tests
* disable RANGEIFY=2
* test w rangeify
2025-09-30 14:09:39 +08:00
Sieds Lykles
73b25bf47d
z3 fix loaded mask ( #12353 )
...
* z3 fix loaded mask
* indentation
2025-09-30 06:55:50 +02:00
wozeparrot
2a0caa09c2
push copy to disk ( #12348 )
2025-09-29 21:55:05 -07:00
hooved
39aae679e4
Support bfloat16 on NULL backend ( #12340 )
...
* add failing test
* move test
* only run test with NULL default
* add skip reason
* add fix
2025-09-30 00:02:30 -04:00
George Hotz
f522e83a02
fix rangeify elu fusion for openpilot ( #12341 )
...
* fix rangeify elu fusion for openpilot
* flip the metadata
* copy over permuted contiguous support
* this is correct
* update that
2025-09-30 11:41:52 +08:00
Sieds Lykles
d55d829635
Lower index dtype spec fix ( #12337 )
...
* new pm_lower_index_dtype
* load_store_indexing after index lowering
* shorten line
* seperate rule for long removal
* fix test
* fix index_to_concrete_int
* minor fixes
* add sink there
* update types in linearizer test
2025-09-30 04:26:50 +02:00
hooved
c2689c505e
Clip model updates for Stable Diffusion mlperf training ( #12313 )
...
* stable diffusion mlperf clip changes
* add clip tests
* set gelu as attribute
* add more tests
* factor out GPUS
* rerun CI
* add imports to if blocks
* remove unneeded axis
* add clip tests to CI
* move clip tests
* add deps, disable max buf size
2025-09-29 21:50:14 -04:00
George Hotz
cdfa0f29fd
add rendering to index ( #12338 )
2025-09-30 09:18:05 +08:00
qazal
9513f025c5
apply multi before rangeify ( #12298 )
...
* it doesn't realize it when i reshape
* cleaner graph
* map out
* REDUCE_AXIS also gives the wrong answer
* maybe
* work
* back here
* try
* more
* refactor tests
* check MultiBuffer
* or copy
* fine with this
* don't need graph_rewrite_map in rangeify
2025-09-29 14:16:31 +03:00
George Hotz
3291e00df7
fix efficientnet slowness on rangeify ( #12332 )
2025-09-29 18:01:01 +08:00
chenyu
9d2f2b8e34
skip test_mean_half_precision_overflow ( #12331 )
...
it only works with SPLIT_REDUCEOP=1
2025-09-29 05:15:04 -04:00
chenyu
76c87d81b3
delete test_backward_sum_acc_dtype ( #12330 )
...
this test tests the wrong thing, it was only working because expand realize rule
2025-09-29 04:46:17 -04:00
George Hotz
fd2e4f2353
failing rng test ( #12328 )
...
* tighten spec: fixup devectorizer types / rangeify
* tighten assign
* failing rangeify test
* simpler
* otherwise contig
* more tolerance cause rng seed changed
2025-09-29 16:06:45 +08:00
qazal
250cb10e8f
rangeify permuted assign ( #12299 )
...
* enable RANGEIFY=1 test_assign
* work
* rangeify=0 asserts this ast
* remove that
* beta test, it's correct though
* skip multi
* matches torch/np output
* memcopy without memcopy
* can remove this
* rangeify isn't silently wrong anymore
* diff cleanup
* use UOp toposort instead of global tags
* actual assert TestRangeifyAssign
* step
* work
* this isn't optimizing away now
* some todos
* test fusion schedule
* typo
* dedup idxs
* cleaner
* pre
* work
* diff
2025-09-29 07:27:57 +03:00
Sieds Lykles
ed90de6583
Revert "Bufferize early, fix "children not making progress" on big graphs (#1…" ( #12318 )
...
This reverts commit 6f1cf717de .
2025-09-28 19:10:21 +02:00
Sieds Lykles
29f0886395
skip test_softmax_fusion tests if RANGEIFY==1 ( #12310 )
2025-09-27 05:57:40 +02:00
Sieds Lykles
b98f1881ef
dsp opt test has different axis number on rangeify ( #12309 )
2025-09-27 05:06:11 +02:00
Sieds Lykles
6f1cf717de
Bufferize early, fix "children not making progress" on big graphs ( #12308 )
...
* bufferize children early
* cleaner
* fix types
* lower number of reduceops
* test openpilot
2025-09-27 04:17:15 +02:00
nimlgen
f5eb46a3d9
fix limit buf metal on non rangeify ( #12303 )
...
* add failure test for limit buf on non rangeify
* correct metal
* correct
* hm
2025-09-26 11:06:28 +03:00
Sieds Lykles
74411984fc
Rangeify IMAGE ( #12304 )
...
* add imagedtype to rangeify
* enable some image tests
* move the tests
* image upcast before locals
* add if statement
* rangeify image_dtype test
* decrease read_image count
2025-09-26 07:21:02 +02:00
chenyu
17cec8d645
RANGEIFY winograd test ( #12297 )
...
speed seems fine
2025-09-24 23:42:32 -04:00
nimlgen
476a2a0a96
test_qcom: update ( #12293 )
2025-09-24 21:45:58 +03:00
qazal
0e778296be
rangeify: refactor const folding ( #12291 )
...
* rangeify: refactor const folding [pr]
* it got better
2025-09-24 17:58:39 +03:00
qazal
6c9d8c7e41
rangeify: simplify noop copy ( #12289 )
2025-09-24 17:01:23 +03:00
Sieds Lykles
45c7252aed
Better div nesting 2 ( #11812 )
...
* remove check
* use fold_divmod_congruence instead of simplify
* adjust tests
* shorten line
* new algo
* add test
* cleanup
* update tests
* ALLOWED_GATED_READ_IMAGE from 16 -> 12
* only remove the call to simplify
* add option to simplify with factor_remainder
* Allowed readimage gates back to 16
2025-09-24 04:50:26 +02:00
Sieds Lykles
6146c64d81
lower the invalid gate last ( #12164 )
...
* lowering invalid gate is part of lower_index_dtype
* update test
* remove import
* put that back
* reduce_collapse uses invalid
* fix that pattern to use invalid_pat
* valid creates the right dtype count
* seperate rule for lowering invalid gate
* dont unvectorize Invalid gate
* image_fixup uses Invalid
* update tests
* cleanup
* update split_load_store
* add .scalar() there
2025-09-24 04:27:35 +02:00
nimlgen
02a7b7fe48
rangeify: fix test_setitem ( #12269 )
...
* rangeify: fix test_setitem
* um?
* better?
* simple where folding
* f
* revert
* x
2025-09-23 20:42:36 +03:00
chenyu
b54cb272d0
move test_qcom to test/device ( #12272 )
2025-09-22 21:07:10 -04:00
Sieds Lykles
d21e34e617
enable test_sum_twice ( #12270 )
...
* remove skip
* remove import
2025-09-23 00:57:29 +02:00
Sieds Lykles
5a4b244e6b
Check for group inside another reduce ( #12268 )
...
* add check
* get the ranges correctly
* add test
* comment and better check
2025-09-23 00:32:41 +02:00
chenyu
b03ceb806e
move test_sample to test_randomness ( #12266 )
2025-09-21 21:11:32 -04:00
nimlgen
b53a266254
rangeify: fix test_optim ( #12262 )
...
* rangeify: fix test_optim
* add to cl?
* these are good now
2025-09-21 18:08:35 +03:00
nimlgen
4762a24022
test_free_intermediates force buffers ( #12255 )
...
* test_free_intermediates force buffers
* f
* fix for rangiefy
* xx
2025-09-20 18:14:39 +03:00
qazal
57c7e0a8f8
RANGEIFY=1 test_jit ( #12254 )
...
* RANGEIFY=1 test_jit
* don't do any of that
* disk
* simple disk tensor
* more work
* run more tests
* it also doesn't copy everytime
* skip tests that hang everything
2025-09-20 17:34:32 +03:00
chenyu
393c6b236c
test case to sum twice in different order ( #12253 )
...
* test case to sum twice in different order
fixed by #12251
* try metal
2025-09-20 10:11:57 -04:00
qazal
4756971c88
skip test_bf16_disk_write_read on CL=1 ( #12256 )
2025-09-20 17:11:06 +03:00