chenyu
76c87d81b3
delete test_backward_sum_acc_dtype ( #12330 )
...
this test tests the wrong thing, it was only working because expand realize rule
2025-09-29 04:46:17 -04:00
George Hotz
fd2e4f2353
failing rng test ( #12328 )
...
* tighten spec: fixup devectorizer types / rangeify
* tighten assign
* failing rangeify test
* simpler
* otherwise contig
* more tolerance cause rng seed changed
2025-09-29 16:06:45 +08:00
qazal
250cb10e8f
rangeify permuted assign ( #12299 )
...
* enable RANGEIFY=1 test_assign
* work
* rangeify=0 asserts this ast
* remove that
* beta test, it's correct though
* skip multi
* matches torch/np output
* memcopy without memcopy
* can remove this
* rangeify isn't silently wrong anymore
* diff cleanup
* use UOp toposort instead of global tags
* actual assert TestRangeifyAssign
* step
* work
* this isn't optimizing away now
* some todos
* test fusion schedule
* typo
* dedup idxs
* cleaner
* pre
* work
* diff
2025-09-29 07:27:57 +03:00
Sieds Lykles
ed90de6583
Revert "Bufferize early, fix "children not making progress" on big graphs (#1…" ( #12318 )
...
This reverts commit 6f1cf717de .
2025-09-28 19:10:21 +02:00
Sieds Lykles
29f0886395
skip test_softmax_fusion tests if RANGEIFY==1 ( #12310 )
2025-09-27 05:57:40 +02:00
Sieds Lykles
b98f1881ef
dsp opt test has different axis number on rangeify ( #12309 )
2025-09-27 05:06:11 +02:00
Sieds Lykles
6f1cf717de
Bufferize early, fix "children not making progress" on big graphs ( #12308 )
...
* bufferize children early
* cleaner
* fix types
* lower number of reduceops
* test openpilot
2025-09-27 04:17:15 +02:00
nimlgen
f5eb46a3d9
fix limit buf metal on non rangeify ( #12303 )
...
* add failure test for limit buf on non rangeify
* correct metal
* correct
* hm
2025-09-26 11:06:28 +03:00
Sieds Lykles
74411984fc
Rangeify IMAGE ( #12304 )
...
* add imagedtype to rangeify
* enable some image tests
* move the tests
* image upcast before locals
* add if statement
* rangeify image_dtype test
* decrease read_image count
2025-09-26 07:21:02 +02:00
chenyu
17cec8d645
RANGEIFY winograd test ( #12297 )
...
speed seems fine
2025-09-24 23:42:32 -04:00
nimlgen
476a2a0a96
test_qcom: update ( #12293 )
2025-09-24 21:45:58 +03:00
qazal
0e778296be
rangeify: refactor const folding ( #12291 )
...
* rangeify: refactor const folding [pr]
* it got better
2025-09-24 17:58:39 +03:00
qazal
6c9d8c7e41
rangeify: simplify noop copy ( #12289 )
2025-09-24 17:01:23 +03:00
Sieds Lykles
45c7252aed
Better div nesting 2 ( #11812 )
...
* remove check
* use fold_divmod_congruence instead of simplify
* adjust tests
* shorten line
* new algo
* add test
* cleanup
* update tests
* ALLOWED_GATED_READ_IMAGE from 16 -> 12
* only remove the call to simplify
* add option to simplify with factor_remainder
* Allowed readimage gates back to 16
2025-09-24 04:50:26 +02:00
Sieds Lykles
6146c64d81
lower the invalid gate last ( #12164 )
...
* lowering invalid gate is part of lower_index_dtype
* update test
* remove import
* put that back
* reduce_collapse uses invalid
* fix that pattern to use invalid_pat
* valid creates the right dtype count
* seperate rule for lowering invalid gate
* dont unvectorize Invalid gate
* image_fixup uses Invalid
* update tests
* cleanup
* update split_load_store
* add .scalar() there
2025-09-24 04:27:35 +02:00
nimlgen
02a7b7fe48
rangeify: fix test_setitem ( #12269 )
...
* rangeify: fix test_setitem
* um?
* better?
* simple where folding
* f
* revert
* x
2025-09-23 20:42:36 +03:00
chenyu
b54cb272d0
move test_qcom to test/device ( #12272 )
2025-09-22 21:07:10 -04:00
Sieds Lykles
d21e34e617
enable test_sum_twice ( #12270 )
...
* remove skip
* remove import
2025-09-23 00:57:29 +02:00
Sieds Lykles
5a4b244e6b
Check for group inside another reduce ( #12268 )
...
* add check
* get the ranges correctly
* add test
* comment and better check
2025-09-23 00:32:41 +02:00
chenyu
b03ceb806e
move test_sample to test_randomness ( #12266 )
2025-09-21 21:11:32 -04:00
nimlgen
b53a266254
rangeify: fix test_optim ( #12262 )
...
* rangeify: fix test_optim
* add to cl?
* these are good now
2025-09-21 18:08:35 +03:00
nimlgen
4762a24022
test_free_intermediates force buffers ( #12255 )
...
* test_free_intermediates force buffers
* f
* fix for rangiefy
* xx
2025-09-20 18:14:39 +03:00
qazal
57c7e0a8f8
RANGEIFY=1 test_jit ( #12254 )
...
* RANGEIFY=1 test_jit
* don't do any of that
* disk
* simple disk tensor
* more work
* run more tests
* it also doesn't copy everytime
* skip tests that hang everything
2025-09-20 17:34:32 +03:00
chenyu
393c6b236c
test case to sum twice in different order ( #12253 )
...
* test case to sum twice in different order
fixed by #12251
* try metal
2025-09-20 10:11:57 -04:00
qazal
4756971c88
skip test_bf16_disk_write_read on CL=1 ( #12256 )
2025-09-20 17:11:06 +03:00
Sieds Lykles
73c8dae60d
add missing remove_blockend case ( #12251 )
...
* add missing remove_blockend case
* remove expectedFailure
* better comment
2025-09-20 06:29:19 +02:00
qazal
bb59eed82f
rangeify: don't tag consts, they are global ( #12247 )
...
* rangeify: don't tag consts, they are global
* don't map movement ops
* sym failing test
* remove that
* update comment
* simpler test
* work
2025-09-19 15:25:03 +03:00
Sieds Lykles
cc038b31b6
Shrink instead of reshape to unregister symbolic ( #12241 )
...
* Slice to unbind symbolic
* use vmax for now
* assert shape in reshape is valid
* update test_symbolic_ops to use shrink instead of reshape
* remove infer_with_bound_values for npw
* symbolic output doesnt have symbolic strides
* symbolic jit tests use shrink to unregister symbolic
* update test
* update more tests
* wrap vmax in int()
* only create a new st if the store is not an assigne
* unwrap st
* comments
2025-09-19 06:04:35 +02:00
Sieds Lykles
8d703a6369
z3 xor doesnt use bitcast ( #12243 )
2025-09-19 00:31:44 +02:00
chenyu
0dad6cc518
good RANGEIFY kernel counts in external_test_opt ( #12242 )
...
no push permute stuff. the model ones are less clear if it's good, some got slower
2025-09-18 17:58:54 -04:00
qazal
825f148469
rangeify: fix copy size mismatch errs ( #12232 )
...
* rangeify: fix copy size mismatch errs
* const folding can happen in sym
assert it
* shippable
* rangeify copy is completely wrong
* pre_bufferize
* tag bufferize
* pre back
2025-09-18 18:23:32 +03:00
chenyu
f82b16a0e9
RANGEIFY test_tensor ( #12235 )
2025-09-18 10:35:43 -04:00
chenyu
7487c13b61
truncate_fp16 -> float_to_fp16 ( #12234 )
...
match float_to_bf16 and float_to_fp8
2025-09-18 09:48:27 -04:00
b1tg
54c15d74a4
python float8 support ( #11960 )
...
* basic support
* alu
* nan in exec_alu
* rand_for_dtype
* inf + 0.0
* finfo
* revert rand_for_dtype
* clean
* truncate fp8s inf
* spec ok
* float_to_fp8 nan/inf
* least_upper_dtype
* clean up
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-09-18 09:17:09 -04:00
qazal
dbbc261075
rangeify: fix COPY simplifier ( #12233 )
2025-09-18 14:35:33 +03:00
qazal
525f80e0d2
rangeify: enable putting consts back in the tensor graph ( #12225 )
...
* rangeify: enable putting consts back in the tensor graph
* work
* sym in ci
2025-09-17 19:45:04 +03:00
chenyu
edffc246ed
MUL in reduce_unparented ( #12223 )
...
* MUL in reduce_unparented
* some test
2025-09-17 11:56:39 -04:00
qazal
7733c217c5
remove spam comments in test_schedule ( #12224 )
2025-09-17 18:24:55 +03:00
qazal
d917895569
map out rangeify errors in test_schedule ( #12211 )
...
* map out rangeify errors in test_schedule
* skip that
* add to ci
2025-09-17 09:10:28 +03:00
Sieds Lykles
158506b91e
Upgrade some divmod folding for symbolic divs ( #12216 )
...
* use const_factor() instead of arg
* add test
* change div min_max
* add tests
* add divide_by_symbolic_gcd
* add tests
* one more test
* Slice to unbind symbolic
* deal with const factor properly
* minor cleanup
* divide_by_symbolic_gcd becomes UOp.gcd and UOp.divide_exact
* add tests
* add gcd_without_const
* fix divide_exact bug
* add factor_remainder
* add tests
* fix imports
* elif -> if
* remove expectedFailure
* add more tests
* add more unwrap
* fix signature of pop_const
* remove that
* remove that
2025-09-17 03:00:50 +02:00
chenyu
5b12764b83
add arange cat arange test ( #12217 )
...
simple test case to catch wrong reduce const folding. also clean up the old arange complexity test
2025-09-16 17:12:32 -04:00
chenyu
6b808c5fe6
update TestSymbolicJit.test_plus1_pad ( #12214 )
...
was failing because movement was not captured
2025-09-16 15:57:50 -04:00
Shun Usami
2a72b00679
Add test for 2D tensor indexing in setitem ( #12193 )
...
* Add test for 2D tensor indexing in setitem
* Fix _masked_setitem to handle multi dim indexing correctly
* Fix indent
* Add fuzz test for 3D tensor indexing in setitem
* Skip indexing fuzz test (slow)
2025-09-16 14:57:25 -04:00
chenyu
84d2d047ea
Tensor.pad_to and Tensor.shrink_to ( #12210 )
...
most of the time i want this instead of spelling out the args
also add more input validation to shrink
2025-09-16 12:24:55 -04:00
qazal
122a50fe8c
assert kernel count ( #12205 )
2025-09-16 14:24:39 +03:00
chenyu
e555748807
test rangeify const folding ( #12200 )
...
* test rangeify const folding
reduce i know how to fix, multi and test_cast_padded tbd
* test_instancenorm_3d is very slow
2025-09-15 20:03:48 -04:00
chenyu
f732f66709
rangeify test_nn almost pass ( #12198 )
...
* rangeify test_nn almost pass
* issue with jit
* flaky
2025-09-15 17:49:20 -04:00
qazal
a388d2cb1a
remove PROFILE=1 option, it's just VIZ=1 [pr] ( #12176 )
...
* remove PROFILE=1 option, it's just VIZ=1 [pr]
* sqtt
* sqtt 2
* return last
* rename
2025-09-15 12:51:50 +03:00
chenyu
bdb3afd566
failed test case for symbolic pad ( #12179 )
2025-09-15 00:25:21 -04:00
chenyu
15b166ce6d
bump test_module_runs to 30 seconds ( #12174 )
...
25 seconds sometimes
2025-09-14 16:48:40 -04:00