Commit Graph

4667 Commits

Author SHA1 Message Date
chenyu
76c87d81b3 delete test_backward_sum_acc_dtype (#12330)
this test tests the wrong thing, it was only working because expand realize rule
2025-09-29 04:46:17 -04:00
George Hotz
fd2e4f2353 failing rng test (#12328)
* tighten spec: fixup devectorizer types / rangeify

* tighten assign

* failing rangeify test

* simpler

* otherwise contig

* more tolerance cause rng seed changed
2025-09-29 16:06:45 +08:00
qazal
250cb10e8f rangeify permuted assign (#12299)
* enable RANGEIFY=1 test_assign

* work

* rangeify=0 asserts this ast

* remove that

* beta test, it's correct though

* skip multi

* matches torch/np output

* memcopy without memcopy

* can remove this

* rangeify isn't silently wrong anymore

* diff cleanup

* use UOp toposort instead of global tags

* actual assert TestRangeifyAssign

* step

* work

* this isn't optimizing away now

* some todos

* test fusion schedule

* typo

* dedup idxs

* cleaner

* pre

* work

* diff
2025-09-29 07:27:57 +03:00
Sieds Lykles
ed90de6583 Revert "Bufferize early, fix "children not making progress" on big graphs (#1…" (#12318)
This reverts commit 6f1cf717de.
2025-09-28 19:10:21 +02:00
Sieds Lykles
29f0886395 skip test_softmax_fusion tests if RANGEIFY==1 (#12310) 2025-09-27 05:57:40 +02:00
Sieds Lykles
b98f1881ef dsp opt test has different axis number on rangeify (#12309) 2025-09-27 05:06:11 +02:00
Sieds Lykles
6f1cf717de Bufferize early, fix "children not making progress" on big graphs (#12308)
* bufferize children early

* cleaner

* fix types

* lower number of reduceops

* test openpilot
2025-09-27 04:17:15 +02:00
nimlgen
f5eb46a3d9 fix limit buf metal on non rangeify (#12303)
* add failure test for limit buf on non rangeify

* correct metal

* correct

* hm
2025-09-26 11:06:28 +03:00
Sieds Lykles
74411984fc Rangeify IMAGE (#12304)
* add imagedtype to rangeify

* enable some image tests

* move the tests

* image upcast before locals

* add if statement

* rangeify image_dtype test

* decrease read_image count
2025-09-26 07:21:02 +02:00
chenyu
17cec8d645 RANGEIFY winograd test (#12297)
speed seems fine
2025-09-24 23:42:32 -04:00
nimlgen
476a2a0a96 test_qcom: update (#12293) 2025-09-24 21:45:58 +03:00
qazal
0e778296be rangeify: refactor const folding (#12291)
* rangeify: refactor const folding [pr]

* it got better
2025-09-24 17:58:39 +03:00
qazal
6c9d8c7e41 rangeify: simplify noop copy (#12289) 2025-09-24 17:01:23 +03:00
Sieds Lykles
45c7252aed Better div nesting 2 (#11812)
* remove check

* use fold_divmod_congruence instead of simplify

* adjust tests

* shorten line

* new algo

* add test

* cleanup

* update tests

* ALLOWED_GATED_READ_IMAGE from 16 -> 12

* only remove the call to simplify

* add option to simplify with factor_remainder

* Allowed readimage gates back to 16
2025-09-24 04:50:26 +02:00
Sieds Lykles
6146c64d81 lower the invalid gate last (#12164)
* lowering invalid gate is part of lower_index_dtype

* update test

* remove import

* put that back

* reduce_collapse uses invalid

* fix that pattern to use invalid_pat

* valid creates the right dtype count

* seperate rule for lowering invalid gate

* dont unvectorize Invalid gate

* image_fixup uses Invalid

* update tests

* cleanup

* update split_load_store

* add .scalar() there
2025-09-24 04:27:35 +02:00
nimlgen
02a7b7fe48 rangeify: fix test_setitem (#12269)
* rangeify: fix test_setitem

* um?

* better?

* simple where folding

* f

* revert

* x
2025-09-23 20:42:36 +03:00
chenyu
b54cb272d0 move test_qcom to test/device (#12272) 2025-09-22 21:07:10 -04:00
Sieds Lykles
d21e34e617 enable test_sum_twice (#12270)
* remove skip

* remove import
2025-09-23 00:57:29 +02:00
Sieds Lykles
5a4b244e6b Check for group inside another reduce (#12268)
* add check

* get the ranges correctly

* add test

* comment and better check
2025-09-23 00:32:41 +02:00
chenyu
b03ceb806e move test_sample to test_randomness (#12266) 2025-09-21 21:11:32 -04:00
nimlgen
b53a266254 rangeify: fix test_optim (#12262)
* rangeify: fix test_optim

* add to cl?

* these are good now
2025-09-21 18:08:35 +03:00
nimlgen
4762a24022 test_free_intermediates force buffers (#12255)
* test_free_intermediates force buffers

* f

* fix for rangiefy

* xx
2025-09-20 18:14:39 +03:00
qazal
57c7e0a8f8 RANGEIFY=1 test_jit (#12254)
* RANGEIFY=1 test_jit

* don't do any of that

* disk

* simple disk tensor

* more work

* run more tests

* it also doesn't copy everytime

* skip tests that hang everything
2025-09-20 17:34:32 +03:00
chenyu
393c6b236c test case to sum twice in different order (#12253)
* test case to sum twice in different order

fixed by #12251

* try metal
2025-09-20 10:11:57 -04:00
qazal
4756971c88 skip test_bf16_disk_write_read on CL=1 (#12256) 2025-09-20 17:11:06 +03:00
Sieds Lykles
73c8dae60d add missing remove_blockend case (#12251)
* add missing remove_blockend case

* remove expectedFailure

* better comment
2025-09-20 06:29:19 +02:00
qazal
bb59eed82f rangeify: don't tag consts, they are global (#12247)
* rangeify: don't tag consts, they are global

* don't map movement ops

* sym failing test

* remove that

* update comment

* simpler test

* work
2025-09-19 15:25:03 +03:00
Sieds Lykles
cc038b31b6 Shrink instead of reshape to unregister symbolic (#12241)
* Slice to unbind symbolic

* use vmax for now

* assert shape in reshape is valid

* update test_symbolic_ops to use shrink instead of reshape

* remove infer_with_bound_values for npw

* symbolic output doesnt have symbolic strides

* symbolic jit tests use shrink to unregister symbolic

* update test

* update more tests

* wrap vmax in int()

* only create a new st if the store is not an assigne

* unwrap st

* comments
2025-09-19 06:04:35 +02:00
Sieds Lykles
8d703a6369 z3 xor doesnt use bitcast (#12243) 2025-09-19 00:31:44 +02:00
chenyu
0dad6cc518 good RANGEIFY kernel counts in external_test_opt (#12242)
no push permute stuff. the model ones are less clear if it's good, some got slower
2025-09-18 17:58:54 -04:00
qazal
825f148469 rangeify: fix copy size mismatch errs (#12232)
* rangeify: fix copy size mismatch errs

* const folding can happen in sym

assert it

* shippable

* rangeify copy is completely wrong

* pre_bufferize

* tag bufferize

* pre back
2025-09-18 18:23:32 +03:00
chenyu
f82b16a0e9 RANGEIFY test_tensor (#12235) 2025-09-18 10:35:43 -04:00
chenyu
7487c13b61 truncate_fp16 -> float_to_fp16 (#12234)
match float_to_bf16 and float_to_fp8
2025-09-18 09:48:27 -04:00
b1tg
54c15d74a4 python float8 support (#11960)
* basic support

* alu

* nan in exec_alu

* rand_for_dtype

* inf + 0.0

* finfo

* revert rand_for_dtype

* clean

* truncate fp8s inf

* spec ok

* float_to_fp8 nan/inf

* least_upper_dtype

* clean up

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-09-18 09:17:09 -04:00
qazal
dbbc261075 rangeify: fix COPY simplifier (#12233) 2025-09-18 14:35:33 +03:00
qazal
525f80e0d2 rangeify: enable putting consts back in the tensor graph (#12225)
* rangeify: enable putting consts back in the tensor graph

* work

* sym in ci
2025-09-17 19:45:04 +03:00
chenyu
edffc246ed MUL in reduce_unparented (#12223)
* MUL in reduce_unparented

* some test
2025-09-17 11:56:39 -04:00
qazal
7733c217c5 remove spam comments in test_schedule (#12224) 2025-09-17 18:24:55 +03:00
qazal
d917895569 map out rangeify errors in test_schedule (#12211)
* map out rangeify errors in test_schedule

* skip that

* add to ci
2025-09-17 09:10:28 +03:00
Sieds Lykles
158506b91e Upgrade some divmod folding for symbolic divs (#12216)
* use const_factor() instead of arg

* add test

* change div min_max

* add tests

* add divide_by_symbolic_gcd

* add tests

* one more test

* Slice to unbind symbolic

* deal with const factor properly

* minor cleanup

* divide_by_symbolic_gcd becomes UOp.gcd and UOp.divide_exact

* add tests

* add gcd_without_const

* fix divide_exact bug

* add factor_remainder

* add tests

* fix imports

* elif -> if

* remove expectedFailure

* add more tests

* add more unwrap

* fix signature of pop_const

* remove that

* remove that
2025-09-17 03:00:50 +02:00
chenyu
5b12764b83 add arange cat arange test (#12217)
simple test case to catch wrong reduce const folding. also clean up the old arange complexity test
2025-09-16 17:12:32 -04:00
chenyu
6b808c5fe6 update TestSymbolicJit.test_plus1_pad (#12214)
was failing because movement was not captured
2025-09-16 15:57:50 -04:00
Shun Usami
2a72b00679 Add test for 2D tensor indexing in setitem (#12193)
* Add test for 2D tensor indexing in setitem

* Fix _masked_setitem to handle multi dim indexing correctly

* Fix indent

* Add fuzz test for 3D tensor indexing in setitem

* Skip indexing fuzz test (slow)
2025-09-16 14:57:25 -04:00
chenyu
84d2d047ea Tensor.pad_to and Tensor.shrink_to (#12210)
most of the time i want this instead of spelling out the args

also add more input validation to shrink
2025-09-16 12:24:55 -04:00
qazal
122a50fe8c assert kernel count (#12205) 2025-09-16 14:24:39 +03:00
chenyu
e555748807 test rangeify const folding (#12200)
* test rangeify const folding

reduce i know how to fix, multi and test_cast_padded tbd

* test_instancenorm_3d is very slow
2025-09-15 20:03:48 -04:00
chenyu
f732f66709 rangeify test_nn almost pass (#12198)
* rangeify test_nn almost pass

* issue with jit

* flaky
2025-09-15 17:49:20 -04:00
qazal
a388d2cb1a remove PROFILE=1 option, it's just VIZ=1 [pr] (#12176)
* remove PROFILE=1 option, it's just VIZ=1 [pr]

* sqtt

* sqtt 2

* return last

* rename
2025-09-15 12:51:50 +03:00
chenyu
bdb3afd566 failed test case for symbolic pad (#12179) 2025-09-15 00:25:21 -04:00
chenyu
15b166ce6d bump test_module_runs to 30 seconds (#12174)
25 seconds sometimes
2025-09-14 16:48:40 -04:00