Commit Graph

915 Commits

Author SHA1 Message Date
chenyu
03ef5197fc move get_contraction to helpers [pr] (#12594) 2025-10-10 04:28:57 -04:00
chenyu
af90dc00de remove some View add logic [pr] (#12584)
no longer simplify the case of v0+v1 where v0 has a mask
2025-10-10 03:47:56 -04:00
chenyu
c8dfd10257 ShapeTracker.real_strides -> is_expanded [pr] (#12579)
only keep the used part
2025-10-09 22:52:45 -04:00
chenyu
678f83e41b delete ShapeTracker to_valid_uop and substitute [pr] (#12563) 2025-10-09 05:06:10 -04:00
chenyu
cf8232ec6a clean up more RANGEIFY flag (#12556) 2025-10-09 03:06:48 -04:00
chenyu
250f05a776 run some hashing test only on METAL (#12554)
quite slow on CPU
2025-10-09 02:39:49 -04:00
chenyu
ae51bdd06a remove trivial use of RANGEIFY flag (#12550)
some tests need update still
2025-10-09 02:29:38 -04:00
chenyu
43bce1f39f delete View minify [pr] (#12538) 2025-10-08 23:25:53 -04:00
chenyu
20d98b19c3 delete more unused ShapeTracker stuff (#12536) 2025-10-08 23:09:44 -04:00
George Hotz
0774575442 delete the old rangeify path and all the children stuff (#12524)
* delete the old rangeify path and all the children stuff

* remove the on_stack stuff and any retries

* don't use the p word

* Revert "remove the on_stack stuff and any retries"

This reverts commit 49a2b328b9.
2025-10-08 21:24:04 +08:00
qazal
b6835f4134 remove Ops.VIEW and related UOp methods (#12522)
* remove Ops.VIEW and related UOp methods

* update abstractions2.py

* no ShapeTrackers in abstractions2.py

* it's a size 1
2025-10-08 14:47:02 +03:00
George Hotz
3b0b3a2e64 fast RANGEIFY (#12504)
* rtoposort is fast, can replace rangeify with this

* fast rangeify

* work

* fast rangeify works for mnist

* should work

* progress

* pad fix

* FAST

* tests passing

* don't delete those shape ops

* put in rangeify map

* ending ranges fix

* tests

* mstack/mselect no hacks

* move to indexing.py

* touch up tests + add comments

* disable failing test

* actually make the file readable

* failing

* error
2025-10-08 19:38:06 +08:00
chenyu
ee0382ad99 remove ShapeTracker.invert (#12520) 2025-10-08 18:37:34 +08:00
chenyu
d5058427ea remove ShapeTracker.real_size (#12519) 2025-10-08 06:15:29 -04:00
qazal
2e19354c1c viz: reorder timeline graphs (#12498)
* viz: reorder timeline graphs

* update test_viz with the new order
2025-10-08 07:10:23 +03:00
Sieds Lykles
b465c17b56 Revert "UOp.factor and add chain sorting (#12413)" (#12492)
This reverts commit e74be4a140.
2025-10-08 03:20:23 +02:00
George Hotz
945cc46475 delete children tracking from uop (#12491)
* delete children tracking from uop

* uop children no longer exists

* no tracked children

* that test is flaky too
2025-10-08 09:04:14 +08:00
chenyu
a5484b767e remove skipping cast in simplify_valid [pr] (#12472)
* remove skipping cast in simplify_valid [pr]

unsupported statements are handled in uop_given_valid already. the test failed because (100%x) somehow got simplified

* better test
2025-10-07 00:10:04 -04:00
qazal
f664bcc8bd use recursive_property in UOp tracing (#12469)
* test

* simple passing
2025-10-06 21:10:52 +03:00
Sieds Lykles
e74be4a140 UOp.factor and add chain sorting (#12413)
* add ordering

* fix some tests

* fix more tests

* shorten comment

* update test

* add rule and test

* add rule and test

* remove check

* use fold_divmod_congruence instead of simplify

* adjust tests

* shorten line

* new algo

* add test

* add function to un-nest the div

* add UOp.factor

* test UOp.factor

* uop_given_valid tries to factor simplex expression

* shorten line

* symbolic_flat is back

* change that back

* fix those new tests

* new rule for ordering

* factor multiple factors

* no symbolic_flat

* symbolic_flat to there

* move that back

* fix imports

* merge correctly

* linter happy

* add rule

* add a test

* cleanup

* revert that for now

* UOp.factor returns self instead of None

* try all_candidates

* remove or_else

* post index symbolic

* add test

* maket this closer to the original

* increase mac hlb_cifar min step time

* add some ordering tests

* cleanup

* increase pytest timeout time

* check dtype
2025-10-04 06:05:38 +02:00
George Hotz
c7849ac593 fix test lil model (#12437)
* fix test lil model

* 4 not 3
2025-10-03 02:28:37 -04:00
chenyu
bf99de7b1e update a few more tests for RANGEIFY (#12434) 2025-10-03 00:16:58 -04:00
Sieds Lykles
16a65b4fd0 fix test_symbolic_gcd_div hang (#12427) 2025-10-03 04:21:16 +02:00
chenyu
7b3912d8e4 relax atol for some tests (#12422) 2025-10-02 05:04:44 -04:00
George Hotz
583553f467 split ranges (#12411)
* split ranges

* simpler

* split ranges

* range str

* fix test

* oops

* faster

* no group 2

* tests

* dont_sub_ranges_for_image

* revert that
2025-10-02 12:57:22 +08:00
nimlgen
3e0e0290ce increase timeout in test_module_runs (#12408) 2025-10-01 22:01:44 +03:00
b1tg
57ad46c6e4 rangeify: increase atol for test_two_binops_no_rerun passing on real windows machine (#12389)
CPU_LLVM=1
2025-10-01 00:56:45 -04:00
chenyu
0662946fac atol in test_two_binops_no_rerun (#12387)
for RANGEIFY LLVM
2025-10-01 00:05:47 -04:00
wozeparrot
4204edc60b feat: skip test_long (#12383) 2025-09-30 20:07:39 -07:00
George Hotz
4c9a930de2 rangeify attn tests (#12377) 2025-10-01 09:59:19 +08:00
qazal
109c63b904 update Tensor unit tests for RANGEIFY (#12359)
* update test_kernelize for RANGEIFY

* also kernelizes user contiguous

* skip that test

* tensor uop repr

* 4 kernels, still realizes a float
2025-09-30 11:17:21 +03:00
George Hotz
7129419500 fix cifar training in RANGEIFY (#12355)
* fix cifar training in RANGEIFY

* even more wino fuse

* bugfix

* test to show issue
2025-09-30 15:59:19 +08:00
chenyu
86c5c969ea linalg cosmetic change (#12356) 2025-09-30 03:00:59 -04:00
George Hotz
f522e83a02 fix rangeify elu fusion for openpilot (#12341)
* fix rangeify elu fusion for openpilot

* flip the metadata

* copy over permuted contiguous support

* this is correct

* update that
2025-09-30 11:41:52 +08:00
George Hotz
cdfa0f29fd add rendering to index (#12338) 2025-09-30 09:18:05 +08:00
chenyu
9d2f2b8e34 skip test_mean_half_precision_overflow (#12331)
it only works with SPLIT_REDUCEOP=1
2025-09-29 05:15:04 -04:00
chenyu
76c87d81b3 delete test_backward_sum_acc_dtype (#12330)
this test tests the wrong thing, it was only working because expand realize rule
2025-09-29 04:46:17 -04:00
chenyu
17cec8d645 RANGEIFY winograd test (#12297)
speed seems fine
2025-09-24 23:42:32 -04:00
Sieds Lykles
45c7252aed Better div nesting 2 (#11812)
* remove check

* use fold_divmod_congruence instead of simplify

* adjust tests

* shorten line

* new algo

* add test

* cleanup

* update tests

* ALLOWED_GATED_READ_IMAGE from 16 -> 12

* only remove the call to simplify

* add option to simplify with factor_remainder

* Allowed readimage gates back to 16
2025-09-24 04:50:26 +02:00
Sieds Lykles
6146c64d81 lower the invalid gate last (#12164)
* lowering invalid gate is part of lower_index_dtype

* update test

* remove import

* put that back

* reduce_collapse uses invalid

* fix that pattern to use invalid_pat

* valid creates the right dtype count

* seperate rule for lowering invalid gate

* dont unvectorize Invalid gate

* image_fixup uses Invalid

* update tests

* cleanup

* update split_load_store

* add .scalar() there
2025-09-24 04:27:35 +02:00
chenyu
b54cb272d0 move test_qcom to test/device (#12272) 2025-09-22 21:07:10 -04:00
qazal
4756971c88 skip test_bf16_disk_write_read on CL=1 (#12256) 2025-09-20 17:11:06 +03:00
Sieds Lykles
cc038b31b6 Shrink instead of reshape to unregister symbolic (#12241)
* Slice to unbind symbolic

* use vmax for now

* assert shape in reshape is valid

* update test_symbolic_ops to use shrink instead of reshape

* remove infer_with_bound_values for npw

* symbolic output doesnt have symbolic strides

* symbolic jit tests use shrink to unregister symbolic

* update test

* update more tests

* wrap vmax in int()

* only create a new st if the store is not an assigne

* unwrap st

* comments
2025-09-19 06:04:35 +02:00
Sieds Lykles
8d703a6369 z3 xor doesnt use bitcast (#12243) 2025-09-19 00:31:44 +02:00
chenyu
7487c13b61 truncate_fp16 -> float_to_fp16 (#12234)
match float_to_bf16 and float_to_fp8
2025-09-18 09:48:27 -04:00
b1tg
54c15d74a4 python float8 support (#11960)
* basic support

* alu

* nan in exec_alu

* rand_for_dtype

* inf + 0.0

* finfo

* revert rand_for_dtype

* clean

* truncate fp8s inf

* spec ok

* float_to_fp8 nan/inf

* least_upper_dtype

* clean up

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-09-18 09:17:09 -04:00
Sieds Lykles
158506b91e Upgrade some divmod folding for symbolic divs (#12216)
* use const_factor() instead of arg

* add test

* change div min_max

* add tests

* add divide_by_symbolic_gcd

* add tests

* one more test

* Slice to unbind symbolic

* deal with const factor properly

* minor cleanup

* divide_by_symbolic_gcd becomes UOp.gcd and UOp.divide_exact

* add tests

* add gcd_without_const

* fix divide_exact bug

* add factor_remainder

* add tests

* fix imports

* elif -> if

* remove expectedFailure

* add more tests

* add more unwrap

* fix signature of pop_const

* remove that

* remove that
2025-09-17 03:00:50 +02:00
qazal
a388d2cb1a remove PROFILE=1 option, it's just VIZ=1 [pr] (#12176)
* remove PROFILE=1 option, it's just VIZ=1 [pr]

* sqtt

* sqtt 2

* return last

* rename
2025-09-15 12:51:50 +03:00
chenyu
15b166ce6d bump test_module_runs to 30 seconds (#12174)
25 seconds sometimes
2025-09-14 16:48:40 -04:00
chenyu
d09c0f28c5 increase test_module_runs (#12173)
timed out on ci windows llvm
2025-09-14 15:19:21 -04:00