Commit Graph

4547 Commits

Author SHA1 Message Date
b1tg
42748ccb92 rangeify: fix test_prequant_conv2d_1x1 (#12391) 2025-10-01 02:33:47 -04:00
Sieds Lykles
05e91a248d load alt value with cast (#12384)
* add or_casted

* add tests and fix old tests

* cast load

* move that to pm_render
2025-10-01 07:14:26 +02:00
b1tg
57ad46c6e4 rangeify: increase atol for test_two_binops_no_rerun passing on real windows machine (#12389)
CPU_LLVM=1
2025-10-01 00:56:45 -04:00
chenyu
0662946fac atol in test_two_binops_no_rerun (#12387)
for RANGEIFY LLVM
2025-10-01 00:05:47 -04:00
wozeparrot
4204edc60b feat: skip test_long (#12383) 2025-09-30 20:07:39 -07:00
George Hotz
4c9a930de2 rangeify attn tests (#12377) 2025-10-01 09:59:19 +08:00
hooved
969a1b35ca LR scheduler for Stable Diffusion mlperf training (#12201)
* add lr scheduler for stable diffusion training

* add lr scheduler test

* rerun ci

* rerun CI

* use np for testing

* move test to CI path

* remove unneeded copy
2025-09-30 21:21:08 -04:00
George Hotz
9ef319f349 bad conv in rangeify (#12373)
* bad conv with broken rangeify

* no maxpool needed

* add empty_like

* typo

* no self

* issue remains for test
2025-10-01 08:56:22 +08:00
George Hotz
44558a37f7 fix some rangeify tests (#12370)
* fix bad range merges

* fix rng

* fix uop gc

* fix some rangeify tests

* now that needs rangeify 2 also
2025-09-30 20:12:08 +08:00
nimlgen
2c397eb2a2 rangeify: buf limit (#12336)
* limit bufs

* g

* fix buffer limit

* um?

* fix

* only these?

* typo

* f

* cleaner
2025-09-30 14:59:47 +03:00
George Hotz
a83f219253 fix bad range merges (#12368)
* fix bad range merges

* fix rng

* fix uop gc
2025-09-30 19:30:21 +08:00
qazal
a95159d579 remove TestShapeSpec, it relies on ShapeTracker [pr] (#12369) 2025-09-30 14:20:35 +03:00
qazal
de1d562b69 rangeify: update test_pickle asserts (#12366)
* realized exists on the base

* use is_realized
2025-09-30 13:27:41 +03:00
qazal
e8c595c29e remu: add new instructions introduced in RANGEIFY (#12363)
* add v_mad_i64_i32 for test_output_padded_conv_transpose2d

* run amd test_ops

* skip test_masked_select
2025-09-30 12:36:29 +03:00
qazal
109c63b904 update Tensor unit tests for RANGEIFY (#12359)
* update test_kernelize for RANGEIFY

* also kernelizes user contiguous

* skip that test

* tensor uop repr

* 4 kernels, still realizes a float
2025-09-30 11:17:21 +03:00
George Hotz
7129419500 fix cifar training in RANGEIFY (#12355)
* fix cifar training in RANGEIFY

* even more wino fuse

* bugfix

* test to show issue
2025-09-30 15:59:19 +08:00
qazal
4ff7f20b9d rangeify: fix kernelize (#12357) 2025-09-30 10:10:08 +03:00
chenyu
86c5c969ea linalg cosmetic change (#12356) 2025-09-30 03:00:59 -04:00
qazal
6a56d3c859 rangeify: only test correctness in multi (#12339)
* work

* more work

* back here

* skip tests

* work
2025-09-30 09:55:59 +03:00
George Hotz
ab6b0d3a21 enable cleanup_dead_axes (#12351)
* enable cleanup_dead_axes

* don't mess with user contig

* correct tag behavior

* double reshape isn't correct

* block on assign too

* skip messing with symbolic

* Fix tests

* disable RANGEIFY=2

* test w rangeify
2025-09-30 14:09:39 +08:00
Sieds Lykles
73b25bf47d z3 fix loaded mask (#12353)
* z3 fix loaded mask

* indentation
2025-09-30 06:55:50 +02:00
wozeparrot
2a0caa09c2 push copy to disk (#12348) 2025-09-29 21:55:05 -07:00
hooved
39aae679e4 Support bfloat16 on NULL backend (#12340)
* add failing test

* move test

* only run test with NULL default

* add skip reason

* add fix
2025-09-30 00:02:30 -04:00
George Hotz
f522e83a02 fix rangeify elu fusion for openpilot (#12341)
* fix rangeify elu fusion for openpilot

* flip the metadata

* copy over permuted contiguous support

* this is correct

* update that
2025-09-30 11:41:52 +08:00
Sieds Lykles
d55d829635 Lower index dtype spec fix (#12337)
* new pm_lower_index_dtype

* load_store_indexing after index lowering

* shorten line

* seperate rule for long removal

* fix test

* fix index_to_concrete_int

* minor fixes

* add sink there

* update types in linearizer test
2025-09-30 04:26:50 +02:00
hooved
c2689c505e Clip model updates for Stable Diffusion mlperf training (#12313)
* stable diffusion mlperf clip changes

* add clip tests

* set gelu as attribute

* add more tests

* factor out GPUS

* rerun CI

* add imports to if blocks

* remove unneeded axis

* add clip tests to CI

* move clip tests

* add deps, disable max buf size
2025-09-29 21:50:14 -04:00
George Hotz
cdfa0f29fd add rendering to index (#12338) 2025-09-30 09:18:05 +08:00
qazal
9513f025c5 apply multi before rangeify (#12298)
* it doesn't realize it when i reshape

* cleaner graph

* map out

* REDUCE_AXIS also gives the wrong answer

* maybe

* work

* back here

* try

* more

* refactor tests

* check MultiBuffer

* or copy

* fine with this

* don't need graph_rewrite_map in rangeify
2025-09-29 14:16:31 +03:00
George Hotz
3291e00df7 fix efficientnet slowness on rangeify (#12332) 2025-09-29 18:01:01 +08:00
chenyu
9d2f2b8e34 skip test_mean_half_precision_overflow (#12331)
it only works with SPLIT_REDUCEOP=1
2025-09-29 05:15:04 -04:00
chenyu
76c87d81b3 delete test_backward_sum_acc_dtype (#12330)
this test tests the wrong thing, it was only working because expand realize rule
2025-09-29 04:46:17 -04:00
George Hotz
fd2e4f2353 failing rng test (#12328)
* tighten spec: fixup devectorizer types / rangeify

* tighten assign

* failing rangeify test

* simpler

* otherwise contig

* more tolerance cause rng seed changed
2025-09-29 16:06:45 +08:00
qazal
250cb10e8f rangeify permuted assign (#12299)
* enable RANGEIFY=1 test_assign

* work

* rangeify=0 asserts this ast

* remove that

* beta test, it's correct though

* skip multi

* matches torch/np output

* memcopy without memcopy

* can remove this

* rangeify isn't silently wrong anymore

* diff cleanup

* use UOp toposort instead of global tags

* actual assert TestRangeifyAssign

* step

* work

* this isn't optimizing away now

* some todos

* test fusion schedule

* typo

* dedup idxs

* cleaner

* pre

* work

* diff
2025-09-29 07:27:57 +03:00
Sieds Lykles
ed90de6583 Revert "Bufferize early, fix "children not making progress" on big graphs (#1…" (#12318)
This reverts commit 6f1cf717de.
2025-09-28 19:10:21 +02:00
Sieds Lykles
29f0886395 skip test_softmax_fusion tests if RANGEIFY==1 (#12310) 2025-09-27 05:57:40 +02:00
Sieds Lykles
b98f1881ef dsp opt test has different axis number on rangeify (#12309) 2025-09-27 05:06:11 +02:00
Sieds Lykles
6f1cf717de Bufferize early, fix "children not making progress" on big graphs (#12308)
* bufferize children early

* cleaner

* fix types

* lower number of reduceops

* test openpilot
2025-09-27 04:17:15 +02:00
nimlgen
f5eb46a3d9 fix limit buf metal on non rangeify (#12303)
* add failure test for limit buf on non rangeify

* correct metal

* correct

* hm
2025-09-26 11:06:28 +03:00
Sieds Lykles
74411984fc Rangeify IMAGE (#12304)
* add imagedtype to rangeify

* enable some image tests

* move the tests

* image upcast before locals

* add if statement

* rangeify image_dtype test

* decrease read_image count
2025-09-26 07:21:02 +02:00
chenyu
17cec8d645 RANGEIFY winograd test (#12297)
speed seems fine
2025-09-24 23:42:32 -04:00
nimlgen
476a2a0a96 test_qcom: update (#12293) 2025-09-24 21:45:58 +03:00
qazal
0e778296be rangeify: refactor const folding (#12291)
* rangeify: refactor const folding [pr]

* it got better
2025-09-24 17:58:39 +03:00
qazal
6c9d8c7e41 rangeify: simplify noop copy (#12289) 2025-09-24 17:01:23 +03:00
Sieds Lykles
45c7252aed Better div nesting 2 (#11812)
* remove check

* use fold_divmod_congruence instead of simplify

* adjust tests

* shorten line

* new algo

* add test

* cleanup

* update tests

* ALLOWED_GATED_READ_IMAGE from 16 -> 12

* only remove the call to simplify

* add option to simplify with factor_remainder

* Allowed readimage gates back to 16
2025-09-24 04:50:26 +02:00
Sieds Lykles
6146c64d81 lower the invalid gate last (#12164)
* lowering invalid gate is part of lower_index_dtype

* update test

* remove import

* put that back

* reduce_collapse uses invalid

* fix that pattern to use invalid_pat

* valid creates the right dtype count

* seperate rule for lowering invalid gate

* dont unvectorize Invalid gate

* image_fixup uses Invalid

* update tests

* cleanup

* update split_load_store

* add .scalar() there
2025-09-24 04:27:35 +02:00
nimlgen
02a7b7fe48 rangeify: fix test_setitem (#12269)
* rangeify: fix test_setitem

* um?

* better?

* simple where folding

* f

* revert

* x
2025-09-23 20:42:36 +03:00
chenyu
b54cb272d0 move test_qcom to test/device (#12272) 2025-09-22 21:07:10 -04:00
Sieds Lykles
d21e34e617 enable test_sum_twice (#12270)
* remove skip

* remove import
2025-09-23 00:57:29 +02:00
Sieds Lykles
5a4b244e6b Check for group inside another reduce (#12268)
* add check

* get the ranges correctly

* add test

* comment and better check
2025-09-23 00:32:41 +02:00
chenyu
b03ceb806e move test_sample to test_randomness (#12266) 2025-09-21 21:11:32 -04:00