Commit Graph

10347 Commits

Author SHA1 Message Date
George Hotz
e7c7fdb47b now that needs rangeify 2 also 2025-09-30 19:39:23 +08:00
George Hotz
ed5592b858 Merge branch 'master' into fix_rangeify_tests 2025-09-30 19:30:33 +08:00
George Hotz
a83f219253 fix bad range merges (#12368)
* fix bad range merges

* fix rng

* fix uop gc
2025-09-30 19:30:21 +08:00
George Hotz
17a1777823 Merge branch 'master' into fix_rangeify_tests 2025-09-30 19:29:50 +08:00
George Hotz
49dc879e8d fix some rangeify tests 2025-09-30 19:23:42 +08:00
qazal
a95159d579 remove TestShapeSpec, it relies on ShapeTracker [pr] (#12369) 2025-09-30 14:20:35 +03:00
George Hotz
7eee206177 fix uop gc 2025-09-30 19:08:35 +08:00
George Hotz
d8bb679a3a Merge branch 'master' into fix_rng_merge 2025-09-30 18:59:08 +08:00
George Hotz
9cf5e66899 minimal rangeify stable diffusion fix (#12367)
* minimal rangeify stable diffusion fix

* more minimal
2025-09-30 18:48:35 +08:00
George Hotz
b1f7ebd9f7 fix rng 2025-09-30 18:36:27 +08:00
chenyu
b4a4817c9c fix rangeigy test_linalg (#12365) 2025-09-30 06:28:35 -04:00
qazal
de1d562b69 rangeify: update test_pickle asserts (#12366)
* realized exists on the base

* use is_realized
2025-09-30 13:27:41 +03:00
George Hotz
dc11a23775 fix bad range merges 2025-09-30 18:26:57 +08:00
b1tg
c9ef5d8fe5 rangeify: fix test_tensor_index_overflow (CPU_LLVM=1) (#12362)
* rangeify: fix test_tensor_index_overflow (CPU_LLVM=1)

* add test

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-09-30 05:55:15 -04:00
qazal
e8c595c29e remu: add new instructions introduced in RANGEIFY (#12363)
* add v_mad_i64_i32 for test_output_padded_conv_transpose2d

* run amd test_ops

* skip test_masked_select
2025-09-30 12:36:29 +03:00
George Hotz
360980f1a3 work on rangeify cost function heuristics (#12360)
* work on rangeify cost function heuristics

* dedup

* better cost function
2025-09-30 16:44:29 +08:00
qazal
109c63b904 update Tensor unit tests for RANGEIFY (#12359)
* update test_kernelize for RANGEIFY

* also kernelizes user contiguous

* skip that test

* tensor uop repr

* 4 kernels, still realizes a float
2025-09-30 11:17:21 +03:00
George Hotz
7129419500 fix cifar training in RANGEIFY (#12355)
* fix cifar training in RANGEIFY

* even more wino fuse

* bugfix

* test to show issue
2025-09-30 15:59:19 +08:00
qazal
4ff7f20b9d rangeify: fix kernelize (#12357) 2025-09-30 10:10:08 +03:00
chenyu
86c5c969ea linalg cosmetic change (#12356) 2025-09-30 03:00:59 -04:00
qazal
6a56d3c859 rangeify: only test correctness in multi (#12339)
* work

* more work

* back here

* skip tests

* work
2025-09-30 09:55:59 +03:00
George Hotz
ab6b0d3a21 enable cleanup_dead_axes (#12351)
* enable cleanup_dead_axes

* don't mess with user contig

* correct tag behavior

* double reshape isn't correct

* block on assign too

* skip messing with symbolic

* Fix tests

* disable RANGEIFY=2

* test w rangeify
2025-09-30 14:09:39 +08:00
qazal
2a7310ab59 rangeify: fix remaining multi correctness issue (#12354) 2025-09-30 08:08:27 +03:00
Sieds Lykles
73b25bf47d z3 fix loaded mask (#12353)
* z3 fix loaded mask

* indentation
2025-09-30 06:55:50 +02:00
wozeparrot
2a0caa09c2 push copy to disk (#12348) 2025-09-29 21:55:05 -07:00
chenyu
881709cd33 don't skip rangeify test_instancenorm_3d (#12350)
seems fine now
2025-09-30 00:05:59 -04:00
hooved
39aae679e4 Support bfloat16 on NULL backend (#12340)
* add failing test

* move test

* only run test with NULL default

* add skip reason

* add fix
2025-09-30 00:02:30 -04:00
chenyu
af935e7d32 Revert "reduce const folding (#12344)" (#12349)
This reverts commit 8e508a9927.
2025-09-29 23:45:30 -04:00
George Hotz
f522e83a02 fix rangeify elu fusion for openpilot (#12341)
* fix rangeify elu fusion for openpilot

* flip the metadata

* copy over permuted contiguous support

* this is correct

* update that
2025-09-30 11:41:52 +08:00
qazal
d95d018bb5 add name to multi rewrite [pr] (#12346) 2025-09-30 06:34:58 +03:00
qazal
05275c9ec3 rangeify: enable assign to mstack target (#12345) 2025-09-30 06:27:57 +03:00
chenyu
8e508a9927 reduce const folding (#12344) 2025-09-29 23:08:56 -04:00
chenyu
3a480b858f use more getitem in gpt2 (#12343) 2025-09-29 23:08:03 -04:00
qazal
32d69d07d7 rangeify: enable multitensor TestBatchNorm (#12342) 2025-09-30 06:05:00 +03:00
Sieds Lykles
d55d829635 Lower index dtype spec fix (#12337)
* new pm_lower_index_dtype

* load_store_indexing after index lowering

* shorten line

* seperate rule for long removal

* fix test

* fix index_to_concrete_int

* minor fixes

* add sink there

* update types in linearizer test
2025-09-30 04:26:50 +02:00
Sieds Lykles
c38f6ce140 unified_rewrite: use deque and dont add nodes to the stack multiple times (#12320)
* use deque instead of list

* increase ctx.progress and max stack_len

* add openpilot

* prevent placing uops on stack many times

* revert increasing ctx.progress and stack length limit

* dont block adding to the stack there

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-09-30 10:02:28 +08:00
hooved
c2689c505e Clip model updates for Stable Diffusion mlperf training (#12313)
* stable diffusion mlperf clip changes

* add clip tests

* set gelu as attribute

* add more tests

* factor out GPUS

* rerun CI

* add imports to if blocks

* remove unneeded axis

* add clip tests to CI

* move clip tests

* add deps, disable max buf size
2025-09-29 21:50:14 -04:00
George Hotz
cdfa0f29fd add rendering to index (#12338) 2025-09-30 09:18:05 +08:00
George Hotz
baf3b60cfb fix gpt2 on rangeify (#12335) 2025-09-29 19:16:44 +08:00
qazal
9513f025c5 apply multi before rangeify (#12298)
* it doesn't realize it when i reshape

* cleaner graph

* map out

* REDUCE_AXIS also gives the wrong answer

* maybe

* work

* back here

* try

* more

* refactor tests

* check MultiBuffer

* or copy

* fine with this

* don't need graph_rewrite_map in rangeify
2025-09-29 14:16:31 +03:00
George Hotz
b899392f30 fix llm app with rangeify (#12334)
* fix llm app with rangeify

* add gpt2 contiguous also
2025-09-29 18:42:44 +08:00
wozeparrot
7ae6898e31 better late bufferview (#12333) 2025-09-29 03:08:34 -07:00
George Hotz
3291e00df7 fix efficientnet slowness on rangeify (#12332) 2025-09-29 18:01:01 +08:00
chenyu
9d2f2b8e34 skip test_mean_half_precision_overflow (#12331)
it only works with SPLIT_REDUCEOP=1
2025-09-29 05:15:04 -04:00
qazal
9915bcf2b4 remove no-op contiguous from rand (#12329) 2025-09-29 11:53:16 +03:00
chenyu
76c87d81b3 delete test_backward_sum_acc_dtype (#12330)
this test tests the wrong thing, it was only working because expand realize rule
2025-09-29 04:46:17 -04:00
George Hotz
fd2e4f2353 failing rng test (#12328)
* tighten spec: fixup devectorizer types / rangeify

* tighten assign

* failing rangeify test

* simpler

* otherwise contig

* more tolerance cause rng seed changed
2025-09-29 16:06:45 +08:00
George Hotz
29469577e8 tighten spec: fixup devectorizer types / rangeify (#12327)
* tighten spec: fixup devectorizer types / rangeify

* tighten assign
2025-09-29 15:41:11 +08:00
wozeparrot
a982480512 feat: late to_bufferview (#12271) 2025-09-29 00:29:43 -07:00
qazal
e01a3eb59a rangeify whitespace cleanups [pr] (#12326)
* rangeify whitespace cleanups

* this is a noop
2025-09-29 10:04:51 +03:00