George Hotz
e7c7fdb47b
now that needs rangeify 2 also
2025-09-30 19:39:23 +08:00
George Hotz
ed5592b858
Merge branch 'master' into fix_rangeify_tests
2025-09-30 19:30:33 +08:00
George Hotz
a83f219253
fix bad range merges ( #12368 )
...
* fix bad range merges
* fix rng
* fix uop gc
2025-09-30 19:30:21 +08:00
George Hotz
17a1777823
Merge branch 'master' into fix_rangeify_tests
2025-09-30 19:29:50 +08:00
George Hotz
49dc879e8d
fix some rangeify tests
2025-09-30 19:23:42 +08:00
qazal
a95159d579
remove TestShapeSpec, it relies on ShapeTracker [pr] ( #12369 )
2025-09-30 14:20:35 +03:00
George Hotz
7eee206177
fix uop gc
2025-09-30 19:08:35 +08:00
George Hotz
d8bb679a3a
Merge branch 'master' into fix_rng_merge
2025-09-30 18:59:08 +08:00
George Hotz
9cf5e66899
minimal rangeify stable diffusion fix ( #12367 )
...
* minimal rangeify stable diffusion fix
* more minimal
2025-09-30 18:48:35 +08:00
George Hotz
b1f7ebd9f7
fix rng
2025-09-30 18:36:27 +08:00
chenyu
b4a4817c9c
fix rangeigy test_linalg ( #12365 )
2025-09-30 06:28:35 -04:00
qazal
de1d562b69
rangeify: update test_pickle asserts ( #12366 )
...
* realized exists on the base
* use is_realized
2025-09-30 13:27:41 +03:00
George Hotz
dc11a23775
fix bad range merges
2025-09-30 18:26:57 +08:00
b1tg
c9ef5d8fe5
rangeify: fix test_tensor_index_overflow (CPU_LLVM=1) ( #12362 )
...
* rangeify: fix test_tensor_index_overflow (CPU_LLVM=1)
* add test
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-09-30 05:55:15 -04:00
qazal
e8c595c29e
remu: add new instructions introduced in RANGEIFY ( #12363 )
...
* add v_mad_i64_i32 for test_output_padded_conv_transpose2d
* run amd test_ops
* skip test_masked_select
2025-09-30 12:36:29 +03:00
George Hotz
360980f1a3
work on rangeify cost function heuristics ( #12360 )
...
* work on rangeify cost function heuristics
* dedup
* better cost function
2025-09-30 16:44:29 +08:00
qazal
109c63b904
update Tensor unit tests for RANGEIFY ( #12359 )
...
* update test_kernelize for RANGEIFY
* also kernelizes user contiguous
* skip that test
* tensor uop repr
* 4 kernels, still realizes a float
2025-09-30 11:17:21 +03:00
George Hotz
7129419500
fix cifar training in RANGEIFY ( #12355 )
...
* fix cifar training in RANGEIFY
* even more wino fuse
* bugfix
* test to show issue
2025-09-30 15:59:19 +08:00
qazal
4ff7f20b9d
rangeify: fix kernelize ( #12357 )
2025-09-30 10:10:08 +03:00
chenyu
86c5c969ea
linalg cosmetic change ( #12356 )
2025-09-30 03:00:59 -04:00
qazal
6a56d3c859
rangeify: only test correctness in multi ( #12339 )
...
* work
* more work
* back here
* skip tests
* work
2025-09-30 09:55:59 +03:00
George Hotz
ab6b0d3a21
enable cleanup_dead_axes ( #12351 )
...
* enable cleanup_dead_axes
* don't mess with user contig
* correct tag behavior
* double reshape isn't correct
* block on assign too
* skip messing with symbolic
* Fix tests
* disable RANGEIFY=2
* test w rangeify
2025-09-30 14:09:39 +08:00
qazal
2a7310ab59
rangeify: fix remaining multi correctness issue ( #12354 )
2025-09-30 08:08:27 +03:00
Sieds Lykles
73b25bf47d
z3 fix loaded mask ( #12353 )
...
* z3 fix loaded mask
* indentation
2025-09-30 06:55:50 +02:00
wozeparrot
2a0caa09c2
push copy to disk ( #12348 )
2025-09-29 21:55:05 -07:00
chenyu
881709cd33
don't skip rangeify test_instancenorm_3d ( #12350 )
...
seems fine now
2025-09-30 00:05:59 -04:00
hooved
39aae679e4
Support bfloat16 on NULL backend ( #12340 )
...
* add failing test
* move test
* only run test with NULL default
* add skip reason
* add fix
2025-09-30 00:02:30 -04:00
chenyu
af935e7d32
Revert "reduce const folding ( #12344 )" ( #12349 )
...
This reverts commit 8e508a9927 .
2025-09-29 23:45:30 -04:00
George Hotz
f522e83a02
fix rangeify elu fusion for openpilot ( #12341 )
...
* fix rangeify elu fusion for openpilot
* flip the metadata
* copy over permuted contiguous support
* this is correct
* update that
2025-09-30 11:41:52 +08:00
qazal
d95d018bb5
add name to multi rewrite [pr] ( #12346 )
2025-09-30 06:34:58 +03:00
qazal
05275c9ec3
rangeify: enable assign to mstack target ( #12345 )
2025-09-30 06:27:57 +03:00
chenyu
8e508a9927
reduce const folding ( #12344 )
2025-09-29 23:08:56 -04:00
chenyu
3a480b858f
use more getitem in gpt2 ( #12343 )
2025-09-29 23:08:03 -04:00
qazal
32d69d07d7
rangeify: enable multitensor TestBatchNorm ( #12342 )
2025-09-30 06:05:00 +03:00
Sieds Lykles
d55d829635
Lower index dtype spec fix ( #12337 )
...
* new pm_lower_index_dtype
* load_store_indexing after index lowering
* shorten line
* seperate rule for long removal
* fix test
* fix index_to_concrete_int
* minor fixes
* add sink there
* update types in linearizer test
2025-09-30 04:26:50 +02:00
Sieds Lykles
c38f6ce140
unified_rewrite: use deque and dont add nodes to the stack multiple times ( #12320 )
...
* use deque instead of list
* increase ctx.progress and max stack_len
* add openpilot
* prevent placing uops on stack many times
* revert increasing ctx.progress and stack length limit
* dont block adding to the stack there
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-09-30 10:02:28 +08:00
hooved
c2689c505e
Clip model updates for Stable Diffusion mlperf training ( #12313 )
...
* stable diffusion mlperf clip changes
* add clip tests
* set gelu as attribute
* add more tests
* factor out GPUS
* rerun CI
* add imports to if blocks
* remove unneeded axis
* add clip tests to CI
* move clip tests
* add deps, disable max buf size
2025-09-29 21:50:14 -04:00
George Hotz
cdfa0f29fd
add rendering to index ( #12338 )
2025-09-30 09:18:05 +08:00
George Hotz
baf3b60cfb
fix gpt2 on rangeify ( #12335 )
2025-09-29 19:16:44 +08:00
qazal
9513f025c5
apply multi before rangeify ( #12298 )
...
* it doesn't realize it when i reshape
* cleaner graph
* map out
* REDUCE_AXIS also gives the wrong answer
* maybe
* work
* back here
* try
* more
* refactor tests
* check MultiBuffer
* or copy
* fine with this
* don't need graph_rewrite_map in rangeify
2025-09-29 14:16:31 +03:00
George Hotz
b899392f30
fix llm app with rangeify ( #12334 )
...
* fix llm app with rangeify
* add gpt2 contiguous also
2025-09-29 18:42:44 +08:00
wozeparrot
7ae6898e31
better late bufferview ( #12333 )
2025-09-29 03:08:34 -07:00
George Hotz
3291e00df7
fix efficientnet slowness on rangeify ( #12332 )
2025-09-29 18:01:01 +08:00
chenyu
9d2f2b8e34
skip test_mean_half_precision_overflow ( #12331 )
...
it only works with SPLIT_REDUCEOP=1
2025-09-29 05:15:04 -04:00
qazal
9915bcf2b4
remove no-op contiguous from rand ( #12329 )
2025-09-29 11:53:16 +03:00
chenyu
76c87d81b3
delete test_backward_sum_acc_dtype ( #12330 )
...
this test tests the wrong thing, it was only working because expand realize rule
2025-09-29 04:46:17 -04:00
George Hotz
fd2e4f2353
failing rng test ( #12328 )
...
* tighten spec: fixup devectorizer types / rangeify
* tighten assign
* failing rangeify test
* simpler
* otherwise contig
* more tolerance cause rng seed changed
2025-09-29 16:06:45 +08:00
George Hotz
29469577e8
tighten spec: fixup devectorizer types / rangeify ( #12327 )
...
* tighten spec: fixup devectorizer types / rangeify
* tighten assign
2025-09-29 15:41:11 +08:00
wozeparrot
a982480512
feat: late to_bufferview ( #12271 )
2025-09-29 00:29:43 -07:00
qazal
e01a3eb59a
rangeify whitespace cleanups [pr] ( #12326 )
...
* rangeify whitespace cleanups
* this is a noop
2025-09-29 10:04:51 +03:00