chenyu
e701106a64
remove FUSE_ARANGE ( #12511 )
...
it was the default already
2025-10-08 04:54:07 -04:00
qazal
ad49f8148b
switch process_replay to rangeify ( #12509 )
2025-10-08 11:26:43 +03:00
nimlgen
4a756a37d8
amd: support rocm7 ( #12502 )
...
* amd: support rocm7
* mock
2025-10-08 14:30:39 +08:00
qazal
60b6dca5ba
update some tests instead of expect_rangeify_fails ( #12500 )
...
* update test_clone_doesnt_dedup to use base
* new_flat_buffer passes
* fix test_reorder_expand
* remove the view stuff
* remove that test, we don't want this view const behavior
* test_setitem_becomes_subbuffer is good
2025-10-08 07:42:31 +03:00
qazal
84597ed53c
early assert for device mistmatched asts in rangeify ( #12499 )
...
* early assert for device mistmatched asts in rangeify
* alt also passes
2025-10-08 07:19:36 +03:00
qazal
2e19354c1c
viz: reorder timeline graphs ( #12498 )
...
* viz: reorder timeline graphs
* update test_viz with the new order
2025-10-08 07:10:23 +03:00
qazal
a7cb80bfab
use recursive_property in UOp device ( #12477 )
...
* simple failing test with RecursionError
* switch to @recursive_property
* merge 2
* diff
2025-10-08 06:15:05 +03:00
Sieds Lykles
b465c17b56
Revert "UOp.factor and add chain sorting ( #12413 )" ( #12492 )
...
This reverts commit e74be4a140 .
2025-10-08 03:20:23 +02:00
George Hotz
945cc46475
delete children tracking from uop ( #12491 )
...
* delete children tracking from uop
* uop children no longer exists
* no tracked children
* that test is flaky too
2025-10-08 09:04:14 +08:00
George Hotz
12c4963489
add more rangeify pm tests ( #12488 )
2025-10-07 05:45:38 -04:00
George Hotz
403fdfcfd4
check spec in test, cleanup vectorize render ( #12484 )
2025-10-07 17:05:50 +08:00
qazal
22674798df
assert correctness in test_permuted_assignment [pr] ( #12483 )
2025-10-07 11:42:22 +03:00
George Hotz
75ce11593c
test_reshape_match should match ( #12479 )
2025-10-07 16:07:21 +08:00
George Hotz
ea7672931f
fix test_matmul_relu_cat ( #12478 )
2025-10-07 02:32:23 -04:00
chenyu
7b48f3cc45
failed test case repro for openpilot model ( #12475 )
...
* failed test case repro for openpilot model
* assertEqual
2025-10-07 13:46:43 +08:00
chenyu
a5484b767e
remove skipping cast in simplify_valid [pr] ( #12472 )
...
* remove skipping cast in simplify_valid [pr]
unsupported statements are handled in uop_given_valid already. the test failed because (100%x) somehow got simplified
* better test
2025-10-07 00:10:04 -04:00
George Hotz
0f25b4b289
move frontend dir to nn [pr] ( #12470 )
2025-10-07 10:42:22 +08:00
qazal
f664bcc8bd
use recursive_property in UOp tracing ( #12469 )
...
* test
* simple passing
2025-10-06 21:10:52 +03:00
qazal
76e8a3250c
rangeify: late zero folding ( #12464 )
...
* rangeify: late zero folding
* early
* not kernels
* none
* multi
* linter
* mstack is sink comment
* more comment
2025-10-06 12:52:33 +03:00
chenyu
a1881b0c17
update test_chicken ( #12466 )
...
logits are close, just numerical
2025-10-06 03:58:44 -04:00
qazal
1b1978b9c0
early copy fixup ( #12463 )
...
* simple failing test
* early copy fixup
2025-10-06 06:38:29 +03:00
chenyu
c1e85f699c
multi test case for sharded ring allreduce ( #12462 )
...
* multi test case for sharded ring allreduce
triggers `children not making progress` with RANGEIFY
* expect_rangeify_fails
2025-10-05 23:18:24 -04:00
George Hotz
46e8ea15c1
split pm_substitute_recurse ( #12460 )
2025-10-05 21:35:50 -04:00
qazal
6ad9a688ed
add failing test after "pend substitutes for speed" ( #12457 )
...
* add failing substitute test
* expect_rangeify_fails
2025-10-05 16:10:04 +03:00
qazal
4b60121498
fix bmnist torch with RANGEIFY=1 ( #12442 )
...
* fix bmnist torch with RANGEIFY=1
* alt
* test and comment
* this was always wrong
* simple failing test for rangeify
* simple upat to match the old behavior
2025-10-05 12:34:27 +03:00
George Hotz
b5f31d7505
earlier seen children ( #12451 )
2025-10-05 15:55:13 +08:00
qazal
865d5796f8
add a test for untested Tensor.assign behavior ( #12448 )
...
* add a test for untested Tensor.assign behavior
* better
2025-10-04 12:44:56 +03:00
Sieds Lykles
e74be4a140
UOp.factor and add chain sorting ( #12413 )
...
* add ordering
* fix some tests
* fix more tests
* shorten comment
* update test
* add rule and test
* add rule and test
* remove check
* use fold_divmod_congruence instead of simplify
* adjust tests
* shorten line
* new algo
* add test
* add function to un-nest the div
* add UOp.factor
* test UOp.factor
* uop_given_valid tries to factor simplex expression
* shorten line
* symbolic_flat is back
* change that back
* fix those new tests
* new rule for ordering
* factor multiple factors
* no symbolic_flat
* symbolic_flat to there
* move that back
* fix imports
* merge correctly
* linter happy
* add rule
* add a test
* cleanup
* revert that for now
* UOp.factor returns self instead of None
* try all_candidates
* remove or_else
* post index symbolic
* add test
* maket this closer to the original
* increase mac hlb_cifar min step time
* add some ordering tests
* cleanup
* increase pytest timeout time
* check dtype
2025-10-04 06:05:38 +02:00
Sieds Lykles
394dc24110
post index symbolic ( #12446 )
...
* post index symbolic
* add test
2025-10-03 23:23:03 +02:00
chenyu
9f2b69b870
enable few tests for PTX test_dtype ( #12445 )
2025-10-03 08:56:30 -04:00
chenyu
b087663c35
RANGEIFY test_bert uses more ran somehow ( #12443 )
2025-10-03 04:38:53 -04:00
chenyu
940a8d5ba9
default IGNORE_OOB=1 ( #12441 )
...
* default IGNORE_OOB=1
z3 can get very slow with RANGEIFY, also update some kernel numbers to what it is
* add to test
2025-10-03 04:16:19 -04:00
hooved
1e8945a28c
Training loop for Stable Diffusion mlperf ( #12315 )
...
* add diff
* fix edit error
* match master
* point reference to specific commit
* simplify wandb logging
* remove lr test, dehardcode device
* increase stack size limit
2025-10-03 02:45:38 -04:00
George Hotz
c7849ac593
fix test lil model ( #12437 )
...
* fix test lil model
* 4 not 3
2025-10-03 02:28:37 -04:00
Sieds Lykles
0047bcc535
undo loaded comparison swap ( #12436 )
...
* add rule
* add a test
2025-10-03 06:57:29 +02:00
chenyu
f203d8b221
update RANGEIFY kernel count and test_masked_select ( #12435 )
2025-10-03 00:41:34 -04:00
wozeparrot
a6dd5a224b
skip webgpu tests ( #12433 )
2025-10-02 21:31:07 -07:00
chenyu
bf99de7b1e
update a few more tests for RANGEIFY ( #12434 )
2025-10-03 00:16:58 -04:00
Sieds Lykles
16a65b4fd0
fix test_symbolic_gcd_div hang ( #12427 )
2025-10-03 04:21:16 +02:00
chenyu
7b3912d8e4
relax atol for some tests ( #12422 )
2025-10-02 05:04:44 -04:00
chenyu
98163832e4
update RANGEIFY test_cast_padded ( #12421 )
...
* update RANGEIFY test_cast_padded
* update test
2025-10-02 04:37:35 -04:00
qazal
f21851b099
ops: n^2 .device property fix ( #12419 )
...
* test case for a long rand chain
currently failing with RANGEIFY because device propogates too deep
* skip
* ops: n^2 .device property fix
* unskip
---------
Co-authored-by: Chen-Yu Yang <chenyu@fastmail.com >
2025-10-02 03:28:12 -04:00
qazal
13a25b2e67
rangeify: don't shape INDEX on kernelize ( #12417 )
2025-10-02 09:45:37 +03:00
hooved
5d9035f5a6
Eval for Stable Diffusion mlperf ( #12316 )
...
* add diff
* rerun ci
* refactor beam workaround, add test
* fix conflict
* linting
2025-10-02 02:35:38 -04:00
hooved
0f804c9a83
Stable Diffusion model init for mlperf ( #12314 )
...
* include clip pr diff
* updated unet and sd init
* dehardcode default device
* revert beam hang workaround
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-10-02 02:28:41 -04:00
George Hotz
583553f467
split ranges ( #12411 )
...
* split ranges
* simpler
* split ranges
* range str
* fix test
* oops
* faster
* no group 2
* tests
* dont_sub_ranges_for_image
* revert that
2025-10-02 12:57:22 +08:00
qazal
6fc6b51b59
fix limit_bufs with kernelize ( #12415 )
2025-10-02 07:49:11 +03:00
qazal
2fcd55583f
allow less kernels in external_test_opt ( #12412 )
...
* allow less kernels in external_test_opt
* this was always 2
2025-10-02 05:05:42 +03:00
qazal
8b48e19ce2
skip more multi remote tests ( #12410 )
2025-10-02 04:50:46 +03:00
Sieds Lykles
9a64fc0d28
Load alt value with cast try 2 ( #12407 )
...
* add or_casted
* add tests and fix old tests
* cast load
* move that to pm_render
* add allow_any_len to gated load patterns in renderers
* slice [:2]
2025-10-02 00:55:29 +02:00