Commit Graph

4439 Commits

Author SHA1 Message Date
George Hotz
0f25b4b289 move frontend dir to nn [pr] (#12470) 2025-10-07 10:42:22 +08:00
qazal
f664bcc8bd use recursive_property in UOp tracing (#12469)
* test

* simple passing
2025-10-06 21:10:52 +03:00
qazal
76e8a3250c rangeify: late zero folding (#12464)
* rangeify: late zero folding

* early

* not kernels

* none

* multi

* linter

* mstack is sink comment

* more comment
2025-10-06 12:52:33 +03:00
chenyu
a1881b0c17 update test_chicken (#12466)
logits are close, just numerical
2025-10-06 03:58:44 -04:00
qazal
1b1978b9c0 early copy fixup (#12463)
* simple failing test

* early copy fixup
2025-10-06 06:38:29 +03:00
chenyu
c1e85f699c multi test case for sharded ring allreduce (#12462)
* multi test case for sharded ring allreduce

triggers `children not making progress` with RANGEIFY

* expect_rangeify_fails
2025-10-05 23:18:24 -04:00
George Hotz
46e8ea15c1 split pm_substitute_recurse (#12460) 2025-10-05 21:35:50 -04:00
qazal
6ad9a688ed add failing test after "pend substitutes for speed" (#12457)
* add failing substitute test

* expect_rangeify_fails
2025-10-05 16:10:04 +03:00
qazal
4b60121498 fix bmnist torch with RANGEIFY=1 (#12442)
* fix bmnist torch with RANGEIFY=1

* alt

* test and comment

* this was always wrong

* simple failing test for rangeify

* simple upat to match the old behavior
2025-10-05 12:34:27 +03:00
George Hotz
b5f31d7505 earlier seen children (#12451) 2025-10-05 15:55:13 +08:00
qazal
865d5796f8 add a test for untested Tensor.assign behavior (#12448)
* add a test for untested Tensor.assign behavior

* better
2025-10-04 12:44:56 +03:00
Sieds Lykles
e74be4a140 UOp.factor and add chain sorting (#12413)
* add ordering

* fix some tests

* fix more tests

* shorten comment

* update test

* add rule and test

* add rule and test

* remove check

* use fold_divmod_congruence instead of simplify

* adjust tests

* shorten line

* new algo

* add test

* add function to un-nest the div

* add UOp.factor

* test UOp.factor

* uop_given_valid tries to factor simplex expression

* shorten line

* symbolic_flat is back

* change that back

* fix those new tests

* new rule for ordering

* factor multiple factors

* no symbolic_flat

* symbolic_flat to there

* move that back

* fix imports

* merge correctly

* linter happy

* add rule

* add a test

* cleanup

* revert that for now

* UOp.factor returns self instead of None

* try all_candidates

* remove or_else

* post index symbolic

* add test

* maket this closer to the original

* increase mac hlb_cifar min step time

* add some ordering tests

* cleanup

* increase pytest timeout time

* check dtype
2025-10-04 06:05:38 +02:00
Sieds Lykles
394dc24110 post index symbolic (#12446)
* post index symbolic

* add test
2025-10-03 23:23:03 +02:00
chenyu
9f2b69b870 enable few tests for PTX test_dtype (#12445) 2025-10-03 08:56:30 -04:00
chenyu
b087663c35 RANGEIFY test_bert uses more ran somehow (#12443) 2025-10-03 04:38:53 -04:00
chenyu
940a8d5ba9 default IGNORE_OOB=1 (#12441)
* default IGNORE_OOB=1

z3 can get very slow with RANGEIFY, also update some kernel numbers to what it is

* add to test
2025-10-03 04:16:19 -04:00
hooved
1e8945a28c Training loop for Stable Diffusion mlperf (#12315)
* add diff

* fix edit error

* match master

* point reference to specific commit

* simplify wandb logging

* remove lr test, dehardcode device

* increase stack size limit
2025-10-03 02:45:38 -04:00
George Hotz
c7849ac593 fix test lil model (#12437)
* fix test lil model

* 4 not 3
2025-10-03 02:28:37 -04:00
Sieds Lykles
0047bcc535 undo loaded comparison swap (#12436)
* add rule

* add a test
2025-10-03 06:57:29 +02:00
chenyu
f203d8b221 update RANGEIFY kernel count and test_masked_select (#12435) 2025-10-03 00:41:34 -04:00
wozeparrot
a6dd5a224b skip webgpu tests (#12433) 2025-10-02 21:31:07 -07:00
chenyu
bf99de7b1e update a few more tests for RANGEIFY (#12434) 2025-10-03 00:16:58 -04:00
Sieds Lykles
16a65b4fd0 fix test_symbolic_gcd_div hang (#12427) 2025-10-03 04:21:16 +02:00
chenyu
7b3912d8e4 relax atol for some tests (#12422) 2025-10-02 05:04:44 -04:00
chenyu
98163832e4 update RANGEIFY test_cast_padded (#12421)
* update RANGEIFY test_cast_padded

* update test
2025-10-02 04:37:35 -04:00
qazal
f21851b099 ops: n^2 .device property fix (#12419)
* test case for a long rand chain

currently failing with RANGEIFY because device propogates too deep

* skip

* ops: n^2 .device property fix

* unskip

---------

Co-authored-by: Chen-Yu Yang <chenyu@fastmail.com>
2025-10-02 03:28:12 -04:00
qazal
13a25b2e67 rangeify: don't shape INDEX on kernelize (#12417) 2025-10-02 09:45:37 +03:00
hooved
5d9035f5a6 Eval for Stable Diffusion mlperf (#12316)
* add diff

* rerun ci

* refactor beam workaround, add test

* fix conflict

* linting
2025-10-02 02:35:38 -04:00
hooved
0f804c9a83 Stable Diffusion model init for mlperf (#12314)
* include clip pr diff

* updated unet and sd init

* dehardcode default device

* revert beam hang workaround

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-10-02 02:28:41 -04:00
George Hotz
583553f467 split ranges (#12411)
* split ranges

* simpler

* split ranges

* range str

* fix test

* oops

* faster

* no group 2

* tests

* dont_sub_ranges_for_image

* revert that
2025-10-02 12:57:22 +08:00
qazal
6fc6b51b59 fix limit_bufs with kernelize (#12415) 2025-10-02 07:49:11 +03:00
qazal
2fcd55583f allow less kernels in external_test_opt (#12412)
* allow less kernels in external_test_opt

* this was always 2
2025-10-02 05:05:42 +03:00
qazal
8b48e19ce2 skip more multi remote tests (#12410) 2025-10-02 04:50:46 +03:00
Sieds Lykles
9a64fc0d28 Load alt value with cast try 2 (#12407)
* add or_casted

* add tests and fix old tests

* cast load

* move that to pm_render

* add allow_any_len to gated load patterns in renderers

* slice [:2]
2025-10-02 00:55:29 +02:00
nimlgen
3e0e0290ce increase timeout in test_module_runs (#12408) 2025-10-01 22:01:44 +03:00
George Hotz
89bed28716 split reduceop (#12404)
* some rangeify tests fixed

* bring split reduceop to rangeify

* fix tests
2025-10-01 18:45:16 +08:00
George Hotz
74ee305948 some rangeify tests fixed (#12403) 2025-10-01 18:23:37 +08:00
qazal
f198a9e1ba skip test_multihost_aware_schedule, assign devices mismatch (#12396)
* minimal failing remote test

* this should've never worked?

* skip that test
2025-10-01 13:09:15 +03:00
George Hotz
60e52fbe36 support opts in contig, simpler (#12400) 2025-10-01 17:20:04 +08:00
chenyu
6ba8bf282f skip test_masked_select for RANGEIFY PYTHON (#12395) 2025-10-01 04:13:31 -04:00
chenyu
adc8c3b28f Revert "load alt value with cast (#12384)" (#12392)
This reverts commit 05e91a248d.
2025-10-01 03:20:04 -04:00
qazal
90b1c0dd96 rangeify: test_where_fold kernel count (#12379)
* rangeify: test_where_fold kernel count

* get these from the index

* replace ranges

* fine

* movement ops

* diff

* better
2025-10-01 09:35:12 +03:00
b1tg
42748ccb92 rangeify: fix test_prequant_conv2d_1x1 (#12391) 2025-10-01 02:33:47 -04:00
Sieds Lykles
05e91a248d load alt value with cast (#12384)
* add or_casted

* add tests and fix old tests

* cast load

* move that to pm_render
2025-10-01 07:14:26 +02:00
b1tg
57ad46c6e4 rangeify: increase atol for test_two_binops_no_rerun passing on real windows machine (#12389)
CPU_LLVM=1
2025-10-01 00:56:45 -04:00
chenyu
0662946fac atol in test_two_binops_no_rerun (#12387)
for RANGEIFY LLVM
2025-10-01 00:05:47 -04:00
wozeparrot
4204edc60b feat: skip test_long (#12383) 2025-09-30 20:07:39 -07:00
George Hotz
4c9a930de2 rangeify attn tests (#12377) 2025-10-01 09:59:19 +08:00
hooved
969a1b35ca LR scheduler for Stable Diffusion mlperf training (#12201)
* add lr scheduler for stable diffusion training

* add lr scheduler test

* rerun ci

* rerun CI

* use np for testing

* move test to CI path

* remove unneeded copy
2025-09-30 21:21:08 -04:00
George Hotz
9ef319f349 bad conv in rangeify (#12373)
* bad conv with broken rangeify

* no maxpool needed

* add empty_like

* typo

* no self

* issue remains for test
2025-10-01 08:56:22 +08:00