Commit Graph

10420 Commits

Author SHA1 Message Date
George Hotz
a8dca47fbc fix 2025-10-03 10:52:18 +08:00
George Hotz
769db23df6 that 2025-10-03 10:45:47 +08:00
George Hotz
1cd40941c8 delete junk 2025-10-03 10:35:12 +08:00
George Hotz
9a607e69e1 Merge branch 'master' into support_opts_in_contig 2025-10-03 10:32:08 +08:00
George Hotz
9cd365c12e little changes from double gemm (#12429)
* little changes from double gemm

* split pm_group_for_reduce

* pm_add_buffers_local

* Revert "pm_add_buffers_local"

This reverts commit 4d30a91db2.
2025-10-03 10:31:51 +08:00
Sieds Lykles
16a65b4fd0 fix test_symbolic_gcd_div hang (#12427) 2025-10-03 04:21:16 +02:00
George Hotz
5b05bf4ab4 Merge branch 'master' into support_opts_in_contig 2025-10-03 10:03:14 +08:00
chenyu
2d24af888b REWRITE_STACK_LIMIT (#12426) 2025-10-02 21:51:04 -04:00
hooved
1b58ef0d60 Increase stack size limit in unified_rewrite (#12424)
* increase stack size limit

* rerun CI due to random tqdm test fail
2025-10-03 09:06:47 +08:00
qazal
17d36d0952 don't tag MSTACK/MSELECT on global buffers (#12423)
* don't tag MSTACK/MSELECT

* fix
2025-10-02 13:32:15 +03:00
George Hotz
e59e0aadc1 opts 2025-10-02 18:25:38 +08:00
George Hotz
3688afa513 fix swap 2025-10-02 18:17:03 +08:00
George Hotz
dae164ffb1 warp 2025-10-02 18:04:45 +08:00
George Hotz
3fb3dd4c06 flash attention with two gemms 2025-10-02 17:48:48 +08:00
George Hotz
e5028d58e9 flash attention sort of works 2025-10-02 17:29:23 +08:00
chenyu
7b3912d8e4 relax atol for some tests (#12422) 2025-10-02 05:04:44 -04:00
chenyu
98163832e4 update RANGEIFY test_cast_padded (#12421)
* update RANGEIFY test_cast_padded

* update test
2025-10-02 04:37:35 -04:00
George Hotz
5a602e6c36 double wmma works 2025-10-02 16:06:46 +08:00
chenyu
37beef6de3 add null bert training test in ci (#12420)
fails with RANGEIFY `RuntimeError: children not making progress`
2025-10-02 04:05:19 -04:00
George Hotz
6640514555 demote works on both matmuls 2025-10-02 15:42:22 +08:00
qazal
f21851b099 ops: n^2 .device property fix (#12419)
* test case for a long rand chain

currently failing with RANGEIFY because device propogates too deep

* skip

* ops: n^2 .device property fix

* unskip

---------

Co-authored-by: Chen-Yu Yang <chenyu@fastmail.com>
2025-10-02 03:28:12 -04:00
b1tg
ec177c80c2 rangeify: fix test_where_fold (llvm) (#12416)
* rangeify: fix test_where_fold (AMD_LLVM)

* rm comment
2025-10-02 02:57:49 -04:00
qazal
13a25b2e67 rangeify: don't shape INDEX on kernelize (#12417) 2025-10-02 09:45:37 +03:00
hooved
5d9035f5a6 Eval for Stable Diffusion mlperf (#12316)
* add diff

* rerun ci

* refactor beam workaround, add test

* fix conflict

* linting
2025-10-02 02:35:38 -04:00
hooved
0f804c9a83 Stable Diffusion model init for mlperf (#12314)
* include clip pr diff

* updated unet and sd init

* dehardcode default device

* revert beam hang workaround

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-10-02 02:28:41 -04:00
George Hotz
2fbd7d21f9 tc passes 2025-10-02 14:27:52 +08:00
George Hotz
0eee93f0c0 hotfix: disable split ranges for non rangeify 2025-10-02 13:15:24 +08:00
George Hotz
f32a497f08 bug 2025-10-02 13:10:07 +08:00
George Hotz
3fd25a425b fix gfr 2025-10-02 13:06:31 +08:00
George Hotz
3da569c20b Merge branch 'master' into support_opts_in_contig 2025-10-02 12:57:43 +08:00
George Hotz
583553f467 split ranges (#12411)
* split ranges

* simpler

* split ranges

* range str

* fix test

* oops

* faster

* no group 2

* tests

* dont_sub_ranges_for_image

* revert that
2025-10-02 12:57:22 +08:00
qazal
6fc6b51b59 fix limit_bufs with kernelize (#12415) 2025-10-02 07:49:11 +03:00
George Hotz
9d5d4b248c Merge branch 'master' into support_opts_in_contig 2025-10-02 12:39:50 +08:00
qazal
d1c868f990 fix limit_bufs with multi (#12414) 2025-10-02 05:51:56 +03:00
qazal
2fcd55583f allow less kernels in external_test_opt (#12412)
* allow less kernels in external_test_opt

* this was always 2
2025-10-02 05:05:42 +03:00
qazal
8b48e19ce2 skip more multi remote tests (#12410) 2025-10-02 04:50:46 +03:00
George Hotz
3770dd9d80 annotate bufferize in viz 2025-10-02 09:20:50 +08:00
qazal
5b649616ff rangeify: detect and assert cycles (#12405)
* rangeify: assert cycles

* rng=2

* any
2025-10-02 03:39:43 +03:00
Sieds Lykles
9a64fc0d28 Load alt value with cast try 2 (#12407)
* add or_casted

* add tests and fix old tests

* cast load

* move that to pm_render

* add allow_any_len to gated load patterns in renderers

* slice [:2]
2025-10-02 00:55:29 +02:00
nimlgen
3e0e0290ce increase timeout in test_module_runs (#12408) 2025-10-01 22:01:44 +03:00
Sieds Lykles
2f8ac77c25 add allow_any_len to gated load patterns in renderers (#12406) 2025-10-01 20:35:32 +02:00
George Hotz
89bed28716 split reduceop (#12404)
* some rangeify tests fixed

* bring split reduceop to rangeify

* fix tests
2025-10-01 18:45:16 +08:00
George Hotz
74ee305948 some rangeify tests fixed (#12403) 2025-10-01 18:23:37 +08:00
qazal
f198a9e1ba skip test_multihost_aware_schedule, assign devices mismatch (#12396)
* minimal failing remote test

* this should've never worked?

* skip that test
2025-10-01 13:09:15 +03:00
b1tg
ac3d457d5e rangeify: TestReduceOpsConstFolding (#12397)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-10-01 17:58:19 +08:00
George Hotz
c449e8eb17 don't change that 2025-10-01 17:47:43 +08:00
George Hotz
3dc1b2e98e broken 2025-10-01 17:27:17 +08:00
George Hotz
8e6126160f Merge branch 'master' into support_opts_in_contig 2025-10-01 17:20:51 +08:00
George Hotz
60e52fbe36 support opts in contig, simpler (#12400) 2025-10-01 17:20:04 +08:00
chenyu
6c95b1f39d explicitly set device for CI unit test (#12399) 2025-10-01 05:16:54 -04:00