Commit Graph

1035 Commits

Author SHA1 Message Date
chenyu
a0cbbc35ad remove LLAMA_LAYERS in ci (#12562) 2025-10-09 04:46:41 -04:00
nimlgen
658c566e22 vars in gated_read_image_count (#12486)
* vars in gated_read_image_count

* nc
2025-10-09 14:54:15 +08:00
chenyu
be05028419 move ASSERT_MIN_STEP_TIME to compile3 (#12535)
threshold is current time +20%
2025-10-08 22:16:59 -04:00
chenyu
5986d656a2 tighter ASSERT_MIN_STEP_TIME (#12531)
set to about 1.2x of actual time now
2025-10-08 21:22:54 -04:00
George Hotz
3b0b3a2e64 fast RANGEIFY (#12504)
* rtoposort is fast, can replace rangeify with this

* fast rangeify

* work

* fast rangeify works for mnist

* should work

* progress

* pad fix

* FAST

* tests passing

* don't delete those shape ops

* put in rangeify map

* ending ranges fix

* tests

* mstack/mselect no hacks

* move to indexing.py

* touch up tests + add comments

* disable failing test

* actually make the file readable

* failing

* error
2025-10-08 19:38:06 +08:00
chenyu
942022c309 smaller LLAMA_LAYER in Test llama 3 training (#12516)
very slow now
2025-10-08 05:10:51 -04:00
chenyu
e701106a64 remove FUSE_ARANGE (#12511)
it was the default already
2025-10-08 04:54:07 -04:00
chenyu
da1f46ff3f remove RANGEIFY specific test jobs (#12507) 2025-10-08 04:12:04 -04:00
chenyu
eb3bc277b3 remove ASSERT_MIN_STEP_TIME in external_benchmark_openpilot (#12495)
should add for compile3 and compile 3 only
2025-10-07 22:13:42 -04:00
George Hotz
403fdfcfd4 check spec in test, cleanup vectorize render (#12484) 2025-10-07 17:05:50 +08:00
chenyu
fe774a4319 more skip WINO on benchmark (#12482) 2025-10-07 03:43:51 -04:00
chenyu
8ad5f9e74f skip slow benchmarks (#12481)
* skip slow benchmarks

padded tc is already slow, rest are slow with rangeify (correct if run locally)

* relax more
2025-10-07 03:28:56 -04:00
chenyu
1823a5043f don't check MAX_BUFFER_SIZE on NULL (#12461) 2025-10-05 22:09:29 -04:00
chenyu
74b04f7dca test beautiful_mnist_multigpu (#12455)
* test beautiful_mnist_multigpu

another example that fails with RANGEIFY

* now i remember

* MAX_BUFFER_SIZE=0
2025-10-05 08:45:01 -04:00
Sieds Lykles
e74be4a140 UOp.factor and add chain sorting (#12413)
* add ordering

* fix some tests

* fix more tests

* shorten comment

* update test

* add rule and test

* add rule and test

* remove check

* use fold_divmod_congruence instead of simplify

* adjust tests

* shorten line

* new algo

* add test

* add function to un-nest the div

* add UOp.factor

* test UOp.factor

* uop_given_valid tries to factor simplex expression

* shorten line

* symbolic_flat is back

* change that back

* fix those new tests

* new rule for ordering

* factor multiple factors

* no symbolic_flat

* symbolic_flat to there

* move that back

* fix imports

* merge correctly

* linter happy

* add rule

* add a test

* cleanup

* revert that for now

* UOp.factor returns self instead of None

* try all_candidates

* remove or_else

* post index symbolic

* add test

* maket this closer to the original

* increase mac hlb_cifar min step time

* add some ordering tests

* cleanup

* increase pytest timeout time

* check dtype
2025-10-04 06:05:38 +02:00
chenyu
98163832e4 update RANGEIFY test_cast_padded (#12421)
* update RANGEIFY test_cast_padded

* update test
2025-10-02 04:37:35 -04:00
chenyu
37beef6de3 add null bert training test in ci (#12420)
fails with RANGEIFY `RuntimeError: children not making progress`
2025-10-02 04:05:19 -04:00
b1tg
ec177c80c2 rangeify: fix test_where_fold (llvm) (#12416)
* rangeify: fix test_where_fold (AMD_LLVM)

* rm comment
2025-10-02 02:57:49 -04:00
qazal
d1c868f990 fix limit_bufs with multi (#12414) 2025-10-02 05:51:56 +03:00
qazal
5b649616ff rangeify: detect and assert cycles (#12405)
* rangeify: assert cycles

* rng=2

* any
2025-10-02 03:39:43 +03:00
b1tg
ac3d457d5e rangeify: TestReduceOpsConstFolding (#12397)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-10-01 17:58:19 +08:00
chenyu
6c95b1f39d explicitly set device for CI unit test (#12399) 2025-10-01 05:16:54 -04:00
chenyu
689ab9151b more RANGEIFY tests (#12393)
would have caught the load alt regression without adding too many tests
2025-10-01 03:43:58 -04:00
b1tg
154d114364 rangeify: fix abstractions2.py (#12386)
* rangeify: fix abstractions2.py

* tests

* lint

* only abstractions2

* base
2025-10-01 09:58:56 +03:00
b1tg
da52006bde rangeify: fix test_scatter_reduce (#12380)
* rangeify: fix test_scatter_reduce

* ext_vector_type

* set alignment=1 on boolean
2025-09-30 23:26:36 -04:00
chenyu
8def8145e4 ALLOWED_KERNEL_COUNT openpilot 0.9.4 with RANGEIFY (#12381) 2025-09-30 22:58:59 -04:00
qazal
26247573e1 rangeify multi tests on gpu (#12376)
* rangeify multi tests on gpu

* fix limit_bufs
2025-10-01 04:53:04 +03:00
chenyu
b4a4817c9c fix rangeigy test_linalg (#12365) 2025-09-30 06:28:35 -04:00
b1tg
c9ef5d8fe5 rangeify: fix test_tensor_index_overflow (CPU_LLVM=1) (#12362)
* rangeify: fix test_tensor_index_overflow (CPU_LLVM=1)

* add test

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-09-30 05:55:15 -04:00
qazal
6a56d3c859 rangeify: only test correctness in multi (#12339)
* work

* more work

* back here

* skip tests

* work
2025-09-30 09:55:59 +03:00
George Hotz
ab6b0d3a21 enable cleanup_dead_axes (#12351)
* enable cleanup_dead_axes

* don't mess with user contig

* correct tag behavior

* double reshape isn't correct

* block on assign too

* skip messing with symbolic

* Fix tests

* disable RANGEIFY=2

* test w rangeify
2025-09-30 14:09:39 +08:00
qazal
2a7310ab59 rangeify: fix remaining multi correctness issue (#12354) 2025-09-30 08:08:27 +03:00
chenyu
881709cd33 don't skip rangeify test_instancenorm_3d (#12350)
seems fine now
2025-09-30 00:05:59 -04:00
hooved
39aae679e4 Support bfloat16 on NULL backend (#12340)
* add failing test

* move test

* only run test with NULL default

* add skip reason

* add fix
2025-09-30 00:02:30 -04:00
chenyu
af935e7d32 Revert "reduce const folding (#12344)" (#12349)
This reverts commit 8e508a9927.
2025-09-29 23:45:30 -04:00
qazal
05275c9ec3 rangeify: enable assign to mstack target (#12345) 2025-09-30 06:27:57 +03:00
chenyu
8e508a9927 reduce const folding (#12344) 2025-09-29 23:08:56 -04:00
qazal
32d69d07d7 rangeify: enable multitensor TestBatchNorm (#12342) 2025-09-30 06:05:00 +03:00
Sieds Lykles
c38f6ce140 unified_rewrite: use deque and dont add nodes to the stack multiple times (#12320)
* use deque instead of list

* increase ctx.progress and max stack_len

* add openpilot

* prevent placing uops on stack many times

* revert increasing ctx.progress and stack length limit

* dont block adding to the stack there

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-09-30 10:02:28 +08:00
hooved
c2689c505e Clip model updates for Stable Diffusion mlperf training (#12313)
* stable diffusion mlperf clip changes

* add clip tests

* set gelu as attribute

* add more tests

* factor out GPUS

* rerun CI

* add imports to if blocks

* remove unneeded axis

* add clip tests to CI

* move clip tests

* add deps, disable max buf size
2025-09-29 21:50:14 -04:00
qazal
250cb10e8f rangeify permuted assign (#12299)
* enable RANGEIFY=1 test_assign

* work

* rangeify=0 asserts this ast

* remove that

* beta test, it's correct though

* skip multi

* matches torch/np output

* memcopy without memcopy

* can remove this

* rangeify isn't silently wrong anymore

* diff cleanup

* use UOp toposort instead of global tags

* actual assert TestRangeifyAssign

* step

* work

* this isn't optimizing away now

* some todos

* test fusion schedule

* typo

* dedup idxs

* cleaner

* pre

* work

* diff
2025-09-29 07:27:57 +03:00
Sieds Lykles
ed90de6583 Revert "Bufferize early, fix "children not making progress" on big graphs (#1…" (#12318)
This reverts commit 6f1cf717de.
2025-09-28 19:10:21 +02:00
Sieds Lykles
6f1cf717de Bufferize early, fix "children not making progress" on big graphs (#12308)
* bufferize children early

* cleaner

* fix types

* lower number of reduceops

* test openpilot
2025-09-27 04:17:15 +02:00
qazal
8b2e0930d7 rangeify: enable passing multi test (#12301) 2025-09-26 08:31:13 +03:00
Sieds Lykles
74411984fc Rangeify IMAGE (#12304)
* add imagedtype to rangeify

* enable some image tests

* move the tests

* image upcast before locals

* add if statement

* rangeify image_dtype test

* decrease read_image count
2025-09-26 07:21:02 +02:00
chenyu
17cec8d645 RANGEIFY winograd test (#12297)
speed seems fine
2025-09-24 23:42:32 -04:00
qazal
38ecefaacb RANGEIFY=1 allreduce (#12260)
* ci

* extract mops

* work

* assert early

* port this?

* can realize shard

* allreduce passing

* notes

* better handling of shard

* err

* outerworld allreduce twice

* work

* don't tag movement ops

* don't tag movement ops

* delete old logic

* 19 failing + ram

* cleanup

* reset stuff

* simplest failing test

* diff

* test_ones

* allreduce work

* allreduce more work

* down to 22 failing tests

* port _device_num

* replace creates a new UOp here

* pour symbolic everywhere

* 7 failing

* focus on allreduce

* work

* cleanup

* more ci

* fix test_schedule_ring

* post index const shape

* much better

* diff cleanup
2025-09-24 18:13:08 +03:00
qazal
1400ce105f rangeify: fix sharding (#12288) 2025-09-24 14:33:56 +03:00
qazal
154c865966 rangeify: fix ram usage in multi (#12286) 2025-09-24 13:48:58 +03:00
qazal
ad7c8c21ea rangeify: INDEX doesn't passthrough MSELECT (#12279) 2025-09-23 21:36:50 +03:00