Commit Graph

12330 Commits

Author SHA1 Message Date
chenyu
07d145debd compile3 0.10.1 driving_vision in mac pytest (#14911)
* compile3 0.10.1 driving_vision in mac pytest

* sync before re-executing onetime kernels
2026-02-20 12:23:52 -05:00
chenyu
d895713116 remove temp onnx migration CI job (#14910) 2026-02-20 11:38:44 -05:00
George Hotz
2611907afb start ripping out old scheduler -- no maps (#14909)
* start ripping out old scheduler -- no maps

* no more metadata
2026-02-20 21:05:04 +08:00
nimlgen
1b3b94a72a fix mockam mypy (#14908) 2026-02-20 15:15:05 +03:00
George Hotz
55d3a5def9 preallocate all realized buffers (#14823)
* preallocate all realized buffers

* contiguous

* work

* comment that out

* move to schedule

* better

* correct fix

* just buffer

* disk bufs

* fixes disk tensor stuff

* fix symbolic stuff

* fix multi

* 162 failures

* bugfixes

* don't check that anymore

* fix schedule tests

* mnist should be contiguious

* type and buffer

* fix tests

* shrink axis correction

* mypy fixes

* tests skips

* same 37 failures

* dedup

* no shrink in the graph

* 29 failures

* skips

* fix custom kernel

* fix training

* those optimizations aren't supported currently

* simpler

* more correct

* tests

* 14 failures

* works

* fix that test

* broken

* 11 failures

* only kernel counts left

* fixes

* all tests pass

* remove tensor_map

* op test

* 200 -> 230

* test fixes

* fixes

* revert test_tiny thing

* guard

* revert that

* test tiny passes

* no contigs there

* base realize back

* Revert "no contigs there"

This reverts commit c45bb9fcfd.

* revert that

* chop many assigns

* 12 failures

* fix tests

* tests

* apply after

* pre-commit

* remove old code

* delete that

* fix types

* remove extra contig

* fix dataloader

* torch fix

* disk fix

* update kernel fusion numbres

* runs on amd

* restore kernel count

* add that rule back

* that

* disable that

* wrong

* add the correct rule for that folding

* more tests

* guard c1.arg

* no newlines

* realize those

* split into a different file

* remove detach/contig back

* skip 2

* update that
2026-02-20 20:05:54 +08:00
nimlgen
dbf894215a init mockam (#14889)
* mockam

* more tests

* linter

* x
2026-02-20 14:09:11 +03:00
wozeparrot
4b9825c829 make optim _step return update (#14906) 2026-02-20 02:43:56 -08:00
George Hotz
6610255654 add the correct rule for gcd div/mod folding (#14905)
* add the correct rule for that folding

* more tests

* guard c1.arg
2026-02-20 18:11:54 +08:00
George Hotz
a28fc2fba7 hotfix: remove wrong symbolic rule 2026-02-20 17:09:18 +08:00
qazal
28451a5957 viz/sqtt: rdna4 wmma, cleanup inst rows (#14904)
* valu wmma

* viz/sqtt: rdna4 wmma, cleanup inst rows
2026-02-20 17:02:09 +09:00
qazal
16ae96fa58 finish rdna4 sqtt (#14903)
* unskip

* it's a wave pair in rdna4

* work

* that

* hidden archive

* generic s_delay, mystery InstOpRDNA4.UNK_60

* branch failing test

* UNK_60 is OTHER_VMEM_STORE

* rdna4 has both s_delay_alu and s_wait_alu

* real branch failing test

* rdna4 doesn't have JUMP_NO, it's NEXT with a flag for no jump

* make inst_delay skips recursive

* all rdna4 tests pass

* simm16 unwraps

* that has a name
2026-02-20 16:06:13 +09:00
qazal
52b51a0324 test fixes from rdna4 sqtt (#14902) 2026-02-20 14:42:33 +09:00
qazal
32f569b573 viz/sqtt: decoder fixes pre rdna4/cdna4 work (#14900)
* viz/sqtt: decoder fixes pre rdna4/cdna4 work

* fix

* branch_inst + more tests

* smaller
2026-02-20 12:10:15 +09:00
qazal
e9ae3da711 viz: click on CALL node goes to codegen (#14609)
* viz: click on CALL node goes to codegen

* colored name
2026-02-20 11:13:11 +09:00
George Hotz
fc5677c28b resnet dataloader + more test cleanups (#14899)
* resnet dataloader

* tests
2026-02-20 10:05:47 +08:00
chenyu
b9744ab62b one more test_gpudims test (#14898)
failure from the bad simplification attempt
2026-02-19 18:18:44 -05:00
chenyu
9d6cf00be2 fix gpudim bug and test_split_2d_to_3d (#14896) 2026-02-19 16:46:24 -05:00
chenyu
2b31823ef9 update test_gpudims to prove bijectivity (#14895)
* update test_gpudims to prove bijectivity

* one more
2026-02-19 16:18:59 -05:00
chenyu
19ce7a3f7f use z3 to verify gpudims output index (#14894)
found a bug with z3
2026-02-19 15:24:38 -05:00
chenyu
52f727738b move test_grouped_dims to test/null (#14893)
it's a pure helper
2026-02-19 14:50:53 -05:00
chenyu
af997c1ea5 use .expr to access variable expr instead of arg[0] [pr] (#14892)
only apply when it's more readable
2026-02-19 12:24:36 -05:00
chenyu
7400362a86 remove UOp.vars [pr] (#14891) 2026-02-19 12:09:39 -05:00
chenyu
f54a49e733 restructure alu_multi [pr] (#14888) 2026-02-19 11:11:49 -05:00
chenyu
06ef8a26b7 add a test case that triggers CALL passthrough_multi (#14887) 2026-02-19 10:45:40 -05:00
nimlgen
071403f9a1 system: use MAP_FIXED_NOREPLACE (#14884) 2026-02-19 18:32:50 +03:00
nimlgen
041dc0cf85 fix typos (#14886) 2026-02-19 17:37:15 +03:00
Kartik Vashishta
9a9c7648e9 system: fix pci_scan_bus vendor filter (#14885)
* system: fix pci_scan_bus vendor filter

* fix: formatting
2026-02-19 17:23:32 +03:00
chenyu
877a5d4c45 improve types and simplify allgather in multi [pr] (#14878) 2026-02-19 09:02:15 -05:00
wozeparrot
9317e96881 fa: explicitly pass shapes (#14857) 2026-02-19 05:26:16 -08:00
George Hotz
f6c1cf343c new symbolic rule from prealloc_bufs (#14883)
* new symbolic rule from prealloc_bufs

* optim
2026-02-19 20:57:30 +08:00
qazal
658c32864a viz: show event number in track line (#14882) 2026-02-19 20:58:37 +09:00
qazal
911399bee5 assembly/amd: move the kernel capture stuff out of helpers (#14881) 2026-02-19 16:28:48 +09:00
qazal
1f34ba4511 viz: remove global amd targets mapping (#14879)
* viz: remove global amd targets mapping

* rename to amd_counters and nv_counters

* diff
2026-02-19 15:31:12 +09:00
George Hotz
2f0f8b5776 more test relaxations from prealloc_bufs (#14880) 2026-02-19 14:23:28 +08:00
qazal
5bc65ec669 applied_opts/estimates in program spec are aliases for the sink arg (#14860)
* remove applied_opts from programspec

* comment that out

* placement

* update tests

* p.ast.arg

* remove todo comment

* maybe this too

* it can exist as an alias, also for estimates
2026-02-19 13:08:26 +09:00
chenyu
8d8da185ec minor handle_allreduce cleanup [pr] (#14876)
no more lbs, also use a divmod
2026-02-18 22:53:28 -05:00
Christopher Milan
b5588d341b uop_given_valid fixes many gated reads for IMAGE=1 (#14877)
* add replay script

* pkl is arg

* that needs uop_given_valid

* cleanup
2026-02-18 22:49:47 -05:00
George Hotz
ab61c16730 fixes and test relaxations from prealloc_bufs (#14875)
* fixes and test relaxations from prealloc_bufs

* fix error type and guard _mop

* revert that

* contiguous makes extra/torch_backend/test_kernel_fusion.py fail
2026-02-19 11:37:25 +08:00
chenyu
0c85b93938 support shink sharded and non-sharded axes (#14874)
simpler to just support it
2026-02-18 20:54:10 -05:00
chenyu
e8252e6e4f use offical gguf in test (#14872)
also deleted bad test_load_sample_mxfp4, added some hard coded simple tests
2026-02-18 19:55:09 -05:00
chenyu
8c830c5b44 test_full_like_shrink_on_shard_axis (#14870)
* test_full_like_shrink_on_shard_axis

add a test case that triggers non-copy branch in mstack_early_shrink

* 0
2026-02-18 19:23:44 -05:00
Ananta Ranganathan
4005e9db6d Mxfp4 fix (#14866)
* double e2m1 values for mxfp4

* check if assert equal works in ci

* Revert "check if assert equal works in ci"

This reverts commit 8cf902ce0d.

* remove unnecessary whitespace change

* add test case that fails for old implementation but passes for new

* add note that the previous test is bad

* clarification on the methodology for the test

* fix the indent problem that happened to skip this test

* for now update mxfp4 block test to similarly use allclose (bad)

* add gist link and clearer explanation of process for computing test data
2026-02-18 18:50:59 -05:00
chenyu
0e4cf21a75 remove handle_allreduce_multirank and group_id [pr] (#14869)
leftovers from ops_remote
2026-02-18 16:13:54 -05:00
chenyu
f771de6738 gc.collect() to get the correct GlobalCounters.mem_used in tests (#14868)
test can be flaky if gc happens in between
2026-02-18 15:01:23 -05:00
chenyu
f84a11bb9f delete uneven shard tests and mentions (#14867) 2026-02-18 14:10:33 -05:00
nimlgen
1c8c17a593 am: aca (#14861) 2026-02-18 21:40:09 +03:00
chenyu
b3cdb61067 clean up expand_multi [pr] (#14865)
remove dead assert, also make it more like a view
2026-02-18 12:21:13 -05:00
chenyu
0260406f49 simplify reshape_multi [pr] (#14864) 2026-02-18 11:46:26 -05:00
chenyu
5746a605ce UOp.axis raises for invalid reshape (#14863)
reshape is lazy now, so better to raise from the .axis call and not have caller to handle invalid case
2026-02-18 11:28:56 -05:00
nimlgen
3b95fa0ed4 am_smi: enable mem usage back (#14858) 2026-02-18 19:27:27 +03:00