qazal
16ae96fa58
finish rdna4 sqtt ( #14903 )
...
* unskip
* it's a wave pair in rdna4
* work
* that
* hidden archive
* generic s_delay, mystery InstOpRDNA4.UNK_60
* branch failing test
* UNK_60 is OTHER_VMEM_STORE
* rdna4 has both s_delay_alu and s_wait_alu
* real branch failing test
* rdna4 doesn't have JUMP_NO, it's NEXT with a flag for no jump
* make inst_delay skips recursive
* all rdna4 tests pass
* simm16 unwraps
* that has a name
2026-02-20 16:06:13 +09:00
qazal
52b51a0324
test fixes from rdna4 sqtt ( #14902 )
2026-02-20 14:42:33 +09:00
qazal
32f569b573
viz/sqtt: decoder fixes pre rdna4/cdna4 work ( #14900 )
...
* viz/sqtt: decoder fixes pre rdna4/cdna4 work
* fix
* branch_inst + more tests
* smaller
2026-02-20 12:10:15 +09:00
qazal
e9ae3da711
viz: click on CALL node goes to codegen ( #14609 )
...
* viz: click on CALL node goes to codegen
* colored name
2026-02-20 11:13:11 +09:00
George Hotz
fc5677c28b
resnet dataloader + more test cleanups ( #14899 )
...
* resnet dataloader
* tests
2026-02-20 10:05:47 +08:00
chenyu
b9744ab62b
one more test_gpudims test ( #14898 )
...
failure from the bad simplification attempt
2026-02-19 18:18:44 -05:00
chenyu
9d6cf00be2
fix gpudim bug and test_split_2d_to_3d ( #14896 )
2026-02-19 16:46:24 -05:00
chenyu
2b31823ef9
update test_gpudims to prove bijectivity ( #14895 )
...
* update test_gpudims to prove bijectivity
* one more
2026-02-19 16:18:59 -05:00
chenyu
19ce7a3f7f
use z3 to verify gpudims output index ( #14894 )
...
found a bug with z3
2026-02-19 15:24:38 -05:00
chenyu
52f727738b
move test_grouped_dims to test/null ( #14893 )
...
it's a pure helper
2026-02-19 14:50:53 -05:00
chenyu
af997c1ea5
use .expr to access variable expr instead of arg[0] [pr] ( #14892 )
...
only apply when it's more readable
2026-02-19 12:24:36 -05:00
chenyu
7400362a86
remove UOp.vars [pr] ( #14891 )
2026-02-19 12:09:39 -05:00
chenyu
f54a49e733
restructure alu_multi [pr] ( #14888 )
2026-02-19 11:11:49 -05:00
chenyu
06ef8a26b7
add a test case that triggers CALL passthrough_multi ( #14887 )
2026-02-19 10:45:40 -05:00
nimlgen
071403f9a1
system: use MAP_FIXED_NOREPLACE ( #14884 )
2026-02-19 18:32:50 +03:00
nimlgen
041dc0cf85
fix typos ( #14886 )
2026-02-19 17:37:15 +03:00
Kartik Vashishta
9a9c7648e9
system: fix pci_scan_bus vendor filter ( #14885 )
...
* system: fix pci_scan_bus vendor filter
* fix: formatting
2026-02-19 17:23:32 +03:00
chenyu
877a5d4c45
improve types and simplify allgather in multi [pr] ( #14878 )
2026-02-19 09:02:15 -05:00
wozeparrot
9317e96881
fa: explicitly pass shapes ( #14857 )
2026-02-19 05:26:16 -08:00
George Hotz
f6c1cf343c
new symbolic rule from prealloc_bufs ( #14883 )
...
* new symbolic rule from prealloc_bufs
* optim
2026-02-19 20:57:30 +08:00
qazal
658c32864a
viz: show event number in track line ( #14882 )
2026-02-19 20:58:37 +09:00
qazal
911399bee5
assembly/amd: move the kernel capture stuff out of helpers ( #14881 )
2026-02-19 16:28:48 +09:00
qazal
1f34ba4511
viz: remove global amd targets mapping ( #14879 )
...
* viz: remove global amd targets mapping
* rename to amd_counters and nv_counters
* diff
2026-02-19 15:31:12 +09:00
George Hotz
2f0f8b5776
more test relaxations from prealloc_bufs ( #14880 )
2026-02-19 14:23:28 +08:00
qazal
5bc65ec669
applied_opts/estimates in program spec are aliases for the sink arg ( #14860 )
...
* remove applied_opts from programspec
* comment that out
* placement
* update tests
* p.ast.arg
* remove todo comment
* maybe this too
* it can exist as an alias, also for estimates
2026-02-19 13:08:26 +09:00
chenyu
8d8da185ec
minor handle_allreduce cleanup [pr] ( #14876 )
...
no more lbs, also use a divmod
2026-02-18 22:53:28 -05:00
Christopher Milan
b5588d341b
uop_given_valid fixes many gated reads for IMAGE=1 ( #14877 )
...
* add replay script
* pkl is arg
* that needs uop_given_valid
* cleanup
2026-02-18 22:49:47 -05:00
George Hotz
ab61c16730
fixes and test relaxations from prealloc_bufs ( #14875 )
...
* fixes and test relaxations from prealloc_bufs
* fix error type and guard _mop
* revert that
* contiguous makes extra/torch_backend/test_kernel_fusion.py fail
2026-02-19 11:37:25 +08:00
chenyu
0c85b93938
support shink sharded and non-sharded axes ( #14874 )
...
simpler to just support it
2026-02-18 20:54:10 -05:00
chenyu
e8252e6e4f
use offical gguf in test ( #14872 )
...
also deleted bad test_load_sample_mxfp4, added some hard coded simple tests
2026-02-18 19:55:09 -05:00
chenyu
8c830c5b44
test_full_like_shrink_on_shard_axis ( #14870 )
...
* test_full_like_shrink_on_shard_axis
add a test case that triggers non-copy branch in mstack_early_shrink
* 0
2026-02-18 19:23:44 -05:00
Ananta Ranganathan
4005e9db6d
Mxfp4 fix ( #14866 )
...
* double e2m1 values for mxfp4
* check if assert equal works in ci
* Revert "check if assert equal works in ci"
This reverts commit 8cf902ce0d .
* remove unnecessary whitespace change
* add test case that fails for old implementation but passes for new
* add note that the previous test is bad
* clarification on the methodology for the test
* fix the indent problem that happened to skip this test
* for now update mxfp4 block test to similarly use allclose (bad)
* add gist link and clearer explanation of process for computing test data
2026-02-18 18:50:59 -05:00
chenyu
0e4cf21a75
remove handle_allreduce_multirank and group_id [pr] ( #14869 )
...
leftovers from ops_remote
2026-02-18 16:13:54 -05:00
chenyu
f771de6738
gc.collect() to get the correct GlobalCounters.mem_used in tests ( #14868 )
...
test can be flaky if gc happens in between
2026-02-18 15:01:23 -05:00
chenyu
f84a11bb9f
delete uneven shard tests and mentions ( #14867 )
2026-02-18 14:10:33 -05:00
nimlgen
1c8c17a593
am: aca ( #14861 )
2026-02-18 21:40:09 +03:00
chenyu
b3cdb61067
clean up expand_multi [pr] ( #14865 )
...
remove dead assert, also make it more like a view
2026-02-18 12:21:13 -05:00
chenyu
0260406f49
simplify reshape_multi [pr] ( #14864 )
2026-02-18 11:46:26 -05:00
chenyu
5746a605ce
UOp.axis raises for invalid reshape ( #14863 )
...
reshape is lazy now, so better to raise from the .axis call and not have caller to handle invalid case
2026-02-18 11:28:56 -05:00
nimlgen
3b95fa0ed4
am_smi: enable mem usage back ( #14858 )
2026-02-18 19:27:27 +03:00
qazal
a212881130
viz: second profiler link goes to source code ( #14855 )
2026-02-18 19:40:34 +09:00
qazal
b0110c4469
viz: simplify shape clicking ( #14853 )
...
* setFocus is the more clear name
* do less
2026-02-18 19:03:26 +09:00
George Hotz
af839b2bd1
remove all the outerworld stuff, it was too complex ( #14852 )
2026-02-18 17:44:11 +08:00
wozeparrot
6d301ad2c4
feat: llama wqkv ( #14841 )
2026-02-17 23:01:33 -08:00
qazal
a3d516c4b5
viz: start displaying pma ( #14848 )
...
* viz: start displaying pma
* s
* work
* colors
* cleaner
* max packets
* fine
* work
* pma
* diff cleanup
2026-02-18 14:22:32 +09:00
George Hotz
d5636fba90
assign after copy shouldn't contig ( #14847 )
...
* assign after copy shouldn't contig
* fix assign copy
2026-02-18 12:23:49 +08:00
George Hotz
ab55e8c6b9
assign should be used as output buffer ( #14845 )
...
* assign should be used as buffer
* late removed
* the fix
* better fix
* backward slice
2026-02-18 09:37:46 +08:00
chenyu
e3c120c8e1
exclude 100 in test_assign_add ( #14846 )
...
this can crash, not sure why. skip 100 to see if it's better
2026-02-17 19:12:47 -05:00
Christopher Milan
7641ed61af
remove doublecast in IMAGE=1 ( #14839 )
2026-02-17 18:22:14 -05:00
Christopher Milan
5b11519d5e
LLVM actually supports ops ( #14843 )
...
LLVM should support eg, SHL/SHR, but this was never actually rendered
2026-02-17 18:21:33 -05:00