Commit Graph

12305 Commits

Author SHA1 Message Date
nimlgen
041dc0cf85 fix typos (#14886) 2026-02-19 17:37:15 +03:00
Kartik Vashishta
9a9c7648e9 system: fix pci_scan_bus vendor filter (#14885)
* system: fix pci_scan_bus vendor filter

* fix: formatting
2026-02-19 17:23:32 +03:00
chenyu
877a5d4c45 improve types and simplify allgather in multi [pr] (#14878) 2026-02-19 09:02:15 -05:00
wozeparrot
9317e96881 fa: explicitly pass shapes (#14857) 2026-02-19 05:26:16 -08:00
George Hotz
f6c1cf343c new symbolic rule from prealloc_bufs (#14883)
* new symbolic rule from prealloc_bufs

* optim
2026-02-19 20:57:30 +08:00
qazal
658c32864a viz: show event number in track line (#14882) 2026-02-19 20:58:37 +09:00
qazal
911399bee5 assembly/amd: move the kernel capture stuff out of helpers (#14881) 2026-02-19 16:28:48 +09:00
qazal
1f34ba4511 viz: remove global amd targets mapping (#14879)
* viz: remove global amd targets mapping

* rename to amd_counters and nv_counters

* diff
2026-02-19 15:31:12 +09:00
George Hotz
2f0f8b5776 more test relaxations from prealloc_bufs (#14880) 2026-02-19 14:23:28 +08:00
qazal
5bc65ec669 applied_opts/estimates in program spec are aliases for the sink arg (#14860)
* remove applied_opts from programspec

* comment that out

* placement

* update tests

* p.ast.arg

* remove todo comment

* maybe this too

* it can exist as an alias, also for estimates
2026-02-19 13:08:26 +09:00
chenyu
8d8da185ec minor handle_allreduce cleanup [pr] (#14876)
no more lbs, also use a divmod
2026-02-18 22:53:28 -05:00
Christopher Milan
b5588d341b uop_given_valid fixes many gated reads for IMAGE=1 (#14877)
* add replay script

* pkl is arg

* that needs uop_given_valid

* cleanup
2026-02-18 22:49:47 -05:00
George Hotz
ab61c16730 fixes and test relaxations from prealloc_bufs (#14875)
* fixes and test relaxations from prealloc_bufs

* fix error type and guard _mop

* revert that

* contiguous makes extra/torch_backend/test_kernel_fusion.py fail
2026-02-19 11:37:25 +08:00
chenyu
0c85b93938 support shink sharded and non-sharded axes (#14874)
simpler to just support it
2026-02-18 20:54:10 -05:00
chenyu
e8252e6e4f use offical gguf in test (#14872)
also deleted bad test_load_sample_mxfp4, added some hard coded simple tests
2026-02-18 19:55:09 -05:00
chenyu
8c830c5b44 test_full_like_shrink_on_shard_axis (#14870)
* test_full_like_shrink_on_shard_axis

add a test case that triggers non-copy branch in mstack_early_shrink

* 0
2026-02-18 19:23:44 -05:00
Ananta Ranganathan
4005e9db6d Mxfp4 fix (#14866)
* double e2m1 values for mxfp4

* check if assert equal works in ci

* Revert "check if assert equal works in ci"

This reverts commit 8cf902ce0d.

* remove unnecessary whitespace change

* add test case that fails for old implementation but passes for new

* add note that the previous test is bad

* clarification on the methodology for the test

* fix the indent problem that happened to skip this test

* for now update mxfp4 block test to similarly use allclose (bad)

* add gist link and clearer explanation of process for computing test data
2026-02-18 18:50:59 -05:00
chenyu
0e4cf21a75 remove handle_allreduce_multirank and group_id [pr] (#14869)
leftovers from ops_remote
2026-02-18 16:13:54 -05:00
chenyu
f771de6738 gc.collect() to get the correct GlobalCounters.mem_used in tests (#14868)
test can be flaky if gc happens in between
2026-02-18 15:01:23 -05:00
chenyu
f84a11bb9f delete uneven shard tests and mentions (#14867) 2026-02-18 14:10:33 -05:00
nimlgen
1c8c17a593 am: aca (#14861) 2026-02-18 21:40:09 +03:00
chenyu
b3cdb61067 clean up expand_multi [pr] (#14865)
remove dead assert, also make it more like a view
2026-02-18 12:21:13 -05:00
chenyu
0260406f49 simplify reshape_multi [pr] (#14864) 2026-02-18 11:46:26 -05:00
chenyu
5746a605ce UOp.axis raises for invalid reshape (#14863)
reshape is lazy now, so better to raise from the .axis call and not have caller to handle invalid case
2026-02-18 11:28:56 -05:00
nimlgen
3b95fa0ed4 am_smi: enable mem usage back (#14858) 2026-02-18 19:27:27 +03:00
qazal
a212881130 viz: second profiler link goes to source code (#14855) 2026-02-18 19:40:34 +09:00
qazal
b0110c4469 viz: simplify shape clicking (#14853)
* setFocus is the more clear name

* do less
2026-02-18 19:03:26 +09:00
George Hotz
af839b2bd1 remove all the outerworld stuff, it was too complex (#14852) 2026-02-18 17:44:11 +08:00
wozeparrot
6d301ad2c4 feat: llama wqkv (#14841) 2026-02-17 23:01:33 -08:00
qazal
a3d516c4b5 viz: start displaying pma (#14848)
* viz: start displaying pma

* s

* work

* colors

* cleaner

* max packets

* fine

* work

* pma

* diff cleanup
2026-02-18 14:22:32 +09:00
George Hotz
d5636fba90 assign after copy shouldn't contig (#14847)
* assign after copy shouldn't contig

* fix assign copy
2026-02-18 12:23:49 +08:00
George Hotz
ab55e8c6b9 assign should be used as output buffer (#14845)
* assign should be used as buffer

* late removed

* the fix

* better fix

* backward slice
2026-02-18 09:37:46 +08:00
chenyu
e3c120c8e1 exclude 100 in test_assign_add (#14846)
this can crash, not sure why. skip 100 to see if it's better
2026-02-17 19:12:47 -05:00
Christopher Milan
7641ed61af remove doublecast in IMAGE=1 (#14839) 2026-02-17 18:22:14 -05:00
Christopher Milan
5b11519d5e LLVM actually supports ops (#14843)
LLVM should support eg, SHL/SHR, but this was never actually rendered
2026-02-17 18:21:33 -05:00
wozeparrot
95e97ec341 seperate llama optim (#14810) 2026-02-17 13:02:35 -08:00
chenyu
72cf603805 removed if self.buffer.is_allocated() in realized (#14836)
automatically fixes is_realized issue for empty
2026-02-17 15:35:56 -05:00
chenyu
aec8a6c85b Revert "one run_schedule for assign realize (#14835)" (#14837)
This reverts commit df7c37f611.
2026-02-17 14:34:26 -05:00
chenyu
df7c37f611 one run_schedule for assign realize (#14835)
concat schedules. separate out the execution part
2026-02-17 14:01:55 -05:00
chenyu
61867c2f35 TestRealizeIsRealized (#14834)
test after calling .realize(), uop.is_realized is True. currently not working for empty (thus disk tensor), and const
2026-02-17 13:30:35 -05:00
chenyu
f147791105 update test to reset and test kernel_count directly (#14832) 2026-02-17 11:48:46 -05:00
chenyu
9d4937ab5e remove assign test @unittest.skip("this test is crashing!") (#14831) 2026-02-17 10:30:58 -05:00
nimlgen
dda5ccf63b hcq: fix usb<->cpu mappings (#14827)
* hcq: fix usb<->cpu mappings

* non cpu

* um
2026-02-17 18:04:18 +03:00
nimlgen
801677cf12 am: GCVM_L2_PROTECTION_FAULT_STATUS prints device (#14830) 2026-02-17 18:03:52 +03:00
chenyu
f07898c68a move assign chain fix to rangeify (#14829) 2026-02-17 09:40:34 -05:00
nimlgen
a2586e4c70 nv: move reset earlier (#14824) 2026-02-17 17:25:49 +03:00
chenyu
f2f039cc0f fix chained full-buffer assign (#14828)
this shows issue that pm_remove_bufferize drops tags, will fix in bufferize next. this also fixed rand being different in jit vs no-jit
2026-02-17 09:11:04 -05:00
chenyu
58fa82eef5 stronger test_assign_add (#14826)
also test self add 10 and 100 times
2026-02-17 08:36:09 -05:00
George Hotz
ff60dab622 Revert "big sink is on base (#14819)" (#14825)
This reverts commit 5fc3d8109f.
2026-02-17 19:18:06 +08:00
qazal
f8e485ee9e nvcc/nvdisasm macos shim (#14822)
* move to backend

* and arch

* setup_nvcc_osx

* blackwell

* min test

* now getting dumb assert is_ptx

* support cubin.

* work

* remove that

* simpler
2026-02-17 20:07:05 +09:00