Commit Graph

12093 Commits

Author SHA1 Message Date
qazal
b10802eb53 use existing VIZ ContextVar instead of getenv (#14610) 2026-02-08 15:37:55 +09:00
chenyu
510b65489e style change rangeify assign [pr] (#14616)
consistent naming, also a standalone fucntion to replace complicated lambda
2026-02-07 15:47:32 -05:00
chenyu
b7afd4471c use arg instead of 3rd op for ASSIGN [pr] (#14613) 2026-02-07 14:17:10 -05:00
nimlgen
88c3022223 amd: kfd iface early exit (#14612)
* amd: kfd iface early exit

* l

* revert
2026-02-07 18:57:10 +03:00
nimlgen
ce7bfc6ce8 nv: use nv_flags for all fields (#14607) 2026-02-07 15:01:38 +03:00
qazal
c2544e2252 viz: remove outdated comment (#14608) 2026-02-07 20:05:24 +09:00
nimlgen
6838b35cff mockgpu: hevc (#14606)
* mockgpu: hevc

* eng
2026-02-07 12:27:55 +03:00
chenyu
884592f6c8 pin z3-solver version (#14605)
found exact input that crashes z3 4.15.4
2026-02-06 22:49:31 -05:00
George Hotz
7a2a3b5c71 Remove Ops.KERNEL, it's all Ops.CALL now (#14603) 2026-02-07 10:21:54 +08:00
George Hotz
ca6604eae2 kernel is call (#14577)
* call is kernel

* closer

* fix bugs

* dedup

* pm_gate_kernel_sink

* better

* Revert "better"

This reverts commit b4c799b810.

* Reapply "better"

This reverts commit e53f094ce7.

* cleanups

* work

* remove junk

* subtle fix

* index

* viz cleanups

* disable assert for now
2026-02-07 10:10:14 +08:00
wozeparrot
d87ae1c84c feat: tinyfs load test in benchmark (#14602) 2026-02-06 18:00:00 -08:00
ttomsa
462b455562 cleanup linearize (#14523) 2026-02-07 08:54:02 +08:00
ttomsa
d5652e4da2 new dtype aliases (#14596) 2026-02-07 08:53:35 +08:00
Christopher Milan
ad9e2f0de7 decompose bf16 (#14601) 2026-02-06 19:24:09 -05:00
Christopher Milan
7bb45e7df0 decompose fp8 to bigger floats [skip_process_replay] (#14554)
* decompose fp8 also

* it works

* cleanup

* no shift required

* default to float

* cleanup

* fixes

* fp8e5m2

* don't rely on behavior comparing nans

* cleanup
2026-02-06 19:05:40 -05:00
chenyu
81f6cdb4ab delete realize_assign [pr] (#14575)
use realize and realize_srcs like COPY and STORE. src[0] always has BUFFER for base
2026-02-06 17:12:33 -05:00
chenyu
7d193a6e26 fix wgsl bitcast (#14600)
was wrong for signed int
2026-02-06 16:57:36 -05:00
chenyu
b9fe8b7591 fix opt in process replay [pr] (#14599) 2026-02-06 16:49:56 -05:00
chenyu
197ebcbbbc log seed with flush=True in fuzz_symbolic (#14597)
* log seed with flush=True in fuzz_symbolic

i think z3 can crash. added reading seed from argv to see if we repro later

* fuzz_symbolic_symbolic_div
2026-02-06 15:03:57 -05:00
nimlgen
fbb67a3f95 am_smi: fix after regen (#14594) 2026-02-06 20:57:41 +03:00
qazal
a80fb4e641 viz: better ordering of device engines in profiler (#14590) 2026-02-06 23:08:09 +09:00
qazal
b7e3fbe07e llama: add VIZ=-1 to dev_run (#14583)
* llama: add VIZ=-1 to dev_run

* readme

* cleaner

* add profile.sh script

* better grouping of options

* add other row

* readme edits

* work
2026-02-06 22:59:22 +09:00
nimlgen
fbeb978170 diff devices for sdma (#14589)
* start

* x

* fix

* sdma

* c

* clean

* x

* hm

* cleaer
2026-02-06 16:39:12 +03:00
George Hotz
7cb996e153 bottom up earliest rewrites (#14587)
* better

* bottom up earliest rewrites

* fix
2026-02-06 18:13:07 +08:00
George Hotz
03af2404e2 small changes and test fixes from kernel is call (#14586) 2026-02-06 17:08:33 +08:00
George Hotz
3c26ce29b2 make disk tensor tests process safe (#14584) 2026-02-06 15:39:55 +08:00
qazal
cf73d7e2a7 hotfix: disable slower asm gemm shape from llama seqlen 8192 (#14582) 2026-02-06 15:05:19 +09:00
qazal
be77873974 llama: contig backward for wk / wv matmul backward (#14581) 2026-02-06 14:54:00 +09:00
chenyu
15d3344d9e use int inputs in test_assign (#14580)
int is less flaky
2026-02-06 00:07:31 -05:00
qazal
50a166a5fa viz: cleanup amdgpu target mapping (#14579)
* viz: cleanup amdgpu target mapping

* linter

* unwraps
2026-02-06 13:51:51 +09:00
chenyu
b09dc646f5 revert some late_buffer_view change (#14578)
revert #14478 which breaks tinyfs
2026-02-05 22:51:40 -05:00
chenyu
d41836f135 remove KERNEL special case in realize_assign [pr] (#14573) 2026-02-05 21:55:44 -05:00
George Hotz
6cbcf98627 KernelInfo is required on get_program (#14571)
* rangeify always adds KernelInfo

* fix tests

* skip flaky test
2026-02-06 10:49:27 +08:00
George Hotz
28c56a783c add CallInfo and viz call toggle (#14570) 2026-02-06 09:30:58 +08:00
wozeparrot
f73468d516 fa: block skipping for fa kv bwd (#14569) 2026-02-05 16:13:53 -08:00
chenyu
b7ef775677 more cleanup in create_schedule [pr] (#14566)
fixed wrong comments and simplified queue building
2026-02-05 16:12:17 -05:00
Garret Castro
cee7ef7ab2 disable threads (#14555) 2026-02-05 16:11:32 -05:00
chenyu
79b7799dba clean up linearize schedule [pr] (#14565)
* clean up linearize schedule [pr]

don't mix ScheduleItem and UOp in schedule queue

* ok
2026-02-05 15:24:09 -05:00
chenyu
41a179f542 fix test_xlm_roberta_large (#14564)
onnxruntime does not allow symlink that's outside model dir. update snapshot_download to use local_dir instead of cache_dir. some ad hoc migration step to copy the existing model too
2026-02-05 14:56:06 -05:00
Christopher Milan
aa9dc50577 dtype decomps don't require bitshifts (#14542)
* dtype decomps don't require bitshifts

* simplify shr/shl

* ruff
2026-02-05 14:42:30 -05:00
Christopher Milan
b47397ab17 list ml_dtypes as dependency for DSP (#14562)
* pin onnxruntime to 1.23.2 for DSP

* list ml_dtypes instead

This reverts commit 84bb2cc0fc.
2026-02-05 14:27:50 -05:00
chenyu
2b47a9a1b5 skip test_xlm_roberta_large (#14563)
symlink model not allowed in latest onnxruntime
2026-02-05 14:00:24 -05:00
chenyu
42c18da88a add Ops asserts in toposort sched_sink [pr] (#14561)
more explicit
2026-02-05 12:40:02 -05:00
nimlgen
483bba4f05 nv: use prof_exec_counter (#14559) 2026-02-05 19:00:14 +03:00
qazal
190042358f llama: faster bf16 matmul / rope backward (#14558) 2026-02-05 23:57:25 +09:00
George Hotz
b398335f62 assembly/amd: fix saturation in python remu (#14557)
* PYTHONREMU: failing test for V_SUB_NC_U32_E64 clamp

* fix saturation in PYTHON_REMU

* simpler

* more tests, less lines

---------

Co-authored-by: Christopher Milan <chrismilan@ucla.edu>
2026-02-05 18:35:57 +08:00
wozeparrot
c1ea6687e5 fa: simpler is faster (#14548) 2026-02-05 01:13:17 -08:00
George Hotz
43e7eda4e7 grad_b uses custom gemm (#14550)
* grad_b uses custom gemm

* fix multi backward, acc is in float32

* test_gemm_batched

* square gemm

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
Co-authored-by: qazal <qazal.software@gmail.com>
2026-02-05 15:22:27 +09:00
qazal
f9cfb64cd9 test asm_gemm in CI (#14551)
* test asm_gemm in CI

* default float16

* use a smaller shape for multi

* smaller size

* smaller for CI

* smaller for ci

* need half
2026-02-05 13:32:22 +09:00
chenyu
c0ca7f9c51 use more UOp.sum and UOp.prod [pr] (#14549) 2026-02-04 22:05:20 -05:00