Commit Graph

11577 Commits

Author SHA1 Message Date
nimlgen
42abb0586c am: fix aid doorbells (#13972) 2026-01-02 15:53:13 +03:00
nimlgen
ebbaad6bfd am: enable all sdma engines (#13970) 2026-01-02 15:25:15 +03:00
qazal
5f52266225 mi350x gemm: use Tensor.custom_kernel in asm test (#13969)
* mi350x gemm: use Tensor.custom_kernel in asm test

* A @ B for baseline
2026-01-02 18:30:50 +09:00
George Hotz
5a1a561e0f assembly/amd: rdna4 autogen (#13967)
* assembly/amd: add pcode ds ops

* refactors

* fix ds op

* update autogen

* fix flat bug

* more tests

* fix emu test

* that's a hack

* generic

* fix all tests

* two tests

* fix test failure

* better

* remove __all__

* assembly/amd: fix autogen for RDNA4
2026-01-01 23:12:18 -05:00
wozeparrot
b27527f05a fix: missed inner tracked range (#13964) 2026-01-01 18:09:57 -08:00
wozeparrot
ecbac8a338 tk: fa cleanups + causal test (#13963) 2026-01-01 18:05:00 -08:00
chenyu
af0392efea only set DiskDevice.size if it opens successfully (#13962) 2026-01-01 19:33:26 -05:00
chenyu
e036d6df89 properly fix DiskDevice reuse (#13961) 2026-01-01 18:08:23 -05:00
George Hotz
dfb813b760 assembly/amd: add pcode ds ops (#13939)
* assembly/amd: add pcode ds ops

* refactors

* fix ds op

* update autogen

* fix flat bug

* more tests

* fix emu test

* that's a hack

* generic

* fix all tests

* two tests

* fix test failure

* better

* remove __all__
2026-01-01 16:24:13 -05:00
chenyu
cb7c76a3bd update test_fuzz_failure to not contruct full UOp (#13960) 2026-01-01 15:09:58 -05:00
chenyu
51398edf9c fix indirect import (#13958)
also deleted old external tests
2026-01-01 14:22:45 -05:00
chenyu
8e416df438 simpler InvalidType [pr] (#13957)
simpler singleton pattern
2026-01-01 13:55:51 -05:00
nimlgen
b8ea0d779c am: remove pipe, queue from setup_ring (#13947) 2026-01-01 21:06:41 +03:00
chenyu
4d5c4d256d update tqdm for edge case (#13956)
1.00kit/s and not 1000it/s for value 999.5
2026-01-01 11:37:26 -05:00
chenyu
ed222070f7 update xlog2 fp16 decomp to not use fp32 (#13955) 2026-01-01 11:18:29 -05:00
chenyu
ce84a23142 remove tee in benchmark (#13954) 2026-01-01 10:55:36 -05:00
b1tg
24723327ac fix tc_up in search (#13438)
* tensor_core is missing from Scheduler

* test upcast max

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-01-01 10:25:08 -05:00
qazal
9726500de8 enable using assembly in Tensor.custom_kernel (#13895) 2026-01-02 00:12:01 +09:00
qazal
c0f52c9dcb split assembly gemm to per arch directory (#13953) 2026-01-02 00:10:22 +09:00
chenyu
c69470be52 fix test_symbolic_arange_sym_step (#13952) 2026-01-01 09:41:07 -05:00
chenyu
b91b46091c delete test_tensor_uop (#13951)
old test for shape tracker. also update tests that refer shapetracker

names
2026-01-01 09:25:05 -05:00
chenyu
17ef4af72c new ceildiv that fixed symbolic conv (#13944)
* new ceildiv that fixed symbolic conv

* smaller test case
2026-01-01 09:02:41 -05:00
qazal
6a5430ab00 correct args order in mi350x gemm (#13949) 2026-01-01 23:01:46 +09:00
chenyu
baff10d32c clean up Tensor.svd slices (#13948) 2026-01-01 08:18:45 -05:00
nimlgen
1c5ed8e8b5 am: remove doorbells from setup_ring (#13946) 2026-01-01 14:39:21 +03:00
haofei
526fd4ec71 Fix SVD rank‑1 Jacobi rotation when tau == 0 (#13945) 2026-01-01 00:30:18 -05:00
haofei
20777f30b9 Fix QR/SVD NaNs on zero/orthogonal inputs (#13943) 2025-12-31 23:40:09 -05:00
chenyu
0ed58c1fcd clean up some functions in helpers [pr] (#13942) 2025-12-31 18:29:16 -05:00
chenyu
e2987001ee unify pre-commit mypy and ci mypy (#13940) 2025-12-31 17:51:51 -05:00
chenyu
8bf7c9c1d2 no-op cleanups for ptx [pr] (#13938) 2025-12-31 17:28:39 -05:00
George Hotz
2bb07d4824 assembly/amd: move Reg out of the psuedocode (#13934)
* assembly/amd: move Reg out of the psuedocode

* remove extra

* fix pcode tests

* simpler pcode

* simpler

* simpler

* cleaner

* fix mypy
2025-12-31 15:34:51 -05:00
chenyu
52acadc160 consolidate IGNORE_OOB=0 tests (#13937)
add a new unit test file and add more cases
2025-12-31 15:24:20 -05:00
chenyu
c0c1c1c8c8 remove unused validate rule (#13936) 2025-12-31 15:02:49 -05:00
chenyu
b6d08f247d assert z3_xor input type (#13933) 2025-12-31 13:37:57 -05:00
George Hotz
f14428090f assembly/amd: speed up emulator (#13932) 2025-12-31 13:32:25 -05:00
Christopher Milan
13973e4dea refactor image pitch (#13928) 2025-12-31 13:22:38 -05:00
chenyu
051fe6c8bc less toposort iteration in oob validate (#13929) 2025-12-31 13:16:34 -05:00
chenyu
a9a7b33404 IGNORE_OOB=0 in CI (#13903) 2025-12-31 12:56:59 -05:00
George Hotz
29402034a1 assembly/amd: cleanups to asm and emu (#13912)
* a bunch of cleanups

* ops are back

* bug fixes

* cleanups

* a lil simpler

* more refactors

* _disasm_vop1

* sops

* more

* continue

* more

* num_srcs

* simpler

* no _is16

* op cleanups

* isinstnace
2025-12-31 12:46:11 -05:00
chenyu
ba9aa5cd6f skip some PTX IGNORE_OOB validation (#13927) 2025-12-31 12:40:21 -05:00
chenyu
4968060ad4 fix IGNORE_OOB=0 for WEBGPU (#13926) 2025-12-31 10:41:28 -05:00
chenyu
35bd39e4ba update mypy and torch version in ci (#13925) 2025-12-31 10:29:28 -05:00
George Hotz
b998a80b5d assembly/amd: split generated stuff into enum/ins (#13924) 2025-12-31 10:10:52 -05:00
chenyu
404755bafd merge ci ruff tests and update ruff version (#13922) 2025-12-31 09:53:49 -05:00
nimlgen
25440f0f72 all2all (#13902)
* all2all

* um

* fix

* x

* um

* simler

* mypy

* fix

* t

* cmnts
2025-12-31 16:38:32 +03:00
nimlgen
f7ee644950 amd: lazy sdma queue allocation (#13920)
* ams: lazy queue

* nv

* linter

* f
2025-12-31 15:17:13 +03:00
nimlgen
b063518ea7 am: several sdmas (#13919)
* am: several sdmas

* fix
2025-12-31 14:19:22 +03:00
qazal
b23f4517ab prep mi350x gemm for python dsl (#13918)
* start by pruning existing asm

* better branch names

* split to template and real instructions
2025-12-31 20:00:57 +09:00
qazal
3f3786ded9 mmapeak: fix compiler import (#13915) 2025-12-31 16:52:23 +09:00
Christopher Milan
a14896fff2 refactor QCOM arg parsing (#13914)
* refactor QCOM arg parsing

* ruff

* mypy
2025-12-30 19:26:02 -05:00