1490 Commits

Author SHA1 Message Date
chenyu
c714881832 don't allow jit input to be const (#14045)
* don't allow jit input to be unbuffered like const

* just const to fix multi

* fix rnnt
2026-01-06 18:15:22 -05:00
wozeparrot
2b3e01e79c tk: support sliced local -> reg load (#14034) 2026-01-06 05:33:24 -05:00
George Hotz
45f7fd073d assembly/amd: pcode bug fixes (#14032)
* bring over pcode parser

* fixes

* pdf test

* delay alu
2026-01-06 00:15:48 -08:00
wozeparrot
21d0f6bb76 tk: flat global -> local load (#14033) 2026-01-05 23:35:53 -08:00
George Hotz
20653d2996 assembly/amd: make pdf.py code shine (#14029)
* assembly/amd: make pdf.py code shine

* no merge

* pdf2 is the future

* something

* regen enums

* test

* work

* remove junk

* write

* pcode extraction

* pdf2 passes all tests

* simplify

* simpler pdf

* late filter

* remove hacks

* simplify pdf2.py

* field type

* remove defaults

* don't export srcenum

* simple pdf.py

* simpler

* cleaner

* less hack in PDF
2026-01-05 18:49:40 -08:00
qazal
ea7b149ca5 viz command line tool (#14030) 2026-01-06 10:19:47 +09:00
nimlgen
70405b4f3c am_smi: mi350 (#14018) 2026-01-05 13:10:56 +03:00
George Hotz
404eed6172 assembly/amd: improve tests for asm (#14007)
* assembly/amd: improve tests for asm

* upd

* skip

* tests

* re bug

* more passing

* cleanups

* cdna fixups

* improve tests, better CDNA parsing

* fix CI

* no defs

* simpler

* all pass

* from pdf

* regen
2026-01-04 15:14:08 -08:00
George Hotz
7ebda28692 assembly/amd: add CDNA support to asm (#13982)
* add CDNA support

* more cdna tests

* something

* fix more stuff

* more work

* simpler

* simplier

* cdna

* disasm

* less skip

* fixes

* simpler
2026-01-04 08:53:56 -08:00
George Hotz
34ea053b26 assembly/amd: clean up pcode, jit pcode instead of static (#14001)
* assembly/amd: clean up pcode

* regen

* lil

* jit the pcode

* sendmsg

* cleanups

* inst prefetch lol
2026-01-03 23:06:15 -08:00
George Hotz
8328511808 assembly/amd: make the emu.py code shine (#13996)
* assembly/amd: make the code shine

* lil clean

* reg back in pcode

* cleanups

* gen fma_mix

* no writelane hacks

* fn cleanup

* dead vgpr_write

* readable

* smem

* cleanup bench_emu

* speedups

* simpler and faster

* direct inst._fn

* split fxn

* Revert "simpler and faster"

This reverts commit e85f6594b3.

* move lds to wavestate

* dispatcher

* pc in dispatch

* literal isn't wavestate

* cleanups + program

* one readlane

* exec_vop3sd in exec_vop

* cleaner exec_vopd

* fully merge VOP3P

* no special paths

* no SliceProxy

* low=0

* no bigint

* failing tests

* fma on python 3.13
2026-01-03 20:33:09 -08:00
qazal
bd55507ee4 RDNA3 fp16 assembly gemm 85 TFLOPS (#13990) 2026-01-03 18:34:23 +09:00
wozeparrot
6242a9d151 tk: no global copy and clear ranges (#13988) 2026-01-02 23:45:15 -08:00
wozeparrot
9f082e8e25 fa: split kv bwd into 2 kernels (#13981) 2026-01-02 18:45:51 -08:00
qazal
2cc64d71b0 simplify mi350x gemm / viz asm tests (#13984)
* mi350x gemm cleanup

* asm tests work

* simpler asm tests
2026-01-03 11:11:07 +09:00
George Hotz
0e282025ff assembly/amd: split test_emu into hw tests (#13966)
* assmebly/amd: split test_emu into hw tests

* hw tests

* bugfixes

* more tests and fix
2026-01-02 08:04:56 -08:00
chenyu
2e2b5fed12 fix misspellings (#13976) 2026-01-02 10:37:38 -05:00
nietras
f49e4714af Fix spelling errors in README for AMD assembly (#13975) 2026-01-02 10:15:20 -05:00
qazal
5f52266225 mi350x gemm: use Tensor.custom_kernel in asm test (#13969)
* mi350x gemm: use Tensor.custom_kernel in asm test

* A @ B for baseline
2026-01-02 18:30:50 +09:00
George Hotz
5a1a561e0f assembly/amd: rdna4 autogen (#13967)
* assembly/amd: add pcode ds ops

* refactors

* fix ds op

* update autogen

* fix flat bug

* more tests

* fix emu test

* that's a hack

* generic

* fix all tests

* two tests

* fix test failure

* better

* remove __all__

* assembly/amd: fix autogen for RDNA4
2026-01-01 23:12:18 -05:00
wozeparrot
b27527f05a fix: missed inner tracked range (#13964) 2026-01-01 18:09:57 -08:00
wozeparrot
ecbac8a338 tk: fa cleanups + causal test (#13963) 2026-01-01 18:05:00 -08:00
George Hotz
dfb813b760 assembly/amd: add pcode ds ops (#13939)
* assembly/amd: add pcode ds ops

* refactors

* fix ds op

* update autogen

* fix flat bug

* more tests

* fix emu test

* that's a hack

* generic

* fix all tests

* two tests

* fix test failure

* better

* remove __all__
2026-01-01 16:24:13 -05:00
b1tg
24723327ac fix tc_up in search (#13438)
* tensor_core is missing from Scheduler

* test upcast max

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-01-01 10:25:08 -05:00
qazal
9726500de8 enable using assembly in Tensor.custom_kernel (#13895) 2026-01-02 00:12:01 +09:00
qazal
c0f52c9dcb split assembly gemm to per arch directory (#13953) 2026-01-02 00:10:22 +09:00
qazal
6a5430ab00 correct args order in mi350x gemm (#13949) 2026-01-01 23:01:46 +09:00
George Hotz
2bb07d4824 assembly/amd: move Reg out of the psuedocode (#13934)
* assembly/amd: move Reg out of the psuedocode

* remove extra

* fix pcode tests

* simpler pcode

* simpler

* simpler

* cleaner

* fix mypy
2025-12-31 15:34:51 -05:00
George Hotz
f14428090f assembly/amd: speed up emulator (#13932) 2025-12-31 13:32:25 -05:00
George Hotz
29402034a1 assembly/amd: cleanups to asm and emu (#13912)
* a bunch of cleanups

* ops are back

* bug fixes

* cleanups

* a lil simpler

* more refactors

* _disasm_vop1

* sops

* more

* continue

* more

* num_srcs

* simpler

* no _is16

* op cleanups

* isinstnace
2025-12-31 12:46:11 -05:00
George Hotz
b998a80b5d assembly/amd: split generated stuff into enum/ins (#13924) 2025-12-31 10:10:52 -05:00
qazal
b23f4517ab prep mi350x gemm for python dsl (#13918)
* start by pruning existing asm

* better branch names

* split to template and real instructions
2025-12-31 20:00:57 +09:00
qazal
3f3786ded9 mmapeak: fix compiler import (#13915) 2025-12-31 16:52:23 +09:00
George Hotz
0221b96761 assembly/amd: fix all ops tests (#13910)
* assembly/amd: fix all ops tests

* test_ops with smaller sizes

* ds store/load 2addr
2025-12-30 18:01:34 -05:00
George Hotz
efc99d0c55 assembly/amd: more refactors (#13907)
* assembly/amd: more refactors

* more refactors

* more refactors

* simpler emu

* generate.py

* regen all

* cleanups

* more

* work

* more readme

* lil
2025-12-30 16:13:24 -05:00
George Hotz
49d1bf93d6 assembly/amd: refactor asm.py to be simpler (#13900)
* assembly/amd: refactor asm.py

* assembly/amd: refactor asm.py to be simpler

* multiple fxns

* fast

* more tests pass

* regen

* stop decode
2025-12-30 13:51:40 -05:00
George Hotz
7e14cdcb06 assembly/amd: clean up clt/ctz hack (#13901)
* assembly/amd: clean up clt/ctz hack

* add breaks
2025-12-30 11:59:28 -05:00
George Hotz
69cdc8066d assembly/amd: add dtype tests to AMD IDE CI (#13899)
* add dtype tests to AMD IDE CI

* more tests

* add trig preop

* regen done

* split to amd autogen

* simpler
2025-12-30 11:09:51 -05:00
George Hotz
9c89be5235 assembly/amd: fix v_perm_b32 + PC fixes (#13897)
* assembly/amd: fix v_perm_b32

* add pc support
2025-12-30 09:25:40 -05:00
George Hotz
2b838dc1d8 assembly/amd: fix AMD_LLVM=1 support in emulator (#13881)
* fix AMD_LLVM=1 support in emulator

* more llvm with dtype

* work

* more fixes

* fix dtype
2025-12-30 09:09:57 -05:00
qazal
b557c46233 assembly gemm clean ups, instructions for cli (#13892) 2025-12-30 16:14:06 +09:00
qazal
d7e1f26e3d command line interface for sqtt viz (#13891)
* command line interface for sqtt viz

* cleanup

* api surface area

* this confuses the llms

* document
2025-12-30 12:33:21 +09:00
George Hotz
94bca91f3e assembly/amd: have asm go through the dsl (#13886)
* assembly/amd: have asm go through the dsl

* lil
2025-12-29 17:39:11 -05:00
George Hotz
7322d9ec4a assembly/amd: add new instruction support to pcode (#13885)
* assembly/amd: add new instruction support

* more

* regen all
2025-12-29 17:30:17 -05:00
George Hotz
0d326f5b9b fix missing instructions in psuedocode (#13884) 2025-12-29 16:11:22 -05:00
George Hotz
9d8397be11 add CDNA3+RDNA4 support (#13882)
* fix CI

* remove junk

* rename lib to dsl

* correct

* cleanups
2025-12-29 15:51:29 -05:00
George Hotz
81cf9ea0ab rename to extra.assembly.amd (#13879) 2025-12-29 14:10:55 -05:00
George Hotz
37f0fa11b6 rdna3 test cleanups (#13878)
* rdna3 test cleanups

* cleanups

* ugh DONT SKIP
2025-12-29 13:41:59 -05:00
George Hotz
35db73b231 add cdna4 support to parsers (#13877)
* add cdna4 support to parsers

* cdna4
2025-12-29 13:23:43 -05:00
George Hotz
ff856a74cb minor refactoring for rdna3 (#13873)
* minor refactoring for rdna3

* fix div scale stuff

* more bugfixes
2025-12-29 13:20:00 -05:00