George Hotz
34ea053b26
assembly/amd: clean up pcode, jit pcode instead of static ( #14001 )
...
* assembly/amd: clean up pcode
* regen
* lil
* jit the pcode
* sendmsg
* cleanups
* inst prefetch lol
2026-01-03 23:06:15 -08:00
George Hotz
8328511808
assembly/amd: make the emu.py code shine ( #13996 )
...
* assembly/amd: make the code shine
* lil clean
* reg back in pcode
* cleanups
* gen fma_mix
* no writelane hacks
* fn cleanup
* dead vgpr_write
* readable
* smem
* cleanup bench_emu
* speedups
* simpler and faster
* direct inst._fn
* split fxn
* Revert "simpler and faster"
This reverts commit e85f6594b3 .
* move lds to wavestate
* dispatcher
* pc in dispatch
* literal isn't wavestate
* cleanups + program
* one readlane
* exec_vop3sd in exec_vop
* cleaner exec_vopd
* fully merge VOP3P
* no special paths
* no SliceProxy
* low=0
* no bigint
* failing tests
* fma on python 3.13
2026-01-03 20:33:09 -08:00
qazal
bd55507ee4
RDNA3 fp16 assembly gemm 85 TFLOPS ( #13990 )
2026-01-03 18:34:23 +09:00
wozeparrot
6242a9d151
tk: no global copy and clear ranges ( #13988 )
2026-01-02 23:45:15 -08:00
wozeparrot
9f082e8e25
fa: split kv bwd into 2 kernels ( #13981 )
2026-01-02 18:45:51 -08:00
qazal
2cc64d71b0
simplify mi350x gemm / viz asm tests ( #13984 )
...
* mi350x gemm cleanup
* asm tests work
* simpler asm tests
2026-01-03 11:11:07 +09:00
George Hotz
0e282025ff
assembly/amd: split test_emu into hw tests ( #13966 )
...
* assmebly/amd: split test_emu into hw tests
* hw tests
* bugfixes
* more tests and fix
2026-01-02 08:04:56 -08:00
chenyu
2e2b5fed12
fix misspellings ( #13976 )
2026-01-02 10:37:38 -05:00
nietras
f49e4714af
Fix spelling errors in README for AMD assembly ( #13975 )
2026-01-02 10:15:20 -05:00
qazal
5f52266225
mi350x gemm: use Tensor.custom_kernel in asm test ( #13969 )
...
* mi350x gemm: use Tensor.custom_kernel in asm test
* A @ B for baseline
2026-01-02 18:30:50 +09:00
George Hotz
5a1a561e0f
assembly/amd: rdna4 autogen ( #13967 )
...
* assembly/amd: add pcode ds ops
* refactors
* fix ds op
* update autogen
* fix flat bug
* more tests
* fix emu test
* that's a hack
* generic
* fix all tests
* two tests
* fix test failure
* better
* remove __all__
* assembly/amd: fix autogen for RDNA4
2026-01-01 23:12:18 -05:00
wozeparrot
b27527f05a
fix: missed inner tracked range ( #13964 )
2026-01-01 18:09:57 -08:00
wozeparrot
ecbac8a338
tk: fa cleanups + causal test ( #13963 )
2026-01-01 18:05:00 -08:00
George Hotz
dfb813b760
assembly/amd: add pcode ds ops ( #13939 )
...
* assembly/amd: add pcode ds ops
* refactors
* fix ds op
* update autogen
* fix flat bug
* more tests
* fix emu test
* that's a hack
* generic
* fix all tests
* two tests
* fix test failure
* better
* remove __all__
2026-01-01 16:24:13 -05:00
b1tg
24723327ac
fix tc_up in search ( #13438 )
...
* tensor_core is missing from Scheduler
* test upcast max
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-01-01 10:25:08 -05:00
qazal
9726500de8
enable using assembly in Tensor.custom_kernel ( #13895 )
2026-01-02 00:12:01 +09:00
qazal
c0f52c9dcb
split assembly gemm to per arch directory ( #13953 )
2026-01-02 00:10:22 +09:00
qazal
6a5430ab00
correct args order in mi350x gemm ( #13949 )
2026-01-01 23:01:46 +09:00
George Hotz
2bb07d4824
assembly/amd: move Reg out of the psuedocode ( #13934 )
...
* assembly/amd: move Reg out of the psuedocode
* remove extra
* fix pcode tests
* simpler pcode
* simpler
* simpler
* cleaner
* fix mypy
2025-12-31 15:34:51 -05:00
George Hotz
f14428090f
assembly/amd: speed up emulator ( #13932 )
2025-12-31 13:32:25 -05:00
George Hotz
29402034a1
assembly/amd: cleanups to asm and emu ( #13912 )
...
* a bunch of cleanups
* ops are back
* bug fixes
* cleanups
* a lil simpler
* more refactors
* _disasm_vop1
* sops
* more
* continue
* more
* num_srcs
* simpler
* no _is16
* op cleanups
* isinstnace
2025-12-31 12:46:11 -05:00
George Hotz
b998a80b5d
assembly/amd: split generated stuff into enum/ins ( #13924 )
2025-12-31 10:10:52 -05:00
qazal
b23f4517ab
prep mi350x gemm for python dsl ( #13918 )
...
* start by pruning existing asm
* better branch names
* split to template and real instructions
2025-12-31 20:00:57 +09:00
qazal
3f3786ded9
mmapeak: fix compiler import ( #13915 )
2025-12-31 16:52:23 +09:00
George Hotz
0221b96761
assembly/amd: fix all ops tests ( #13910 )
...
* assembly/amd: fix all ops tests
* test_ops with smaller sizes
* ds store/load 2addr
2025-12-30 18:01:34 -05:00
George Hotz
efc99d0c55
assembly/amd: more refactors ( #13907 )
...
* assembly/amd: more refactors
* more refactors
* more refactors
* simpler emu
* generate.py
* regen all
* cleanups
* more
* work
* more readme
* lil
2025-12-30 16:13:24 -05:00
George Hotz
49d1bf93d6
assembly/amd: refactor asm.py to be simpler ( #13900 )
...
* assembly/amd: refactor asm.py
* assembly/amd: refactor asm.py to be simpler
* multiple fxns
* fast
* more tests pass
* regen
* stop decode
2025-12-30 13:51:40 -05:00
George Hotz
7e14cdcb06
assembly/amd: clean up clt/ctz hack ( #13901 )
...
* assembly/amd: clean up clt/ctz hack
* add breaks
2025-12-30 11:59:28 -05:00
George Hotz
69cdc8066d
assembly/amd: add dtype tests to AMD IDE CI ( #13899 )
...
* add dtype tests to AMD IDE CI
* more tests
* add trig preop
* regen done
* split to amd autogen
* simpler
2025-12-30 11:09:51 -05:00
George Hotz
9c89be5235
assembly/amd: fix v_perm_b32 + PC fixes ( #13897 )
...
* assembly/amd: fix v_perm_b32
* add pc support
2025-12-30 09:25:40 -05:00
George Hotz
2b838dc1d8
assembly/amd: fix AMD_LLVM=1 support in emulator ( #13881 )
...
* fix AMD_LLVM=1 support in emulator
* more llvm with dtype
* work
* more fixes
* fix dtype
2025-12-30 09:09:57 -05:00
qazal
b557c46233
assembly gemm clean ups, instructions for cli ( #13892 )
2025-12-30 16:14:06 +09:00
qazal
d7e1f26e3d
command line interface for sqtt viz ( #13891 )
...
* command line interface for sqtt viz
* cleanup
* api surface area
* this confuses the llms
* document
2025-12-30 12:33:21 +09:00
George Hotz
94bca91f3e
assembly/amd: have asm go through the dsl ( #13886 )
...
* assembly/amd: have asm go through the dsl
* lil
2025-12-29 17:39:11 -05:00
George Hotz
7322d9ec4a
assembly/amd: add new instruction support to pcode ( #13885 )
...
* assembly/amd: add new instruction support
* more
* regen all
2025-12-29 17:30:17 -05:00
George Hotz
0d326f5b9b
fix missing instructions in psuedocode ( #13884 )
2025-12-29 16:11:22 -05:00
George Hotz
9d8397be11
add CDNA3+RDNA4 support ( #13882 )
...
* fix CI
* remove junk
* rename lib to dsl
* correct
* cleanups
2025-12-29 15:51:29 -05:00
George Hotz
81cf9ea0ab
rename to extra.assembly.amd ( #13879 )
2025-12-29 14:10:55 -05:00
George Hotz
37f0fa11b6
rdna3 test cleanups ( #13878 )
...
* rdna3 test cleanups
* cleanups
* ugh DONT SKIP
2025-12-29 13:41:59 -05:00
George Hotz
35db73b231
add cdna4 support to parsers ( #13877 )
...
* add cdna4 support to parsers
* cdna4
2025-12-29 13:23:43 -05:00
George Hotz
ff856a74cb
minor refactoring for rdna3 ( #13873 )
...
* minor refactoring for rdna3
* fix div scale stuff
* more bugfixes
2025-12-29 13:20:00 -05:00
George Hotz
f1471a3b99
speed up rdna3 unit tests + add to CI ( #13871 )
...
* speed up rdna3 unit tests
* add test to CI
* faster and simpler
* speedups
* bugfixes
* use helper
* fix CI maybe
* test fixes
* llvm-21 on 24.04
* upd
* llvm-21
* fix test
* bring that back
* merge gen into lib
* test generators
2025-12-29 10:26:48 -05:00
George Hotz
25ef866e89
write python emulator from RDNA3 psuedocode in pdf ( #13841 )
...
* write python emulator from RDNA3 psuedocode in pdf
* emu2
* more emu
* working
* more psueod
* progress
* cleanups
* delete junk
* delete stale files
* just emu
* work
* emu compare
* bemu
* cleanups and more failures
* revert bench emu
* fix emu cmp
* four tests fail
* bugfixes
* dsl
* ext
* refactor
* dsl
* div scale fix
* test_emu
* fix emu tests
* pcode
* test pcode
* top imports
* fix test_emu to use run_asm
* emu tests on real hardware
* more tests
* more emu tests
* more
* work
* work
* bug fix
* bugfixes
* fix fp16 gemm
* all ops tests pass in emulator
* fix llvm tests
* fix a few more tests
* fix mockgpu timeout
2025-12-29 07:39:53 -05:00
qazal
f541540129
variable N for asm gemm ( #13869 )
...
* variable N for asm gemm
* cleanup spacing
2025-12-29 19:35:50 +09:00
qazal
fc5278746f
mi350x assembly gemm cleanups ( #13867 )
2025-12-29 18:47:23 +09:00
George Hotz
f07c39cfa4
hwtest fixes for rdna3 dsl ( #13865 )
2025-12-28 20:42:29 -05:00
George Hotz
d9603c1bee
improve asm dsl syntax ( #13864 )
...
* improve asm dsl syntax
* improve asm dsl syntax
2025-12-28 20:04:59 -05:00
qazal
066d96c397
print tflops in asm gemm test ( #13859 )
...
* print tflops in asm gemm test
* change order
2025-12-29 02:26:40 +09:00
qazal
2cfbabdc34
mi350x 1tflop bf16 gemm in extra ( #13702 )
2025-12-28 21:45:42 +09:00
qazal
2180eee5e4
use the asm dsl in remu hwtest.py ( #13856 )
...
* remu hw test with the asm dsl
* simpler
* nthreads and exec mask
* cmp/cmpx
* assembler error in s_mov_b32
* vopd in dsl?
2025-12-28 11:32:41 +09:00