Commit Graph

1561 Commits

Author SHA1 Message Date
qazal
bf2d9d138f viz: simplify amdgpu cfg (#14326)
* viz: replace llvm disasm with our disasm

* it starts with more code

* then it becomes less

* simpler, cdna disassembles with decimal simm16

* s_branch is upper case, add test

* simm16s and others
2026-01-25 15:21:45 +09:00
qazal
647e527a7e viz: replace llvm disasm with our disasm (#14325) 2026-01-25 13:56:56 +09:00
chenyu
7e41da1ae8 fix generate_dataset.sh (#14324)
added `set -e` so wrong pathes would fail the script, then fixed the path
2026-01-24 16:47:10 -05:00
wozeparrot
d74587f16d fa multi fix 2 (#14314) 2026-01-23 23:35:02 -08:00
qazal
807bc40931 assembly/amd: dsl and disasm cleanup (#14311)
* rdna4 inst helper

* remove dsl aliases
2026-01-24 11:36:12 +09:00
nimlgen
26220a472e no core_id (#14265)
* no core_id

* kwargs

* est

* linters

* ugh

* revert this

* deps

* glb

* should work?

* nn

* line

* fx

* ym

* z

* d

* um?

* revert

* this one?

* first half

* um p2

* all?

* um

* cleaner

* um
2026-01-23 21:30:12 +03:00
qazal
b913c910c5 assembly/amd: rdna4 passing test_roundtrip (#14300)
* test_roundtrip on different archs

* failing tests

* take RDNA4 xml changes from the emu branch

* work

* min diff to disasm flat

* test_add passes, rdna4 first

* correct vgpr field for the multi dword store stuff

* amdllvm

* recompile in roundtrip, get sources from emulator

* amdllvm, 2

* clean clean

* note, don't rely on that os.environ

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2026-01-23 21:33:53 +09:00
qazal
f3b0e42863 remove extra sqtt pickles in gfx1200 (#14302) 2026-01-23 20:13:48 +09:00
George Hotz
d116312b1a get cdna sqtt working (#14301)
* get cdna sqtt working

* cnd aprser

* wavestart/waveend

* names

* cdna

* test that
2026-01-23 18:46:15 +08:00
George Hotz
a5c4fa39d1 RDNA4 support in SQTT (#14299)
* table test

* cleanups

* dead file

* delta short

* tests

* delta test

* work

* l4 tests pass

* l0

* cnda

* print

* reverT

* wave failure

* wave failure

* test

* encs

* no l0 crap

* L4

* rdna4 sqtt

* notes

* linter
2026-01-23 16:16:45 +08:00
qazal
3b8a7bb8c9 use existing roc.py infra for sqtt tests (#14297)
* add pc, per kernel tracing

* work

* remove those imports

* min diff
2026-01-23 14:07:11 +09:00
nimlgen
8cd22df2dd amd: alive wgps (#14149)
* amd: disabled wgps

* l

* wgp

* uoops

* mockgpu

* drm

* ad this

* fi

* reg
2026-01-23 00:08:45 +03:00
qazal
d7afa02085 clean up the extra/sqtt directory (#14284)
* remove legacy test_timing stuff

* remove legacy test_pmc, update active_sqtt_parse
2026-01-22 19:10:59 +09:00
qazal
dff5f361b0 support rendering assembly kernels on the NULL backend (#14283)
* assembly custom kernels in DEV=NULL, use renderer arch

* update mmapeak

* llvm
2026-01-22 15:49:07 +09:00
qazal
dfefeddeed add tflops to cdna gemm custom kernel (#14281) 2026-01-22 12:48:28 +09:00
qazal
18f408a35a custom assembly kernel with variable tests (#14280)
* custom assembly kernel with variable tests

* different threads

* sink

* zeros like / flatten
2026-01-22 11:34:17 +09:00
wozeparrot
76a9242a66 fa: merge kv bwd into one kernel (#14277) 2026-01-21 15:24:41 -08:00
nimlgen
da1fedc3c8 working ioctls (#14272) 2026-01-21 20:29:04 +03:00
qazal
78a28227c6 assembly/amd: cdna4 mfma support (#14206) 2026-01-21 09:12:05 +09:00
George Hotz
1baefed530 assembly/amd: add hw tests from ucode branch (#14259)
* assembly/amd: add hw tests from ucode branch

* fix is per lane
2026-01-21 08:53:54 +09:00
Robbe Derks
c7fbd177d4 USBGPU: debug script for comma chestnut (#14252)
* initial debug script

* improvements
2026-01-20 18:52:25 +03:00
nimlgen
dc82856084 tbgpu: shim binary + remote apl pci dev (#14124)
* shim binary + remote pci dev

* v2

* rip out apl

* cmds

* rename

* clean

* remove

* rm gitignore

* ui

* install

* linter

* um

* cleaner

* assets

* normal install in ui

* cleaner app

* install script

* support fd mmap

* cleaner

* kill server when disconn

* rename + pcidevs

* sign

* install and reinstall

* no sip install

* will trigger update

* nv

* ugh

* this

* fix

* nv

* use nosip sign

* auto install

* remove

* mypy

* upd

* ditto

* print

* simpler

* ditto

* um

* simpler

* upd

* upd

* cleaner

* autogen

* cleaner

* move

* annotations

* server cleaner
2026-01-20 16:15:18 +03:00
qazal
4548fcc1b8 amd/sqtt: add rdna4 and cdna sqtt examples (#14251)
* amd/sqtt: add rdna4 and cdna sqtt examples

* work

* comment out rdna and cdna tests
2026-01-20 21:11:48 +09:00
qazal
2dc281b32a assembly/amd: test helpers for arch to gfx target mapping (#14250) 2026-01-20 20:35:09 +09:00
George Hotz
0243f4a0f1 clear wins from ucode branch (#14243)
* clear wins from ucode branch

* two more

* revert those
2026-01-20 15:11:09 +09:00
wozeparrot
1f89eaf790 tk: fa bert mask fix + some numerical stability improvements (#14214) 2026-01-19 19:18:07 -08:00
nimlgen
7cb7abeeb0 amd: fix scratch_wave64_lane_byte_size (#14223) 2026-01-19 15:21:39 +03:00
qazal
e27a0002c5 viz: only keep the sqtt bytes for pkts (#14203)
* viz: only keep the sqtt bytes for pkts

* better option name

* work

* renames
2026-01-18 17:04:26 +09:00
qazal
d8f87ae2f2 SQTT packets to assembly mapper (#14198)
* disasm + compare to llvm

* start inst trace

* base tests pass

* work

* work

* all kernels

* qol

* refactor

* work

* work

* wave_focus

* simple

* work

* add a lot of asserts

* focus on wave0

* correct handling of IMMEDIATE_MASK

* work

* viz work

* use the metadata infra

* better
2026-01-18 16:32:13 +09:00
George Hotz
a51e0a86db assembly/amd: clean up disasm.py + add CDNA support (#14200)
* assembly/amd: clean up disasm.py

* cleanups

* add missing encodings

* decode is pretty

* cdna

* assert on failure

* cdna roudtrip

* cdna passing

* test

* lil cleanup

* variant cleanups

* cleanups
2026-01-18 14:48:44 +09:00
George Hotz
79c1559f69 amd asm can still be simpler (#14199)
* amd asm can still be simpler

* simpler

* V_LANE_ID

* simpler

* simpler

* compact vgpr
2026-01-17 18:40:10 +09:00
George Hotz
50554115ee fix VALU_SALU / IMMED_MASK and improve amd_asm_matmul (#14196)
* fix VALU_SALU / IMMED_MASK and improve amd_asm_matmul

* immed

* wave override

* restore ALT

* advance sgprs correctly

* no helpers

* decrease to 192 VGPRs
2026-01-17 11:58:34 +09:00
wozeparrot
a879b54234 tk: fa jit fix (#14170) 2026-01-16 16:38:45 -08:00
George Hotz
8a2549d42b improve amd_asm_matmul + minor VIZ PKTS improvements (#14186)
* improve amd_asm_matmul + minor VIZ PKTS improvements

* fix waitcnt issue

* cleanups
2026-01-17 06:56:59 +09:00
George Hotz
7d1d9d4568 assembly/amd: remove IMG instruction support and asm.py (#14163)
* assembly/amd: return IMG instruction supports

* remove asm.py

* op2dsl
2026-01-17 06:21:50 +09:00
nimlgen
e855ec8ee3 tbgpu: refactor dext to support user mappings (#14177) 2026-01-16 15:55:57 +03:00
nimlgen
a0dd9d2146 tbgpu: correct com.apple.developer.driverkit.transport.pci entitlements (#14164)
* tbgpu: correct com.apple.developer.driverkit.transport.pci entitlements

* format
2026-01-15 20:56:39 +03:00
Christopher Milan
0cb024a5bb remove ctypes.Structure (#13651) 2026-01-15 05:06:22 -05:00
George Hotz
255e0573b1 assembly/amd: clean up asm/disasm (#14158)
* assembly/amd: clean up asm/disasm

* update disasm

* revert dumb stuff

* update decode

* use fmt
2026-01-15 17:45:40 +09:00
qazal
b46da603fe codegen/custom_kernel: do not attach KernelInfo to user program (#14160) 2026-01-15 14:01:48 +09:00
George Hotz
fd60626ea1 assembly/amd: refactor to use op_bits/op_regs (#14156)
* assembly/amd: refactor to use op_bits/op_regs

* remove that skip

* remove another hack

* remove another hack

* precompute mask

* more reg, less hasattr
2026-01-15 11:20:21 +09:00
George Hotz
e9ce12028e assembly/amd: amdxml cleanups, remove broken SDWA/DPP, merge in pdf.py (#14154)
* assembly/amd: amdxml cleanups, remove broken SDWA/DPP

* remove buf junk

* simplify

* simplify

* lil cleanup

* dead fixes

* strip non pcode extraction from pdf

* merge pdf.py into amdxml.py

* only amdxml
2026-01-15 09:23:19 +09:00
wozeparrot
7e5687f6a3 more fa multi fix (#14152) 2026-01-14 13:57:11 -08:00
chenyu
986e865830 fix TINY_BACKEND=1 cumsum (#14138)
* fix TINY_BACKEND=1 cumsum

old hack was wrong, need to apply contiguous on the input

* test time

* test_linalg_svd is slow
2026-01-14 09:54:49 -05:00
qazal
434dbafab5 optional Estimates in KernelInfo (#14147)
* optional Estimates in KernelInfo

* custom asm test plumbing

* s_code_end

* estimates test

* vaddr arg in global_store

* kernel desc

* Ops.DEVICE name
2026-01-14 22:55:03 +09:00
qazal
76b577ee76 viz: only SIMD name in sqtt timeline rows (#14146) 2026-01-14 20:13:27 +09:00
nimlgen
86708ccac5 hip_ioctl: dump aql (#14142) 2026-01-14 13:15:10 +03:00
wozeparrot
a92778aa0c tk: fa multi fix (#14134) 2026-01-13 17:22:15 -08:00
George Hotz
2ab18ea7e3 assembly/amd: use xml instead of pdf (#14118)
* assembly/amd: use xml instead of pdf

* use amdxml to generate info about op sizes

* fix many tests with invalid instructions

* fix info generation

* chad xml fixes many bugs

* rename to operands

* simplify

* amdxml

* bug fix
2026-01-14 10:03:37 +09:00
qazal
002ea39da7 assembly/amd: use Tensor.custom_kernel to run assembly (#14125)
* assembly/amd: use Tensor.custom_kernel to run assembly

* PRINT_ASM=1 is DEBUG=4
2026-01-14 08:29:25 +09:00