Christopher Milan
232848d086
PYTHONREMU: VOP3P integer operations with constants don't cast to fp16 ( #14546 )
...
* PYTHONREMU: VOP3P integer operations with constants don't cast to fp16
* put that back
* cleaner
* do that once
2026-02-04 20:10:59 -05:00
Christopher Milan
5338ce6b74
test S_PACK in extra/assembly/amd/test/hw ( #14537 )
...
* S_PACK_LL_B32_B16 in test/hw
* add rest of S_PACK instructions
2026-02-04 14:17:16 -05:00
chenyu
d57d24c7d4
Buffer.as_buffer -> Buffer.as_memoryview [pr] ( #14535 )
...
it casts to memoryview. also inline the as_typed_buffer checks to Tensor._data
2026-02-04 11:31:11 -05:00
Christopher Milan
ecbce5269e
PYTHONREMU properly supports S_PACK_LL_B32_B16 ( #14527 )
...
* PYTHONREMU properly supports S_PACK_LL_B32_B16
* default
2026-02-03 23:45:33 -05:00
George Hotz
dd2de4f838
rename all DEFINE_GLOBAL to PARAM ( #14511 )
2026-02-03 15:09:38 +08:00
George Hotz
6e958dbfd4
assembly/amd: add RDNA4 support to emulator ( #14341 )
...
* start new rdna4
* work
* plus works
* more pass
* rdna4
* assembly/amd: fix RDNA4 emulator for float16 and VOP3 clamp
* stale
* rev
* rr
* rdna4 emu tests
* cleanup
* cleanup
* simp
* works
* better factorizaion
* hacks
* fix mockgpu
* guard both
* cleaner
* gate
* bug fix and a few tests
* all test_tiny
2026-02-02 21:35:59 +08:00
qazal
965940dd00
sqtt: update examples after event field change ( #14493 )
...
* regen sqtt examples
* cdna
* rdna4
* packet counts for rdna3
* sqttmap work
2026-02-02 21:39:48 +09:00
George Hotz
965149a46d
assembly/amd: add ds perm instructions ( #14486 )
...
* assembly/amd: add ds perm instructions
* NO SKIP
* fix preexisting RDNA3 issues
* pcode
* assert
* asserts
* unify
* simp
* good fix
2026-02-02 16:02:00 +08:00
George Hotz
b705c9143c
assembly/amd: test more instructions ( #14365 )
...
* assembly/amd: test more instructions
* more
* passing
* revert
* no const fold
* remove junk
* cleaner
2026-01-31 12:40:22 +08:00
Christopher Milan
e575dd8275
prevent UB in long decomp and more emulated tests ( #14447 )
2026-01-30 19:38:41 -05:00
George Hotz
202b74b369
assembly/amd: continue refactors ( #14386 )
...
* simpler
* merge
* flat
* no ctx
* use the correct apis
* dup code
* write clean code
* remove bad helpers
* bits junk remove
* junk remove
* smem test
* fix tests
* correct fix + tests
* Fmt matters it seems
* wmma refactor
* a lil more
* kimi cleanups
* line
2026-01-28 17:33:03 +08:00
George Hotz
88bc5ee212
assembly/amd: rename to better names ( #14384 )
...
* assembly/amd: rename to better names
* might help fuzzing segfault
* emu2 -> emu
2026-01-28 10:00:54 +08:00
chenyu
db010a31be
IGNORE_OOB -> CHECK_OOB [pr] ( #14374 )
...
flip the meaning
2026-01-27 12:20:59 -05:00
George Hotz
e5df7e640b
fix branches in amd_asm_matmul ( #14369 )
2026-01-27 20:48:42 +08:00
George Hotz
bfc88bcfb8
assembly/amd: emu refactors + enable PYTHON_REMU by default ( #14361 )
...
* assembly/amd: start refactors
* cleanups
* those are global
* methods on ctx
* const cleanup
* range helper
* types and imports
* cleanups
* cleanups
* remove stale name
* fix emu2 types
* more typing
* more mypy
* cleanups
* fxns
* scc cleanup
* cleanups
* cleanups
* simpler parse_pcode
* laneid
* no defaults for pcode
* pcode is not optional
* cleanups
* functions cleanup
* splat
* expr_parser functions
* single tok
* invert global loops
* try_eat
* minor
* run parser on all
* no silent 0
* tests
2026-01-27 17:42:24 +08:00
George Hotz
204f51e739
assembly/amd: bug fixes for PYTHON_REMU ( #14347 )
...
* default PYTHON_REMU to 1
* mockgpu
* less size
* normal compile path
* uniqie
* more
* fix clamp
* Change PYTHON_REMU default to 0 in _try_dlopen_remu
2026-01-27 00:48:22 +08:00
George Hotz
3b43d26f10
assembly/amd: emu speed ( #14344 )
...
* assembly/amd: emu speed
* fix spec
* go
* don't do this
* simpler
* no stupid consts
* hack
* simpler
* no index
* no where
* faster linearizer
* fix spec
* no index dtype
2026-01-26 22:21:34 +08:00
George Hotz
774a454bb5
assembly/amd: fix scratch SVE ( #14340 )
...
* assembly/amd: default python REMU
* mem_used
* no lane
* sve
* remove that
* needs s_code_end in tests
2026-01-26 21:03:51 +08:00
George Hotz
be23776ba7
assembly/amd: replace pcode with ucode ( #14002 )
...
* a bunch of todos for my boy claude
* uops have types
* lil cleanups
* simpler ucode
* isNAN
* calls
* move more
* cleanup pcode_parse
* cvt functions
* fix parser bugs
* no void
* minmax
* more pcode parse
* pretty print
* transform
* comments
* move to transform
* assign/declare
* simpler norm
* single PM
* just Uops
* simpler
* more typed
* all rewrite
* less verbose
* work
* spec
* transform
* work
* simpler spec
* less spec
* bitcast
* simpler
* simp ucode
* work
* more in pcode_transform
* remove junk
* more functions
* bug
* no void assign
* load/store
* wave
* fixes
* move denorm
* move more functions
* tests
* cat is shape None
* uop syntax
* move a few more
* program_spec
* cat stuff
* assign fix clear
* unused
* nans
* fp bits
* works with simplify
* remove junk
* special
* meh
* more
* more
* update test pcode parse
* improve parser
* parse some for loops
* merge master
* dead files
* tests pass
* emu2
* better emu2
* test_plus works
* uselessly write more instructions
* use pcode
* something
* something
* bench_emu
* progress
* ds works
* work
* work
* more passing
* run compare
* bench_emu
* more pcode
* a few more
* bugfixes
* bugfix
* test fixes
* tests pass without USE_HW
* all hw tests pass
* add more hw tests
* new hw tests
* bit
* less handcode
* parse more
* consolidate pcode
* fixes
* rsrc
* lane pcode
* cleanups
* simpler
* emu bugs
* one cmp test fails
* fix decode and upd name
* fix name and test harness
* _ftz_f32
* fix denorm
* fix VOPD and use load
* fix carry bug
* no load where / just invalid
* clean
* simpler
* merge sops
* refactoring
* simplifications
* bugfixes
* new tests
* f16 sin fix
* assertion and hw tests
* cvt functions
* one more failure
* bugfixes
* bugfix + regression
* more tests
* fmac
* no manual unrolling
* ordering
* LLVM backend is a lot faster
* compile inst
* more bugs
* f16
* bugfix
* fix regression
* one clang call
* 1M inst
* scratch works
* do scratch correctly
* cleanup
* regression
* cmp
* fmamk fixes
* merge
* fix vcmpx
* unify memory
* remove unused code
* ignore oob for test
* cleanups
* fix mbs
* unify cmp
* test
* minor cleanups
* bump timeout
* fix tests
* revert the CMPLE stuff
* remove opt
* less diff
* simpler
* revert
* support multiple backends
* memset is a lot faster
* split out in bench emu
* improve timing
* timing
* cache that
* cache that
* simpler and faster
* tokenize
* binop table
* simpler
* move to parser
* tok for lambda
* refactor
* expr_parser
* delete emu2_pcode
* import cleanup
* lil
* if parse
* work
* simpler
* no v
* trig preop is faster
* durations for tests
* fix cmp bug
* sdst
* remove scartch_size hack
* null behavior
* _MXCSRContext
* bugfixes
* DEBUG >= 3
* test smem crashes my gpu
* debug
* test
* test smem
* profiler
* full inst
* bugfix
* rtag(1)
* pc is 64-bit and word
* pc is real code now
* dynamic
* more dynamic
* fix oob access
* fix crash, more dyn
* all dyn
* really all dyn
* correct null mask
* lit + format
* 21s on the tests
* 13s on the tests
* canonical name
* simm16
* more dyn
* 14s
* proper saddr dedup
* dyn
* debug 5
* better 5
* revert dynamic stuff
* that can be dyn
* negative offsets
* dyn wmma
* f16 wmma support / ops / dtype / dtype_alu
* symbolic changes not needed
* ConstFloat
* more uop.const
* __eq__
* uop tests
* fix f16
* bf16 tensor cores
* whitespace
* remove cast roundtrip
* Revert "remove cast roundtrip"
This reverts commit c5bb0381c3 .
* just the fix
* remove dead paths
* llvm runs
2026-01-26 18:04:29 +08:00
qazal
92bfe92138
assembly/amd: fix cdna mfma xml ( #14329 )
...
* handwritten failing test
* new amdxml
* more mfma from fixes
* ci
* move arch of test integration
* alt
* amdxml human cleanup
* _TestIntegration rename to IntegrationTestBase
* it's the same problem as _LIT
* better comment
* better variable name
2026-01-26 17:51:26 +09:00
George Hotz
49db266b96
ReprEnum for repr roundtrips ( #14327 )
...
* ReprEnum for repr roundtrips
* dsl
* bugfixes
* vdsty fixes
* cleaner
* fix
* fix cdna fields
* tests all pass
2026-01-25 18:58:31 +08:00
qazal
bf2d9d138f
viz: simplify amdgpu cfg ( #14326 )
...
* viz: replace llvm disasm with our disasm
* it starts with more code
* then it becomes less
* simpler, cdna disassembles with decimal simm16
* s_branch is upper case, add test
* simm16s and others
2026-01-25 15:21:45 +09:00
qazal
807bc40931
assembly/amd: dsl and disasm cleanup ( #14311 )
...
* rdna4 inst helper
* remove dsl aliases
2026-01-24 11:36:12 +09:00
qazal
b913c910c5
assembly/amd: rdna4 passing test_roundtrip ( #14300 )
...
* test_roundtrip on different archs
* failing tests
* take RDNA4 xml changes from the emu branch
* work
* min diff to disasm flat
* test_add passes, rdna4 first
* correct vgpr field for the multi dword store stuff
* amdllvm
* recompile in roundtrip, get sources from emulator
* amdllvm, 2
* clean clean
* note, don't rely on that os.environ
---------
Co-authored-by: George Hotz <geohot@gmail.com >
2026-01-23 21:33:53 +09:00
George Hotz
d116312b1a
get cdna sqtt working ( #14301 )
...
* get cdna sqtt working
* cnd aprser
* wavestart/waveend
* names
* cdna
* test that
2026-01-23 18:46:15 +08:00
George Hotz
a5c4fa39d1
RDNA4 support in SQTT ( #14299 )
...
* table test
* cleanups
* dead file
* delta short
* tests
* delta test
* work
* l4 tests pass
* l0
* cnda
* print
* reverT
* wave failure
* wave failure
* test
* encs
* no l0 crap
* L4
* rdna4 sqtt
* notes
* linter
2026-01-23 16:16:45 +08:00
qazal
3b8a7bb8c9
use existing roc.py infra for sqtt tests ( #14297 )
...
* add pc, per kernel tracing
* work
* remove those imports
* min diff
2026-01-23 14:07:11 +09:00
qazal
dff5f361b0
support rendering assembly kernels on the NULL backend ( #14283 )
...
* assembly custom kernels in DEV=NULL, use renderer arch
* update mmapeak
* llvm
2026-01-22 15:49:07 +09:00
qazal
18f408a35a
custom assembly kernel with variable tests ( #14280 )
...
* custom assembly kernel with variable tests
* different threads
* sink
* zeros like / flatten
2026-01-22 11:34:17 +09:00
qazal
78a28227c6
assembly/amd: cdna4 mfma support ( #14206 )
2026-01-21 09:12:05 +09:00
George Hotz
1baefed530
assembly/amd: add hw tests from ucode branch ( #14259 )
...
* assembly/amd: add hw tests from ucode branch
* fix is per lane
2026-01-21 08:53:54 +09:00
qazal
4548fcc1b8
amd/sqtt: add rdna4 and cdna sqtt examples ( #14251 )
...
* amd/sqtt: add rdna4 and cdna sqtt examples
* work
* comment out rdna and cdna tests
2026-01-20 21:11:48 +09:00
qazal
2dc281b32a
assembly/amd: test helpers for arch to gfx target mapping ( #14250 )
2026-01-20 20:35:09 +09:00
George Hotz
0243f4a0f1
clear wins from ucode branch ( #14243 )
...
* clear wins from ucode branch
* two more
* revert those
2026-01-20 15:11:09 +09:00
qazal
e27a0002c5
viz: only keep the sqtt bytes for pkts ( #14203 )
...
* viz: only keep the sqtt bytes for pkts
* better option name
* work
* renames
2026-01-18 17:04:26 +09:00
qazal
d8f87ae2f2
SQTT packets to assembly mapper ( #14198 )
...
* disasm + compare to llvm
* start inst trace
* base tests pass
* work
* work
* all kernels
* qol
* refactor
* work
* work
* wave_focus
* simple
* work
* add a lot of asserts
* focus on wave0
* correct handling of IMMEDIATE_MASK
* work
* viz work
* use the metadata infra
* better
2026-01-18 16:32:13 +09:00
George Hotz
a51e0a86db
assembly/amd: clean up disasm.py + add CDNA support ( #14200 )
...
* assembly/amd: clean up disasm.py
* cleanups
* add missing encodings
* decode is pretty
* cdna
* assert on failure
* cdna roudtrip
* cdna passing
* test
* lil cleanup
* variant cleanups
* cleanups
2026-01-18 14:48:44 +09:00
George Hotz
50554115ee
fix VALU_SALU / IMMED_MASK and improve amd_asm_matmul ( #14196 )
...
* fix VALU_SALU / IMMED_MASK and improve amd_asm_matmul
* immed
* wave override
* restore ALT
* advance sgprs correctly
* no helpers
* decrease to 192 VGPRs
2026-01-17 11:58:34 +09:00
George Hotz
7d1d9d4568
assembly/amd: remove IMG instruction support and asm.py ( #14163 )
...
* assembly/amd: return IMG instruction supports
* remove asm.py
* op2dsl
2026-01-17 06:21:50 +09:00
George Hotz
255e0573b1
assembly/amd: clean up asm/disasm ( #14158 )
...
* assembly/amd: clean up asm/disasm
* update disasm
* revert dumb stuff
* update decode
* use fmt
2026-01-15 17:45:40 +09:00
qazal
b46da603fe
codegen/custom_kernel: do not attach KernelInfo to user program ( #14160 )
2026-01-15 14:01:48 +09:00
George Hotz
fd60626ea1
assembly/amd: refactor to use op_bits/op_regs ( #14156 )
...
* assembly/amd: refactor to use op_bits/op_regs
* remove that skip
* remove another hack
* remove another hack
* precompute mask
* more reg, less hasattr
2026-01-15 11:20:21 +09:00
George Hotz
e9ce12028e
assembly/amd: amdxml cleanups, remove broken SDWA/DPP, merge in pdf.py ( #14154 )
...
* assembly/amd: amdxml cleanups, remove broken SDWA/DPP
* remove buf junk
* simplify
* simplify
* lil cleanup
* dead fixes
* strip non pcode extraction from pdf
* merge pdf.py into amdxml.py
* only amdxml
2026-01-15 09:23:19 +09:00
qazal
434dbafab5
optional Estimates in KernelInfo ( #14147 )
...
* optional Estimates in KernelInfo
* custom asm test plumbing
* s_code_end
* estimates test
* vaddr arg in global_store
* kernel desc
* Ops.DEVICE name
2026-01-14 22:55:03 +09:00
George Hotz
2ab18ea7e3
assembly/amd: use xml instead of pdf ( #14118 )
...
* assembly/amd: use xml instead of pdf
* use amdxml to generate info about op sizes
* fix many tests with invalid instructions
* fix info generation
* chad xml fixes many bugs
* rename to operands
* simplify
* amdxml
* bug fix
2026-01-14 10:03:37 +09:00
George Hotz
a28c8105a5
assembly/amd: 2% faster amd_uop_matmul + SQTT ( #14122 )
...
* assembly/amd: 2% faster amd_uop_matmul
* SQTT_TOKEN_EXCLUDE + SQTT_SIMD_SEL
* sqtt printer
* fix printer
* fast decode
* fast decoder
* test packet counts
* ugh it's not faster
* dead
2026-01-13 19:55:32 +09:00
George Hotz
330a0b686e
assembly/amd: clean up dsl and make type verification strict ( #14102 )
...
* assembly/amd: start newdsl
* work
* newdsl upd
* Reg is p nice
* cleaner
* work
* getting clean
* all fields
* more BitFields
* redo the pdfs with dsl2 syntax
* no lit
* cleanups
* more defaults
* fix get and remove crap
* aliases
* ugly but kind of works
* NULL, not rawimm
* clean up defaults
* only dsl
* asm fixes
* lit fixup
* more lit
* cleanups
* olddsl
* single pcode dict
* emu sort of works
* trash test
* global is global
* types property
* reg mods
* fix a few tests
* remove monkey patch
* fixes
* less hacks in tests
* less hacks in tests
* 4 test failures
* hw tests all pass
* fix compare emulator
* fix some tests
* 3 more
* fix and shorten sqtt
* handwritten
* fix validation
* test corrections
* all types validate
* fix dsl2 tests
* fix bugs in disasm
* skips on cdna
* work
* repr with reg[]
* fix bitfield tests
* merge pcodes in dsl
* remove override
* disasm uses inst.types
* simpler
2026-01-13 08:52:16 +09:00
George Hotz
91bde927ef
assembly/amd: split asm.py into asm.py and disasm.py ( #14101 )
...
* split asm.py into asm.py and disasm.py
* split decoder
* move to pcode
* tests
2026-01-12 07:22:02 +09:00
George Hotz
44135e2e84
assembly/amd: always use v_nop in test for rocprof-trace-decoder ( #14100 )
...
* assembly/amd: always use v_nop in test for rocprof-trace-decoder
* test touchups
2026-01-12 05:31:58 +09:00
George Hotz
8b1b15aec0
assembly/amd: SQTT support ( #14099 )
...
* assembly/amd: SQTT support
* simpler
* cmp wave
* instruction compare
* rocprof decode
* simpler
* no llvm
* no strcmp
2026-01-12 05:07:17 +09:00