George Hotz
4680247e35
renderer/amd: move in tree ( #14702 )
...
* renderer/amd: move in tree
* fix paths in tests
* 24000 lines
* no delete for amd files
2026-02-12 18:09:16 +08:00
George Hotz
d5fc3ea1ba
assembly/amd: mypy+ruff passes ( #14701 )
...
* assembly/amd: mypy+ruff passes
* touchups
2026-02-12 16:59:42 +08:00
George Hotz
b398335f62
assembly/amd: fix saturation in python remu ( #14557 )
...
* PYTHONREMU: failing test for V_SUB_NC_U32_E64 clamp
* fix saturation in PYTHON_REMU
* simpler
* more tests, less lines
---------
Co-authored-by: Christopher Milan <chrismilan@ucla.edu >
2026-02-05 18:35:57 +08:00
Christopher Milan
232848d086
PYTHONREMU: VOP3P integer operations with constants don't cast to fp16 ( #14546 )
...
* PYTHONREMU: VOP3P integer operations with constants don't cast to fp16
* put that back
* cleaner
* do that once
2026-02-04 20:10:59 -05:00
chenyu
d57d24c7d4
Buffer.as_buffer -> Buffer.as_memoryview [pr] ( #14535 )
...
it casts to memoryview. also inline the as_typed_buffer checks to Tensor._data
2026-02-04 11:31:11 -05:00
George Hotz
dd2de4f838
rename all DEFINE_GLOBAL to PARAM ( #14511 )
2026-02-03 15:09:38 +08:00
George Hotz
6e958dbfd4
assembly/amd: add RDNA4 support to emulator ( #14341 )
...
* start new rdna4
* work
* plus works
* more pass
* rdna4
* assembly/amd: fix RDNA4 emulator for float16 and VOP3 clamp
* stale
* rev
* rr
* rdna4 emu tests
* cleanup
* cleanup
* simp
* works
* better factorizaion
* hacks
* fix mockgpu
* guard both
* cleaner
* gate
* bug fix and a few tests
* all test_tiny
2026-02-02 21:35:59 +08:00
George Hotz
965149a46d
assembly/amd: add ds perm instructions ( #14486 )
...
* assembly/amd: add ds perm instructions
* NO SKIP
* fix preexisting RDNA3 issues
* pcode
* assert
* asserts
* unify
* simp
* good fix
2026-02-02 16:02:00 +08:00
George Hotz
b705c9143c
assembly/amd: test more instructions ( #14365 )
...
* assembly/amd: test more instructions
* more
* passing
* revert
* no const fold
* remove junk
* cleaner
2026-01-31 12:40:22 +08:00
Christopher Milan
e575dd8275
prevent UB in long decomp and more emulated tests ( #14447 )
2026-01-30 19:38:41 -05:00
George Hotz
202b74b369
assembly/amd: continue refactors ( #14386 )
...
* simpler
* merge
* flat
* no ctx
* use the correct apis
* dup code
* write clean code
* remove bad helpers
* bits junk remove
* junk remove
* smem test
* fix tests
* correct fix + tests
* Fmt matters it seems
* wmma refactor
* a lil more
* kimi cleanups
* line
2026-01-28 17:33:03 +08:00
George Hotz
88bc5ee212
assembly/amd: rename to better names ( #14384 )
...
* assembly/amd: rename to better names
* might help fuzzing segfault
* emu2 -> emu
2026-01-28 10:00:54 +08:00
George Hotz
be23776ba7
assembly/amd: replace pcode with ucode ( #14002 )
...
* a bunch of todos for my boy claude
* uops have types
* lil cleanups
* simpler ucode
* isNAN
* calls
* move more
* cleanup pcode_parse
* cvt functions
* fix parser bugs
* no void
* minmax
* more pcode parse
* pretty print
* transform
* comments
* move to transform
* assign/declare
* simpler norm
* single PM
* just Uops
* simpler
* more typed
* all rewrite
* less verbose
* work
* spec
* transform
* work
* simpler spec
* less spec
* bitcast
* simpler
* simp ucode
* work
* more in pcode_transform
* remove junk
* more functions
* bug
* no void assign
* load/store
* wave
* fixes
* move denorm
* move more functions
* tests
* cat is shape None
* uop syntax
* move a few more
* program_spec
* cat stuff
* assign fix clear
* unused
* nans
* fp bits
* works with simplify
* remove junk
* special
* meh
* more
* more
* update test pcode parse
* improve parser
* parse some for loops
* merge master
* dead files
* tests pass
* emu2
* better emu2
* test_plus works
* uselessly write more instructions
* use pcode
* something
* something
* bench_emu
* progress
* ds works
* work
* work
* more passing
* run compare
* bench_emu
* more pcode
* a few more
* bugfixes
* bugfix
* test fixes
* tests pass without USE_HW
* all hw tests pass
* add more hw tests
* new hw tests
* bit
* less handcode
* parse more
* consolidate pcode
* fixes
* rsrc
* lane pcode
* cleanups
* simpler
* emu bugs
* one cmp test fails
* fix decode and upd name
* fix name and test harness
* _ftz_f32
* fix denorm
* fix VOPD and use load
* fix carry bug
* no load where / just invalid
* clean
* simpler
* merge sops
* refactoring
* simplifications
* bugfixes
* new tests
* f16 sin fix
* assertion and hw tests
* cvt functions
* one more failure
* bugfixes
* bugfix + regression
* more tests
* fmac
* no manual unrolling
* ordering
* LLVM backend is a lot faster
* compile inst
* more bugs
* f16
* bugfix
* fix regression
* one clang call
* 1M inst
* scratch works
* do scratch correctly
* cleanup
* regression
* cmp
* fmamk fixes
* merge
* fix vcmpx
* unify memory
* remove unused code
* ignore oob for test
* cleanups
* fix mbs
* unify cmp
* test
* minor cleanups
* bump timeout
* fix tests
* revert the CMPLE stuff
* remove opt
* less diff
* simpler
* revert
* support multiple backends
* memset is a lot faster
* split out in bench emu
* improve timing
* timing
* cache that
* cache that
* simpler and faster
* tokenize
* binop table
* simpler
* move to parser
* tok for lambda
* refactor
* expr_parser
* delete emu2_pcode
* import cleanup
* lil
* if parse
* work
* simpler
* no v
* trig preop is faster
* durations for tests
* fix cmp bug
* sdst
* remove scartch_size hack
* null behavior
* _MXCSRContext
* bugfixes
* DEBUG >= 3
* test smem crashes my gpu
* debug
* test
* test smem
* profiler
* full inst
* bugfix
* rtag(1)
* pc is 64-bit and word
* pc is real code now
* dynamic
* more dynamic
* fix oob access
* fix crash, more dyn
* all dyn
* really all dyn
* correct null mask
* lit + format
* 21s on the tests
* 13s on the tests
* canonical name
* simm16
* more dyn
* 14s
* proper saddr dedup
* dyn
* debug 5
* better 5
* revert dynamic stuff
* that can be dyn
* negative offsets
* dyn wmma
* f16 wmma support / ops / dtype / dtype_alu
* symbolic changes not needed
* ConstFloat
* more uop.const
* __eq__
* uop tests
* fix f16
* bf16 tensor cores
* whitespace
* remove cast roundtrip
* Revert "remove cast roundtrip"
This reverts commit c5bb0381c3 .
* just the fix
* remove dead paths
* llvm runs
2026-01-26 18:04:29 +08:00
George Hotz
49db266b96
ReprEnum for repr roundtrips ( #14327 )
...
* ReprEnum for repr roundtrips
* dsl
* bugfixes
* vdsty fixes
* cleaner
* fix
* fix cdna fields
* tests all pass
2026-01-25 18:58:31 +08:00
George Hotz
a51e0a86db
assembly/amd: clean up disasm.py + add CDNA support ( #14200 )
...
* assembly/amd: clean up disasm.py
* cleanups
* add missing encodings
* decode is pretty
* cdna
* assert on failure
* cdna roudtrip
* cdna passing
* test
* lil cleanup
* variant cleanups
* cleanups
2026-01-18 14:48:44 +09:00
George Hotz
fd60626ea1
assembly/amd: refactor to use op_bits/op_regs ( #14156 )
...
* assembly/amd: refactor to use op_bits/op_regs
* remove that skip
* remove another hack
* remove another hack
* precompute mask
* more reg, less hasattr
2026-01-15 11:20:21 +09:00
George Hotz
330a0b686e
assembly/amd: clean up dsl and make type verification strict ( #14102 )
...
* assembly/amd: start newdsl
* work
* newdsl upd
* Reg is p nice
* cleaner
* work
* getting clean
* all fields
* more BitFields
* redo the pdfs with dsl2 syntax
* no lit
* cleanups
* more defaults
* fix get and remove crap
* aliases
* ugly but kind of works
* NULL, not rawimm
* clean up defaults
* only dsl
* asm fixes
* lit fixup
* more lit
* cleanups
* olddsl
* single pcode dict
* emu sort of works
* trash test
* global is global
* types property
* reg mods
* fix a few tests
* remove monkey patch
* fixes
* less hacks in tests
* less hacks in tests
* 4 test failures
* hw tests all pass
* fix compare emulator
* fix some tests
* 3 more
* fix and shorten sqtt
* handwritten
* fix validation
* test corrections
* all types validate
* fix dsl2 tests
* fix bugs in disasm
* skips on cdna
* work
* repr with reg[]
* fix bitfield tests
* merge pcodes in dsl
* remove override
* disasm uses inst.types
* simpler
2026-01-13 08:52:16 +09:00
George Hotz
91bde927ef
assembly/amd: split asm.py into asm.py and disasm.py ( #14101 )
...
* split asm.py into asm.py and disasm.py
* split decoder
* move to pcode
* tests
2026-01-12 07:22:02 +09:00
George Hotz
45f7fd073d
assembly/amd: pcode bug fixes ( #14032 )
...
* bring over pcode parser
* fixes
* pdf test
* delay alu
2026-01-06 00:15:48 -08:00
George Hotz
20653d2996
assembly/amd: make pdf.py code shine ( #14029 )
...
* assembly/amd: make pdf.py code shine
* no merge
* pdf2 is the future
* something
* regen enums
* test
* work
* remove junk
* write
* pcode extraction
* pdf2 passes all tests
* simplify
* simpler pdf
* late filter
* remove hacks
* simplify pdf2.py
* field type
* remove defaults
* don't export srcenum
* simple pdf.py
* simpler
* cleaner
* less hack in PDF
2026-01-05 18:49:40 -08:00
George Hotz
404eed6172
assembly/amd: improve tests for asm ( #14007 )
...
* assembly/amd: improve tests for asm
* upd
* skip
* tests
* re bug
* more passing
* cleanups
* cdna fixups
* improve tests, better CDNA parsing
* fix CI
* no defs
* simpler
* all pass
* from pdf
* regen
2026-01-04 15:14:08 -08:00
George Hotz
34ea053b26
assembly/amd: clean up pcode, jit pcode instead of static ( #14001 )
...
* assembly/amd: clean up pcode
* regen
* lil
* jit the pcode
* sendmsg
* cleanups
* inst prefetch lol
2026-01-03 23:06:15 -08:00
George Hotz
8328511808
assembly/amd: make the emu.py code shine ( #13996 )
...
* assembly/amd: make the code shine
* lil clean
* reg back in pcode
* cleanups
* gen fma_mix
* no writelane hacks
* fn cleanup
* dead vgpr_write
* readable
* smem
* cleanup bench_emu
* speedups
* simpler and faster
* direct inst._fn
* split fxn
* Revert "simpler and faster"
This reverts commit e85f6594b3 .
* move lds to wavestate
* dispatcher
* pc in dispatch
* literal isn't wavestate
* cleanups + program
* one readlane
* exec_vop3sd in exec_vop
* cleaner exec_vopd
* fully merge VOP3P
* no special paths
* no SliceProxy
* low=0
* no bigint
* failing tests
* fma on python 3.13
2026-01-03 20:33:09 -08:00
George Hotz
dfb813b760
assembly/amd: add pcode ds ops ( #13939 )
...
* assembly/amd: add pcode ds ops
* refactors
* fix ds op
* update autogen
* fix flat bug
* more tests
* fix emu test
* that's a hack
* generic
* fix all tests
* two tests
* fix test failure
* better
* remove __all__
2026-01-01 16:24:13 -05:00
George Hotz
2bb07d4824
assembly/amd: move Reg out of the psuedocode ( #13934 )
...
* assembly/amd: move Reg out of the psuedocode
* remove extra
* fix pcode tests
* simpler pcode
* simpler
* simpler
* cleaner
* fix mypy
2025-12-31 15:34:51 -05:00
George Hotz
29402034a1
assembly/amd: cleanups to asm and emu ( #13912 )
...
* a bunch of cleanups
* ops are back
* bug fixes
* cleanups
* a lil simpler
* more refactors
* _disasm_vop1
* sops
* more
* continue
* more
* num_srcs
* simpler
* no _is16
* op cleanups
* isinstnace
2025-12-31 12:46:11 -05:00
George Hotz
b998a80b5d
assembly/amd: split generated stuff into enum/ins ( #13924 )
2025-12-31 10:10:52 -05:00
George Hotz
0221b96761
assembly/amd: fix all ops tests ( #13910 )
...
* assembly/amd: fix all ops tests
* test_ops with smaller sizes
* ds store/load 2addr
2025-12-30 18:01:34 -05:00
George Hotz
efc99d0c55
assembly/amd: more refactors ( #13907 )
...
* assembly/amd: more refactors
* more refactors
* more refactors
* simpler emu
* generate.py
* regen all
* cleanups
* more
* work
* more readme
* lil
2025-12-30 16:13:24 -05:00
George Hotz
49d1bf93d6
assembly/amd: refactor asm.py to be simpler ( #13900 )
...
* assembly/amd: refactor asm.py
* assembly/amd: refactor asm.py to be simpler
* multiple fxns
* fast
* more tests pass
* regen
* stop decode
2025-12-30 13:51:40 -05:00
George Hotz
69cdc8066d
assembly/amd: add dtype tests to AMD IDE CI ( #13899 )
...
* add dtype tests to AMD IDE CI
* more tests
* add trig preop
* regen done
* split to amd autogen
* simpler
2025-12-30 11:09:51 -05:00
George Hotz
9c89be5235
assembly/amd: fix v_perm_b32 + PC fixes ( #13897 )
...
* assembly/amd: fix v_perm_b32
* add pc support
2025-12-30 09:25:40 -05:00
George Hotz
2b838dc1d8
assembly/amd: fix AMD_LLVM=1 support in emulator ( #13881 )
...
* fix AMD_LLVM=1 support in emulator
* more llvm with dtype
* work
* more fixes
* fix dtype
2025-12-30 09:09:57 -05:00
George Hotz
7322d9ec4a
assembly/amd: add new instruction support to pcode ( #13885 )
...
* assembly/amd: add new instruction support
* more
* regen all
2025-12-29 17:30:17 -05:00
George Hotz
9d8397be11
add CDNA3+RDNA4 support ( #13882 )
...
* fix CI
* remove junk
* rename lib to dsl
* correct
* cleanups
2025-12-29 15:51:29 -05:00
George Hotz
81cf9ea0ab
rename to extra.assembly.amd ( #13879 )
2025-12-29 14:10:55 -05:00