36 Commits

Author SHA1 Message Date
George Hotz
4680247e35 renderer/amd: move in tree (#14702)
* renderer/amd: move in tree

* fix paths in tests

* 24000 lines

* no delete for amd files
2026-02-12 18:09:16 +08:00
George Hotz
d5fc3ea1ba assembly/amd: mypy+ruff passes (#14701)
* assembly/amd: mypy+ruff passes

* touchups
2026-02-12 16:59:42 +08:00
George Hotz
b398335f62 assembly/amd: fix saturation in python remu (#14557)
* PYTHONREMU: failing test for V_SUB_NC_U32_E64 clamp

* fix saturation in PYTHON_REMU

* simpler

* more tests, less lines

---------

Co-authored-by: Christopher Milan <chrismilan@ucla.edu>
2026-02-05 18:35:57 +08:00
Christopher Milan
232848d086 PYTHONREMU: VOP3P integer operations with constants don't cast to fp16 (#14546)
* PYTHONREMU: VOP3P integer operations with constants don't cast to fp16

* put that back

* cleaner

* do that once
2026-02-04 20:10:59 -05:00
chenyu
d57d24c7d4 Buffer.as_buffer -> Buffer.as_memoryview [pr] (#14535)
it casts to memoryview. also inline the as_typed_buffer checks to Tensor._data
2026-02-04 11:31:11 -05:00
George Hotz
dd2de4f838 rename all DEFINE_GLOBAL to PARAM (#14511) 2026-02-03 15:09:38 +08:00
George Hotz
6e958dbfd4 assembly/amd: add RDNA4 support to emulator (#14341)
* start new rdna4

* work

* plus works

* more pass

* rdna4

* assembly/amd: fix RDNA4 emulator for float16 and VOP3 clamp

* stale

* rev

* rr

* rdna4 emu tests

* cleanup

* cleanup

* simp

* works

* better factorizaion

* hacks

* fix mockgpu

* guard both

* cleaner

* gate

* bug fix and a few tests

* all test_tiny
2026-02-02 21:35:59 +08:00
George Hotz
965149a46d assembly/amd: add ds perm instructions (#14486)
* assembly/amd: add ds perm instructions

* NO SKIP

* fix preexisting RDNA3 issues

* pcode

* assert

* asserts

* unify

* simp

* good fix
2026-02-02 16:02:00 +08:00
George Hotz
b705c9143c assembly/amd: test more instructions (#14365)
* assembly/amd: test more instructions

* more

* passing

* revert

* no const fold

* remove junk

* cleaner
2026-01-31 12:40:22 +08:00
Christopher Milan
e575dd8275 prevent UB in long decomp and more emulated tests (#14447) 2026-01-30 19:38:41 -05:00
George Hotz
202b74b369 assembly/amd: continue refactors (#14386)
* simpler

* merge

* flat

* no ctx

* use the correct apis

* dup code

* write clean code

* remove bad helpers

* bits junk remove

* junk remove

* smem test

* fix tests

* correct fix + tests

* Fmt matters it seems

* wmma refactor

* a lil more

* kimi cleanups

* line
2026-01-28 17:33:03 +08:00
George Hotz
88bc5ee212 assembly/amd: rename to better names (#14384)
* assembly/amd: rename to better names

* might help fuzzing segfault

* emu2 -> emu
2026-01-28 10:00:54 +08:00
George Hotz
be23776ba7 assembly/amd: replace pcode with ucode (#14002)
* a bunch of todos for my boy claude

* uops have types

* lil cleanups

* simpler ucode

* isNAN

* calls

* move more

* cleanup pcode_parse

* cvt functions

* fix parser bugs

* no void

* minmax

* more pcode parse

* pretty print

* transform

* comments

* move to transform

* assign/declare

* simpler norm

* single PM

* just Uops

* simpler

* more typed

* all rewrite

* less verbose

* work

* spec

* transform

* work

* simpler spec

* less spec

* bitcast

* simpler

* simp ucode

* work

* more in pcode_transform

* remove junk

* more functions

* bug

* no void assign

* load/store

* wave

* fixes

* move denorm

* move more functions

* tests

* cat is shape None

* uop syntax

* move a few more

* program_spec

* cat stuff

* assign fix clear

* unused

* nans

* fp bits

* works with simplify

* remove junk

* special

* meh

* more

* more

* update test pcode parse

* improve parser

* parse some for loops

* merge master

* dead files

* tests pass

* emu2

* better emu2

* test_plus works

* uselessly write more instructions

* use pcode

* something

* something

* bench_emu

* progress

* ds works

* work

* work

* more passing

* run compare

* bench_emu

* more pcode

* a few more

* bugfixes

* bugfix

* test fixes

* tests pass without USE_HW

* all hw tests pass

* add more hw tests

* new hw tests

* bit

* less handcode

* parse more

* consolidate pcode

* fixes

* rsrc

* lane pcode

* cleanups

* simpler

* emu bugs

* one cmp test fails

* fix decode and upd name

* fix name and test harness

* _ftz_f32

* fix denorm

* fix VOPD and use load

* fix carry bug

* no load where / just invalid

* clean

* simpler

* merge sops

* refactoring

* simplifications

* bugfixes

* new tests

* f16 sin fix

* assertion and hw tests

* cvt functions

* one more failure

* bugfixes

* bugfix + regression

* more tests

* fmac

* no manual unrolling

* ordering

* LLVM backend is a lot faster

* compile inst

* more bugs

* f16

* bugfix

* fix regression

* one clang call

* 1M inst

* scratch works

* do scratch correctly

* cleanup

* regression

* cmp

* fmamk fixes

* merge

* fix vcmpx

* unify memory

* remove unused code

* ignore oob for test

* cleanups

* fix mbs

* unify cmp

* test

* minor cleanups

* bump timeout

* fix tests

* revert the CMPLE stuff

* remove opt

* less diff

* simpler

* revert

* support multiple backends

* memset is a lot faster

* split out in bench emu

* improve timing

* timing

* cache that

* cache that

* simpler and faster

* tokenize

* binop table

* simpler

* move to parser

* tok for lambda

* refactor

* expr_parser

* delete emu2_pcode

* import cleanup

* lil

* if parse

* work

* simpler

* no v

* trig preop is faster

* durations for tests

* fix cmp bug

* sdst

* remove scartch_size hack

* null behavior

* _MXCSRContext

* bugfixes

* DEBUG >= 3

* test smem crashes my gpu

* debug

* test

* test smem

* profiler

* full inst

* bugfix

* rtag(1)

* pc is 64-bit and word

* pc is real code now

* dynamic

* more dynamic

* fix oob access

* fix crash, more dyn

* all dyn

* really all dyn

* correct null mask

* lit + format

* 21s on the tests

* 13s on the tests

* canonical name

* simm16

* more dyn

* 14s

* proper saddr dedup

* dyn

* debug 5

* better 5

* revert dynamic stuff

* that can be dyn

* negative offsets

* dyn wmma

* f16 wmma support / ops / dtype / dtype_alu

* symbolic changes not needed

* ConstFloat

* more uop.const

* __eq__

* uop tests

* fix f16

* bf16 tensor cores

* whitespace

* remove cast roundtrip

* Revert "remove cast roundtrip"

This reverts commit c5bb0381c3.

* just the fix

* remove dead paths

* llvm runs
2026-01-26 18:04:29 +08:00
George Hotz
49db266b96 ReprEnum for repr roundtrips (#14327)
* ReprEnum for repr roundtrips

* dsl

* bugfixes

* vdsty fixes

* cleaner

* fix

* fix cdna fields

* tests all pass
2026-01-25 18:58:31 +08:00
George Hotz
a51e0a86db assembly/amd: clean up disasm.py + add CDNA support (#14200)
* assembly/amd: clean up disasm.py

* cleanups

* add missing encodings

* decode is pretty

* cdna

* assert on failure

* cdna roudtrip

* cdna passing

* test

* lil cleanup

* variant cleanups

* cleanups
2026-01-18 14:48:44 +09:00
George Hotz
fd60626ea1 assembly/amd: refactor to use op_bits/op_regs (#14156)
* assembly/amd: refactor to use op_bits/op_regs

* remove that skip

* remove another hack

* remove another hack

* precompute mask

* more reg, less hasattr
2026-01-15 11:20:21 +09:00
George Hotz
330a0b686e assembly/amd: clean up dsl and make type verification strict (#14102)
* assembly/amd: start newdsl

* work

* newdsl upd

* Reg is p nice

* cleaner

* work

* getting clean

* all fields

* more BitFields

* redo the pdfs with dsl2 syntax

* no lit

* cleanups

* more defaults

* fix get and remove crap

* aliases

* ugly but kind of works

* NULL, not rawimm

* clean up defaults

* only dsl

* asm fixes

* lit fixup

* more lit

* cleanups

* olddsl

* single pcode dict

* emu sort of works

* trash test

* global is global

* types property

* reg mods

* fix a few tests

* remove monkey patch

* fixes

* less hacks in tests

* less hacks in tests

* 4 test failures

* hw tests all pass

* fix compare emulator

* fix some tests

* 3 more

* fix and shorten sqtt

* handwritten

* fix validation

* test corrections

* all types validate

* fix dsl2 tests

* fix bugs in disasm

* skips on cdna

* work

* repr with reg[]

* fix bitfield tests

* merge pcodes in dsl

* remove override

* disasm uses inst.types

* simpler
2026-01-13 08:52:16 +09:00
George Hotz
91bde927ef assembly/amd: split asm.py into asm.py and disasm.py (#14101)
* split asm.py into asm.py and disasm.py

* split decoder

* move to pcode

* tests
2026-01-12 07:22:02 +09:00
George Hotz
45f7fd073d assembly/amd: pcode bug fixes (#14032)
* bring over pcode parser

* fixes

* pdf test

* delay alu
2026-01-06 00:15:48 -08:00
George Hotz
20653d2996 assembly/amd: make pdf.py code shine (#14029)
* assembly/amd: make pdf.py code shine

* no merge

* pdf2 is the future

* something

* regen enums

* test

* work

* remove junk

* write

* pcode extraction

* pdf2 passes all tests

* simplify

* simpler pdf

* late filter

* remove hacks

* simplify pdf2.py

* field type

* remove defaults

* don't export srcenum

* simple pdf.py

* simpler

* cleaner

* less hack in PDF
2026-01-05 18:49:40 -08:00
George Hotz
404eed6172 assembly/amd: improve tests for asm (#14007)
* assembly/amd: improve tests for asm

* upd

* skip

* tests

* re bug

* more passing

* cleanups

* cdna fixups

* improve tests, better CDNA parsing

* fix CI

* no defs

* simpler

* all pass

* from pdf

* regen
2026-01-04 15:14:08 -08:00
George Hotz
34ea053b26 assembly/amd: clean up pcode, jit pcode instead of static (#14001)
* assembly/amd: clean up pcode

* regen

* lil

* jit the pcode

* sendmsg

* cleanups

* inst prefetch lol
2026-01-03 23:06:15 -08:00
George Hotz
8328511808 assembly/amd: make the emu.py code shine (#13996)
* assembly/amd: make the code shine

* lil clean

* reg back in pcode

* cleanups

* gen fma_mix

* no writelane hacks

* fn cleanup

* dead vgpr_write

* readable

* smem

* cleanup bench_emu

* speedups

* simpler and faster

* direct inst._fn

* split fxn

* Revert "simpler and faster"

This reverts commit e85f6594b3.

* move lds to wavestate

* dispatcher

* pc in dispatch

* literal isn't wavestate

* cleanups + program

* one readlane

* exec_vop3sd in exec_vop

* cleaner exec_vopd

* fully merge VOP3P

* no special paths

* no SliceProxy

* low=0

* no bigint

* failing tests

* fma on python 3.13
2026-01-03 20:33:09 -08:00
George Hotz
dfb813b760 assembly/amd: add pcode ds ops (#13939)
* assembly/amd: add pcode ds ops

* refactors

* fix ds op

* update autogen

* fix flat bug

* more tests

* fix emu test

* that's a hack

* generic

* fix all tests

* two tests

* fix test failure

* better

* remove __all__
2026-01-01 16:24:13 -05:00
George Hotz
2bb07d4824 assembly/amd: move Reg out of the psuedocode (#13934)
* assembly/amd: move Reg out of the psuedocode

* remove extra

* fix pcode tests

* simpler pcode

* simpler

* simpler

* cleaner

* fix mypy
2025-12-31 15:34:51 -05:00
George Hotz
29402034a1 assembly/amd: cleanups to asm and emu (#13912)
* a bunch of cleanups

* ops are back

* bug fixes

* cleanups

* a lil simpler

* more refactors

* _disasm_vop1

* sops

* more

* continue

* more

* num_srcs

* simpler

* no _is16

* op cleanups

* isinstnace
2025-12-31 12:46:11 -05:00
George Hotz
b998a80b5d assembly/amd: split generated stuff into enum/ins (#13924) 2025-12-31 10:10:52 -05:00
George Hotz
0221b96761 assembly/amd: fix all ops tests (#13910)
* assembly/amd: fix all ops tests

* test_ops with smaller sizes

* ds store/load 2addr
2025-12-30 18:01:34 -05:00
George Hotz
efc99d0c55 assembly/amd: more refactors (#13907)
* assembly/amd: more refactors

* more refactors

* more refactors

* simpler emu

* generate.py

* regen all

* cleanups

* more

* work

* more readme

* lil
2025-12-30 16:13:24 -05:00
George Hotz
49d1bf93d6 assembly/amd: refactor asm.py to be simpler (#13900)
* assembly/amd: refactor asm.py

* assembly/amd: refactor asm.py to be simpler

* multiple fxns

* fast

* more tests pass

* regen

* stop decode
2025-12-30 13:51:40 -05:00
George Hotz
69cdc8066d assembly/amd: add dtype tests to AMD IDE CI (#13899)
* add dtype tests to AMD IDE CI

* more tests

* add trig preop

* regen done

* split to amd autogen

* simpler
2025-12-30 11:09:51 -05:00
George Hotz
9c89be5235 assembly/amd: fix v_perm_b32 + PC fixes (#13897)
* assembly/amd: fix v_perm_b32

* add pc support
2025-12-30 09:25:40 -05:00
George Hotz
2b838dc1d8 assembly/amd: fix AMD_LLVM=1 support in emulator (#13881)
* fix AMD_LLVM=1 support in emulator

* more llvm with dtype

* work

* more fixes

* fix dtype
2025-12-30 09:09:57 -05:00
George Hotz
7322d9ec4a assembly/amd: add new instruction support to pcode (#13885)
* assembly/amd: add new instruction support

* more

* regen all
2025-12-29 17:30:17 -05:00
George Hotz
9d8397be11 add CDNA3+RDNA4 support (#13882)
* fix CI

* remove junk

* rename lib to dsl

* correct

* cleanups
2025-12-29 15:51:29 -05:00
George Hotz
81cf9ea0ab rename to extra.assembly.amd (#13879) 2025-12-29 14:10:55 -05:00