Commit Graph

26 Commits

Author SHA1 Message Date
qazal
2cc64d71b0 simplify mi350x gemm / viz asm tests (#13984)
* mi350x gemm cleanup

* asm tests work

* simpler asm tests
2026-01-03 11:11:07 +09:00
qazal
9726500de8 enable using assembly in Tensor.custom_kernel (#13895) 2026-01-02 00:12:01 +09:00
George Hotz
b998a80b5d assembly/amd: split generated stuff into enum/ins (#13924) 2025-12-31 10:10:52 -05:00
George Hotz
81cf9ea0ab rename to extra.assembly.amd (#13879) 2025-12-29 14:10:55 -05:00
George Hotz
f07c39cfa4 hwtest fixes for rdna3 dsl (#13865) 2025-12-28 20:42:29 -05:00
qazal
2180eee5e4 use the asm dsl in remu hwtest.py (#13856)
* remu hw test with the asm dsl

* simpler

* nthreads and exec mask

* cmp/cmpx

* assembler error in s_mov_b32

* vopd in dsl?
2025-12-28 11:32:41 +09:00
George Hotz
9d94b8c6b2 python asm dsl in extra + python REMU (#13436)
* having fun with python asm dsl

* rdna3

* meh

* all in rdna3

* work

* more work

* work

* integration

* tests

* simpler

* simpler

* asm

* better

* simpler

* progress

* emu

* simpler

* emu

* tests

* types

* vopd

* cleaups

* work

* memory ranges

* add tracing

* refactors

* run_asm exit

* more readable

* compare to remu

* test gemm

* bug + stale

* more tests

* refactor

* tests fix

* more ins

* more instructions

* refactor

* faster

* match case

* match case

* simpler

* work

* tests

* run_asm

* work

* bug fixes

* more emu

* alu/emu

* refactor

* no pipeline emu yet

* alu direct

* fix

* bugfixes + new test

* fix exceptions in emulators

* update gen.py

* pylint

* no pdf

* improve bench_emu

* speedups

* cleanups

* more tests
2025-12-25 13:04:14 -05:00
qazal
7622be761f add new remu instructions from #13533 (#13539) 2025-12-03 06:29:20 +08:00
qazal
2f95c10702 remu new instructions / use volatile in emulator tests (#12862)
* remu new instructions

* start moving to volatile

* test_simple works

* test_exec_mov works and lid is still here

* test_exec_cmp_vopc

* clang did s_mov_b32 exec_lo, 1

* don't hardcode v1

* support volatile in tests

* hw_test passes

* only the volatile version

* subrev saturating behavior
2025-10-23 11:13:43 +08:00
qazal
e8c595c29e remu: add new instructions introduced in RANGEIFY (#12363)
* add v_mad_i64_i32 for test_output_padded_conv_transpose2d

* run amd test_ops

* skip test_masked_select
2025-09-30 12:36:29 +03:00
qazal
c7bb561ef9 remu: add v_rsq_f32_e32 instruction (#11947)
https://github.com/tinygrad/tinygrad/pull/11936 introduces a change to
the AMD LLVM renderer that outputs this instruction. Adding both 32 and
64 bit variants.
2025-09-01 11:29:31 +03:00
chenyu
126fcf4129 clean up AMD_LLVM in tests (#11021) 2025-06-28 22:45:47 -04:00
George Hotz
32e9949052 rename lazydata to uop (#10698) 2025-06-08 08:42:22 -07:00
qazal
17f0f5e764 add v_rcp_f32_e64 to remu (#10393)
* tests from the box

* add v_rcp_f32_e64 to remu

* f32::from_bits utils

* v_cndmask_b32 tests
2025-05-18 17:08:21 +03:00
Ignacio Sica
a54fd745c3 simpler barrier match in remu (#10339)
* s_barrier

* remove s_barrier from syncs
2025-05-16 14:40:58 +03:00
Ignacio Sica
3c453e96a9 add ds_load_b96 and ds_store_b96 instructions (#10338) 2025-05-15 18:11:08 +03:00
qazal
be8202b293 add s_abs_i32 instruction to remu (#10334) 2025-05-15 16:47:58 +03:00
qazal
9210280811 add v_fmac_f16 vop3 instruction to remu (#10247)
* fmac vop3

* from the box
2025-05-10 23:48:25 +03:00
qazal
4ea3e373aa decode lds ops in remu (#10184) 2025-05-07 16:44:18 +08:00
Ignacio Sica
74c25bdc8b add support for ds_load_u8 in remu (#10180)
* add support for ds_load_u8 in remu

* add test for ds_load_u8
2025-05-06 20:31:00 +03:00
qazal
ac37510f60 remu: only write v_cmp result if exec is set (#10084) 2025-04-28 20:31:52 +08:00
qazal
d6b436a815 remu bugfix with -0.0 negation (#10082) 2025-04-28 15:46:42 +08:00
qazal
e1d2b64e92 remu new instructions (#10050)
* remu new instructions

* test_ds_store_half

* test_v_mul_f16
2025-04-26 02:04:12 +03:00
qazal
bba5d0a3e4 remu refactors (#10028)
* remu refactors

* scc is sgpr 253

* remove that

* rename to vcc_lo

* run cargo test in CI

* llvm-mc

* meh

* work

* work_group work 1

* seeded_lanes is dumb

* better than seeded_lanes

* does not need to be address

* 128 sgpr per wave

* scc is sgpr, we don't know which one

* null_src once more

* derive clone, wave init is cleaner

* init comes first
2025-04-26 04:31:10 +08:00
qazal
0b482fb824 add RDNA3 parser to remu (#10025)
* llvm ref

* work

* all of them

* salu

* cleaner

* start

* vector ops

* done

* replace SMEM

* vopd

* sop1

* SOPC

* null stays null_src

* sopp

* SOPK

* sop2

* vop1

* vop2

* remove allow(dead_code)

* vopc
2025-04-24 21:34:07 +08:00
qazal
16dfe0a902 upstream remu (#9921) 2025-04-18 01:57:36 +03:00