Add Radeon and Raden Pro specifications to the architecture reference (#2960)

* Expand architecture hardware specifications overview

Add supported Radeon and Radeon Pro GPUs

* Remove glossary from gpu architecture hardware specifications
This commit is contained in:
MKKnorr
2024-03-18 18:34:20 +01:00
committed by GitHub
parent e6b4715b4f
commit cac5df504c
2 changed files with 649 additions and 229 deletions

View File

@@ -356,6 +356,7 @@ VSkipped
Vanhoucke
Vulkan
WGP
WGPs
WX
WikiText
Wojna

View File

@@ -5,237 +5,656 @@
GPU architecture hardware specifications
########################################
The following table provides an overview over the hardware specifications for the AMD Instinct accelerators.
The following table provides an overview over the hardware specifications for the AMD Instinct accelerators, AMD Radeon and AMD Radeon Pro GPUs.
.. list-table:: AMD Instinct architecture specification table
:header-rows: 1
:name: instinct-arch-spec-table
.. tab-set::
*
- Model
- Architecture
- LLVM target name
- VRAM
- Compute Units
- Wavefront Size
- LDS
- L3 Cache
- L2 Cache
- L1 Vector Cache
- L1 Scalar Cache
- L1 Instruction Cache
- VGPR File
- SGPR File
*
- MI300X
- CDNA3
- gfx941 or gfx942
- 192 GiB
- 304
- 64
- 64 KiB
- 256 MiB
- 32 MiB
- 32 KiB
- 16 KiB per 2 CUs
- 64 KiB per 2 CUs
- 512 KiB
- 12.5 KiB
*
- MI300A
- CDNA3
- gfx940 or gfx942
- 128 GiB
- 228
- 64
- 64 KiB
- 256 MiB
- 24 MiB
- 32 KiB
- 16 KiB per 2 CUs
- 64 KiB per 2 CUs
- 512 KiB
- 12.5 KiB
*
- MI250X
- CDNA2
- gfx90a
- 128 GiB
- 220 (110 per GCD)
- 64
- 64 KiB
-
- 16 MiB (8 MiB per GCD)
- 16 KiB
- 16 KiB per 2 CUs
- 32 KiB per 2 CUs
- 512 KiB
- 12.5 KiB
*
- MI250
- CDNA2
- gfx90a
- 128 GiB
- 208
- 64
- 64 KiB
-
- 16 MiB (8 MiB per GCD)
- 16 KiB
- 16 KiB per 2 CUs
- 32 KiB per 2 CUs
- 512 KiB
- 12.5 KiB
*
- MI210
- CDNA2
- gfx90a
- 64 GiB
- 104
- 64
- 64 KiB
-
- 8 MiB
- 16 KiB
- 16 KiB per 2 CUs
- 32 KiB per 2 CUs
- 512 KiB
- 12.5 KiB
*
- MI100
- CDNA
- gfx908
- 32 GiB
- 120
- 64
- 64 KiB
-
- 8 MiB
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB VGPR and 256 KiB AccVGPR
- 12.5 KiB
*
- MI60
- GCN 5.1
- gfx906
- 32 GiB
- 64
- 64
- 64 KiB
-
- 4 MiB
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB
- 12.5 KiB
*
- MI50 (32GB)
- GCN 5.1
- gfx906
- 32 GiB
- 60
- 64
- 64 KiB
-
- 4 MiB
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB
- 12.5 KiB
*
- MI50 (16GB)
- GCN 5.1
- gfx906
- 16 GiB
- 60
- 64
- 64 KiB
-
- 4 MiB
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB
- 12.5 KiB
*
- MI25
- GCN 5.0
- gfx900
- 16 GiB
- 64
- 64
- 64 KiB
-
- 4 MiB
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB
- 12.5 KiB
*
- MI8
- GCN 3.0
- gfx803
- 4 GiB
- 64
- 64
- 64 KiB
-
- 2 MiB
- 16 KiB
- 16 KiB per 4 CUs
- 32 KiB per 4 CUs
- 256 KiB
- 12.5 KiB
*
- MI6
- GCN 4.0
- gfx803
- 16 GiB
- 36
- 64
- 64 KiB
-
- 2 MiB
- 16 KiB
- 16 KiB per 4 CUs
- 32 KiB per 4 CUs
- 256 KiB
- 12.5 KiB
.. tab-item:: AMD Instinct GPUs
Glossary
########
.. list-table::
:header-rows: 1
:name: instinct-arch-spec-table
For a more detailed explanation refer to the :ref:`specific documents and guides <gpu-arch-documentation>`.
*
- Model
- Architecture
- LLVM target name
- VRAM
- Compute Units
- Warp Size
- LDS
- L3 Cache
- L2 Cache
- L1 Vector Cache
- L1 Scalar Cache
- L1 Instruction Cache
- VGPR File
- SGPR File
*
- MI300X
- CDNA3
- gfx941 or gfx942
- 192 GiB
- 304
- 64
- 64 KiB
- 256 MiB
- 32 MiB
- 32 KiB
- 16 KiB per 2 CUs
- 64 KiB per 2 CUs
- 512 KiB
- 12.5 KiB
*
- MI300A
- CDNA3
- gfx940 or gfx942
- 128 GiB
- 228
- 64
- 64 KiB
- 256 MiB
- 24 MiB
- 32 KiB
- 16 KiB per 2 CUs
- 64 KiB per 2 CUs
- 512 KiB
- 12.5 KiB
*
- MI250X
- CDNA2
- gfx90a
- 128 GiB
- 220 (110 per GCD)
- 64
- 64 KiB
-
- 16 MiB (8 MiB per GCD)
- 16 KiB
- 16 KiB per 2 CUs
- 32 KiB per 2 CUs
- 512 KiB
- 12.5 KiB
*
- MI250
- CDNA2
- gfx90a
- 128 GiB
- 208
- 64
- 64 KiB
-
- 16 MiB (8 MiB per GCD)
- 16 KiB
- 16 KiB per 2 CUs
- 32 KiB per 2 CUs
- 512 KiB
- 12.5 KiB
*
- MI210
- CDNA2
- gfx90a
- 64 GiB
- 104
- 64
- 64 KiB
-
- 8 MiB
- 16 KiB
- 16 KiB per 2 CUs
- 32 KiB per 2 CUs
- 512 KiB
- 12.5 KiB
*
- MI100
- CDNA
- gfx908
- 32 GiB
- 120
- 64
- 64 KiB
-
- 8 MiB
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB VGPR and 256 KiB AccVGPR
- 12.5 KiB
*
- MI60
- GCN5.1
- gfx906
- 32 GiB
- 64
- 64
- 64 KiB
-
- 4 MiB
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB
- 12.5 KiB
*
- MI50 (32GB)
- GCN5.1
- gfx906
- 32 GiB
- 60
- 64
- 64 KiB
-
- 4 MiB
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB
- 12.5 KiB
*
- MI50 (16GB)
- GCN5.1
- gfx906
- 16 GiB
- 60
- 64
- 64 KiB
-
- 4 MiB
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB
- 12.5 KiB
*
- MI25
- GCN5.0
- gfx900
- 16 GiB
- 64
- 64
- 64 KiB
-
- 4 MiB
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB
- 12.5 KiB
*
- MI8
- GCN3.0
- gfx803
- 4 GiB
- 64
- 64
- 64 KiB
-
- 2 MiB
- 16 KiB
- 16 KiB per 4 CUs
- 32 KiB per 4 CUs
- 256 KiB
- 12.5 KiB
*
- MI6
- GCN4.0
- gfx803
- 16 GiB
- 36
- 64
- 64 KiB
-
- 2 MiB
- 16 KiB
- 16 KiB per 4 CUs
- 32 KiB per 4 CUs
- 256 KiB
- 12.5 KiB
LLVM target name
Argument to pass to clang in `--offload-arch` to compile code for the given architecture.
VRAM
Amount of memory available on the GPU.
Compute Units
Number of compute units on the GPU.
Wavefront Size
Amount of work-items that execute in parallel on a single compute unit. This is equivalent to the warp size in HIP.
LDS
The Local Data Share (LDS) is a low-latency, high-bandwidth scratch pad memory. It is local to the compute units, shared by all work-items in a work group. In HIP this is the shared memory, which is shared by all threads in a block.
L3 Cache
Size of the level 3 cache. Shared by all compute units on the same GPU. Caches vector and scalar data and instructions.
L2 Cache
Size of the level 3 cache. Shared by all compute units on the same GCD. Caches vector and scalar data and instructions.
L1 Vector Cache
Size of the level 1 vector data cache. Local to a compute unit. Caches vector data.
L1 Scalar Cache
Size of the level 1 scalar data cache. Usually shared by several compute units. Caches scalar data.
L1 Instruction Cache
Size of the level 1 instruction cache. Usually shared by several compute units.
VGPR File
Size of the Vector General Purpose Register (VGPR) file. Holds data used in vector instructions.
GPUs with matrix cores also have AccVGPRs, which are Accumulation General Purpose Vector Registers, specifically used in matrix instructions.
SGPR File
Size of the Scalar General Purpose Register (SGPR) file. Holds data used in scalar instructions.
GCD
Graphics Compute Die.
.. tab-item:: AMD Radeon Pro GPUs
.. list-table::
:header-rows: 1
:name: radeon-pro-arch-spec-table
*
- Model
- Architecture
- LLVM target name
- VRAM
- Compute Units
- Warp Size
- LDS
- Infinity Cache
- L2 Cache
- Graphics L1 Cache
- L0 Vector Cache
- L0 Scalar Cache
- L0 Instruction Cache
- VGPR File
- SGPR File
*
- Radeon PRO W7900
- RDNA3
- gfx1100
- 48 GiB
- 96
- 32
- 128 KiB
- 96 MiB
- 6 MiB
- 256 KiB
- 32 KiB
- 16 KiB
- 32 KiB
- 384 KiB
- 20 KiB
*
- Radeon PRO W7800
- RDNA3
- gfx1100
- 32 GiB
- 70
- 32
- 128 KiB
- 64 MiB
- 6 MiB
- 256 KiB
- 32 KiB
- 16 KiB
- 32 KiB
- 384 KiB
- 20 KiB
*
- Radeon PRO W7700
- RDNA3
- gfx1101
- 16 GiB
- 48
- 32
- 128 KiB
- 64 MiB
- 4 MiB
- 256 KiB
- 32 KiB
- 16 KiB
- 32 KiB
- 384 KiB
- 20 KiB
*
- Radeon PRO W6800
- RDNA2
- gfx1030
- 32 GiB
- 60
- 32
- 128 KiB
- 128 MiB
- 4 MiB
- 128 KiB
- 16 KiB
- 16 KiB
- 32 KiB
- 256 KiB
- 20 KiB
*
- Radeon PRO W6600
- RDNA2
- gfx1032
- 8 GiB
- 28
- 32
- 128 KiB
- 32 MiB
- 2 MiB
- 128 KiB
- 16 KiB
- 16 KiB
- 32 KiB
- 256 KiB
- 20 KiB
*
- Radeon PRO V620
- RDNA2
- gfx1030
- 32 GiB
- 72
- 32
- 128 KiB
- 128 MiB
- 4 MiB
- 128 KiB
- 16 KiB
- 16 KiB
- 32 KiB
- 256 KiB
- 20 KiB
*
- Radeon Pro W5500
- RDNA
- gfx1012
- 8 GiB
- 22
- 32
- 128 KiB
-
- 4 MiB
- 128 KiB
- 16 KiB
- 16 KiB
- 32 KiB
- 256 KiB
- 20 KiB
*
- Radeon Pro VII
- GCN5.1
- gfx906
- 16 GiB
- 60
- 64
- 64 KiB
-
- 4 MiB
-
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB
- 12.5 KiB
.. tab-item:: AMD Radeon GPUs
.. list-table::
:header-rows: 1
:name: radeon-arch-spec-table
*
- Model
- Architecture
- LLVM target name
- VRAM
- Compute Units
- Warp Size
- LDS
- Infinity Cache
- L2 Cache
- Graphics L1 Cache
- L0 Vector Cache
- L0 Scalar Cache
- L0 Instruction Cache
- VGPR File
- SGPR File
*
- Radeon RX 7900 XTX
- RDNA3
- gfx1100
- 24 GiB
- 96
- 32
- 128 KiB
- 96 MiB
- 6 MiB
- 256 KiB
- 32 KiB
- 16 KiB
- 32 KiB
- 384 KiB
- 20 KiB
*
- Radeon RX 7900 XT
- RDNA3
- gfx1100
- 20 GiB
- 84
- 32
- 128 KiB
- 80 MiB
- 6 MiB
- 256 KiB
- 32 KiB
- 16 KiB
- 32 KiB
- 384 KiB
- 20 KiB
*
- Radeon RX 7900 GRE
- RDNA3
- gfx1100
- 16 GiB
- 80
- 32
- 128 KiB
- 64 MiB
- 6 MiB
- 256 KiB
- 32 KiB
- 16 KiB
- 32 KiB
- 384 KiB
- 20 KiB
*
- Radeon RX 7800 XT
- RDNA3
- gfx1101
- 16 GiB
- 60
- 32
- 128 KiB
- 64 MiB
- 4 MiB
- 256 KiB
- 32 KiB
- 16 KiB
- 32 KiB
- 384 KiB
- 20 KiB
*
- Radeon RX 7700 XT
- RDNA3
- gfx1101
- 12 GiB
- 54
- 32
- 128 KiB
- 48 MiB
- 4 MiB
- 256 KiB
- 32 KiB
- 16 KiB
- 32 KiB
- 384 KiB
- 20 KiB
*
- Radeon RX 7600
- RDNA3
- gfx1102
- 8 GiB
- 32
- 32
- 128 KiB
- 32 MiB
- 2 MiB
- 256 KiB
- 32 KiB
- 16 KiB
- 32 KiB
- 256 KiB
- 20 KiB
*
- Radeon RX 6950 XT
- RDNA2
- gfx1030
- 16 GiB
- 80
- 32
- 128 KiB
- 128 MiB
- 4 MiB
- 128 KiB
- 16 KiB
- 16 KiB
- 32 KiB
- 256 KiB
- 20 KiB
*
- Radeon RX 6900 XT
- RDNA2
- gfx1030
- 16 GiB
- 80
- 32
- 128 KiB
- 128 MiB
- 4 MiB
- 128 KiB
- 16 KiB
- 16 KiB
- 32 KiB
- 256 KiB
- 20 KiB
*
- Radeon RX 6800 XT
- RDNA2
- gfx1030
- 16 GiB
- 72
- 32
- 128 KiB
- 128 MiB
- 4 MiB
- 128 KiB
- 16 KiB
- 16 KiB
- 32 KiB
- 256 KiB
- 20 KiB
*
- Radeon RX 6800
- RDNA2
- gfx1030
- 16 GiB
- 60
- 32
- 128 KiB
- 128 MiB
- 4 MiB
- 128 KiB
- 16 KiB
- 16 KiB
- 32 KiB
- 256 KiB
- 20 KiB
*
- Radeon RX 6750 XT
- RDNA2
- gfx1031
- 12 GiB
- 40
- 32
- 128 KiB
- 96 MiB
- 3 MiB
- 128 KiB
- 16 KiB
- 16 KiB
- 32 KiB
- 256 KiB
- 20 KiB
*
- Radeon RX 6700 XT
- RDNA2
- gfx1031
- 12 GiB
- 40
- 32
- 128 KiB
- 96 MiB
- 3 MiB
- 128 KiB
- 16 KiB
- 16 KiB
- 32 KiB
- 256 KiB
- 20 KiB
*
- Radeon RX 6700
- RDNA2
- gfx1031
- 10 GiB
- 36
- 32
- 128 KiB
- 80 MiB
- 3 MiB
- 128 KiB
- 16 KiB
- 16 KiB
- 32 KiB
- 256 KiB
- 20 KiB
*
- Radeon RX 6650 XT
- RDNA2
- gfx1032
- 8 GiB
- 32
- 32
- 128 KiB
- 32 MiB
- 2 MiB
- 128 KiB
- 16 KiB
- 16 KiB
- 32 KiB
- 256 KiB
- 20 KiB
*
- Radeon RX 6600 XT
- RDNA2
- gfx1032
- 8 GiB
- 32
- 32
- 128 KiB
- 32 MiB
- 2 MiB
- 128 KiB
- 16 KiB
- 16 KiB
- 32 KiB
- 256 KiB
- 20 KiB
*
- Radeon RX 6600
- RDNA2
- gfx1032
- 8 GiB
- 28
- 32
- 128 KiB
- 32 MiB
- 2 MiB
- 128 KiB
- 16 KiB
- 16 KiB
- 32 KiB
- 256 KiB
- 20 KiB
*
- Radeon VII
- GCN5.1
- gfx906
- 16 GiB
- 60
- 64
- 64 KiB per CU
-
- 4 MiB
-
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB
- 12.5 KiB
For a detailed explanation of the terms refer to the :ref:`specific documents and guides <gpu-arch-documentation>` or the :ref:`HIP programming guide <HIP:understand/programming_model>`.