Add instinct gpu architectures information (#2859)

* Add instinct gpu architectures information * Improve gpu architecture table Move table to "reference" instead of "conceptual" * Add HIP terminology to GPU Arch glossary
2026-01-10 23:28:03 -05:00 · 2024-02-29 23:03:23 +01:00
parent 2666c51c54
commit cd586348f5
6 changed files with 269 additions and 0 deletions
--- a/.wordlist.txt
+++ b/.wordlist.txt
@@ -2,6 +2,8 @@ AAC
 ABI
 ACE
 ACEs
+AccVGPR
+AccVGPRs
 ALU
 AMD
 AMDGPU
@@ -103,6 +105,7 @@ GDS
 GEMM
 GEMMs
 GFortran
+GiB
 GIM
 GL
 GLXT
@@ -154,6 +157,7 @@ Ioffe
 JSON
 Jupyter
 KFD
+KiB
 KVM
 Keras
 Khronos
@@ -170,6 +174,7 @@ LoRA
 MEM
 MERCHANTABILITY
 MFMA
+MiB
 MIGraphX
 MIOpen
 MIOpenGEMM
--- a/docs/conceptual/gpu-arch.md
+++ b/docs/conceptual/gpu-arch.md
@@ -5,6 +5,8 @@
  MI100, AMD Instinct">
 </head>

+(gpu-arch-documentation)=
+
 # GPU architecture documentation

 :::::{grid} 1 1 2 2
--- a/docs/index.md
+++ b/docs/index.md
@@ -77,6 +77,8 @@ Our documentation is organized into the following categories:
  * Development
  * Performance analysis
  * System
+* [GPU architectures](./reference/gpu-arch.rst)
+  * [GPU architecture hardware specification overview](./reference/gpu-arch/gpu-arch-spec-overview.rst)
 :::

 :::{grid-item-card}
--- a/docs/reference/gpu-arch.rst
+++ b/docs/reference/gpu-arch.rst
@@ -0,0 +1,13 @@
+.. meta::
+    :description: GPU Architecture reference
+    :keywords: AMD, GPU, architecture, hardware, CDNA, Instinct, reference
+
+.. _gpu-arch-reference:
+
+GPU architecture reference
+##########################
+
+General overview
+""""""""""""""""
+
+* :doc:`GPU architecture hardware specifications overview<gpu-arch/gpu-arch-spec-overview>`
--- a/docs/reference/gpu-arch/gpu-arch-spec-overview.rst
+++ b/docs/reference/gpu-arch/gpu-arch-spec-overview.rst
@@ -0,0 +1,241 @@
+.. meta::
+   :description: AMD Instinct™ GPU architecture information
+   :keywords: Instinct, CDNA, GPU, architecture, VRAM, Compute Units, Cache, Registers, LDS, Register File
+
+GPU architecture hardware specifications
+########################################
+
+The following table provides an overview over the hardware specifications for the AMD Instinct accelerators.
+
+.. list-table:: AMD Instinct architecture specification table
+    :header-rows: 1
+    :name: instinct-arch-spec-table
+
+    *
+      - Model
+      - Architecture
+      - LLVM target name
+      - VRAM
+      - Compute Units
+      - Wavefront Size
+      - LDS
+      - L3 Cache
+      - L2 Cache
+      - L1 Vector Cache
+      - L1 Scalar Cache
+      - L1 Instruction Cache
+      - VGPR File
+      - SGPR File
+    *
+      - MI300X
+      - CDNA3
+      - gfx941 or gfx942
+      - 192 GiB
+      - 304
+      - 64
+      - 64 KiB
+      - 256 MiB
+      - 32 MiB
+      - 32 KiB
+      - 16 KiB per 2 CUs
+      - 64 KiB per 2 CUs
+      - 512 KiB
+      - 12.5 KiB
+    *
+      - MI300A
+      - CDNA3
+      - gfx940 or gfx942
+      - 128 GiB
+      - 228
+      - 64
+      - 64 KiB
+      - 256 MiB
+      - 24 MiB
+      - 32 KiB
+      - 16 KiB per 2 CUs
+      - 64 KiB per 2 CUs
+      - 512 KiB
+      - 12.5 KiB
+    *
+      - MI250X
+      - CDNA2
+      - gfx90a
+      - 128 GiB
+      - 220 (110 per GCD)
+      - 64
+      - 64 KiB
+      -
+      - 16 MiB (8 MiB per GCD)
+      - 16 KiB
+      - 16 KiB per 2 CUs
+      - 32 KiB per 2 CUs
+      - 512 KiB
+      - 12.5 KiB
+    *
+      - MI250
+      - CDNA2
+      - gfx90a
+      - 128 GiB
+      - 208
+      - 64
+      - 64 KiB
+      -
+      - 16 MiB (8 MiB per GCD)
+      - 16 KiB
+      - 16 KiB per 2 CUs
+      - 32 KiB per 2 CUs
+      - 512 KiB
+      - 12.5 KiB
+    *
+       - MI210
+       - CDNA2
+       - gfx90a
+       - 64 GiB
+       - 104
+       - 64
+       - 64 KiB
+       -
+       - 8 MiB
+       - 16 KiB
+       - 16 KiB per 2 CUs
+       - 32 KiB per 2 CUs
+       - 512 KiB
+       - 12.5 KiB
+    *
+      - MI100
+      - CDNA
+      - gfx908
+      - 32 GiB
+      - 120
+      - 64
+      - 64 KiB
+      -
+      - 8 MiB
+      - 16 KiB
+      - 16 KiB per 3 CUs
+      - 32 KiB per 3 CUs
+      - 256 KiB VGPR and 256 KiB AccVGPR
+      - 12.5 KiB
+    *
+      - MI60
+      - GCN 5.1
+      - gfx906
+      - 32 GiB
+      - 64
+      - 64
+      - 64 KiB
+      -
+      - 4 MiB
+      - 16 KiB
+      - 16 KiB per 3 CUs
+      - 32 KiB per 3 CUs
+      - 256 KiB
+      - 12.5 KiB
+    *
+      - MI50 (32GB)
+      - GCN 5.1
+      - gfx906
+      - 32 GiB
+      - 60
+      - 64
+      - 64 KiB
+      -
+      - 4 MiB
+      - 16 KiB
+      - 16 KiB per 3 CUs
+      - 32 KiB per 3 CUs
+      - 256 KiB
+      - 12.5 KiB
+    *
+      - MI50 (16GB)
+      - GCN 5.1
+      - gfx906
+      - 16 GiB
+      - 60
+      - 64
+      - 64 KiB
+      -
+      - 4 MiB
+      - 16 KiB
+      - 16 KiB per 3 CUs
+      - 32 KiB per 3 CUs
+      - 256 KiB
+      - 12.5 KiB
+    *
+      - MI25
+      - GCN 5.0
+      - gfx900
+      - 16 GiB
+      - 64
+      - 64
+      - 64 KiB
+      -
+      - 4 MiB
+      - 16 KiB
+      - 16 KiB per 3 CUs
+      - 32 KiB per 3 CUs
+      - 256 KiB
+      - 12.5 KiB
+    *
+      - MI8
+      - GCN 3.0
+      - gfx803
+      - 4 GiB
+      - 64
+      - 64
+      - 64 KiB
+      -
+      - 2 MiB
+      - 16 KiB
+      - 16 KiB per 4 CUs
+      - 32 KiB per 4 CUs
+      - 256 KiB
+      - 12.5 KiB
+    *
+      - MI6
+      - GCN 4.0
+      - gfx803
+      - 16 GiB
+      - 36
+      - 64
+      - 64 KiB
+      -
+      - 2 MiB
+      - 16 KiB
+      - 16 KiB per 4 CUs
+      - 32 KiB per 4 CUs
+      - 256 KiB
+      - 12.5 KiB
+
+Glossary
+########
+
+For a more detailed explanation refer to the :ref:`specific documents and guides <gpu-arch-documentation>`.
+
+LLVM target name
+  Argument to pass to clang in `--offload-arch` to compile code for the given architecture.
+VRAM
+  Amount of memory available on the GPU.
+Compute Units
+  Number of compute units on the GPU.
+Wavefront Size
+  Amount of work-items that execute in parallel on a single compute unit. This is equivalent to the warp size in HIP.
+LDS
+  The Local Data Share (LDS) is a low-latency, high-bandwidth scratch pad memory. It is local to the compute units, shared by all work-items in a work group. In HIP this is the shared memory, which is shared by all threads in a block.
+L3 Cache
+  Size of the level 3 cache. Shared by all compute units on the same GPU. Caches vector and scalar data and instructions.
+L2 Cache
+  Size of the level 3 cache. Shared by all compute units on the same GCD. Caches vector and scalar data and instructions.
+L1 Vector Cache
+  Size of the level 1 vector data cache. Local to a compute unit. Caches vector data.
+L1 Scalar Cache
+  Size of the level 1 scalar data cache. Usually shared by several compute units. Caches scalar data.
+L1 Instruction Cache
+  Size of the level 1 instruction cache. Usually shared by several compute units.
+VGPR File
+  Size of the Vector General Purpose Register (VGPR) file. Holds data used in vector instructions.
+  GPUs with matrix cores also have AccVGPRs, which are Accumulation General Purpose Vector Registers, specifically used in matrix instructions.
+SGPR File
+  Size of the Scalar General Purpose Register (SGPR) file. Holds data used in scalar instructions.
+GCD
+  Graphics Compute Die.
--- a/docs/sphinx/_toc.yml.in
+++ b/docs/sphinx/_toc.yml.in
@@ -36,6 +36,12 @@ subtrees:
      title: API libraries
    - file: reference/rocm-tools.md
      title: Tools
+    - file: reference/gpu-arch.rst
+      title: GPU architectures
+      subtrees:
+      - entries:
+        - file: reference/gpu-arch/gpu-arch-spec-overview.rst
+          title: Hardware specifications overview

 - caption: How-to
  entries: