Add glossary (#5935)

Signed-off-by: Jan Stephan <jan.stephan@amd.com>
This commit is contained in:
Jan Stephan
2026-02-20 20:09:51 +01:00
committed by GitHub
parent 221f963d31
commit 61c0c31481
8 changed files with 573 additions and 122 deletions

View File

@@ -69,6 +69,7 @@ ROCm documentation is organized into the following categories:
* [Environment variables](./reference/env-variables.rst)
* [Data types and precision support](./reference/precision-support.rst)
* [Graph safe support](./reference/graph-safe-support.rst)
* [ROCm glossary](./reference/glossary.rst)
<!-- markdownlint-enable MD051 -->
:::

View File

@@ -0,0 +1,24 @@
.. meta::
:description: AMD ROCm Glossary
:keywords: AMD, ROCm, glossary, terminology, device hardware,
device software, host software, performance
.. _glossary:
********************************************************************************
ROCm glossary
********************************************************************************
This glossary provides concise definitions of key terms and concepts in AMD ROCm
programming. Each entry includes a brief description and a link to detailed
documentation for in-depth information.
The glossary is organized into four sections:
* :doc:`glossary/device-hardware` — Hardware components (for example, Compute
Units, cores, memory)
* :doc:`glossary/device-software` — Software abstractions (programming model,
ISA, thread hierarchy)
* :doc:`glossary/host-software` — Development tools (HIP, compilers, libraries,
profilers)
* :doc:`glossary/performance` — Performance metrics and optimization concepts

View File

@@ -0,0 +1,254 @@
.. meta::
:description: Device hardware glossary for AMD GPUs
:keywords: AMD, ROCm, GPU, device hardware, compute units, cores, MFMA,
architecture, register file, cache, HBM
.. _glossary-device-hardware:
************************
Device hardware glossary
************************
This section provides concise definitions of hardware components and architectural
features of AMD GPUs.
.. glossary::
:sorted:
AMD device architecture
AMD device architecture is based on unified, programmable compute
engines known as :term:`compute units (CUs) <Compute units>`. See
:ref:`hip:hardware_implementation` for details.
Compute units
Compute units (CUs) are the fundamental programmable execution engines
in AMD GPUs capable of running complex programs. See
:ref:`hip:compute_unit` for details.
ALU
Arithmetic logic units (ALUs) are the primary arithmetic engines that
execute mathematical and logical operations within
:term:`compute units <Compute units>`. See :ref:`hip:valu` for details.
SALU
Scalar :term:`ALUs <ALU>` (SALUs) operate on a single value per
:term:`wavefront <Wavefront>` and manage all control flow.
VALU
Vector :term:`ALUs <ALU>` (VALUs) perform an arithmetic or logical
operation on data for each :term:`work-item <Work-item (Thread)>` in a
:term:`wavefront <Wavefront>`, enabling data-parallel execution.
Special function unit
Special function units (SFUs) accelerate transcendental and reciprocal
mathematical functions such as ``exp``, ``log``, ``sin``, and ``cos``.
See :ref:`hip:sfu` for details.
Load/store unit
Load/store units (LSUs) handle data transfer between
:term:`compute units <Compute units>` and the GPU's memory subsystems,
managing thousands of concurrent memory operations. See :ref:`hip:lsu`
for details.
Work-group (Block)
A work-group (also called a block) is a collection of
:term:`wavefronts <Wavefront (Warp)>` scheduled together on a single
:term:`compute unit <Compute units>` that can coordinate through
:term:`Local data share <Local data share>` memory. See
:ref:`hip:inherent_thread_hierarchy_block` for work-group details.
Work-item (Thread)
A work-item (also called a thread) is the smallest unit of execution on
an AMD GPU and represents a single element of work. See
:ref:`hip:work-item` for thread hierarchy details.
Wavefront (Warp)
A wavefront (also called a warp) is a group of
:term:`work-items <Work-item (Thread)>` that execute in parallel on a
single :term:`compute unit <Compute units>`, sharing one
instruction stream. See :ref:`hip:wavefront` for execution details.
Wavefront scheduler
The wavefront scheduler in each :term:`compute unit <Compute units>`
decides which :term:`wavefront <wavefront>` to execute each clock cycle,
enabling rapid context switching for latency hiding. See
:ref:`hip:wave-scheduling` for details.
Wavefront size
The wavefront size is the number of
:term:`work-items <Work-item (Thread)>` that execute together in a
single :term:`wavefront <Wavefront (Warp)>`. For AMD Instinct GPUs, the
wavefront size is 64 threads, while AMD Radeon GPUs have a wavefront
size of 32 threads. See :ref:`hip:wavefront` for details.
SIMD core
SIMD cores are execution lanes that perform scalar and vector arithmetic
operations inside each :term:`compute unit <Compute unit>`. See
:ref:`hip:cdna_architecture` and :ref:`hip:rdna_architecture` for
details.
Matrix cores (MFMA units)
Matrix cores (MFMA units) are specialized execution units that perform
large-scale matrix operations in a single instruction, delivering high
throughput for AI and HPC workloads. See :ref:`hip:mfma_units` for
details.
Data movement engine
Data movement engines (DMEs) are specialized hardware units in AMD
Instinct MI300 and MI350 series GPUs that accelerate multi-dimensional
tensor data copies between global memory and on-chip memory. See
:ref:`hip:dme` for details.
GFX IP
GFX IP (Graphics IP) versions are identifiers that specify which
instruction formats, memory models, and compute features are supported
by each AMD GPU generation. See :ref:`hip:gfx_ip` for versioning
information.
GFX IP major version
The :term:`GFX IP <GFX IP>` major version represents the GPU's core
instruction set and architecture. For example, a GFX IP `11` major
version corresponds to the RDNA3 architecture, influencing driver
support and available compute features. See :ref:`hip:gfx_ip` for
versioning information.
GFX IP minor version
The :term:`GFX IP <GFX IP>` minor version represents specific variations
within a :term:`GFX IP <GFX IP>` major version and affects feature sets,
optimizations, and driver behavior. Different GPU models within the same
major version can have unique capabilities, impacting performance and
supported instructions. See :ref:`hip:gfx_ip` for versioning
information.
Compute unit versioning
:term:`Compute units <Compute units>` are versioned with
:term:`GFX IP <GFX IP>` identifiers that define their microarchitectural
features and instruction set compatibility. See :ref:`hip:gfx_ip` for
details.
Register file
The register file is the primary on-chip memory store in each
:term:`compute unit <Compute units>`, holding data between arithmetic
and memory operations. See :ref:`hip:memory_hierarchy` for details.
SGPR file
The :term:`SGPR <SGPR>` file is the
:term:`register file <Register file>` that holds data used by the
:term:`scalar ALU <SALU>`.
VGPR file
The :term:`VGPR <VGPR>` file is the
:term:`register file <Register file>` that holds data used by the
:term:`vector ALU <VALU>`. GPUs with
:term:`matrix cores <Matrix cores (MFMA units)>` also have
:term:`AccVGPR <AccVGPR>` files, used specifically for matrix
instructions.
L0 instruction cache
On AMD Radeon GPUs, the level 0 (L0) instruction cache is local to each
:term:`WGP <WGP>` and thus shared between the WGP's
:term:`compute units <Compute units>`.
L0 scalar cache
On AMD Radeon GPUs, the level 0 (L0) scalar data cache is local to each
:term:`WGP <WGP>` and thus shared between the WGP's
:term:`compute units <Compute units>`. It provides the
:term:`scalar ALU <SALU>` with fast access to recently used data.
L0 vector cache
On AMD Radeon GPUs, the level 0 (L0) vector data cache is local to each
:term:`WGP <WGP>` and thus shared between the WGP's
:term:`compute units <Compute units>`. It provides the
:term:`vector ALU <VALU>` with fast access to recently used data.
L1 instruction cache
On AMD Instinct GPUs, the level 1 (L1) instruction cache is local to
each :term:`compute unit <Compute units>`. On AMD Radeon GPUs, the
L1 instruction cache does not exist as a separate cache level, and
instructions are stored in the
:term:`L0 instruction cache <L0 instruction cache>`.
L1 scalar cache
On AMD Instinct GPUs, the level 1 (L1) scalar data cache is local to
each :term:`compute unit <Compute units>`, providing the
:term:`scalar ALU <SALU>` with fast access to recently used data. On AMD
Radeon GPUs, the L1 scalar cache does not exist as a separate cache
level, and recently used scalar data is stored in the
:term:`L0 scalar cache <L0 scalar cache>`.
L1 vector cache
On AMD Instinct GPUs, the level 1 (L1) vector data cache is local to
each :term:`compute unit <Compute units>`, providing the
:term:`vector ALU <VALU>` with fast access to recently used data. On AMD
Radeon GPUs, the L1 vector cache does not exist as a separate cache
level, and recently used vector data is stored in the
:term:`L0 vector cache <L0 vector cache>`.
Graphics L1 cache
On AMD Radeon GPUs, the read-only graphics level 1 (L1) cache is local
to groups of :term:`WGPs <WGP>` called shader arrays, providing fast
access to recently used data. AMD Instinct GPUs do not feature the
graphics L1 cache.
L2 cache
On AMD Instinct MI100 series GPUs, the L2 cache is shared across the
entire chip, while for all other AMD GPUs the L2 caches are shared by
the :term:`compute units <Compute units>` on the same :term:`GCD <GCD>`
or :term:`XCD <XCD>`.
Infinity Cache (L3 cache)
On AMD Instinct MI300 and MI350 series GPUs and AMD Radeon GPUs, the
Infinity Cache is the last level cache of the cache hierarchy. It is
shared by all :term:`compute units <Compute units>` and
:term:`WGPs <WGP>` on the GPU.
GPU RAM (VRAM)
GPU RAM, also known as :term:`global memory <Global memory>` in the HIP
programming model, is the large, high-capacity off-chip memory subsystem
accessible by all :term:`compute units <Compute units>`, forming the
foundation of the device's :ref:`memory hierarchy <hip:hbm>`.
Local data share
Local data share (LDS) is fast on-chip memory local to each
:term:`compute unit <Compute units>` and shared among
:term:`work-items <Work-item (Thread)>` in a
:term:`work-group <Work-group (Block)>`, enabling efficient coordination
and data reuse. In the HIP programming model, the LDS is known as shared
memory. See :ref:`hip:lds` for LDS programming details.
Registers
Registers are the lowest level of the memory hierarchy, storing
per-thread temporary variables and intermediate results. See
:ref:`hip:memory_hierarchy` for register usage details.
SGPR
Scalar general-purpose :term:`registers <Registers>` (SGPRs) hold data
produced and consumed by a :term:`compute unit <Compute units>`'s
:term:`scalar ALU <SALU>`.
VGPR
Vector general-purpose :term:`registers <Registers>` (VGPRs) hold data
produced and consumed by a :term:`compute unit <Compute units>`'s
:term:`vector ALU <VALU>`.
AccVGPR
Accumulation General Purpose Vector Registers (AccVGPRs) are a special
type of :term:`VGPRs <VGPR>` used exclusively for matrix operations.
XCD
On AMD Instinct MI300 and MI350 series GPUs, the Accelerator Complex Die
(XCD) contains the GPU's computational elements and lower levels of the
cache hierarchy. See :doc:`../../conceptual/gpu-arch/mi300` for details.
GCD
On AMD Instinct MI100 and MI250 series GPUs and AMD Radeon GPUs, the
Graphics Compute Die (GCD) contains the GPU's computational elements
and lower levels of the cache hierarchy. See
:doc:`../../conceptual/gpu-arch/mi250` for details.
WGP
A Workgroup Processor (WGP) is a hardware unit on AMD Radeon GPUs that
contains two :term:`compute units <Compute units>` and their associated
resources, enabling efficient scheduling and execution of
:term:`wavefronts <wavefront>`. See :ref:`hip:rdna_architecture` for
details.

View File

@@ -0,0 +1,74 @@
.. meta::
:description: Device software glossary for AMD GPUs
:keywords: AMD, ROCm, GPU, device software, programming model, AMDGPU,
assembly, IR, GFX IP, wavefront, work-group, HIP kernel, thread hierarchy
.. _glossary-device-software:
************************
Device software glossary
************************
This section provides brief definitions of software abstractions and programming
models that run on AMD GPUs.
.. glossary::
:sorted:
ROCm programming model
The ROCm programming model defines how AMD GPUs execute massively
parallel programs using hierarchical
:term:`work-groups <Work-group (Block)>`, memory scopes, and barrier
synchronization. See :ref:`hip:programming_model` for complete details.
AMDGPU assembly
AMDGPU assembly (GFX ISA) is the low-level assembly format for programs
running on AMD GPUs, generated by the
:term:`ROCm compiler toolchain <HIP compiler>`. See
:ref:`hip:amdgpu_assembly` for instruction set details.
AMDGPU intermediate representation
AMDGPU IR is an intermediate representation for GPU code, serving as a
virtual instruction set between high-level languages and
:term:`architecture-specific assembly <AMDGPU assembly>`. See
:ref:`hip:amdgpu_ir` for compilation details.
LLVM target name
The LLVM target name is a string identifier corresponding to a specific
:term:`GFX IP <GFX IP>` version that is passed to the
:term:`HIP compiler <HIP compiler>` toolchain to specify the target GPU
architecture for code generation.
See :doc:`llvm-project:reference/rocmcc` for details.
Grid
A grid represents the collection of all
:term:`work-groups <Work-group (Block)>` executing a single
:term:`kernel <HIP kernel>` across the entire GPU. See
:ref:`hip:inherent_thread_hierarchy_grid` for grid execution details.
HIP kernel
A HIP kernel is the unit of GPU code that executes in parallel across
many :term:`threads <Work-item (Thread)>`, distributed across the GPU's
:term:`compute units <Compute units>`. See :ref:`hip:device_program` for
kernel programming details.
HIP thread hierarchy
The thread hierarchy structures parallel work from individual
:term:`threads <Work-item (Thread)>` to
:term:`blocks <Work-group (Block)>` to :term:`grids <Grid>`, mapping
onto hardware from :term:`SIMD lanes <SIMD core>` to
:term:`compute units <Compute units>` to the entire GPU. See
:ref:`hip:inherent_thread_model` for complete details.
HIP memory hierarchy
The memory hierarchy pairs each
:term:`thread hierarchy <HIP thread hierarchy>` level with corresponding
memory scopes, from :term:`private registers <Register>` to
:term:`LDS <Local data share>` to :term:`GPU RAM <GPU RAM (VRAM)>`. See
:ref:`hip:memory_hierarchy` for memory architecture details.
Global memory
Global memory is the :term:`device-wide memory <GPU RAM (VRAM)>`
accessible to all :term:`threads <Work-item (Thread)>`, physically
implemented as HBM or GDDR. See :ref:`hip:memory_hierarchy` for global
memory details.

View File

@@ -0,0 +1,67 @@
.. meta::
:description: Host software glossary for AMD GPUs
:keywords: AMD, ROCm, GPU, host software, HIP, compiler, runtime, libraries,
profiler, amd-smi
.. _glossary-host-software:
**********************
Host software glossary
**********************
This section provides brief definitions of development tools, compilers,
libraries, and runtime environments for programming AMD GPUs.
.. glossary::
:sorted:
ROCm software platform
ROCm is AMD's GPU software stack, providing compiler
toolchains, runtime environments, and performance libraries for HPC and
AI applications. See :doc:`../../what-is-rocm` for a complete component
overview.
HIP C++ language extension
HIP extends the C++ language with additional features designed for
programming heterogeneous applications. These extensions mostly relate
to the kernel language, but some can also be applied to host
functionality. See :doc:`hip:how-to/hip_cpp_language_extensions` for
language fundamentals.
AMD SMI
The ``amd-smi`` command-line utility queries, monitors, and manages
AMD GPU state, providing hardware information and performance metrics.
See :doc:`amdsmi:index` for detailed usage.
HIP runtime API
The HIP runtime API provides an interface for GPU programming, offering
functions for memory management, kernel launches, and synchronization. See
:ref:`hip:hip_runtime_api_how-to` for API overview.
HIP compiler
The HIP compiler ``amdclang++`` compiles HIP C++ programs into binaries
that contain both host CPU and device GPU code. See
:doc:`llvm-project:reference/rocmcc` for compiler flags and options.
HIP runtime compiler
The HIP Runtime Compiler (HIPRTC) compiles HIP source code at runtime
into :term:`AMDGPU <AMDGPU assembly>` binary code objects, enabling
just-in-time kernel generation, device-specific optimization, and
dynamic code creation for different GPUs. See
:ref:`hip:hip_runtime_compiler_how-to` for API details.
ROCgdb
ROCgdb is AMD's source-level debugger for HIP and ROCm applications,
enabling debugging of both host CPU and GPU device code, including
kernel breakpoints, stepping, and variable inspection. See
:doc:`rocgdb:index` for usage and command reference.
rocprofv3
``rocprofv3`` is AMD's primary performance analysis tool, providing
profiling, tracing, and performance counter collection.
See :ref:`rocprofiler-sdk:using-rocprofv3` for profiling workflows.
ROCm and LLVM binary utilities
ROCm and LLVM binary utilities are command-line tools for examining and
manipulating GPU binaries and code objects. See
:ref:`hip:binary_utilities` for utility details.

View File

@@ -0,0 +1,135 @@
.. meta::
:description: Performance glossary for AMD GPUs
:keywords: AMD, ROCm, GPU, performance, optimization, roofline, bottleneck,
occupancy, bandwidth, latency hiding, divergence
.. _glossary-performance:
*****************************
Performance analysis glossary
*****************************
This section provides brief definitions of performance analysis concepts and
optimization techniques.
.. glossary::
:sorted:
Roofline model
The roofline model is a visual performance model that determines whether
a program is :term:`compute-bound <Compute-bound>` or
:term:`memory-bound <Memory-bound>`. See :ref:`hip:roofline_model` for
roofline analysis.
Compute-bound
Compute-bound kernels are limited by the
:term:`arithmetic bandwidth <Arithmetic bandwidth>` of the GPU's
:term:`compute units <Compute units>` rather than
:term:`memory bandwidth <Memory bandwidth>`. See
:ref:`hip:compute_bound` for compute-bound analysis.
Memory-bound
Memory-bound kernels are limited by
:term:`memory bandwidth <Memory bandwidth>` rather than
:term:`arithmetic bandwidth <Arithmetic bandwidth>`, typically due to
low :term:`arithmetic intensity <Arithmetic intensity>`. See
:ref:`hip:memory_bound` for memory-bound analysis.
Arithmetic intensity
Arithmetic intensity is the ratio of arithmetic operations to memory
operations in a kernel, and determines performance characteristics. See
:ref:`hip:arithmetic_intensity` for intensity analysis.
Overhead
Overhead latency is the time spent with no useful work being done, often
due to CPU-side bottlenecks or kernel launch delays. See
:ref:`hip:performance_bottlenecks` for details.
Little's Law
Little's Law relates concurrency, latency, and throughput, determining
how much independent work must be in flight to hide latency. See
:ref:`hip:littles_law` for latency hiding details.
Memory bandwidth
Memory bandwidth is the maximum rate at which data can be transferred
between memory hierarchy levels, typically measured in bytes per
second. See :ref:`hip:memory_bound` for details.
Arithmetic bandwidth
Arithmetic bandwidth is the peak rate at which arithmetic work can be
performed, defining the compute roof in
:term:`roofline models <Roofline model>`. See :ref:`hip:compute_bound`
for details.
Latency hiding
Latency hiding masks long-latency operations by running many concurrent
threads, keeping execution pipelines busy. See :ref:`hip:latency_hiding`
for details.
Wavefront execution state
Wavefront execution states (*active*, *stalled*, *eligible*, *selected*)
describe the scheduling status of :term:`wavefronts <Wavefront>` on AMD
GPUs. See :ref:`hip:wavefront_execution` for state definitions.
Active cycle
An active cycle is a clock cycle in which a
:term:`compute unit <Compute units>` has at least one active
:term:`wavefront <Wavefront>` resident. See
:ref:`hip:wavefront_execution` for details.
Occupancy
Occupancy is the ratio of active :term:`wavefronts <Wavefront>` to the
maximum number of wavefronts that can be active on a
:term:`compute unit <Compute units>`. See :ref:`hip:occupancy` for
occupancy analysis.
Pipe utilization
Pipe utilization measures how effectively a kernel uses the execution
pipelines within each :term:`compute unit <Compute units>`. See
:ref:`hip:pipe_utilization` for utilization details.
Peak rate
Peak rate is the theoretical maximum throughput at which a hardware
system can complete work under ideal conditions. See
:ref:`hip:theoretical_performance_limits` for details.
Issue efficiency
Issue efficiency measures how effectively the
:term:`wavefront scheduler <Wavefront scheduler>` keeps
execution pipelines busy by issuing instructions. See
:ref:`hip:issue_efficiency` for efficiency metrics.
CU utilization
CU utilization measures the percentage of time that
:term:`compute units <Compute units>` are actively executing
instructions. See :ref:`hip:cu_utilization` for utilization analysis.
Wavefront divergence
Wavefront divergence occurs when threads within a
:term:`wavefront <Wavefront>` take different execution paths due to
conditional statements. See :ref:`hip:branch_efficiency` for divergence
handling details.
Branch efficiency
Branch efficiency measures how often all threads within a
:term:`wavefront <Wavefront>` take the same execution path, quantifying
control-flow uniformity. See :ref:`hip:branch_efficiency` for branch
analysis.
Memory coalescing
Memory coalescing improves :term:`memory bandwidth <Memory bandwidth>`
by servicing many logical loads or stores with fewer physical memory
transactions. See :ref:`hip:memory_coalescing_theory` for coalescing
patterns.
Bank conflict
A bank conflict occurs when multiple threads simultaneously access
different addresses in the same :term:`LDS bank <Local data share>`,
serializing accesses. See :ref:`hip:bank_conflicts_theory` for details.
Register pressure
Register pressure occurs when excessive register demand limits the
number of active :term:`wavefronts <Wavefront>` per
:term:`compute unit <Compute units>`, reducing
:term:`occupancy <Occupancy>`. See
:ref:`hip:register_pressure_theory` for details.

View File

@@ -9,6 +9,12 @@ The following tables provide an overview of the hardware specifications for AMD
For more information about ROCm hardware compatibility, see the ROCm `Compatibility matrix <https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html>`_.
For a description of the terms used in the table, see the
:ref:`ROCm glossary <glossary>`, or for more detailed information about GPU
architecture and programming models, see the
:ref:`specific documents and guides <gpu-arch-documentation>`, or
:doc:`Understanding the HIP programming model<hip:understand/programming_model>`.
.. tab-set::
.. tab-item:: AMD Instinct GPUs
@@ -1127,125 +1133,3 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 32
- 11
- 5
Glossary
========
For more information about the terms used, see the
:ref:`specific documents and guides <gpu-arch-documentation>`, or
:doc:`Understanding the HIP programming model<hip:understand/programming_model>`.
**LLVM target name**
Argument to pass to clang in ``--offload-arch`` to compile code for the given
architecture.
**VRAM**
Amount of memory available on the GPU.
**Compute Units**
Number of compute units on the GPU.
**Wavefront Size**
Amount of work items that execute in parallel on a single compute unit. This
is equivalent to the warp size in HIP.
**LDS**
The Local Data Share (LDS) is a low-latency, high-bandwidth scratch pad
memory. It is local to the compute units, and can be shared by all work items
in a work group. In HIP, the LDS can be used for shared memory, which is
shared by all threads in a block.
**L3 Cache (CDNA/GCN only)**
Size of the level 3 cache. Shared by all compute units on the same GPU. Caches
data and instructions. Similar to the Infinity Cache on RDNA architectures.
**Infinity Cache (RDNA only)**
Size of the infinity cache. Shared by all compute units on the same GPU. Caches
data and instructions. Similar to the L3 Cache on CDNA/GCN architectures.
**L2 Cache**
Size of the level 2 cache. Shared by all compute units on the same GCD. Caches
data and instructions.
**Graphics L1 Cache (RDNA only)**
An additional cache level that only exists in RDNA architectures. Local to a
shader array.
**L1 Vector Cache (CDNA/GCN only)**
Size of the level 1 vector data cache. Local to a compute unit. This is the L0
vector cache in RDNA architectures.
**L1 Scalar Cache (CDNA/GCN only)**
Size of the level 1 scalar data cache. Usually shared by several compute
units. This is the L0 scalar cache in RDNA architectures.
**L1 Instruction Cache (CDNA/GCN only)**
Size of the level 1 instruction cache. Usually shared by several compute
units. This is the L0 instruction cache in RDNA architectures.
**L0 Vector Cache (RDNA only)**
Size of the level 0 vector data cache. Local to a compute unit. This is the L1
vector cache in CDNA/GCN architectures.
**L0 Scalar Cache (RDNA only)**
Size of the level 0 scalar data cache. Usually shared by several compute
units. This is the L1 scalar cache in CDNA/GCN architectures.
**L0 Instruction Cache (RDNA only)**
Size of the level 0 instruction cache. Usually shared by several compute
units. This is the L1 instruction cache in CDNA/GCN architectures.
**VGPR File**
Size of the Vector General Purpose Register (VGPR) file and. It holds data used in
vector instructions.
GPUs with matrix cores also have AccVGPRs, which are Accumulation General
Purpose Vector Registers, used specifically in matrix instructions.
**SGPR File**
Size of the Scalar General Purpose Register (SGPR) file. Holds data used in
scalar instructions.
**GFXIP**
GFXIP (Graphics IP) is a versioning system used by AMD to identify the GPU
architecture and its instruction set. It helps categorize different generations
of GPUs and their feature sets.
**GFXIP major version**
Defines the GPU's core instruction set and architecture, which determines
compatibility with software stacks such as HIP and OpenCL. For example, a GFXIP
11 major version corresponds to the RDNA 3 (Navi 3x) architecture, influencing
driver support and available compute features.
**GFXIP minor version**
Represents specific variations within a GFXIP major version and affects feature sets,
optimizations, and driver behavior in software stacks such as HIP and OpenCL. Different
GPU models within the same major version can have unique capabilities, impacting
performance and supported instructions.
**GCD**
Graphics Compute Die.
**XCD**
Accelerator Complex Die.

View File

@@ -232,6 +232,18 @@ subtrees:
title: Data types and precision support
- file: reference/graph-safe-support.rst
title: Graph safe support
- file: reference/glossary.rst
title: ROCm glossary
subtrees:
- entries:
- file: reference/glossary/device-hardware.rst
title: Device hardware
- file: reference/glossary/device-software.rst
title: Device software
- file: reference/glossary/host-software.rst
title: Host software
- file: reference/glossary/performance.rst
title: Performance
- caption: Contribute
entries: