mirror of
https://github.com/ROCm/ROCm.git
synced 2026-04-27 03:01:52 -04:00
Add glossary
This commit is contained in:
@@ -64,6 +64,7 @@ ROCm documentation is organized into the following categories:
|
||||
<!-- markdownlint-disable MD051 -->
|
||||
* [ROCm libraries](./reference/api-libraries.md)
|
||||
* [ROCm tools, compilers, and runtimes](./reference/rocm-tools.md)
|
||||
* [ROCm glossary](./reference/glossary.rst)
|
||||
* [GPU hardware specifications](./reference/gpu-arch-specs.rst)
|
||||
* [Hardware atomics operation support](./reference/gpu-atomics-operation.rst)
|
||||
* [Environment variables](./reference/env-variables.rst)
|
||||
|
||||
24
docs/reference/glossary.rst
Normal file
24
docs/reference/glossary.rst
Normal file
@@ -0,0 +1,24 @@
|
||||
.. meta::
|
||||
:description: AMD ROCm Glossary
|
||||
:keywords: AMD, ROCm, glossary, terminology, device hardware,
|
||||
device software, host software, performance
|
||||
|
||||
.. _glossary:
|
||||
|
||||
********************************************************************************
|
||||
Glossary
|
||||
********************************************************************************
|
||||
|
||||
This glossary provides concise definitions of key terms and concepts in AMD ROCm
|
||||
programming. Each entry includes a brief description and a link to detailed
|
||||
documentation for in-depth information.
|
||||
|
||||
The glossary is organized into four sections:
|
||||
|
||||
* :doc:`glossary/device-hardware` — Hardware components (Compute Units, cores,
|
||||
memory)
|
||||
* :doc:`glossary/device-software` — Software abstractions (programming model,
|
||||
ISA, thread hierarchy)
|
||||
* :doc:`glossary/host-software` — Development tools (HIP, compilers, libraries,
|
||||
profilers)
|
||||
* :doc:`glossary/performance` — Performance metrics and optimization concepts
|
||||
84
docs/reference/glossary/device-hardware.rst
Normal file
84
docs/reference/glossary/device-hardware.rst
Normal file
@@ -0,0 +1,84 @@
|
||||
.. meta::
|
||||
:description: Device hardware glossary for AMD GPUs
|
||||
:keywords: AMD, ROCm, GPU, device hardware, compute units, cores, MFMA,
|
||||
architecture, register file, cache, HBM
|
||||
|
||||
.. _glossary-device-hardware:
|
||||
|
||||
***************
|
||||
Device hardware
|
||||
***************
|
||||
|
||||
This section provides brief definitions of hardware components and architectural
|
||||
features of AMD GPUs.
|
||||
|
||||
.. glossary::
|
||||
:sorted:
|
||||
|
||||
AMD device architecture
|
||||
AMD's device architecture is based on unified, programmable compute
|
||||
engines called Compute Units. See :ref:`hip:hardware_implementation` for
|
||||
details.
|
||||
|
||||
Compute units
|
||||
Compute Units (CUs) are the fundamental programmable execution engines
|
||||
in AMD GPUs that manage thousands of lightweight threads. See
|
||||
:ref:`hip:compute_unit` for details.
|
||||
|
||||
Vector arithmetic logic units
|
||||
Vector arithmetic logic units (VALUs) are the primary arithmetic engines
|
||||
that execute mathematical and logical operations within AMD Compute
|
||||
Units. See :ref:`hip:valu` for details.
|
||||
|
||||
Special function unit
|
||||
Special Function Units (SFUs) accelerate transcendental and reciprocal
|
||||
mathematical functions such as ``exp``, ``log``, ``sin``, and ``cos``.
|
||||
See :ref:`hip:sfu` for details.
|
||||
|
||||
Load and store unit
|
||||
Load/Store Units (LSUs) handle data transfer between Compute Units and
|
||||
the GPU's memory subsystems, managing thousands of concurrent memory
|
||||
operations. See :ref:`hip:lsu` for details.
|
||||
|
||||
Wavefront scheduler
|
||||
The Wavefront Scheduler in each Compute Unit decides which group of
|
||||
threads to execute each clock cycle, enabling rapid context switching
|
||||
for latency hiding. See :ref:`hip:wave-scheduling` for details.
|
||||
|
||||
SIMD core
|
||||
SIMD Cores are execution lanes that perform scalar and vector arithmetic
|
||||
operations inside each Compute Unit. See :ref:`hip:cdna_architecture`
|
||||
and :ref:`hip:rdna_architecture` for details.
|
||||
|
||||
Matrix core and MFMA
|
||||
Matrix Cores (MFMA units) are specialized execution units that perform
|
||||
large-scale matrix operations in a single instruction, delivering high
|
||||
throughput for AI and HPC workloads. See :ref:`hip:mfma_units` for
|
||||
details.
|
||||
|
||||
Data movement engine
|
||||
Data Movement Engines (DMEs) are specialized hardware units in CDNA3 and
|
||||
CDNA4 that accelerate multi-dimensional tensor data copies between
|
||||
global memory and on-chip memory. See :ref:`hip:dme` for details.
|
||||
|
||||
Compute unit versioning
|
||||
Compute Units are versioned with GFX IP identifiers that define their
|
||||
microarchitectural features and instruction set compatibility. See
|
||||
:ref:`hip:gfx_ip` for details.
|
||||
|
||||
Register file
|
||||
The register file is the primary on-chip memory store in each Compute
|
||||
Unit, holding data between arithmetic and memory operations. See
|
||||
:ref:`hip:memory_hierarchy` for details.
|
||||
|
||||
L1 data cache
|
||||
The L1 Data Cache is the private on-chip memory associated with each
|
||||
Compute Unit, providing fast access to recently used data. See
|
||||
:ref:`hip:vl1`, :ref:`hip:sl1` and :ref:`hip:memory_coherence` for
|
||||
details.
|
||||
|
||||
GPU RAM and HBM
|
||||
GPU RAM, also known as global memory in the HIP programming model, is
|
||||
the large, high-capacity High Bandwidth Memory (HBM) subsystem
|
||||
accessible by all Compute Units, forming the foundation of the device's
|
||||
data storage hierarchy. See :ref:hip:hbm for details.
|
||||
92
docs/reference/glossary/device-software.rst
Normal file
92
docs/reference/glossary/device-software.rst
Normal file
@@ -0,0 +1,92 @@
|
||||
.. meta::
|
||||
:description: Device software glossary for AMD GPUs
|
||||
:keywords: AMD, ROCm, GPU, device software, programming model, AMDGPU,
|
||||
assembly, IR, GFX IP, wavefront, work-group, HIP kernel, thread hierarchy
|
||||
|
||||
.. _glossary-device-software:
|
||||
|
||||
***************
|
||||
Device software
|
||||
***************
|
||||
|
||||
This section provides brief definitions of software abstractions and programming
|
||||
models that run on AMD GPUs.
|
||||
|
||||
.. glossary::
|
||||
:sorted:
|
||||
|
||||
ROCm programming model
|
||||
The ROCm programming model defines how AMD GPUs execute massively
|
||||
parallel programs through hierarchical work-groups, memory scopes, and
|
||||
barrier synchronization. See :ref:`hip:programming_model` for complete
|
||||
details.
|
||||
|
||||
AMDGPU assembly
|
||||
AMDGPU Assembly (GFX ISA) is the low-level assembly format for programs
|
||||
running on AMD GPUs, generated by the ROCm compiler toolchain. See
|
||||
:ref:`hip:amdgpu_assembly` for instruction set details.
|
||||
|
||||
AMDGPU intermediate representation
|
||||
AMDGPU IR is an intermediate representation for GPU code, serving as a
|
||||
virtual instruction set between high-level languages and
|
||||
architecture-specific assembly. See :ref:`hip:amdgpu_ir` for compilation
|
||||
details.
|
||||
|
||||
GFX IP
|
||||
GFX IP versions are identifiers that specify which instruction formats,
|
||||
memory models, and compute features are supported by each AMD GPU
|
||||
generation. See :ref:`hip:gfx_ip` for versioning information.
|
||||
|
||||
Work-item
|
||||
A work-item (also called a thread) is the smallest unit of execution in
|
||||
the AMD GPU programming model. See :ref:`hip:work-item` for thread
|
||||
hierarchy details.
|
||||
|
||||
Wavefront
|
||||
A wavefront is a group of threads that execute together in parallel on a
|
||||
single Compute Unit, sharing one instruction stream. See
|
||||
:ref:`hip:wavefront` for execution details.
|
||||
|
||||
Work-group
|
||||
A work-group is a collection of threads scheduled together on a single
|
||||
Compute Unit that can coordinate through Local Data Share memory. A
|
||||
work-group may consist of multiple wavefronts that execute in parallel
|
||||
on the same Compute Unit. See
|
||||
:ref:`hip:inherent_thread_hierarchy_block` for work-group details.
|
||||
|
||||
Grid
|
||||
A grid represents the collection of all work-groups executing a single
|
||||
kernel across the entire GPU. See :ref:`hip:inherent_thread_hierarchy_`
|
||||
for grid execution details.
|
||||
|
||||
HIP kernel
|
||||
A kernel is the unit of GPU code that executes in parallel across many
|
||||
threads, distributed across the GPU's Compute Units. See
|
||||
:ref:`hip:device_program` for kernel programming details.
|
||||
|
||||
HIP thread hierarchy
|
||||
The thread hierarchy structures parallel work from individual threads to
|
||||
work-groups to grids, mapping onto hardware from SIMD lanes to Compute
|
||||
Units to the entire GPU. See :ref:`hip:inherent_thread_model` for
|
||||
complete details.
|
||||
|
||||
HIP memory hierarchy
|
||||
The memory hierarchy pairs each thread hierarchy level with
|
||||
corresponding memory scopes, from private registers to shared LDS to
|
||||
global HBM. See :ref:`hip:memory_hierarchy` for memory architecture
|
||||
details.
|
||||
|
||||
Registers
|
||||
Registers are the lowest level of the memory hierarchy, storing per-thread
|
||||
temporary variables and intermediate results. See
|
||||
:ref:`hip:memory_hierarchy` for register usage details.
|
||||
|
||||
Local data share
|
||||
Local Data Share (LDS) is fast on-chip memory shared among threads in a
|
||||
work-group, enabling efficient coordination and data reuse. See
|
||||
:ref:`hip:lds` for LDS programming details.
|
||||
|
||||
Global memory
|
||||
Global memory is the device-wide memory accessible to all threads,
|
||||
physically implemented in HBM or GDDR. See
|
||||
:ref:`hip:memory_hierarchy` for global memory details.
|
||||
66
docs/reference/glossary/host-software.rst
Normal file
66
docs/reference/glossary/host-software.rst
Normal file
@@ -0,0 +1,66 @@
|
||||
.. meta::
|
||||
:description: Host software glossary for AMD GPUs
|
||||
:keywords: AMD, ROCm, GPU, host software, HIP, compiler, runtime, libraries,
|
||||
profiler, amd-smi
|
||||
|
||||
.. _glossary-host-software:
|
||||
|
||||
*************
|
||||
Host software
|
||||
*************
|
||||
|
||||
This section provides brief definitions of development tools, compilers,
|
||||
libraries, and runtime environments for programming AMD GPUs.
|
||||
|
||||
.. glossary::
|
||||
:sorted:
|
||||
|
||||
ROCm software platform
|
||||
ROCm is AMD's GPU software stack, providing compiler
|
||||
toolchains, runtime environments, and performance libraries for HPC and
|
||||
AI applications. See :doc:`../../what-is-rocm` for a complete component
|
||||
overview.
|
||||
|
||||
HIP C++ language extension
|
||||
HIP extends the C++ language with additional features designed for
|
||||
programming heterogeneous applications. These extensions mostly relate
|
||||
to the kernel language, but some can also be applied to host
|
||||
functionality. See :doc:`hip:how-to/hip_cpp_language_extensions` for
|
||||
language fundamentals.
|
||||
|
||||
amd-smi
|
||||
The ``amd-smi`` command-line utility queries, monitors, and manages AMD GPU
|
||||
state, providing hardware information and performance metrics. See
|
||||
:doc:`amdsmi:index` for detailed usage.
|
||||
|
||||
HIP runtime API
|
||||
The HIP runtime API provides an interface for GPU programming, offering
|
||||
functions for memory management, kernel launches, and synchronization. See
|
||||
:ref:`hip:hip_runtime_api_how-to` for API overview.
|
||||
|
||||
HIP compiler
|
||||
The HIP compiler ``amdclang++`` compiles HIP C++ programs into binaries
|
||||
containing both host CPU and device GPU code. See
|
||||
:doc:`llvm-project:reference/rocmcc` for compiler flags and options.
|
||||
|
||||
HIP runtime compiler
|
||||
The HIP Runtime Compiler (HIPRTC) compiles HIP source code at runtime
|
||||
into AMDGPU binary code objects, enabling just-in-time kernel generation,
|
||||
device-specific optimization, and dynamic code creation for different
|
||||
GPUs. See :ref:`hip:hip_runtime_compiler_how-to` for API details.
|
||||
|
||||
ROCgdb
|
||||
ROCgdb is AMD's source-level debugger for HIP and ROCm applications,
|
||||
enabling debugging of both host CPU and GPU device code, including
|
||||
kernel breakpoints, stepping, and variable inspection. See
|
||||
:doc:`rocgdb:index` for usage and command reference.
|
||||
|
||||
ROCm profiler
|
||||
The ROCm profiler (``rocprofv3``) is AMD's primary performance analysis
|
||||
tool, providing profiling, tracing, and performance counter collection.
|
||||
See :ref:`rocprofiler-sdk:using-rocprofv3` for profiling workflows.
|
||||
|
||||
ROCm and LLVM binary utilities
|
||||
ROCm and LLVM binary utilities are command-line tools for examining and
|
||||
manipulating GPU binaries and code objects. See
|
||||
:ref:`hip:binary_utilities` for utility details.
|
||||
121
docs/reference/glossary/performance.rst
Normal file
121
docs/reference/glossary/performance.rst
Normal file
@@ -0,0 +1,121 @@
|
||||
.. meta::
|
||||
:description: Performance glossary for AMD GPUs
|
||||
:keywords: AMD, ROCm, GPU, performance, optimization, roofline, bottleneck,
|
||||
occupancy, bandwidth, latency hiding, divergence
|
||||
|
||||
.. _glossary-performance:
|
||||
|
||||
***********
|
||||
Performance
|
||||
***********
|
||||
|
||||
This section provides brief definitions of performance analysis concepts and
|
||||
optimization techniques.
|
||||
|
||||
.. glossary::
|
||||
:sorted:
|
||||
|
||||
Roofline model
|
||||
The roofline model is a visual performance model that determines whether
|
||||
a program is compute-bound or memory-bound. See
|
||||
:ref:`hip:roofline_model` for roofline analysis.
|
||||
|
||||
Compute-bound
|
||||
Compute-bound kernels are limited by the arithmetic bandwidth of the
|
||||
GPU's compute units rather than memory bandwidth. See
|
||||
:ref:`hip:compute_bound` for compute-bound analysis.
|
||||
|
||||
Memory-bound
|
||||
Memory-bound kernels are limited by memory bandwidth rather than
|
||||
arithmetic throughput, typically due to low arithmetic intensity. See
|
||||
:ref:`hip:memory_bound` for memory-bound analysis.
|
||||
|
||||
Arithmetic intensity
|
||||
Arithmetic intensity is the ratio of arithmetic operations to memory
|
||||
operations in a kernel, determining performance characteristics. See
|
||||
:ref:`hip:arithmetic_intensity` for intensity analysis.
|
||||
|
||||
Overhead
|
||||
Overhead latency is time spent with no useful work being done, often
|
||||
from CPU-side bottlenecks or kernel launch delays. See
|
||||
:ref:`hip:performance_bottlenecks` for details.
|
||||
|
||||
Little's Law
|
||||
Little's Law relates concurrency, latency, and throughput, determining
|
||||
how much independent work must be in flight to hide latency. See
|
||||
:ref:`hip:littles_law` for latency hiding details.
|
||||
|
||||
Memory bandwidth
|
||||
Memory bandwidth is the maximum rate at which data can be transferred
|
||||
between memory hierarchy levels, typically measured in bytes per
|
||||
second. See :ref:`hip:memory_bound` for details.
|
||||
|
||||
Arithmetic bandwidth
|
||||
Arithmetic bandwidth is the peak rate at which arithmetic work can be
|
||||
performed, defining the compute roof in roofline models. See
|
||||
:ref:`hip:compute_bound` for details.
|
||||
|
||||
Latency hiding
|
||||
Latency hiding masks long-latency operations by running many concurrent
|
||||
threads, keeping execution pipelines busy. See :ref:`hip:latency_hiding`
|
||||
for details.
|
||||
|
||||
Wavefront execution state
|
||||
Wavefront execution states (*active*, *stalled*, *eligible*, *selected*)
|
||||
describe the scheduling status of wavefronts on AMD GPUs. See
|
||||
:ref:`hip:wavefront_execution` for state definitions.
|
||||
|
||||
Active cycle
|
||||
An active cycle is a clock cycle in which a Compute Unit has at least
|
||||
one active wavefront resident. See :ref:`hip:wavefront_execution` for
|
||||
details.
|
||||
|
||||
Occupancy
|
||||
Occupancy is the ratio of active wavefronts to the maximum number of
|
||||
wavefronts that can be active on a Compute Unit. See
|
||||
:ref:`hip:occupancy` for occupancy analysis.
|
||||
|
||||
Pipe utilization
|
||||
Pipe utilization measures how effectively a kernel uses the execution
|
||||
pipelines within each Compute Unit. See :ref:`hip:pipe_utilization` for
|
||||
utilization details.
|
||||
|
||||
Peak rate
|
||||
Peak rate is the theoretical maximum throughput at which a hardware
|
||||
system can complete work under ideal conditions. See
|
||||
:ref:`hip:theoretical_performance_limits` for details.
|
||||
|
||||
Issue efficiency
|
||||
Issue efficiency measures how effectively the wavefront scheduler keeps
|
||||
execution pipelines busy by issuing instructions. See
|
||||
:ref:`hip:issue_efficiency` for efficiency metrics.
|
||||
|
||||
CU utilization
|
||||
CU utilization measures the percentage of time that Compute Units are
|
||||
actively executing instructions. See :ref:`hip:cu_utilization` for
|
||||
utilization analysis.
|
||||
|
||||
Wavefront divergence
|
||||
Wavefront divergence occurs when threads within a wavefront take
|
||||
different execution paths due to conditional statements. See
|
||||
:ref:`hip:branch_efficiency` for divergence handling details.
|
||||
|
||||
Branch efficiency
|
||||
Branch efficiency measures how often all threads within a wavefront take
|
||||
the same execution path, quantifying control flow uniformity. See
|
||||
:ref:`hip:branch_efficiency` for branch analysis.
|
||||
|
||||
Memory coalescing
|
||||
Memory coalescing improves memory bandwidth by servicing many logical
|
||||
loads or stores with fewer physical memory transactions. See
|
||||
:ref:`hip:memory_coalescing_theory` for coalescing patterns.
|
||||
|
||||
Bank conflict
|
||||
A bank conflict occurs when multiple threads simultaneously access
|
||||
different addresses in the same LDS bank, serializing accesses. See
|
||||
:ref:`hip:bank_conflicts_theory` for details.
|
||||
|
||||
Register pressure
|
||||
Register pressure occurs when excessive register demand limits the
|
||||
number of active wavefronts per Compute Unit, reducing occupancy. See
|
||||
:ref:`hip:register_pressure_theory` for details.
|
||||
@@ -224,6 +224,18 @@ subtrees:
|
||||
title: ROCm libraries
|
||||
- file: reference/rocm-tools.md
|
||||
title: ROCm tools, compilers, and runtimes
|
||||
- file: reference/glossary.rst
|
||||
title: ROCm glossary
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: reference/glossary/device-hardware.rst
|
||||
title: Device hardware
|
||||
- file: reference/glossary/device-software.rst
|
||||
title: Device software
|
||||
- file: reference/glossary/host-software.rst
|
||||
title: Host software
|
||||
- file: reference/glossary/performance.rst
|
||||
title: Performance
|
||||
- file: reference/gpu-arch-specs.rst
|
||||
- file: reference/gpu-atomics-operation.rst
|
||||
- file: reference/env-variables.rst
|
||||
|
||||
Reference in New Issue
Block a user