Add glossary

2026-04-27 03:01:52 -04:00 · 2026-02-06 16:55:53 +01:00
parent a0f56927ba
commit debd213a58
7 changed files with 400 additions and 0 deletions
--- a/docs/index.md
+++ b/docs/index.md
@@ -64,6 +64,7 @@ ROCm documentation is organized into the following categories:
 <!-- markdownlint-disable MD051 -->
 * [ROCm libraries](./reference/api-libraries.md)
 * [ROCm tools, compilers, and runtimes](./reference/rocm-tools.md)
+* [ROCm glossary](./reference/glossary.rst)
 * [GPU hardware specifications](./reference/gpu-arch-specs.rst)
 * [Hardware atomics operation support](./reference/gpu-atomics-operation.rst)
 * [Environment variables](./reference/env-variables.rst)
--- a/docs/reference/glossary.rst
+++ b/docs/reference/glossary.rst
@@ -0,0 +1,24 @@
+.. meta::
+  :description: AMD ROCm Glossary
+  :keywords: AMD, ROCm,  glossary, terminology, device hardware,
+    device software, host software, performance
+
+.. _glossary:
+
+********************************************************************************
+Glossary
+********************************************************************************
+
+This glossary provides concise definitions of key terms and concepts in AMD ROCm
+programming. Each entry includes a brief description and a link to detailed
+documentation for in-depth information.
+
+The glossary is organized into four sections:
+
+* :doc:`glossary/device-hardware` — Hardware components (Compute Units, cores,
+  memory)
+* :doc:`glossary/device-software` — Software abstractions (programming model,
+  ISA, thread hierarchy)
+* :doc:`glossary/host-software` — Development tools (HIP, compilers, libraries,
+  profilers)
+* :doc:`glossary/performance` — Performance metrics and optimization concepts
--- a/docs/reference/glossary/device-hardware.rst
+++ b/docs/reference/glossary/device-hardware.rst
@@ -0,0 +1,84 @@
+.. meta::
+  :description: Device hardware glossary for AMD GPUs
+  :keywords: AMD, ROCm, GPU, device hardware, compute units, cores, MFMA,
+    architecture, register file, cache, HBM
+
+.. _glossary-device-hardware:
+
+***************
+Device hardware
+***************
+
+This section provides brief definitions of hardware components and architectural
+features of AMD GPUs.
+
+.. glossary::
+    :sorted:
+
+    AMD device architecture
+        AMD's device architecture is based on unified, programmable compute
+        engines called Compute Units. See :ref:`hip:hardware_implementation` for
+        details.
+
+    Compute units
+        Compute Units (CUs) are the fundamental programmable execution engines
+        in AMD GPUs that manage thousands of lightweight threads. See
+        :ref:`hip:compute_unit` for details.
+
+    Vector arithmetic logic units
+        Vector arithmetic logic units (VALUs) are the primary arithmetic engines
+        that execute mathematical and logical operations within AMD Compute
+        Units. See :ref:`hip:valu` for details.
+
+    Special function unit
+        Special Function Units (SFUs) accelerate transcendental and reciprocal
+        mathematical functions such as ``exp``, ``log``, ``sin``, and ``cos``.
+        See :ref:`hip:sfu` for details.
+
+    Load and store unit
+        Load/Store Units (LSUs) handle data transfer between Compute Units and
+        the GPU's memory subsystems, managing thousands of concurrent memory
+        operations. See :ref:`hip:lsu` for details.
+
+    Wavefront scheduler
+        The Wavefront Scheduler in each Compute Unit decides which group of
+        threads to execute each clock cycle, enabling rapid context switching
+        for latency hiding. See :ref:`hip:wave-scheduling` for details.
+
+    SIMD core
+        SIMD Cores are execution lanes that perform scalar and vector arithmetic
+        operations inside each Compute Unit. See :ref:`hip:cdna_architecture`
+        and :ref:`hip:rdna_architecture` for details.
+
+    Matrix core and MFMA
+        Matrix Cores (MFMA units) are specialized execution units that perform
+        large-scale matrix operations in a single instruction, delivering high
+        throughput for AI and HPC workloads. See :ref:`hip:mfma_units` for
+        details.
+
+    Data movement engine
+        Data Movement Engines (DMEs) are specialized hardware units in CDNA3 and
+        CDNA4 that accelerate multi-dimensional tensor data copies between
+        global memory and on-chip memory. See :ref:`hip:dme` for details.
+
+    Compute unit versioning
+        Compute Units are versioned with GFX IP identifiers that define their
+        microarchitectural features and instruction set compatibility. See
+        :ref:`hip:gfx_ip` for details.
+
+    Register file
+        The register file is the primary on-chip memory store in each Compute
+        Unit, holding data between arithmetic and memory operations. See
+        :ref:`hip:memory_hierarchy` for details.
+
+    L1 data cache
+        The L1 Data Cache is the private on-chip memory associated with each
+        Compute Unit, providing fast access to recently used data. See
+        :ref:`hip:vl1`, :ref:`hip:sl1` and :ref:`hip:memory_coherence` for
+        details.
+
+    GPU RAM and HBM
+        GPU RAM, also known as global memory in the HIP programming model, is
+        the large, high-capacity High Bandwidth Memory (HBM) subsystem
+        accessible by all Compute Units, forming the foundation of the device's
+        data storage hierarchy. See :ref:hip:hbm for details.
--- a/docs/reference/glossary/device-software.rst
+++ b/docs/reference/glossary/device-software.rst
@@ -0,0 +1,92 @@
+.. meta::
+  :description: Device software glossary for AMD GPUs
+  :keywords: AMD, ROCm, GPU, device software, programming model, AMDGPU,
+    assembly, IR, GFX IP, wavefront, work-group, HIP kernel, thread hierarchy
+
+.. _glossary-device-software:
+
+***************
+Device software
+***************
+
+This section provides brief definitions of software abstractions and programming
+models that run on AMD GPUs.
+
+.. glossary::
+    :sorted:
+
+    ROCm programming model
+        The ROCm programming model defines how AMD GPUs execute massively
+        parallel programs through hierarchical work-groups, memory scopes, and
+        barrier synchronization. See :ref:`hip:programming_model` for complete
+        details.
+
+    AMDGPU assembly
+        AMDGPU Assembly (GFX ISA) is the low-level assembly format for programs
+        running on AMD GPUs, generated by the ROCm compiler toolchain. See
+        :ref:`hip:amdgpu_assembly` for instruction set details.
+
+    AMDGPU intermediate representation
+        AMDGPU IR is an intermediate representation for GPU code, serving as a
+        virtual instruction set between high-level languages and
+        architecture-specific assembly. See :ref:`hip:amdgpu_ir` for compilation
+        details.
+
+    GFX IP
+        GFX IP versions are identifiers that specify which instruction formats,
+        memory models, and compute features are supported by each AMD GPU
+        generation. See :ref:`hip:gfx_ip` for versioning information.
+
+    Work-item
+        A work-item (also called a thread) is the smallest unit of execution in
+        the AMD GPU programming model. See :ref:`hip:work-item` for thread
+        hierarchy details.
+
+    Wavefront
+        A wavefront is a group of threads that execute together in parallel on a
+        single Compute Unit, sharing one instruction stream. See
+        :ref:`hip:wavefront` for execution details.
+
+    Work-group
+        A work-group is a collection of threads scheduled together on a single
+        Compute Unit that can coordinate through Local Data Share memory. A
+        work-group may consist of multiple wavefronts that execute in parallel
+        on the same Compute Unit. See
+        :ref:`hip:inherent_thread_hierarchy_block` for work-group details.
+
+    Grid
+        A grid represents the collection of all work-groups executing a single
+        kernel across the entire GPU. See :ref:`hip:inherent_thread_hierarchy_`
+        for grid execution details.
+
+    HIP kernel
+        A kernel is the unit of GPU code that executes in parallel across many
+        threads, distributed across the GPU's Compute Units. See
+        :ref:`hip:device_program` for kernel programming details.
+
+    HIP thread hierarchy
+        The thread hierarchy structures parallel work from individual threads to
+        work-groups to grids, mapping onto hardware from SIMD lanes to Compute
+        Units to the entire GPU. See :ref:`hip:inherent_thread_model` for
+        complete details.
+
+    HIP memory hierarchy
+        The memory hierarchy pairs each thread hierarchy level with
+        corresponding memory scopes, from private registers to shared LDS to
+        global HBM. See :ref:`hip:memory_hierarchy` for memory architecture
+        details.
+
+    Registers
+        Registers are the lowest level of the memory hierarchy, storing per-thread
+        temporary variables and intermediate results. See
+        :ref:`hip:memory_hierarchy` for register usage details.
+
+    Local data share
+        Local Data Share (LDS) is fast on-chip memory shared among threads in a
+        work-group, enabling efficient coordination and data reuse. See
+        :ref:`hip:lds` for LDS programming details.
+
+    Global memory
+        Global memory is the device-wide memory accessible to all threads,
+        physically implemented in HBM or GDDR. See
+        :ref:`hip:memory_hierarchy` for global memory details.
--- a/docs/reference/glossary/host-software.rst
+++ b/docs/reference/glossary/host-software.rst
@@ -0,0 +1,66 @@
+.. meta::
+  :description: Host software glossary for AMD GPUs
+  :keywords: AMD, ROCm, GPU, host software, HIP, compiler, runtime, libraries,
+    profiler, amd-smi
+
+.. _glossary-host-software:
+
+*************
+Host software
+*************
+
+This section provides brief definitions of development tools, compilers,
+libraries, and runtime environments for programming AMD GPUs.
+
+.. glossary::
+    :sorted:
+
+    ROCm software platform
+        ROCm is AMD's GPU software stack, providing compiler
+        toolchains, runtime environments, and performance libraries for HPC and
+        AI applications. See :doc:`../../what-is-rocm` for a complete component
+        overview.
+
+    HIP C++ language extension
+        HIP extends the C++ language with additional features designed for
+        programming heterogeneous applications. These extensions mostly relate
+        to the kernel language, but some can also be applied to host
+        functionality. See :doc:`hip:how-to/hip_cpp_language_extensions` for
+        language fundamentals.
+
+    amd-smi
+        The ``amd-smi`` command-line utility queries, monitors, and manages AMD GPU
+        state, providing hardware information and performance metrics. See
+        :doc:`amdsmi:index` for detailed usage.
+
+    HIP runtime API
+        The HIP runtime API provides an interface for GPU programming, offering
+        functions for memory management, kernel launches, and synchronization. See
+        :ref:`hip:hip_runtime_api_how-to` for API overview.
+
+    HIP compiler
+        The HIP compiler ``amdclang++`` compiles HIP C++ programs into binaries
+        containing both host CPU and device GPU code. See
+        :doc:`llvm-project:reference/rocmcc` for compiler flags and options.
+
+    HIP runtime compiler
+        The HIP Runtime Compiler (HIPRTC) compiles HIP source code at runtime
+        into AMDGPU binary code objects, enabling just-in-time kernel generation,
+        device-specific optimization, and dynamic code creation for different
+        GPUs. See :ref:`hip:hip_runtime_compiler_how-to` for API details.
+
+    ROCgdb
+        ROCgdb is AMD's source-level debugger for HIP and ROCm applications,
+        enabling debugging of both host CPU and GPU device code, including
+        kernel breakpoints, stepping, and variable inspection. See
+        :doc:`rocgdb:index` for usage and command reference.
+
+    ROCm profiler
+        The ROCm profiler (``rocprofv3``) is AMD's primary performance analysis
+        tool, providing profiling, tracing, and performance counter collection.
+        See :ref:`rocprofiler-sdk:using-rocprofv3` for profiling workflows.
+
+    ROCm and LLVM binary utilities
+        ROCm and LLVM binary utilities are command-line tools for examining and
+        manipulating GPU binaries and code objects. See
+        :ref:`hip:binary_utilities` for utility details.
--- a/docs/reference/glossary/performance.rst
+++ b/docs/reference/glossary/performance.rst
@@ -0,0 +1,121 @@
+.. meta::
+  :description: Performance glossary for AMD GPUs
+  :keywords: AMD, ROCm, GPU, performance, optimization, roofline, bottleneck,
+    occupancy, bandwidth, latency hiding, divergence
+
+.. _glossary-performance:
+
+***********
+Performance
+***********
+
+This section provides brief definitions of performance analysis concepts and
+optimization techniques.
+
+.. glossary::
+    :sorted:
+    
+    Roofline model
+        The roofline model is a visual performance model that determines whether
+        a program is compute-bound or memory-bound. See
+        :ref:`hip:roofline_model` for roofline analysis.
+    
+    Compute-bound
+        Compute-bound kernels are limited by the arithmetic bandwidth of the
+        GPU's compute units rather than memory bandwidth. See
+        :ref:`hip:compute_bound` for compute-bound analysis.
+    
+    Memory-bound
+        Memory-bound kernels are limited by memory bandwidth rather than
+        arithmetic throughput, typically due to low arithmetic intensity. See
+        :ref:`hip:memory_bound` for memory-bound analysis.
+    
+    Arithmetic intensity
+        Arithmetic intensity is the ratio of arithmetic operations to memory
+        operations in a kernel, determining performance characteristics. See
+        :ref:`hip:arithmetic_intensity` for intensity analysis.
+    
+    Overhead
+        Overhead latency is time spent with no useful work being done, often
+        from CPU-side bottlenecks or kernel launch delays. See
+        :ref:`hip:performance_bottlenecks` for details.
+    
+    Little's Law
+        Little's Law relates concurrency, latency, and throughput, determining
+        how much independent work must be in flight to hide latency. See
+        :ref:`hip:littles_law` for latency hiding details.
+    
+    Memory bandwidth
+        Memory bandwidth is the maximum rate at which data can be transferred
+        between memory hierarchy levels, typically measured in bytes per
+        second. See :ref:`hip:memory_bound` for details.
+    
+    Arithmetic bandwidth
+        Arithmetic bandwidth is the peak rate at which arithmetic work can be
+        performed, defining the compute roof in roofline models. See
+        :ref:`hip:compute_bound` for details.
+    
+    Latency hiding
+        Latency hiding masks long-latency operations by running many concurrent
+        threads, keeping execution pipelines busy. See :ref:`hip:latency_hiding`
+        for details.
+
+    Wavefront execution state
+        Wavefront execution states (*active*, *stalled*, *eligible*, *selected*)
+        describe the scheduling status of wavefronts on AMD GPUs. See
+        :ref:`hip:wavefront_execution` for state definitions.
+
+    Active cycle
+        An active cycle is a clock cycle in which a Compute Unit has at least
+        one active wavefront resident. See :ref:`hip:wavefront_execution` for
+        details.
+
+    Occupancy
+        Occupancy is the ratio of active wavefronts to the maximum number of
+        wavefronts that can be active on a Compute Unit. See
+        :ref:`hip:occupancy` for occupancy analysis.
+
+    Pipe utilization
+        Pipe utilization measures how effectively a kernel uses the execution
+        pipelines within each Compute Unit. See :ref:`hip:pipe_utilization` for
+        utilization details.
+
+    Peak rate
+        Peak rate is the theoretical maximum throughput at which a hardware
+        system can complete work under ideal conditions. See
+        :ref:`hip:theoretical_performance_limits` for details.
+
+    Issue efficiency
+        Issue efficiency measures how effectively the wavefront scheduler keeps
+        execution pipelines busy by issuing instructions. See
+        :ref:`hip:issue_efficiency` for efficiency metrics.
+
+    CU utilization
+        CU utilization measures the percentage of time that Compute Units are
+        actively executing instructions. See :ref:`hip:cu_utilization` for
+        utilization analysis.
+
+    Wavefront divergence
+        Wavefront divergence occurs when threads within a wavefront take
+        different execution paths due to conditional statements. See
+        :ref:`hip:branch_efficiency` for divergence handling details.
+
+    Branch efficiency
+        Branch efficiency measures how often all threads within a wavefront take
+        the same execution path, quantifying control flow uniformity. See
+        :ref:`hip:branch_efficiency` for branch analysis.
+
+    Memory coalescing
+        Memory coalescing improves memory bandwidth by servicing many logical
+        loads or stores with fewer physical memory transactions. See
+        :ref:`hip:memory_coalescing_theory` for coalescing patterns.
+
+    Bank conflict
+        A bank conflict occurs when multiple threads simultaneously access
+        different addresses in the same LDS bank, serializing accesses. See
+        :ref:`hip:bank_conflicts_theory` for details.
+
+    Register pressure
+        Register pressure occurs when excessive register demand limits the
+        number of active wavefronts per Compute Unit, reducing occupancy. See
+        :ref:`hip:register_pressure_theory` for details.
--- a/docs/sphinx/_toc.yml.in
+++ b/docs/sphinx/_toc.yml.in
@@ -224,6 +224,18 @@ subtrees:
    title: ROCm libraries
  - file: reference/rocm-tools.md
    title: ROCm tools, compilers, and runtimes
+  - file: reference/glossary.rst
+    title: ROCm glossary
+    subtrees:
+    - entries:
+      - file: reference/glossary/device-hardware.rst
+        title: Device hardware
+      - file: reference/glossary/device-software.rst
+        title: Device software
+      - file: reference/glossary/host-software.rst
+        title: Host software
+      - file: reference/glossary/performance.rst
+        title: Performance
  - file: reference/gpu-arch-specs.rst
  - file: reference/gpu-atomics-operation.rst
  - file: reference/env-variables.rst