Merge pull request #2956 from ROCm/roc-6.0.x

Merge roc-6.0.x into docs/6.0.2
2026-01-10 23:28:03 -05:00 · 2024-03-07 09:10:24 -07:00
parent 252dae76a7 f059e9ea1c
commit 893ee4a56b
14 changed files with 851 additions and 9 deletions
--- a/.wordlist.txt
+++ b/.wordlist.txt
@@ -2,6 +2,8 @@ AAC
 ABI
 ACE
 ACEs
+AccVGPR
+AccVGPRs
 ALU
 AMD
 AMDGPU
@@ -103,6 +105,7 @@ GDS
 GEMM
 GEMMs
 GFortran
+GiB
 GIM
 GL
 GLXT
@@ -154,6 +157,7 @@ Ioffe
 JSON
 Jupyter
 KFD
+KiB
 KVM
 Keras
 Khronos
@@ -170,6 +174,7 @@ LoRA
 MEM
 MERCHANTABILITY
 MFMA
+MiB
 MIGraphX
 MIOpen
 MIOpenGEMM
@@ -207,6 +212,7 @@ NUMA
 NVCC
 NVIDIA
 NVPTX
+NaN
 Nano
 Navi
 Noncoherently
@@ -387,6 +393,7 @@ awk
 backend
 backends
 benchmarking
+bfloat
 bilinear
 bitsandbytes
 blit
@@ -446,6 +453,7 @@ el
 embeddings
 enablement
 endpgm
+encodings
 env
 epilog
 etcetera
@@ -620,6 +628,7 @@ subexpression
 subfolder
 subfolders
 supercomputing
+tensorfloat
 th
 tokenization
 tokenize
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -344,7 +344,7 @@ Note: These complex operations are equivalent to corresponding types/functions o
      * `HIP_ROCclr`
    * NVIDIA platform
      * `HIP_PLATFORM_NVCC`
-* The [hcc_detail](https://github.com/ROCm/clr/tree/1949b1621a802ffb1492616adbae6154bfbe64ef/hipamd/include/hip/hcc_detail) and [nvcc_detail](https://github.com/ROCm/clr/tree/1949b1621a802ffb1492616adbae6154bfbe64ef/hipamd/include/hips/nvcc_detail) directories in the clr repository are removed.
+* The `hcc_detail` and `nvcc_detail` directories in the clr repository are removed.
 * Deprecated gcnArch is removed from hip device struct `hipDeviceProp_t`.
 * Deprecated `enum hipMemoryType memoryType;` is removed from HIP struct `hipPointerAttribute_t` union.

--- a/docs/about/compatibility/data-type-support.rst
+++ b/docs/about/compatibility/data-type-support.rst
@@ -0,0 +1,564 @@
+.. meta::
+  :description: Supported data types in ROCm
+  :keywords: int8, float8, float8 (E4M3), float8 (E5M2), bfloat8, float16, half, bfloat16, tensorfloat32, float, float32, float64, double, AMD, ROCm, AMDGPU
+
+.. _rocm-supported-data-types:
+
+*************************************************************
+ROCm data type specifications
+*************************************************************
+
+Integral types
+==========================================
+
+The signed and unsigned integral types that are supported by ROCm™ are listed in the following table,
+together with their corresponding HIP type and a short description.
+
+
+.. list-table::
+    :header-rows: 1
+    :widths: 15,35,50
+
+    *
+      - Type name
+      - HIP type
+      - Description
+    *
+      - int8
+      - ``int8_t``, ``uint8_t``
+      - A signed or unsigned 8-bit integer
+    *
+      - int16
+      - ``int16_t``, ``uint16_t``
+      - A signed or unsigned 16-bit integer
+    *
+      - int32
+      - ``int32_t``, ``uint32_t``
+      - A signed or unsigned 32-bit integer
+    *
+      - int64
+      - ``int64_t``, ``uint64_t``
+      - A signed or unsigned 64-bit integer
+
+Floating-point types
+==========================================
+
+The floating-point types that are supported by ROCm are listed in the following table, together with
+their corresponding HIP type and a short description.
+
+.. image:: ../../data/about/compatibility/floating-point-data-types.png
+    :alt: Supported floating-point types
+
+.. list-table::
+    :header-rows: 1
+    :widths: 15,15,70
+
+    *
+      - Type name
+      - HIP type
+      - Description
+    *
+      - float8 (E4M3)
+      - ``-``
+      - An 8-bit floating-point number that mostly follows IEEE-754 conventions and **S1E4M3** bit layout, as described in `8-bit Numerical Formats for Deep Neural Networks <https://arxiv.org/abs/2206.02915>`_ , with expanded range and with no infinity or signed zero. NaN is represented as negative zero.
+    *
+      - float8 (E5M2)
+      - ``-``
+      - An 8-bit floating-point number mostly following IEEE-754 conventions and **S1E5M2** bit layout, as described in `8-bit Numerical Formats for Deep Neural Networks <https://arxiv.org/abs/2206.02915>`_ , with expanded range and with no infinity or signed zero. NaN is represented as negative zero.
+    *
+      - float16
+      - ``half``
+      - A 16-bit floating-point number that conforms to the IEEE 754-2008 half-precision storage format.
+    *
+      - bfloat16
+      - ``bfloat16``
+      - A shortened 16-bit version of the IEEE 754 single-precision storage format.
+    *
+      - tensorfloat32
+      - ``-``
+      - A floating-point number that occupies 32 bits or less of storage, providing improved range compared to half (16-bit) format, at (potentially) greater throughput than single-precision (32-bit) formats.
+    *
+      - float32
+      - ``float``
+      - A 32-bit floating-point number that conforms to the IEEE 754 single-precision storage format.
+    *
+      - float64
+      - ``double``
+      - A 64-bit floating-point number that conforms to the IEEE 754 double-precision storage format.
+
+.. note::
+
+  * The float8 and tensorfloat32 types are internal types used in calculations in Matrix Cores and can be stored in any type of the same size.
+  * The encodings for FP8 (E5M2) and FP8 (E4M3) that are natively supported by MI300 differ from the FP8 (E5M2) and FP8 (E4M3) encodings used in H100 (`FP8 Formats for Deep Learning <https://arxiv.org/abs/2209.05433>`_).
+  * In some AMD documents and articles, float8 (E5M2) is referred to as bfloat8.
+
+ROCm support icons
+==========================================
+
+In the following sections, we use icons to represent the level of support. These icons, described in the
+following table, are also used on the library data type support pages.
+
+.. list-table::
+    :header-rows: 1
+
+    *
+      -  Icon
+      - Definition
+    *
+      - ❌
+      - Not supported
+
+    *
+      - ⚠️
+      - Partial support
+
+    *
+      - ✅
+      - Full support
+
+.. note::
+
+  * Full support means that the type is supported natively or with hardware emulation.
+  * Native support means that the operations for that type are implemented in hardware. Types that are not natively supported are emulated with the available hardware. The performance of non-natively supported types can differ from the full instruction throughput rate. For example, 16-bit integer operations can be performed on the 32-bit integer ALUs at full rate; however, 64-bit integer operations might need several instructions on the 32-bit integer ALUs.
+  * Any type can be emulated by software, but this page does not cover such cases.
+
+Hardware type support
+==========================================
+
+AMD GPU hardware support for data types is listed in the following tables.
+
+Compute units support
+-------------------------------------------------------------------------------
+
+The following table lists data type support for compute units.
+
+.. tab-set::
+
+  .. tab-item:: Integral types
+    :sync: integral-type
+
+    .. list-table::
+      :header-rows: 1
+
+      *
+        - Type name
+        - int8
+        - int16
+        - int32
+        - int64
+      *
+        - MI100
+        - ✅
+        - ✅
+        - ✅
+        - ✅
+      *
+        - MI200 series
+        - ✅
+        - ✅
+        - ✅
+        - ✅
+      *
+        - MI300 series
+        - ✅
+        - ✅
+        - ✅
+        - ✅
+
+  .. tab-item:: Floating-point types
+    :sync: floating-point-type
+
+    .. list-table::
+      :header-rows: 1
+
+      *
+        - Type name
+        - float8 (E4M3)
+        - float8 (E5M2)
+        - float16
+        - bfloat16
+        - tensorfloat32
+        - float32
+        - float64
+      *
+        - MI100
+        - ❌
+        - ❌
+        - ✅
+        - ✅
+        - ❌
+        - ✅
+        - ✅
+      *
+        - MI200 series
+        - ❌
+        - ❌
+        - ✅
+        - ✅
+        - ❌
+        - ✅
+        - ✅
+      *
+        - MI300 series
+        - ❌
+        - ❌
+        - ✅
+        - ✅
+        - ❌
+        - ✅
+        - ✅
+
+Matrix core support
+-------------------------------------------------------------------------------
+
+The following table lists data type support for AMD GPU matrix cores.
+
+.. tab-set::
+
+  .. tab-item:: Integral types
+    :sync: integral-type
+
+    .. list-table::
+      :header-rows: 1
+
+      *
+        - Type name
+        - int8
+        - int16
+        - int32
+        - int64
+      *
+        - MI100
+        - ✅
+        - ❌
+        - ❌
+        - ❌
+      *
+        - MI200 series
+        - ✅
+        - ❌
+        - ❌
+        - ❌
+      *
+        - MI300 series
+        - ✅
+        - ❌
+        - ❌
+        - ❌
+
+  .. tab-item:: Floating-point types
+    :sync: floating-point-type
+
+    .. list-table::
+      :header-rows: 1
+
+      *
+        - Type name
+        - float8 (E4M3)
+        - float8 (E5M2)
+        - float16
+        - bfloat16
+        - tensorfloat32
+        - float32
+        - float64
+      *
+        - MI100
+        - ❌
+        - ❌
+        - ✅
+        - ✅
+        - ❌
+        - ✅
+        - ❌
+      *
+        - MI200 series
+        - ❌
+        - ❌
+        - ✅
+        - ✅
+        - ❌
+        - ✅
+        - ✅
+      *
+        - MI300 series
+        - ✅
+        - ✅
+        - ✅
+        - ✅
+        - ✅
+        - ✅
+        - ✅
+
+Atomic operations support
+-------------------------------------------------------------------------------
+
+The following table lists data type support for atomic operations.
+
+.. tab-set::
+
+  .. tab-item:: Integral types
+    :sync: integral-type
+
+    .. list-table::
+      :header-rows: 1
+
+      *
+        - Type name
+        - int8
+        - int16
+        - int32
+        - int64
+      *
+        - MI100
+        - ❌
+        - ❌
+        - ✅
+        - ❌
+      *
+        - MI200 series
+        - ❌
+        - ❌
+        - ✅
+        - ✅
+      *
+        - MI300 series
+        - ❌
+        - ❌
+        - ✅
+        - ✅
+
+  .. tab-item:: Floating-point types
+    :sync: floating-point-type
+
+    .. list-table::
+      :header-rows: 1
+
+      *
+        - Type name
+        - float8 (E4M3)
+        - float8 (E5M2)
+        - float16
+        - bfloat16
+        - tensorfloat32
+        - float32
+        - float64
+      *
+        - MI100
+        - ❌
+        - ❌
+        - ✅
+        - ❌
+        - ❌
+        - ✅
+        - ❌
+      *
+        - MI200 series
+        - ❌
+        - ❌
+        - ✅
+        - ❌
+        - ❌
+        - ✅
+        - ✅
+      *
+        - MI300 series
+        - ❌
+        - ❌
+        - ✅
+        - ❌
+        - ❌
+        - ✅
+        - ✅
+
+.. note::
+
+  For cases that are not natively supported, you can emulate atomic operations using software.
+  Software-emulated atomic operations have high negative performance impact when they frequently
+  access the same memory address.
+
+Data Type support in ROCm Libraries
+==========================================
+
+ROCm library support for int8, float8 (E4M3), float8 (E5M2), int16, float16, bfloat16, int32,
+tensorfloat32, float32, int64, and float64 is listed in the following tables.
+
+Libraries input/output type support
+-------------------------------------------------------------------------------
+
+The following tables list ROCm library support for specific input and output data types. For a detailed
+description, refer to the corresponding library data type support page.
+
+.. tab-set::
+
+  .. tab-item:: Integral types
+    :sync: integral-type
+
+    .. list-table::
+      :header-rows: 1
+
+      *
+        - Library input/output data type name
+        - int8
+        - int16
+        - int32
+        - int64
+      *
+        - hipSPARSELt (:doc:`details<hipsparselt:reference/data-type-support>`)
+        - ✅/✅
+        - ❌/❌
+        - ❌/❌
+        - ❌/❌
+      *
+        - rocRAND (:doc:`details<rocrand:data-type-support>`)
+        - -/✅
+        - -/✅
+        - -/✅
+        - -/✅
+      *
+        - hipRAND (:doc:`details<hiprand:data-type-support>`)
+        - -/✅
+        - -/✅
+        - -/✅
+        - -/✅
+      *
+        - rocPRIM (:doc:`details<rocprim:data-type-support>`)
+        - ✅/✅
+        - ✅/✅
+        - ✅/✅
+        - ✅/✅
+      *
+        - hipCUB (:doc:`details<hipcub:data-type-support>`)
+        - ✅/✅
+        - ✅/✅
+        - ✅/✅
+        - ✅/✅
+      *
+        - rocThrust (:doc:`details<rocthrust:data-type-support>`)
+        - ✅/✅
+        - ✅/✅
+        - ✅/✅
+        - ✅/✅
+
+  .. tab-item:: Floating-point types
+    :sync: floating-point-type
+
+    .. list-table::
+      :header-rows: 1
+
+      *
+        - Library input/output data type name
+        - float8 (E4M3)
+        - float8 (E5M2)
+        - float16
+        - bfloat16
+        - tensorfloat32
+        - float32
+        - float64
+      *
+        - hipSPARSELt (:doc:`details<hipsparselt:reference/data-type-support>`)
+        - ❌/❌
+        - ❌/❌
+        - ✅/✅
+        - ✅/✅
+        - ❌/❌
+        - ❌/❌
+        - ❌/❌
+      *
+        - rocRAND (:doc:`details<rocrand:data-type-support>`)
+        - -/❌
+        - -/❌
+        - -/✅
+        - -/❌
+        - -/❌
+        - -/✅
+        - -/✅
+      *
+        - hipRAND (:doc:`details<hiprand:data-type-support>`)
+        - -/❌
+        - -/❌
+        - -/✅
+        - -/❌
+        - -/❌
+        - -/✅
+        - -/✅
+      *
+        - rocPRIM (:doc:`details<rocprim:data-type-support>`)
+        - ❌/❌
+        - ❌/❌
+        - ✅/✅
+        - ✅/✅
+        - ❌/❌
+        - ✅/✅
+        - ✅/✅
+      *
+        - hipCUB (:doc:`details<hipcub:data-type-support>`)
+        - ❌/❌
+        - ❌/❌
+        - ✅/✅
+        - ✅/✅
+        - ❌/❌
+        - ✅/✅
+        - ✅/✅
+      *
+        - rocThrust (:doc:`details<rocthrust:data-type-support>`)
+        - ❌/❌
+        - ❌/❌
+        - ⚠️/⚠️
+        - ⚠️/⚠️
+        - ❌/❌
+        - ✅/✅
+        - ✅/✅
+
+
+Libraries internal calculations type support
+-------------------------------------------------------------------------------
+
+The following tables list ROCm library support for specific internal data types. For a detailed
+description, refer to the corresponding library data type support page.
+
+.. tab-set::
+
+  .. tab-item:: Integral types
+    :sync: integral-type
+
+    .. list-table::
+      :header-rows: 1
+
+      *
+        - Library internal data type name
+        - int8
+        - int16
+        - int32
+        - int64
+      *
+        - hipSPARSELt (:doc:`details<hipsparselt:reference/data-type-support>`)
+        - ❌
+        - ❌
+        - ✅
+        - ❌
+
+
+  .. tab-item:: Floating-point types
+    :sync: floating-point-type
+
+    .. list-table::
+      :header-rows: 1
+
+      *
+        - Library internal data type name
+        - float8 (E4M3)
+        - float8 (E5M2)
+        - float16
+        - bfloat16
+        - tensorfloat32
+        - float32
+        - float64
+      *
+        - hipSPARSELt (:doc:`details<hipsparselt:reference/data-type-support>`)
+        - ❌
+        - ❌
+        - ❌
+        - ❌
+        - ❌
+        - ✅
+        - ❌
--- a/docs/conceptual/gpu-arch.md
+++ b/docs/conceptual/gpu-arch.md
@@ -5,6 +5,8 @@
  MI100, AMD Instinct">
 </head>

+(gpu-arch-documentation)=
+
 # GPU architecture documentation

 :::::{grid} 1 1 2 2
--- a/docs/conceptual/gpu-arch/mi300.md
+++ b/docs/conceptual/gpu-arch/mi300.md
@@ -95,7 +95,7 @@ connected via AMD Infinity Fabric™ network on-chip.
 ```{figure} ../../data/conceptual/gpu-arch/image008.png
 ---
 name: mi300-arch
-alt: 
+alt:
 align: center
 ---
 MI300 series system architecture showing MI300A (left) with 6 XCDs and 3 CCDs, while the MI300X (right) has 8 XCDs.
--- a/docs/contribute/doc-structure.md
+++ b/docs/contribute/doc-structure.md
@@ -110,14 +110,14 @@ Sub-subsection title (H4)

 1. Add a tag to the section you want to reference:

-.. _my-section-tag:
+.. _my-section-tag: section-1

 Section 1
 ==========

 2. Link to your tag:

-As shown in :ref:`my-section-tag`.
+As shown in :ref:`section-1`.

 ```

--- a/docs/data/about/compatibility/floating-point-data-types.png
+++ b/docs/data/about/compatibility/floating-point-data-types.png
--- a/docs/index.md
+++ b/docs/index.md
@@ -52,10 +52,11 @@ Our documentation is organized into the following categories:

 * {doc}`System requirements (Linux)<rocm-install-on-linux:reference/system-requirements>`
 * {doc}`System requirements (Windows)<rocm-install-on-windows:reference/system-requirements>`
-* {doc}`Third-party<rocm-install-on-linux:reference/3rd-party-support-matrix>`
+* {doc}`Third-party support<rocm-install-on-linux:reference/3rd-party-support-matrix>`
 * {doc}`User/kernel space<rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`
 * {doc}`Docker<rocm-install-on-linux:reference/docker-image-support-matrix>`
 * [OpenMP](./about/compatibility/openmp.md)
+* [Precision support](./about/compatibility/data-type-support.rst)
 * {doc}`ROCm on Radeon GPUs<radeon:index>`
 :::

@@ -77,6 +78,8 @@ Our documentation is organized into the following categories:
  * Development
  * Performance analysis
  * System
+* [GPU architectures](./reference/gpu-arch.rst)
+  * [GPU architecture hardware specification overview](./reference/gpu-arch/gpu-arch-spec-overview.rst)
 :::

 :::{grid-item-card}
--- a/docs/reference/gpu-arch.rst
+++ b/docs/reference/gpu-arch.rst
@@ -0,0 +1,13 @@
+.. meta::
+    :description: GPU Architecture reference
+    :keywords: AMD, GPU, architecture, hardware, CDNA, Instinct, reference
+
+.. _gpu-arch-reference:
+
+GPU architecture reference
+##########################
+
+General overview
+""""""""""""""""
+
+* :doc:`GPU architecture hardware specifications overview<gpu-arch/gpu-arch-spec-overview>`
--- a/docs/reference/gpu-arch/gpu-arch-spec-overview.rst
+++ b/docs/reference/gpu-arch/gpu-arch-spec-overview.rst
@@ -0,0 +1,241 @@
+.. meta::
+   :description: AMD Instinct™ GPU architecture information
+   :keywords: Instinct, CDNA, GPU, architecture, VRAM, Compute Units, Cache, Registers, LDS, Register File
+
+GPU architecture hardware specifications
+########################################
+
+The following table provides an overview over the hardware specifications for the AMD Instinct accelerators.
+
+.. list-table:: AMD Instinct architecture specification table
+    :header-rows: 1
+    :name: instinct-arch-spec-table
+
+    *
+      - Model
+      - Architecture
+      - LLVM target name
+      - VRAM
+      - Compute Units
+      - Wavefront Size
+      - LDS
+      - L3 Cache
+      - L2 Cache
+      - L1 Vector Cache
+      - L1 Scalar Cache
+      - L1 Instruction Cache
+      - VGPR File
+      - SGPR File
+    *
+      - MI300X
+      - CDNA3
+      - gfx941 or gfx942
+      - 192 GiB
+      - 304
+      - 64
+      - 64 KiB
+      - 256 MiB
+      - 32 MiB
+      - 32 KiB
+      - 16 KiB per 2 CUs
+      - 64 KiB per 2 CUs
+      - 512 KiB
+      - 12.5 KiB
+    *
+      - MI300A
+      - CDNA3
+      - gfx940 or gfx942
+      - 128 GiB
+      - 228
+      - 64
+      - 64 KiB
+      - 256 MiB
+      - 24 MiB
+      - 32 KiB
+      - 16 KiB per 2 CUs
+      - 64 KiB per 2 CUs
+      - 512 KiB
+      - 12.5 KiB
+    *
+      - MI250X
+      - CDNA2
+      - gfx90a
+      - 128 GiB
+      - 220 (110 per GCD)
+      - 64
+      - 64 KiB
+      -
+      - 16 MiB (8 MiB per GCD)
+      - 16 KiB
+      - 16 KiB per 2 CUs
+      - 32 KiB per 2 CUs
+      - 512 KiB
+      - 12.5 KiB
+    *
+      - MI250
+      - CDNA2
+      - gfx90a
+      - 128 GiB
+      - 208
+      - 64
+      - 64 KiB
+      -
+      - 16 MiB (8 MiB per GCD)
+      - 16 KiB
+      - 16 KiB per 2 CUs
+      - 32 KiB per 2 CUs
+      - 512 KiB
+      - 12.5 KiB
+    *
+       - MI210
+       - CDNA2
+       - gfx90a
+       - 64 GiB
+       - 104
+       - 64
+       - 64 KiB
+       -
+       - 8 MiB
+       - 16 KiB
+       - 16 KiB per 2 CUs
+       - 32 KiB per 2 CUs
+       - 512 KiB
+       - 12.5 KiB
+    *
+      - MI100
+      - CDNA
+      - gfx908
+      - 32 GiB
+      - 120
+      - 64
+      - 64 KiB
+      -
+      - 8 MiB
+      - 16 KiB
+      - 16 KiB per 3 CUs
+      - 32 KiB per 3 CUs
+      - 256 KiB VGPR and 256 KiB AccVGPR
+      - 12.5 KiB
+    *
+      - MI60
+      - GCN 5.1
+      - gfx906
+      - 32 GiB
+      - 64
+      - 64
+      - 64 KiB
+      -
+      - 4 MiB
+      - 16 KiB
+      - 16 KiB per 3 CUs
+      - 32 KiB per 3 CUs
+      - 256 KiB
+      - 12.5 KiB
+    *
+      - MI50 (32GB)
+      - GCN 5.1
+      - gfx906
+      - 32 GiB
+      - 60
+      - 64
+      - 64 KiB
+      -
+      - 4 MiB
+      - 16 KiB
+      - 16 KiB per 3 CUs
+      - 32 KiB per 3 CUs
+      - 256 KiB
+      - 12.5 KiB
+    *
+      - MI50 (16GB)
+      - GCN 5.1
+      - gfx906
+      - 16 GiB
+      - 60
+      - 64
+      - 64 KiB
+      -
+      - 4 MiB
+      - 16 KiB
+      - 16 KiB per 3 CUs
+      - 32 KiB per 3 CUs
+      - 256 KiB
+      - 12.5 KiB
+    *
+      - MI25
+      - GCN 5.0
+      - gfx900
+      - 16 GiB
+      - 64
+      - 64
+      - 64 KiB
+      -
+      - 4 MiB
+      - 16 KiB
+      - 16 KiB per 3 CUs
+      - 32 KiB per 3 CUs
+      - 256 KiB
+      - 12.5 KiB
+    *
+      - MI8
+      - GCN 3.0
+      - gfx803
+      - 4 GiB
+      - 64
+      - 64
+      - 64 KiB
+      -
+      - 2 MiB
+      - 16 KiB
+      - 16 KiB per 4 CUs
+      - 32 KiB per 4 CUs
+      - 256 KiB
+      - 12.5 KiB
+    *
+      - MI6
+      - GCN 4.0
+      - gfx803
+      - 16 GiB
+      - 36
+      - 64
+      - 64 KiB
+      -
+      - 2 MiB
+      - 16 KiB
+      - 16 KiB per 4 CUs
+      - 32 KiB per 4 CUs
+      - 256 KiB
+      - 12.5 KiB
+
+Glossary
+########
+
+For a more detailed explanation refer to the :ref:`specific documents and guides <gpu-arch-documentation>`.
+
+LLVM target name
+  Argument to pass to clang in `--offload-arch` to compile code for the given architecture.
+VRAM
+  Amount of memory available on the GPU.
+Compute Units
+  Number of compute units on the GPU.
+Wavefront Size
+  Amount of work-items that execute in parallel on a single compute unit. This is equivalent to the warp size in HIP.
+LDS
+  The Local Data Share (LDS) is a low-latency, high-bandwidth scratch pad memory. It is local to the compute units, shared by all work-items in a work group. In HIP this is the shared memory, which is shared by all threads in a block.
+L3 Cache
+  Size of the level 3 cache. Shared by all compute units on the same GPU. Caches vector and scalar data and instructions.
+L2 Cache
+  Size of the level 3 cache. Shared by all compute units on the same GCD. Caches vector and scalar data and instructions.
+L1 Vector Cache
+  Size of the level 1 vector data cache. Local to a compute unit. Caches vector data.
+L1 Scalar Cache
+  Size of the level 1 scalar data cache. Usually shared by several compute units. Caches scalar data.
+L1 Instruction Cache
+  Size of the level 1 instruction cache. Usually shared by several compute units.
+VGPR File
+  Size of the Vector General Purpose Register (VGPR) file. Holds data used in vector instructions.
+  GPUs with matrix cores also have AccVGPRs, which are Accumulation General Purpose Vector Registers, specifically used in matrix instructions.
+SGPR File
+  Size of the Scalar General Purpose Register (SGPR) file. Holds data used in scalar instructions.
+GCD
+  Graphics Compute Die.
--- a/docs/sphinx/_toc.yml.in
+++ b/docs/sphinx/_toc.yml.in
@@ -23,12 +23,16 @@ subtrees:
  - url: https://rocm.docs.amd.com/projects/install-on-windows/en/${branch}/
    title: HIP SDK on Windows

- caption: Supported configurations
+- caption: Compatibility
  entries:
  - url: https://rocm.docs.amd.com/projects/install-on-linux/en/${branch}/reference/system-requirements.html
    title: Linux
  - url: https://rocm.docs.amd.com/projects/install-on-windows/en/${branch}/reference/system-requirements.html
    title: Windows
+  - file: about/compatibility/data-type-support.rst
+    title: Precision support
+  - url: https://rocm.docs.amd.com/projects/install-on-linux/en/${branch}/reference/3rd-party-support-matrix.html
+    title: Third-party

 - caption: Reference
  entries:
@@ -36,6 +40,12 @@ subtrees:
      title: API libraries
    - file: reference/rocm-tools.md
      title: Tools
+    - file: reference/gpu-arch.rst
+      title: GPU architectures
+      subtrees:
+      - entries:
+        - file: reference/gpu-arch/gpu-arch-spec-overview.rst
+          title: Hardware specifications overview

 - caption: How-to
  entries:
--- a/docs/sphinx/requirements.in
+++ b/docs/sphinx/requirements.in
@@ -1 +1 @@
-rocm-docs-core==0.35.0
+rocm-docs-core==0.35.1
--- a/docs/sphinx/requirements.txt
+++ b/docs/sphinx/requirements.txt
@@ -100,7 +100,7 @@ requests==2.31.0
    # via
    #   pygithub
    #   sphinx
-rocm-docs-core==0.35.0
+rocm-docs-core==0.35.1
    # via -r requirements.in
 smmap==5.0.0
    # via gitdb
--- a/docs/what-is-rocm.rst
+++ b/docs/what-is-rocm.rst
@@ -59,7 +59,7 @@ see :doc:`ROCm licensing <./about/license>`.
  "`RocBandwidthTest <https://github.com/ROCm/rocm_bandwidth_test/>`_ ", "Tool", "Captures the performance characteristics of buffer copying and kernel read/write operations"
  ":doc:`rocBLAS <rocblas:index>`", "Library (math)", "BLAS implementation (in the HIP programming language) on the ROCm runtime and toolchains"
  ":doc:`rocFFT <rocfft:index>`", "Library (math)", "Software library for computing fast Fourier transforms (FFTs) written in HIP"
-  "`ROCmCC <./reference/rocmcc.md>`_ ", "Tool", "Clang/LLVM-based compiler"
+  ":doc:`ROCmCC <./reference/rocmcc>`", "Tool", "Clang/LLVM-based compiler"
  "`ROCm CMake <https://github.com/ROCm/rocm-cmake>`_ ", "Tool", "Collection of CMake modules for common build and development tasks"
  ":doc:`ROCm Data Center Tool <rdc:index>`", "Tool", "Simplifies administration and addresses key infrastructure challenges in AMD GPUs in cluster and data-center environments"
  "`ROCm Debug Agent (ROCdebug-agent) <https://github.com/ROCm/rocr_debug_agent/>`_ ", "Tool", "Prints the state of all AMD GPU wavefronts that caused a queue error by sending a SIGQUIT signal to the process while the program is running"