Merge pull request #2956 from ROCm/roc-6.0.x

Merge roc-6.0.x into docs/6.0.2
This commit is contained in:
Sam Wu
2024-03-07 09:10:24 -07:00
committed by GitHub
14 changed files with 851 additions and 9 deletions

View File

@@ -2,6 +2,8 @@ AAC
ABI
ACE
ACEs
AccVGPR
AccVGPRs
ALU
AMD
AMDGPU
@@ -103,6 +105,7 @@ GDS
GEMM
GEMMs
GFortran
GiB
GIM
GL
GLXT
@@ -154,6 +157,7 @@ Ioffe
JSON
Jupyter
KFD
KiB
KVM
Keras
Khronos
@@ -170,6 +174,7 @@ LoRA
MEM
MERCHANTABILITY
MFMA
MiB
MIGraphX
MIOpen
MIOpenGEMM
@@ -207,6 +212,7 @@ NUMA
NVCC
NVIDIA
NVPTX
NaN
Nano
Navi
Noncoherently
@@ -387,6 +393,7 @@ awk
backend
backends
benchmarking
bfloat
bilinear
bitsandbytes
blit
@@ -446,6 +453,7 @@ el
embeddings
enablement
endpgm
encodings
env
epilog
etcetera
@@ -620,6 +628,7 @@ subexpression
subfolder
subfolders
supercomputing
tensorfloat
th
tokenization
tokenize

View File

@@ -344,7 +344,7 @@ Note: These complex operations are equivalent to corresponding types/functions o
* `HIP_ROCclr`
* NVIDIA platform
* `HIP_PLATFORM_NVCC`
* The [hcc_detail](https://github.com/ROCm/clr/tree/1949b1621a802ffb1492616adbae6154bfbe64ef/hipamd/include/hip/hcc_detail) and [nvcc_detail](https://github.com/ROCm/clr/tree/1949b1621a802ffb1492616adbae6154bfbe64ef/hipamd/include/hips/nvcc_detail) directories in the clr repository are removed.
* The `hcc_detail` and `nvcc_detail` directories in the clr repository are removed.
* Deprecated gcnArch is removed from hip device struct `hipDeviceProp_t`.
* Deprecated `enum hipMemoryType memoryType;` is removed from HIP struct `hipPointerAttribute_t` union.

View File

@@ -0,0 +1,564 @@
.. meta::
:description: Supported data types in ROCm
:keywords: int8, float8, float8 (E4M3), float8 (E5M2), bfloat8, float16, half, bfloat16, tensorfloat32, float, float32, float64, double, AMD, ROCm, AMDGPU
.. _rocm-supported-data-types:
*************************************************************
ROCm data type specifications
*************************************************************
Integral types
==========================================
The signed and unsigned integral types that are supported by ROCm™ are listed in the following table,
together with their corresponding HIP type and a short description.
.. list-table::
:header-rows: 1
:widths: 15,35,50
*
- Type name
- HIP type
- Description
*
- int8
- ``int8_t``, ``uint8_t``
- A signed or unsigned 8-bit integer
*
- int16
- ``int16_t``, ``uint16_t``
- A signed or unsigned 16-bit integer
*
- int32
- ``int32_t``, ``uint32_t``
- A signed or unsigned 32-bit integer
*
- int64
- ``int64_t``, ``uint64_t``
- A signed or unsigned 64-bit integer
Floating-point types
==========================================
The floating-point types that are supported by ROCm are listed in the following table, together with
their corresponding HIP type and a short description.
.. image:: ../../data/about/compatibility/floating-point-data-types.png
:alt: Supported floating-point types
.. list-table::
:header-rows: 1
:widths: 15,15,70
*
- Type name
- HIP type
- Description
*
- float8 (E4M3)
- ``-``
- An 8-bit floating-point number that mostly follows IEEE-754 conventions and **S1E4M3** bit layout, as described in `8-bit Numerical Formats for Deep Neural Networks <https://arxiv.org/abs/2206.02915>`_ , with expanded range and with no infinity or signed zero. NaN is represented as negative zero.
*
- float8 (E5M2)
- ``-``
- An 8-bit floating-point number mostly following IEEE-754 conventions and **S1E5M2** bit layout, as described in `8-bit Numerical Formats for Deep Neural Networks <https://arxiv.org/abs/2206.02915>`_ , with expanded range and with no infinity or signed zero. NaN is represented as negative zero.
*
- float16
- ``half``
- A 16-bit floating-point number that conforms to the IEEE 754-2008 half-precision storage format.
*
- bfloat16
- ``bfloat16``
- A shortened 16-bit version of the IEEE 754 single-precision storage format.
*
- tensorfloat32
- ``-``
- A floating-point number that occupies 32 bits or less of storage, providing improved range compared to half (16-bit) format, at (potentially) greater throughput than single-precision (32-bit) formats.
*
- float32
- ``float``
- A 32-bit floating-point number that conforms to the IEEE 754 single-precision storage format.
*
- float64
- ``double``
- A 64-bit floating-point number that conforms to the IEEE 754 double-precision storage format.
.. note::
* The float8 and tensorfloat32 types are internal types used in calculations in Matrix Cores and can be stored in any type of the same size.
* The encodings for FP8 (E5M2) and FP8 (E4M3) that are natively supported by MI300 differ from the FP8 (E5M2) and FP8 (E4M3) encodings used in H100 (`FP8 Formats for Deep Learning <https://arxiv.org/abs/2209.05433>`_).
* In some AMD documents and articles, float8 (E5M2) is referred to as bfloat8.
ROCm support icons
==========================================
In the following sections, we use icons to represent the level of support. These icons, described in the
following table, are also used on the library data type support pages.
.. list-table::
:header-rows: 1
*
- Icon
- Definition
*
-
- Not supported
*
- ⚠️
- Partial support
*
-
- Full support
.. note::
* Full support means that the type is supported natively or with hardware emulation.
* Native support means that the operations for that type are implemented in hardware. Types that are not natively supported are emulated with the available hardware. The performance of non-natively supported types can differ from the full instruction throughput rate. For example, 16-bit integer operations can be performed on the 32-bit integer ALUs at full rate; however, 64-bit integer operations might need several instructions on the 32-bit integer ALUs.
* Any type can be emulated by software, but this page does not cover such cases.
Hardware type support
==========================================
AMD GPU hardware support for data types is listed in the following tables.
Compute units support
-------------------------------------------------------------------------------
The following table lists data type support for compute units.
.. tab-set::
.. tab-item:: Integral types
:sync: integral-type
.. list-table::
:header-rows: 1
*
- Type name
- int8
- int16
- int32
- int64
*
- MI100
-
-
-
-
*
- MI200 series
-
-
-
-
*
- MI300 series
-
-
-
-
.. tab-item:: Floating-point types
:sync: floating-point-type
.. list-table::
:header-rows: 1
*
- Type name
- float8 (E4M3)
- float8 (E5M2)
- float16
- bfloat16
- tensorfloat32
- float32
- float64
*
- MI100
-
-
-
-
-
-
-
*
- MI200 series
-
-
-
-
-
-
-
*
- MI300 series
-
-
-
-
-
-
-
Matrix core support
-------------------------------------------------------------------------------
The following table lists data type support for AMD GPU matrix cores.
.. tab-set::
.. tab-item:: Integral types
:sync: integral-type
.. list-table::
:header-rows: 1
*
- Type name
- int8
- int16
- int32
- int64
*
- MI100
-
-
-
-
*
- MI200 series
-
-
-
-
*
- MI300 series
-
-
-
-
.. tab-item:: Floating-point types
:sync: floating-point-type
.. list-table::
:header-rows: 1
*
- Type name
- float8 (E4M3)
- float8 (E5M2)
- float16
- bfloat16
- tensorfloat32
- float32
- float64
*
- MI100
-
-
-
-
-
-
-
*
- MI200 series
-
-
-
-
-
-
-
*
- MI300 series
-
-
-
-
-
-
-
Atomic operations support
-------------------------------------------------------------------------------
The following table lists data type support for atomic operations.
.. tab-set::
.. tab-item:: Integral types
:sync: integral-type
.. list-table::
:header-rows: 1
*
- Type name
- int8
- int16
- int32
- int64
*
- MI100
-
-
-
-
*
- MI200 series
-
-
-
-
*
- MI300 series
-
-
-
-
.. tab-item:: Floating-point types
:sync: floating-point-type
.. list-table::
:header-rows: 1
*
- Type name
- float8 (E4M3)
- float8 (E5M2)
- float16
- bfloat16
- tensorfloat32
- float32
- float64
*
- MI100
-
-
-
-
-
-
-
*
- MI200 series
-
-
-
-
-
-
-
*
- MI300 series
-
-
-
-
-
-
-
.. note::
For cases that are not natively supported, you can emulate atomic operations using software.
Software-emulated atomic operations have high negative performance impact when they frequently
access the same memory address.
Data Type support in ROCm Libraries
==========================================
ROCm library support for int8, float8 (E4M3), float8 (E5M2), int16, float16, bfloat16, int32,
tensorfloat32, float32, int64, and float64 is listed in the following tables.
Libraries input/output type support
-------------------------------------------------------------------------------
The following tables list ROCm library support for specific input and output data types. For a detailed
description, refer to the corresponding library data type support page.
.. tab-set::
.. tab-item:: Integral types
:sync: integral-type
.. list-table::
:header-rows: 1
*
- Library input/output data type name
- int8
- int16
- int32
- int64
*
- hipSPARSELt (:doc:`details<hipsparselt:reference/data-type-support>`)
- ✅/✅
- ❌/❌
- ❌/❌
- ❌/❌
*
- rocRAND (:doc:`details<rocrand:data-type-support>`)
- -/✅
- -/✅
- -/✅
- -/✅
*
- hipRAND (:doc:`details<hiprand:data-type-support>`)
- -/✅
- -/✅
- -/✅
- -/✅
*
- rocPRIM (:doc:`details<rocprim:data-type-support>`)
- ✅/✅
- ✅/✅
- ✅/✅
- ✅/✅
*
- hipCUB (:doc:`details<hipcub:data-type-support>`)
- ✅/✅
- ✅/✅
- ✅/✅
- ✅/✅
*
- rocThrust (:doc:`details<rocthrust:data-type-support>`)
- ✅/✅
- ✅/✅
- ✅/✅
- ✅/✅
.. tab-item:: Floating-point types
:sync: floating-point-type
.. list-table::
:header-rows: 1
*
- Library input/output data type name
- float8 (E4M3)
- float8 (E5M2)
- float16
- bfloat16
- tensorfloat32
- float32
- float64
*
- hipSPARSELt (:doc:`details<hipsparselt:reference/data-type-support>`)
- ❌/❌
- ❌/❌
- ✅/✅
- ✅/✅
- ❌/❌
- ❌/❌
- ❌/❌
*
- rocRAND (:doc:`details<rocrand:data-type-support>`)
- -/❌
- -/❌
- -/✅
- -/❌
- -/❌
- -/✅
- -/✅
*
- hipRAND (:doc:`details<hiprand:data-type-support>`)
- -/❌
- -/❌
- -/✅
- -/❌
- -/❌
- -/✅
- -/✅
*
- rocPRIM (:doc:`details<rocprim:data-type-support>`)
- ❌/❌
- ❌/❌
- ✅/✅
- ✅/✅
- ❌/❌
- ✅/✅
- ✅/✅
*
- hipCUB (:doc:`details<hipcub:data-type-support>`)
- ❌/❌
- ❌/❌
- ✅/✅
- ✅/✅
- ❌/❌
- ✅/✅
- ✅/✅
*
- rocThrust (:doc:`details<rocthrust:data-type-support>`)
- ❌/❌
- ❌/❌
- ⚠️/⚠️
- ⚠️/⚠️
- ❌/❌
- ✅/✅
- ✅/✅
Libraries internal calculations type support
-------------------------------------------------------------------------------
The following tables list ROCm library support for specific internal data types. For a detailed
description, refer to the corresponding library data type support page.
.. tab-set::
.. tab-item:: Integral types
:sync: integral-type
.. list-table::
:header-rows: 1
*
- Library internal data type name
- int8
- int16
- int32
- int64
*
- hipSPARSELt (:doc:`details<hipsparselt:reference/data-type-support>`)
-
-
-
-
.. tab-item:: Floating-point types
:sync: floating-point-type
.. list-table::
:header-rows: 1
*
- Library internal data type name
- float8 (E4M3)
- float8 (E5M2)
- float16
- bfloat16
- tensorfloat32
- float32
- float64
*
- hipSPARSELt (:doc:`details<hipsparselt:reference/data-type-support>`)
-
-
-
-
-
-
-

View File

@@ -5,6 +5,8 @@
MI100, AMD Instinct">
</head>
(gpu-arch-documentation)=
# GPU architecture documentation
:::::{grid} 1 1 2 2

View File

@@ -95,7 +95,7 @@ connected via AMD Infinity Fabric™ network on-chip.
```{figure} ../../data/conceptual/gpu-arch/image008.png
---
name: mi300-arch
alt:
alt:
align: center
---
MI300 series system architecture showing MI300A (left) with 6 XCDs and 3 CCDs, while the MI300X (right) has 8 XCDs.

View File

@@ -110,14 +110,14 @@ Sub-subsection title (H4)
1. Add a tag to the section you want to reference:
.. _my-section-tag:
.. _my-section-tag: section-1
Section 1
==========
2. Link to your tag:
As shown in :ref:`my-section-tag`.
As shown in :ref:`section-1`.
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 81 KiB

View File

@@ -52,10 +52,11 @@ Our documentation is organized into the following categories:
* {doc}`System requirements (Linux)<rocm-install-on-linux:reference/system-requirements>`
* {doc}`System requirements (Windows)<rocm-install-on-windows:reference/system-requirements>`
* {doc}`Third-party<rocm-install-on-linux:reference/3rd-party-support-matrix>`
* {doc}`Third-party support<rocm-install-on-linux:reference/3rd-party-support-matrix>`
* {doc}`User/kernel space<rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`
* {doc}`Docker<rocm-install-on-linux:reference/docker-image-support-matrix>`
* [OpenMP](./about/compatibility/openmp.md)
* [Precision support](./about/compatibility/data-type-support.rst)
* {doc}`ROCm on Radeon GPUs<radeon:index>`
:::
@@ -77,6 +78,8 @@ Our documentation is organized into the following categories:
* Development
* Performance analysis
* System
* [GPU architectures](./reference/gpu-arch.rst)
* [GPU architecture hardware specification overview](./reference/gpu-arch/gpu-arch-spec-overview.rst)
:::
:::{grid-item-card}

View File

@@ -0,0 +1,13 @@
.. meta::
:description: GPU Architecture reference
:keywords: AMD, GPU, architecture, hardware, CDNA, Instinct, reference
.. _gpu-arch-reference:
GPU architecture reference
##########################
General overview
""""""""""""""""
* :doc:`GPU architecture hardware specifications overview<gpu-arch/gpu-arch-spec-overview>`

View File

@@ -0,0 +1,241 @@
.. meta::
:description: AMD Instinct™ GPU architecture information
:keywords: Instinct, CDNA, GPU, architecture, VRAM, Compute Units, Cache, Registers, LDS, Register File
GPU architecture hardware specifications
########################################
The following table provides an overview over the hardware specifications for the AMD Instinct accelerators.
.. list-table:: AMD Instinct architecture specification table
:header-rows: 1
:name: instinct-arch-spec-table
*
- Model
- Architecture
- LLVM target name
- VRAM
- Compute Units
- Wavefront Size
- LDS
- L3 Cache
- L2 Cache
- L1 Vector Cache
- L1 Scalar Cache
- L1 Instruction Cache
- VGPR File
- SGPR File
*
- MI300X
- CDNA3
- gfx941 or gfx942
- 192 GiB
- 304
- 64
- 64 KiB
- 256 MiB
- 32 MiB
- 32 KiB
- 16 KiB per 2 CUs
- 64 KiB per 2 CUs
- 512 KiB
- 12.5 KiB
*
- MI300A
- CDNA3
- gfx940 or gfx942
- 128 GiB
- 228
- 64
- 64 KiB
- 256 MiB
- 24 MiB
- 32 KiB
- 16 KiB per 2 CUs
- 64 KiB per 2 CUs
- 512 KiB
- 12.5 KiB
*
- MI250X
- CDNA2
- gfx90a
- 128 GiB
- 220 (110 per GCD)
- 64
- 64 KiB
-
- 16 MiB (8 MiB per GCD)
- 16 KiB
- 16 KiB per 2 CUs
- 32 KiB per 2 CUs
- 512 KiB
- 12.5 KiB
*
- MI250
- CDNA2
- gfx90a
- 128 GiB
- 208
- 64
- 64 KiB
-
- 16 MiB (8 MiB per GCD)
- 16 KiB
- 16 KiB per 2 CUs
- 32 KiB per 2 CUs
- 512 KiB
- 12.5 KiB
*
- MI210
- CDNA2
- gfx90a
- 64 GiB
- 104
- 64
- 64 KiB
-
- 8 MiB
- 16 KiB
- 16 KiB per 2 CUs
- 32 KiB per 2 CUs
- 512 KiB
- 12.5 KiB
*
- MI100
- CDNA
- gfx908
- 32 GiB
- 120
- 64
- 64 KiB
-
- 8 MiB
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB VGPR and 256 KiB AccVGPR
- 12.5 KiB
*
- MI60
- GCN 5.1
- gfx906
- 32 GiB
- 64
- 64
- 64 KiB
-
- 4 MiB
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB
- 12.5 KiB
*
- MI50 (32GB)
- GCN 5.1
- gfx906
- 32 GiB
- 60
- 64
- 64 KiB
-
- 4 MiB
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB
- 12.5 KiB
*
- MI50 (16GB)
- GCN 5.1
- gfx906
- 16 GiB
- 60
- 64
- 64 KiB
-
- 4 MiB
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB
- 12.5 KiB
*
- MI25
- GCN 5.0
- gfx900
- 16 GiB
- 64
- 64
- 64 KiB
-
- 4 MiB
- 16 KiB
- 16 KiB per 3 CUs
- 32 KiB per 3 CUs
- 256 KiB
- 12.5 KiB
*
- MI8
- GCN 3.0
- gfx803
- 4 GiB
- 64
- 64
- 64 KiB
-
- 2 MiB
- 16 KiB
- 16 KiB per 4 CUs
- 32 KiB per 4 CUs
- 256 KiB
- 12.5 KiB
*
- MI6
- GCN 4.0
- gfx803
- 16 GiB
- 36
- 64
- 64 KiB
-
- 2 MiB
- 16 KiB
- 16 KiB per 4 CUs
- 32 KiB per 4 CUs
- 256 KiB
- 12.5 KiB
Glossary
########
For a more detailed explanation refer to the :ref:`specific documents and guides <gpu-arch-documentation>`.
LLVM target name
Argument to pass to clang in `--offload-arch` to compile code for the given architecture.
VRAM
Amount of memory available on the GPU.
Compute Units
Number of compute units on the GPU.
Wavefront Size
Amount of work-items that execute in parallel on a single compute unit. This is equivalent to the warp size in HIP.
LDS
The Local Data Share (LDS) is a low-latency, high-bandwidth scratch pad memory. It is local to the compute units, shared by all work-items in a work group. In HIP this is the shared memory, which is shared by all threads in a block.
L3 Cache
Size of the level 3 cache. Shared by all compute units on the same GPU. Caches vector and scalar data and instructions.
L2 Cache
Size of the level 3 cache. Shared by all compute units on the same GCD. Caches vector and scalar data and instructions.
L1 Vector Cache
Size of the level 1 vector data cache. Local to a compute unit. Caches vector data.
L1 Scalar Cache
Size of the level 1 scalar data cache. Usually shared by several compute units. Caches scalar data.
L1 Instruction Cache
Size of the level 1 instruction cache. Usually shared by several compute units.
VGPR File
Size of the Vector General Purpose Register (VGPR) file. Holds data used in vector instructions.
GPUs with matrix cores also have AccVGPRs, which are Accumulation General Purpose Vector Registers, specifically used in matrix instructions.
SGPR File
Size of the Scalar General Purpose Register (SGPR) file. Holds data used in scalar instructions.
GCD
Graphics Compute Die.

View File

@@ -23,12 +23,16 @@ subtrees:
- url: https://rocm.docs.amd.com/projects/install-on-windows/en/${branch}/
title: HIP SDK on Windows
- caption: Supported configurations
- caption: Compatibility
entries:
- url: https://rocm.docs.amd.com/projects/install-on-linux/en/${branch}/reference/system-requirements.html
title: Linux
- url: https://rocm.docs.amd.com/projects/install-on-windows/en/${branch}/reference/system-requirements.html
title: Windows
- file: about/compatibility/data-type-support.rst
title: Precision support
- url: https://rocm.docs.amd.com/projects/install-on-linux/en/${branch}/reference/3rd-party-support-matrix.html
title: Third-party
- caption: Reference
entries:
@@ -36,6 +40,12 @@ subtrees:
title: API libraries
- file: reference/rocm-tools.md
title: Tools
- file: reference/gpu-arch.rst
title: GPU architectures
subtrees:
- entries:
- file: reference/gpu-arch/gpu-arch-spec-overview.rst
title: Hardware specifications overview
- caption: How-to
entries:

View File

@@ -1 +1 @@
rocm-docs-core==0.35.0
rocm-docs-core==0.35.1

View File

@@ -100,7 +100,7 @@ requests==2.31.0
# via
# pygithub
# sphinx
rocm-docs-core==0.35.0
rocm-docs-core==0.35.1
# via -r requirements.in
smmap==5.0.0
# via gitdb

View File

@@ -59,7 +59,7 @@ see :doc:`ROCm licensing <./about/license>`.
"`RocBandwidthTest <https://github.com/ROCm/rocm_bandwidth_test/>`_ ", "Tool", "Captures the performance characteristics of buffer copying and kernel read/write operations"
":doc:`rocBLAS <rocblas:index>`", "Library (math)", "BLAS implementation (in the HIP programming language) on the ROCm runtime and toolchains"
":doc:`rocFFT <rocfft:index>`", "Library (math)", "Software library for computing fast Fourier transforms (FFTs) written in HIP"
"`ROCmCC <./reference/rocmcc.md>`_ ", "Tool", "Clang/LLVM-based compiler"
":doc:`ROCmCC <./reference/rocmcc>`", "Tool", "Clang/LLVM-based compiler"
"`ROCm CMake <https://github.com/ROCm/rocm-cmake>`_ ", "Tool", "Collection of CMake modules for common build and development tasks"
":doc:`ROCm Data Center Tool <rdc:index>`", "Tool", "Simplifies administration and addresses key infrastructure challenges in AMD GPUs in cluster and data-center environments"
"`ROCm Debug Agent (ROCdebug-agent) <https://github.com/ROCm/rocr_debug_agent/>`_ ", "Tool", "Prints the state of all AMD GPU wavefronts that caused a queue error by sending a SIGQUIT signal to the process while the program is running"