mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-10 23:28:03 -05:00
Merge pull request #2956 from ROCm/roc-6.0.x
Merge roc-6.0.x into docs/6.0.2
This commit is contained in:
@@ -2,6 +2,8 @@ AAC
|
||||
ABI
|
||||
ACE
|
||||
ACEs
|
||||
AccVGPR
|
||||
AccVGPRs
|
||||
ALU
|
||||
AMD
|
||||
AMDGPU
|
||||
@@ -103,6 +105,7 @@ GDS
|
||||
GEMM
|
||||
GEMMs
|
||||
GFortran
|
||||
GiB
|
||||
GIM
|
||||
GL
|
||||
GLXT
|
||||
@@ -154,6 +157,7 @@ Ioffe
|
||||
JSON
|
||||
Jupyter
|
||||
KFD
|
||||
KiB
|
||||
KVM
|
||||
Keras
|
||||
Khronos
|
||||
@@ -170,6 +174,7 @@ LoRA
|
||||
MEM
|
||||
MERCHANTABILITY
|
||||
MFMA
|
||||
MiB
|
||||
MIGraphX
|
||||
MIOpen
|
||||
MIOpenGEMM
|
||||
@@ -207,6 +212,7 @@ NUMA
|
||||
NVCC
|
||||
NVIDIA
|
||||
NVPTX
|
||||
NaN
|
||||
Nano
|
||||
Navi
|
||||
Noncoherently
|
||||
@@ -387,6 +393,7 @@ awk
|
||||
backend
|
||||
backends
|
||||
benchmarking
|
||||
bfloat
|
||||
bilinear
|
||||
bitsandbytes
|
||||
blit
|
||||
@@ -446,6 +453,7 @@ el
|
||||
embeddings
|
||||
enablement
|
||||
endpgm
|
||||
encodings
|
||||
env
|
||||
epilog
|
||||
etcetera
|
||||
@@ -620,6 +628,7 @@ subexpression
|
||||
subfolder
|
||||
subfolders
|
||||
supercomputing
|
||||
tensorfloat
|
||||
th
|
||||
tokenization
|
||||
tokenize
|
||||
|
||||
@@ -344,7 +344,7 @@ Note: These complex operations are equivalent to corresponding types/functions o
|
||||
* `HIP_ROCclr`
|
||||
* NVIDIA platform
|
||||
* `HIP_PLATFORM_NVCC`
|
||||
* The [hcc_detail](https://github.com/ROCm/clr/tree/1949b1621a802ffb1492616adbae6154bfbe64ef/hipamd/include/hip/hcc_detail) and [nvcc_detail](https://github.com/ROCm/clr/tree/1949b1621a802ffb1492616adbae6154bfbe64ef/hipamd/include/hips/nvcc_detail) directories in the clr repository are removed.
|
||||
* The `hcc_detail` and `nvcc_detail` directories in the clr repository are removed.
|
||||
* Deprecated gcnArch is removed from hip device struct `hipDeviceProp_t`.
|
||||
* Deprecated `enum hipMemoryType memoryType;` is removed from HIP struct `hipPointerAttribute_t` union.
|
||||
|
||||
|
||||
564
docs/about/compatibility/data-type-support.rst
Normal file
564
docs/about/compatibility/data-type-support.rst
Normal file
@@ -0,0 +1,564 @@
|
||||
.. meta::
|
||||
:description: Supported data types in ROCm
|
||||
:keywords: int8, float8, float8 (E4M3), float8 (E5M2), bfloat8, float16, half, bfloat16, tensorfloat32, float, float32, float64, double, AMD, ROCm, AMDGPU
|
||||
|
||||
.. _rocm-supported-data-types:
|
||||
|
||||
*************************************************************
|
||||
ROCm data type specifications
|
||||
*************************************************************
|
||||
|
||||
Integral types
|
||||
==========================================
|
||||
|
||||
The signed and unsigned integral types that are supported by ROCm™ are listed in the following table,
|
||||
together with their corresponding HIP type and a short description.
|
||||
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 15,35,50
|
||||
|
||||
*
|
||||
- Type name
|
||||
- HIP type
|
||||
- Description
|
||||
*
|
||||
- int8
|
||||
- ``int8_t``, ``uint8_t``
|
||||
- A signed or unsigned 8-bit integer
|
||||
*
|
||||
- int16
|
||||
- ``int16_t``, ``uint16_t``
|
||||
- A signed or unsigned 16-bit integer
|
||||
*
|
||||
- int32
|
||||
- ``int32_t``, ``uint32_t``
|
||||
- A signed or unsigned 32-bit integer
|
||||
*
|
||||
- int64
|
||||
- ``int64_t``, ``uint64_t``
|
||||
- A signed or unsigned 64-bit integer
|
||||
|
||||
Floating-point types
|
||||
==========================================
|
||||
|
||||
The floating-point types that are supported by ROCm are listed in the following table, together with
|
||||
their corresponding HIP type and a short description.
|
||||
|
||||
.. image:: ../../data/about/compatibility/floating-point-data-types.png
|
||||
:alt: Supported floating-point types
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 15,15,70
|
||||
|
||||
*
|
||||
- Type name
|
||||
- HIP type
|
||||
- Description
|
||||
*
|
||||
- float8 (E4M3)
|
||||
- ``-``
|
||||
- An 8-bit floating-point number that mostly follows IEEE-754 conventions and **S1E4M3** bit layout, as described in `8-bit Numerical Formats for Deep Neural Networks <https://arxiv.org/abs/2206.02915>`_ , with expanded range and with no infinity or signed zero. NaN is represented as negative zero.
|
||||
*
|
||||
- float8 (E5M2)
|
||||
- ``-``
|
||||
- An 8-bit floating-point number mostly following IEEE-754 conventions and **S1E5M2** bit layout, as described in `8-bit Numerical Formats for Deep Neural Networks <https://arxiv.org/abs/2206.02915>`_ , with expanded range and with no infinity or signed zero. NaN is represented as negative zero.
|
||||
*
|
||||
- float16
|
||||
- ``half``
|
||||
- A 16-bit floating-point number that conforms to the IEEE 754-2008 half-precision storage format.
|
||||
*
|
||||
- bfloat16
|
||||
- ``bfloat16``
|
||||
- A shortened 16-bit version of the IEEE 754 single-precision storage format.
|
||||
*
|
||||
- tensorfloat32
|
||||
- ``-``
|
||||
- A floating-point number that occupies 32 bits or less of storage, providing improved range compared to half (16-bit) format, at (potentially) greater throughput than single-precision (32-bit) formats.
|
||||
*
|
||||
- float32
|
||||
- ``float``
|
||||
- A 32-bit floating-point number that conforms to the IEEE 754 single-precision storage format.
|
||||
*
|
||||
- float64
|
||||
- ``double``
|
||||
- A 64-bit floating-point number that conforms to the IEEE 754 double-precision storage format.
|
||||
|
||||
.. note::
|
||||
|
||||
* The float8 and tensorfloat32 types are internal types used in calculations in Matrix Cores and can be stored in any type of the same size.
|
||||
* The encodings for FP8 (E5M2) and FP8 (E4M3) that are natively supported by MI300 differ from the FP8 (E5M2) and FP8 (E4M3) encodings used in H100 (`FP8 Formats for Deep Learning <https://arxiv.org/abs/2209.05433>`_).
|
||||
* In some AMD documents and articles, float8 (E5M2) is referred to as bfloat8.
|
||||
|
||||
ROCm support icons
|
||||
==========================================
|
||||
|
||||
In the following sections, we use icons to represent the level of support. These icons, described in the
|
||||
following table, are also used on the library data type support pages.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
*
|
||||
- Icon
|
||||
- Definition
|
||||
*
|
||||
- ❌
|
||||
- Not supported
|
||||
|
||||
*
|
||||
- ⚠️
|
||||
- Partial support
|
||||
|
||||
*
|
||||
- ✅
|
||||
- Full support
|
||||
|
||||
.. note::
|
||||
|
||||
* Full support means that the type is supported natively or with hardware emulation.
|
||||
* Native support means that the operations for that type are implemented in hardware. Types that are not natively supported are emulated with the available hardware. The performance of non-natively supported types can differ from the full instruction throughput rate. For example, 16-bit integer operations can be performed on the 32-bit integer ALUs at full rate; however, 64-bit integer operations might need several instructions on the 32-bit integer ALUs.
|
||||
* Any type can be emulated by software, but this page does not cover such cases.
|
||||
|
||||
Hardware type support
|
||||
==========================================
|
||||
|
||||
AMD GPU hardware support for data types is listed in the following tables.
|
||||
|
||||
Compute units support
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
The following table lists data type support for compute units.
|
||||
|
||||
.. tab-set::
|
||||
|
||||
.. tab-item:: Integral types
|
||||
:sync: integral-type
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
*
|
||||
- Type name
|
||||
- int8
|
||||
- int16
|
||||
- int32
|
||||
- int64
|
||||
*
|
||||
- MI100
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
*
|
||||
- MI200 series
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
*
|
||||
- MI300 series
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
|
||||
.. tab-item:: Floating-point types
|
||||
:sync: floating-point-type
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
*
|
||||
- Type name
|
||||
- float8 (E4M3)
|
||||
- float8 (E5M2)
|
||||
- float16
|
||||
- bfloat16
|
||||
- tensorfloat32
|
||||
- float32
|
||||
- float64
|
||||
*
|
||||
- MI100
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
- ✅
|
||||
- ❌
|
||||
- ✅
|
||||
- ✅
|
||||
*
|
||||
- MI200 series
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
- ✅
|
||||
- ❌
|
||||
- ✅
|
||||
- ✅
|
||||
*
|
||||
- MI300 series
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
- ✅
|
||||
- ❌
|
||||
- ✅
|
||||
- ✅
|
||||
|
||||
Matrix core support
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
The following table lists data type support for AMD GPU matrix cores.
|
||||
|
||||
.. tab-set::
|
||||
|
||||
.. tab-item:: Integral types
|
||||
:sync: integral-type
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
*
|
||||
- Type name
|
||||
- int8
|
||||
- int16
|
||||
- int32
|
||||
- int64
|
||||
*
|
||||
- MI100
|
||||
- ✅
|
||||
- ❌
|
||||
- ❌
|
||||
- ❌
|
||||
*
|
||||
- MI200 series
|
||||
- ✅
|
||||
- ❌
|
||||
- ❌
|
||||
- ❌
|
||||
*
|
||||
- MI300 series
|
||||
- ✅
|
||||
- ❌
|
||||
- ❌
|
||||
- ❌
|
||||
|
||||
.. tab-item:: Floating-point types
|
||||
:sync: floating-point-type
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
*
|
||||
- Type name
|
||||
- float8 (E4M3)
|
||||
- float8 (E5M2)
|
||||
- float16
|
||||
- bfloat16
|
||||
- tensorfloat32
|
||||
- float32
|
||||
- float64
|
||||
*
|
||||
- MI100
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
- ✅
|
||||
- ❌
|
||||
- ✅
|
||||
- ❌
|
||||
*
|
||||
- MI200 series
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
- ✅
|
||||
- ❌
|
||||
- ✅
|
||||
- ✅
|
||||
*
|
||||
- MI300 series
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
|
||||
Atomic operations support
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
The following table lists data type support for atomic operations.
|
||||
|
||||
.. tab-set::
|
||||
|
||||
.. tab-item:: Integral types
|
||||
:sync: integral-type
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
*
|
||||
- Type name
|
||||
- int8
|
||||
- int16
|
||||
- int32
|
||||
- int64
|
||||
*
|
||||
- MI100
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
- ❌
|
||||
*
|
||||
- MI200 series
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
- ✅
|
||||
*
|
||||
- MI300 series
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
- ✅
|
||||
|
||||
.. tab-item:: Floating-point types
|
||||
:sync: floating-point-type
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
*
|
||||
- Type name
|
||||
- float8 (E4M3)
|
||||
- float8 (E5M2)
|
||||
- float16
|
||||
- bfloat16
|
||||
- tensorfloat32
|
||||
- float32
|
||||
- float64
|
||||
*
|
||||
- MI100
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
- ❌
|
||||
*
|
||||
- MI200 series
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
- ✅
|
||||
*
|
||||
- MI300 series
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
- ✅
|
||||
|
||||
.. note::
|
||||
|
||||
For cases that are not natively supported, you can emulate atomic operations using software.
|
||||
Software-emulated atomic operations have high negative performance impact when they frequently
|
||||
access the same memory address.
|
||||
|
||||
Data Type support in ROCm Libraries
|
||||
==========================================
|
||||
|
||||
ROCm library support for int8, float8 (E4M3), float8 (E5M2), int16, float16, bfloat16, int32,
|
||||
tensorfloat32, float32, int64, and float64 is listed in the following tables.
|
||||
|
||||
Libraries input/output type support
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
The following tables list ROCm library support for specific input and output data types. For a detailed
|
||||
description, refer to the corresponding library data type support page.
|
||||
|
||||
.. tab-set::
|
||||
|
||||
.. tab-item:: Integral types
|
||||
:sync: integral-type
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
*
|
||||
- Library input/output data type name
|
||||
- int8
|
||||
- int16
|
||||
- int32
|
||||
- int64
|
||||
*
|
||||
- hipSPARSELt (:doc:`details<hipsparselt:reference/data-type-support>`)
|
||||
- ✅/✅
|
||||
- ❌/❌
|
||||
- ❌/❌
|
||||
- ❌/❌
|
||||
*
|
||||
- rocRAND (:doc:`details<rocrand:data-type-support>`)
|
||||
- -/✅
|
||||
- -/✅
|
||||
- -/✅
|
||||
- -/✅
|
||||
*
|
||||
- hipRAND (:doc:`details<hiprand:data-type-support>`)
|
||||
- -/✅
|
||||
- -/✅
|
||||
- -/✅
|
||||
- -/✅
|
||||
*
|
||||
- rocPRIM (:doc:`details<rocprim:data-type-support>`)
|
||||
- ✅/✅
|
||||
- ✅/✅
|
||||
- ✅/✅
|
||||
- ✅/✅
|
||||
*
|
||||
- hipCUB (:doc:`details<hipcub:data-type-support>`)
|
||||
- ✅/✅
|
||||
- ✅/✅
|
||||
- ✅/✅
|
||||
- ✅/✅
|
||||
*
|
||||
- rocThrust (:doc:`details<rocthrust:data-type-support>`)
|
||||
- ✅/✅
|
||||
- ✅/✅
|
||||
- ✅/✅
|
||||
- ✅/✅
|
||||
|
||||
.. tab-item:: Floating-point types
|
||||
:sync: floating-point-type
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
*
|
||||
- Library input/output data type name
|
||||
- float8 (E4M3)
|
||||
- float8 (E5M2)
|
||||
- float16
|
||||
- bfloat16
|
||||
- tensorfloat32
|
||||
- float32
|
||||
- float64
|
||||
*
|
||||
- hipSPARSELt (:doc:`details<hipsparselt:reference/data-type-support>`)
|
||||
- ❌/❌
|
||||
- ❌/❌
|
||||
- ✅/✅
|
||||
- ✅/✅
|
||||
- ❌/❌
|
||||
- ❌/❌
|
||||
- ❌/❌
|
||||
*
|
||||
- rocRAND (:doc:`details<rocrand:data-type-support>`)
|
||||
- -/❌
|
||||
- -/❌
|
||||
- -/✅
|
||||
- -/❌
|
||||
- -/❌
|
||||
- -/✅
|
||||
- -/✅
|
||||
*
|
||||
- hipRAND (:doc:`details<hiprand:data-type-support>`)
|
||||
- -/❌
|
||||
- -/❌
|
||||
- -/✅
|
||||
- -/❌
|
||||
- -/❌
|
||||
- -/✅
|
||||
- -/✅
|
||||
*
|
||||
- rocPRIM (:doc:`details<rocprim:data-type-support>`)
|
||||
- ❌/❌
|
||||
- ❌/❌
|
||||
- ✅/✅
|
||||
- ✅/✅
|
||||
- ❌/❌
|
||||
- ✅/✅
|
||||
- ✅/✅
|
||||
*
|
||||
- hipCUB (:doc:`details<hipcub:data-type-support>`)
|
||||
- ❌/❌
|
||||
- ❌/❌
|
||||
- ✅/✅
|
||||
- ✅/✅
|
||||
- ❌/❌
|
||||
- ✅/✅
|
||||
- ✅/✅
|
||||
*
|
||||
- rocThrust (:doc:`details<rocthrust:data-type-support>`)
|
||||
- ❌/❌
|
||||
- ❌/❌
|
||||
- ⚠️/⚠️
|
||||
- ⚠️/⚠️
|
||||
- ❌/❌
|
||||
- ✅/✅
|
||||
- ✅/✅
|
||||
|
||||
|
||||
Libraries internal calculations type support
|
||||
-------------------------------------------------------------------------------
|
||||
|
||||
The following tables list ROCm library support for specific internal data types. For a detailed
|
||||
description, refer to the corresponding library data type support page.
|
||||
|
||||
.. tab-set::
|
||||
|
||||
.. tab-item:: Integral types
|
||||
:sync: integral-type
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
*
|
||||
- Library internal data type name
|
||||
- int8
|
||||
- int16
|
||||
- int32
|
||||
- int64
|
||||
*
|
||||
- hipSPARSELt (:doc:`details<hipsparselt:reference/data-type-support>`)
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
- ❌
|
||||
|
||||
|
||||
.. tab-item:: Floating-point types
|
||||
:sync: floating-point-type
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
*
|
||||
- Library internal data type name
|
||||
- float8 (E4M3)
|
||||
- float8 (E5M2)
|
||||
- float16
|
||||
- bfloat16
|
||||
- tensorfloat32
|
||||
- float32
|
||||
- float64
|
||||
*
|
||||
- hipSPARSELt (:doc:`details<hipsparselt:reference/data-type-support>`)
|
||||
- ❌
|
||||
- ❌
|
||||
- ❌
|
||||
- ❌
|
||||
- ❌
|
||||
- ✅
|
||||
- ❌
|
||||
@@ -5,6 +5,8 @@
|
||||
MI100, AMD Instinct">
|
||||
</head>
|
||||
|
||||
(gpu-arch-documentation)=
|
||||
|
||||
# GPU architecture documentation
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
|
||||
@@ -95,7 +95,7 @@ connected via AMD Infinity Fabric™ network on-chip.
|
||||
```{figure} ../../data/conceptual/gpu-arch/image008.png
|
||||
---
|
||||
name: mi300-arch
|
||||
alt:
|
||||
alt:
|
||||
align: center
|
||||
---
|
||||
MI300 series system architecture showing MI300A (left) with 6 XCDs and 3 CCDs, while the MI300X (right) has 8 XCDs.
|
||||
|
||||
@@ -110,14 +110,14 @@ Sub-subsection title (H4)
|
||||
|
||||
1. Add a tag to the section you want to reference:
|
||||
|
||||
.. _my-section-tag:
|
||||
.. _my-section-tag: section-1
|
||||
|
||||
Section 1
|
||||
==========
|
||||
|
||||
2. Link to your tag:
|
||||
|
||||
As shown in :ref:`my-section-tag`.
|
||||
As shown in :ref:`section-1`.
|
||||
|
||||
```
|
||||
|
||||
|
||||
BIN
docs/data/about/compatibility/floating-point-data-types.png
Normal file
BIN
docs/data/about/compatibility/floating-point-data-types.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 81 KiB |
@@ -52,10 +52,11 @@ Our documentation is organized into the following categories:
|
||||
|
||||
* {doc}`System requirements (Linux)<rocm-install-on-linux:reference/system-requirements>`
|
||||
* {doc}`System requirements (Windows)<rocm-install-on-windows:reference/system-requirements>`
|
||||
* {doc}`Third-party<rocm-install-on-linux:reference/3rd-party-support-matrix>`
|
||||
* {doc}`Third-party support<rocm-install-on-linux:reference/3rd-party-support-matrix>`
|
||||
* {doc}`User/kernel space<rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`
|
||||
* {doc}`Docker<rocm-install-on-linux:reference/docker-image-support-matrix>`
|
||||
* [OpenMP](./about/compatibility/openmp.md)
|
||||
* [Precision support](./about/compatibility/data-type-support.rst)
|
||||
* {doc}`ROCm on Radeon GPUs<radeon:index>`
|
||||
:::
|
||||
|
||||
@@ -77,6 +78,8 @@ Our documentation is organized into the following categories:
|
||||
* Development
|
||||
* Performance analysis
|
||||
* System
|
||||
* [GPU architectures](./reference/gpu-arch.rst)
|
||||
* [GPU architecture hardware specification overview](./reference/gpu-arch/gpu-arch-spec-overview.rst)
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
|
||||
13
docs/reference/gpu-arch.rst
Normal file
13
docs/reference/gpu-arch.rst
Normal file
@@ -0,0 +1,13 @@
|
||||
.. meta::
|
||||
:description: GPU Architecture reference
|
||||
:keywords: AMD, GPU, architecture, hardware, CDNA, Instinct, reference
|
||||
|
||||
.. _gpu-arch-reference:
|
||||
|
||||
GPU architecture reference
|
||||
##########################
|
||||
|
||||
General overview
|
||||
""""""""""""""""
|
||||
|
||||
* :doc:`GPU architecture hardware specifications overview<gpu-arch/gpu-arch-spec-overview>`
|
||||
241
docs/reference/gpu-arch/gpu-arch-spec-overview.rst
Normal file
241
docs/reference/gpu-arch/gpu-arch-spec-overview.rst
Normal file
@@ -0,0 +1,241 @@
|
||||
.. meta::
|
||||
:description: AMD Instinct™ GPU architecture information
|
||||
:keywords: Instinct, CDNA, GPU, architecture, VRAM, Compute Units, Cache, Registers, LDS, Register File
|
||||
|
||||
GPU architecture hardware specifications
|
||||
########################################
|
||||
|
||||
The following table provides an overview over the hardware specifications for the AMD Instinct accelerators.
|
||||
|
||||
.. list-table:: AMD Instinct architecture specification table
|
||||
:header-rows: 1
|
||||
:name: instinct-arch-spec-table
|
||||
|
||||
*
|
||||
- Model
|
||||
- Architecture
|
||||
- LLVM target name
|
||||
- VRAM
|
||||
- Compute Units
|
||||
- Wavefront Size
|
||||
- LDS
|
||||
- L3 Cache
|
||||
- L2 Cache
|
||||
- L1 Vector Cache
|
||||
- L1 Scalar Cache
|
||||
- L1 Instruction Cache
|
||||
- VGPR File
|
||||
- SGPR File
|
||||
*
|
||||
- MI300X
|
||||
- CDNA3
|
||||
- gfx941 or gfx942
|
||||
- 192 GiB
|
||||
- 304
|
||||
- 64
|
||||
- 64 KiB
|
||||
- 256 MiB
|
||||
- 32 MiB
|
||||
- 32 KiB
|
||||
- 16 KiB per 2 CUs
|
||||
- 64 KiB per 2 CUs
|
||||
- 512 KiB
|
||||
- 12.5 KiB
|
||||
*
|
||||
- MI300A
|
||||
- CDNA3
|
||||
- gfx940 or gfx942
|
||||
- 128 GiB
|
||||
- 228
|
||||
- 64
|
||||
- 64 KiB
|
||||
- 256 MiB
|
||||
- 24 MiB
|
||||
- 32 KiB
|
||||
- 16 KiB per 2 CUs
|
||||
- 64 KiB per 2 CUs
|
||||
- 512 KiB
|
||||
- 12.5 KiB
|
||||
*
|
||||
- MI250X
|
||||
- CDNA2
|
||||
- gfx90a
|
||||
- 128 GiB
|
||||
- 220 (110 per GCD)
|
||||
- 64
|
||||
- 64 KiB
|
||||
-
|
||||
- 16 MiB (8 MiB per GCD)
|
||||
- 16 KiB
|
||||
- 16 KiB per 2 CUs
|
||||
- 32 KiB per 2 CUs
|
||||
- 512 KiB
|
||||
- 12.5 KiB
|
||||
*
|
||||
- MI250
|
||||
- CDNA2
|
||||
- gfx90a
|
||||
- 128 GiB
|
||||
- 208
|
||||
- 64
|
||||
- 64 KiB
|
||||
-
|
||||
- 16 MiB (8 MiB per GCD)
|
||||
- 16 KiB
|
||||
- 16 KiB per 2 CUs
|
||||
- 32 KiB per 2 CUs
|
||||
- 512 KiB
|
||||
- 12.5 KiB
|
||||
*
|
||||
- MI210
|
||||
- CDNA2
|
||||
- gfx90a
|
||||
- 64 GiB
|
||||
- 104
|
||||
- 64
|
||||
- 64 KiB
|
||||
-
|
||||
- 8 MiB
|
||||
- 16 KiB
|
||||
- 16 KiB per 2 CUs
|
||||
- 32 KiB per 2 CUs
|
||||
- 512 KiB
|
||||
- 12.5 KiB
|
||||
*
|
||||
- MI100
|
||||
- CDNA
|
||||
- gfx908
|
||||
- 32 GiB
|
||||
- 120
|
||||
- 64
|
||||
- 64 KiB
|
||||
-
|
||||
- 8 MiB
|
||||
- 16 KiB
|
||||
- 16 KiB per 3 CUs
|
||||
- 32 KiB per 3 CUs
|
||||
- 256 KiB VGPR and 256 KiB AccVGPR
|
||||
- 12.5 KiB
|
||||
*
|
||||
- MI60
|
||||
- GCN 5.1
|
||||
- gfx906
|
||||
- 32 GiB
|
||||
- 64
|
||||
- 64
|
||||
- 64 KiB
|
||||
-
|
||||
- 4 MiB
|
||||
- 16 KiB
|
||||
- 16 KiB per 3 CUs
|
||||
- 32 KiB per 3 CUs
|
||||
- 256 KiB
|
||||
- 12.5 KiB
|
||||
*
|
||||
- MI50 (32GB)
|
||||
- GCN 5.1
|
||||
- gfx906
|
||||
- 32 GiB
|
||||
- 60
|
||||
- 64
|
||||
- 64 KiB
|
||||
-
|
||||
- 4 MiB
|
||||
- 16 KiB
|
||||
- 16 KiB per 3 CUs
|
||||
- 32 KiB per 3 CUs
|
||||
- 256 KiB
|
||||
- 12.5 KiB
|
||||
*
|
||||
- MI50 (16GB)
|
||||
- GCN 5.1
|
||||
- gfx906
|
||||
- 16 GiB
|
||||
- 60
|
||||
- 64
|
||||
- 64 KiB
|
||||
-
|
||||
- 4 MiB
|
||||
- 16 KiB
|
||||
- 16 KiB per 3 CUs
|
||||
- 32 KiB per 3 CUs
|
||||
- 256 KiB
|
||||
- 12.5 KiB
|
||||
*
|
||||
- MI25
|
||||
- GCN 5.0
|
||||
- gfx900
|
||||
- 16 GiB
|
||||
- 64
|
||||
- 64
|
||||
- 64 KiB
|
||||
-
|
||||
- 4 MiB
|
||||
- 16 KiB
|
||||
- 16 KiB per 3 CUs
|
||||
- 32 KiB per 3 CUs
|
||||
- 256 KiB
|
||||
- 12.5 KiB
|
||||
*
|
||||
- MI8
|
||||
- GCN 3.0
|
||||
- gfx803
|
||||
- 4 GiB
|
||||
- 64
|
||||
- 64
|
||||
- 64 KiB
|
||||
-
|
||||
- 2 MiB
|
||||
- 16 KiB
|
||||
- 16 KiB per 4 CUs
|
||||
- 32 KiB per 4 CUs
|
||||
- 256 KiB
|
||||
- 12.5 KiB
|
||||
*
|
||||
- MI6
|
||||
- GCN 4.0
|
||||
- gfx803
|
||||
- 16 GiB
|
||||
- 36
|
||||
- 64
|
||||
- 64 KiB
|
||||
-
|
||||
- 2 MiB
|
||||
- 16 KiB
|
||||
- 16 KiB per 4 CUs
|
||||
- 32 KiB per 4 CUs
|
||||
- 256 KiB
|
||||
- 12.5 KiB
|
||||
|
||||
Glossary
|
||||
########
|
||||
|
||||
For a more detailed explanation refer to the :ref:`specific documents and guides <gpu-arch-documentation>`.
|
||||
|
||||
LLVM target name
|
||||
Argument to pass to clang in `--offload-arch` to compile code for the given architecture.
|
||||
VRAM
|
||||
Amount of memory available on the GPU.
|
||||
Compute Units
|
||||
Number of compute units on the GPU.
|
||||
Wavefront Size
|
||||
Amount of work-items that execute in parallel on a single compute unit. This is equivalent to the warp size in HIP.
|
||||
LDS
|
||||
The Local Data Share (LDS) is a low-latency, high-bandwidth scratch pad memory. It is local to the compute units, shared by all work-items in a work group. In HIP this is the shared memory, which is shared by all threads in a block.
|
||||
L3 Cache
|
||||
Size of the level 3 cache. Shared by all compute units on the same GPU. Caches vector and scalar data and instructions.
|
||||
L2 Cache
|
||||
Size of the level 3 cache. Shared by all compute units on the same GCD. Caches vector and scalar data and instructions.
|
||||
L1 Vector Cache
|
||||
Size of the level 1 vector data cache. Local to a compute unit. Caches vector data.
|
||||
L1 Scalar Cache
|
||||
Size of the level 1 scalar data cache. Usually shared by several compute units. Caches scalar data.
|
||||
L1 Instruction Cache
|
||||
Size of the level 1 instruction cache. Usually shared by several compute units.
|
||||
VGPR File
|
||||
Size of the Vector General Purpose Register (VGPR) file. Holds data used in vector instructions.
|
||||
GPUs with matrix cores also have AccVGPRs, which are Accumulation General Purpose Vector Registers, specifically used in matrix instructions.
|
||||
SGPR File
|
||||
Size of the Scalar General Purpose Register (SGPR) file. Holds data used in scalar instructions.
|
||||
GCD
|
||||
Graphics Compute Die.
|
||||
@@ -23,12 +23,16 @@ subtrees:
|
||||
- url: https://rocm.docs.amd.com/projects/install-on-windows/en/${branch}/
|
||||
title: HIP SDK on Windows
|
||||
|
||||
- caption: Supported configurations
|
||||
- caption: Compatibility
|
||||
entries:
|
||||
- url: https://rocm.docs.amd.com/projects/install-on-linux/en/${branch}/reference/system-requirements.html
|
||||
title: Linux
|
||||
- url: https://rocm.docs.amd.com/projects/install-on-windows/en/${branch}/reference/system-requirements.html
|
||||
title: Windows
|
||||
- file: about/compatibility/data-type-support.rst
|
||||
title: Precision support
|
||||
- url: https://rocm.docs.amd.com/projects/install-on-linux/en/${branch}/reference/3rd-party-support-matrix.html
|
||||
title: Third-party
|
||||
|
||||
- caption: Reference
|
||||
entries:
|
||||
@@ -36,6 +40,12 @@ subtrees:
|
||||
title: API libraries
|
||||
- file: reference/rocm-tools.md
|
||||
title: Tools
|
||||
- file: reference/gpu-arch.rst
|
||||
title: GPU architectures
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: reference/gpu-arch/gpu-arch-spec-overview.rst
|
||||
title: Hardware specifications overview
|
||||
|
||||
- caption: How-to
|
||||
entries:
|
||||
|
||||
@@ -1 +1 @@
|
||||
rocm-docs-core==0.35.0
|
||||
rocm-docs-core==0.35.1
|
||||
|
||||
@@ -100,7 +100,7 @@ requests==2.31.0
|
||||
# via
|
||||
# pygithub
|
||||
# sphinx
|
||||
rocm-docs-core==0.35.0
|
||||
rocm-docs-core==0.35.1
|
||||
# via -r requirements.in
|
||||
smmap==5.0.0
|
||||
# via gitdb
|
||||
|
||||
@@ -59,7 +59,7 @@ see :doc:`ROCm licensing <./about/license>`.
|
||||
"`RocBandwidthTest <https://github.com/ROCm/rocm_bandwidth_test/>`_ ", "Tool", "Captures the performance characteristics of buffer copying and kernel read/write operations"
|
||||
":doc:`rocBLAS <rocblas:index>`", "Library (math)", "BLAS implementation (in the HIP programming language) on the ROCm runtime and toolchains"
|
||||
":doc:`rocFFT <rocfft:index>`", "Library (math)", "Software library for computing fast Fourier transforms (FFTs) written in HIP"
|
||||
"`ROCmCC <./reference/rocmcc.md>`_ ", "Tool", "Clang/LLVM-based compiler"
|
||||
":doc:`ROCmCC <./reference/rocmcc>`", "Tool", "Clang/LLVM-based compiler"
|
||||
"`ROCm CMake <https://github.com/ROCm/rocm-cmake>`_ ", "Tool", "Collection of CMake modules for common build and development tasks"
|
||||
":doc:`ROCm Data Center Tool <rdc:index>`", "Tool", "Simplifies administration and addresses key infrastructure challenges in AMD GPUs in cluster and data-center environments"
|
||||
"`ROCm Debug Agent (ROCdebug-agent) <https://github.com/ROCm/rocr_debug_agent/>`_ ", "Tool", "Prints the state of all AMD GPU wavefronts that caused a queue error by sending a SIGQUIT signal to the process while the program is running"
|
||||
|
||||
Reference in New Issue
Block a user