mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-07 22:03:58 -05:00
Main Docs: references of accelerator removal and change to GPU (#5495)
* Docs: references of accelerator removal and change to GPU Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
This commit is contained in:
@@ -46,4 +46,4 @@ ROCm SMI for ROCm 6.1.2
|
||||
#### Fixes
|
||||
|
||||
* Fixed an issue causing ROCm SMI to incorrectly report GPU utilization for RDNA3 GPUs. See the issue on [GitHub](https://github.com/ROCm/ROCm/issues/3112).
|
||||
* Fixed the parsing of `pp_od_clk_voltage` in `get_od_clk_volt_info` to work better with MI-series hardware.
|
||||
* Fixed the parsing of `pp_od_clk_voltage` in `get_od_clk_volt_info` to work better with MI-Series hardware.
|
||||
|
||||
@@ -33,10 +33,10 @@ Until a fix is provided, users should rely on ROCm v5.2.3 to support their SRIOV
|
||||
|
||||
#### AMD Instinct™ MI200 firmware updates
|
||||
|
||||
Customers cannot update the Integrated Firmware Image (IFWI) for AMD Instinct™ MI200 accelerators.
|
||||
Customers cannot update the Integrated Firmware Image (IFWI) for AMD Instinct™ MI200 GPUs.
|
||||
|
||||
An updated firmware maintenance bundle consisting of an installation tool and images specific to
|
||||
AMD Instinct™ MI200 accelerators is under planning and will be available soon.
|
||||
AMD Instinct™ MI200 GPUs is under planning and will be available soon.
|
||||
|
||||
#### Known issue with rocThrust and rocPRIM libraries
|
||||
|
||||
|
||||
@@ -50,12 +50,12 @@ fixed in this release.
|
||||
|
||||
#### AMD Instinct™ MI200 firmware IFWI maintenance update #3
|
||||
|
||||
This IFWI release fixes the following issue in AMD Instinct™ MI210/MI250 Accelerators.
|
||||
This IFWI release fixes the following issue in AMD Instinct™ MI210/MI250 GPUs.
|
||||
|
||||
After prolonged periods of operation, certain MI200 Instinct™ Accelerators may perform in a degraded
|
||||
After prolonged periods of operation, certain MI200 Instinct™ GPUs may perform in a degraded
|
||||
way resulting in application failures.
|
||||
|
||||
In this package, AMD delivers a new firmware version for MI200 GPU accelerators and a firmware
|
||||
In this package, AMD delivers a new firmware version for MI200 GPU GPUs and a firmware
|
||||
installation tool – AMD FW FLASH 1.2.
|
||||
|
||||
| GPU | Productionp part number | SKU | IFWI name |
|
||||
|
||||
@@ -10,10 +10,10 @@ New features include:
|
||||
* AddressSanitizer for host and device code (GPU) is now available as a beta
|
||||
|
||||
Note that ROCm 5.7.0 is EOS for MI50. 5.7 versions of ROCm are the last major releases in the ROCm 5
|
||||
series. This release is Linux-only.
|
||||
Series. This release is Linux-only.
|
||||
|
||||
:::{important}
|
||||
The next major ROCm release (ROCm 6.0) will not be backward compatible with the ROCm 5 series.
|
||||
The next major ROCm release (ROCm 6.0) will not be backward compatible with the ROCm 5 Series.
|
||||
Changes will include: splitting LLVM packages into more manageable sizes, changes to the HIP runtime
|
||||
API, splitting rocRAND and hipRAND into separate packages, and reorganizing our file structure.
|
||||
:::
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
|
||||
ROCm 6.0 is a major release with new performance optimizations, expanded frameworks and library
|
||||
support, and improved developer experience. This includes initial enablement of the AMD Instinct™
|
||||
MI300 series. Future releases will further enable and optimize this new platform. Key features include:
|
||||
MI300 Series. Future releases will further enable and optimize this new platform. Key features include:
|
||||
|
||||
* Improved performance in areas like lower precision math and attention layers.
|
||||
* New hipSPARSELt library to accelerate AI workloads via AMD's sparse matrix core technique.
|
||||
@@ -18,7 +18,7 @@ the [Changelog](https://rocm.docs.amd.com/en/docs-6.0.0/about/CHANGELOG.html).
|
||||
|
||||
### OS and GPU support changes
|
||||
|
||||
AMD Instinct™ MI300A and MI300X Accelerator support has been enabled for limited operating
|
||||
AMD Instinct™ MI300A and MI300X GPU support has been enabled for limited operating
|
||||
systems.
|
||||
|
||||
* Ubuntu 22.04.3 (MI300A and MI300X)
|
||||
|
||||
@@ -3,7 +3,7 @@ ROCm™ 6.1.1 introduces minor fixes and improvements to some tools and librarie
|
||||
|
||||
### OS support
|
||||
|
||||
* ROCm 6.1.1 now supports Oracle Linux. It has been tested against version 8.9 (kernel 5.15.0-205) with AMD Instinct MI300X accelerators.
|
||||
* ROCm 6.1.1 now supports Oracle Linux. It has been tested against version 8.9 (kernel 5.15.0-205) with AMD Instinct MI300X GPUs.
|
||||
|
||||
* ROCm 6.1.1 has been tested against a pre-release version of Ubuntu 22.04.5 (kernel: 5.15 [GA], 6.8 [HWE]).
|
||||
|
||||
|
||||
@@ -28,7 +28,7 @@ This section introduces notable new features and improvements in ROCm 6.2. See t
|
||||
ROCm 6.2.0 introduces the following new components to the ROCm software stack.
|
||||
|
||||
- **Omniperf** -- A kernel-level profiling tool for machine learning and high-performance computing (HPC) workloads
|
||||
running on AMD Instinct accelerators. Omniperf offers comprehensive profiling and advanced analysis via command line
|
||||
running on AMD Instinct GPUs. Omniperf offers comprehensive profiling and advanced analysis via command line
|
||||
or a GUI dashboard. For more information, see
|
||||
[Omniperf](https://rocm.docs.amd.com/projects/omniperf/en/latest).
|
||||
|
||||
@@ -141,7 +141,7 @@ For more information, see [Model quantization techniques](https://rocm.docs.amd.
|
||||
|
||||
#### Improved vLLM support
|
||||
|
||||
ROCm 6.2.0 enhances vLLM support for inference on AMD Instinct accelerators, adding
|
||||
ROCm 6.2.0 enhances vLLM support for inference on AMD Instinct GPUs, adding
|
||||
capabilities for `FP16`/`BF16` precision for LLMs, and `FP8` support for Llama.
|
||||
ROCm 6.2.0 adds support for the following vLLM features:
|
||||
|
||||
@@ -177,12 +177,12 @@ To enable these experimental new features, see
|
||||
Use the `rocm/vllm` branch when cloning the GitHub repo. The `vllm/ROCm_performance.md` document outlines
|
||||
all the accessible features, and the `vllm/Dockerfile.rocm` file can be used.
|
||||
|
||||
### Enhanced performance tuning on AMD Instinct accelerators
|
||||
### Enhanced performance tuning on AMD Instinct GPUs
|
||||
|
||||
ROCm is pre-tuned for high-performance computing workloads including large language models, generative AI, and scientific computing.
|
||||
The ROCm documentation provides comprehensive guidance on configuring your system for AMD Instinct accelerators. It includes
|
||||
The ROCm documentation provides comprehensive guidance on configuring your system for AMD Instinct GPUs. It includes
|
||||
detailed instructions on system settings and application tuning suggestions to help you fully leverage the capabilities of these
|
||||
accelerators for optimal performance. For more information, see
|
||||
GPUs for optimal performance. For more information, see
|
||||
[AMD MI300X tuning guides](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/tuning-guides/mi300x/index.html) and
|
||||
[AMD MI300A system optimization](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/system-optimization/mi300x.html).
|
||||
|
||||
|
||||
@@ -22,7 +22,7 @@ The following is a significant fix introduced in ROCm 6.2.2.
|
||||
|
||||
### Fixed Instinct MI300X error recovery failure
|
||||
|
||||
Improved the reliability of AMD Instinct MI300X accelerators in scenarios involving
|
||||
Improved the reliability of AMD Instinct MI300X GPUs in scenarios involving
|
||||
uncorrectable errors. Previously, error recovery did not occur as expected,
|
||||
potentially leaving the system in an undefined state. This fix ensures that error
|
||||
recovery functions as expected, maintaining system stability.
|
||||
|
||||
@@ -32,7 +32,7 @@ ROCm documentation continues to be updated to provide clearer and more comprehen
|
||||
a wider variety of user needs and use cases.
|
||||
|
||||
* Added a new GPU cluster networking guide. See
|
||||
[Cluster network performance validation for AMD Instinct accelerators](https://rocm.docs.amd.com/projects/gpu-cluster-networking/en/latest/index.html).
|
||||
[Cluster network performance validation for AMD Instinct GPUs](https://rocm.docs.amd.com/projects/gpu-cluster-networking/en/latest/index.html).
|
||||
|
||||
This documentation provides guidelines on validating network configurations
|
||||
in single-node and multi-node environments to attain optimal speed and bandwidth
|
||||
|
||||
@@ -138,7 +138,7 @@ wider variety of user needs and use cases.
|
||||
documentation](https://rocm.docs.amd.com/projects/Tensile/en/docs-6.3.0/src/index.html).
|
||||
|
||||
- New documentation has been added to explain the advantages of enabling the IOMMU in passthrough
|
||||
mode for Instinct accelerators and Radeon GPUs. See [Input-Output Memory Management
|
||||
mode for Instinct GPUs and Radeon GPUs. See [Input-Output Memory Management
|
||||
Unit](https://rocm.docs.amd.com/en/docs-6.3.0/conceptual/iommu.html).
|
||||
|
||||
- The HIP documentation has been updated and includes the following new topics:
|
||||
|
||||
@@ -26,9 +26,9 @@ documentation to verify compatibility and system requirements.
|
||||
The following are notable new features and improvements in ROCm 6.3.1. For changes to individual components, see
|
||||
[Detailed component changes](#detailed-component-changes).
|
||||
|
||||
### Per queue resiliency for Instinct MI300 accelerators
|
||||
### Per queue resiliency for Instinct MI300 GPUs
|
||||
|
||||
The AMDGPU driver now includes enhanced resiliency for misbehaving applications on AMD Instinct MI300 accelerators. This helps isolate the impact of misbehaving applications, ensuring other workloads running on the same accelerator are unaffected.
|
||||
The AMDGPU driver now includes enhanced resiliency for misbehaving applications on AMD Instinct MI300 GPUs. This helps isolate the impact of misbehaving applications, ensuring other workloads running on the same GPU are unaffected.
|
||||
|
||||
### ROCm Runfile Installer
|
||||
|
||||
@@ -38,7 +38,7 @@ ROCm 6.3.1 introduces the ROCm Runfile Installer, with initial support for Ubunt
|
||||
|
||||
ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.
|
||||
|
||||
* Added documentation on training a model with ROCm Megatron-LM. AMD offers a Docker image for MI300X accelerators
|
||||
* Added documentation on training a model with ROCm Megatron-LM. AMD offers a Docker image for MI300X GPUs
|
||||
containing essential components to get started, including ROCm libraries, PyTorch, and Megatron-LM utilities. See
|
||||
[Training a model using ROCm Megatron-LM](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/train-a-model.html)
|
||||
to get started.
|
||||
|
||||
@@ -45,9 +45,9 @@ bash -c "echo taskset -p \$\$"
|
||||
|
||||
See [issue #3493](https://github.com/ROCm/ROCm/issues/3493) on GitHub.
|
||||
|
||||
### Display issues on servers with Instinct MI300-series accelerators when loading AMDGPU driver
|
||||
### Display issues on servers with Instinct MI300-Series GPUs when loading AMDGPU driver
|
||||
|
||||
AMD Instinct MI300-series accelerators and third-party GPUs such as the Matrox G200 have an issue impacting video
|
||||
AMD Instinct MI300-Series GPUs and third-party GPUs such as the Matrox G200 have an issue impacting video
|
||||
output. The issue was reproduced on a Dell server model PowerEdge XE9680. Servers from other vendors utilizing Matrox
|
||||
G200 cards may be impacted as well. This issue was found with ROCm 6.2.0 but is present in older ROCm versions.
|
||||
|
||||
|
||||
@@ -5,7 +5,7 @@ individual components are listed in the [Detailed component changes](detailed-co
|
||||
|
||||
### Instinct MI300X GPU recovery failure on uncorrectable errors
|
||||
|
||||
For the AMD Instinct MI300X accelerator, GPU recovery resets triggered by uncorrectable errors (UE) might not complete
|
||||
For the AMD Instinct MI300X GPU, GPU recovery resets triggered by uncorrectable errors (UE) might not complete
|
||||
successfully, which can result in the system being left in an undefined state. A system reboot is needed to recover from
|
||||
this state. Additionally, error logging might fail in these situations, hindering diagnostics.
|
||||
|
||||
|
||||
@@ -5,13 +5,13 @@ issues related to individual components, review the [Detailed component changes]
|
||||
|
||||
### Instinct MI300X reports incorrect raw GPU timestamps
|
||||
|
||||
On MI300X accelerators, the command processor firmware reports incorrect raw GPU timestamps. This
|
||||
On MI300X GPUs, the command processor firmware reports incorrect raw GPU timestamps. This
|
||||
issue is under investigation and will be addressed in a future release. See [GitHub issue #4079](https://github.com/ROCm/ROCm/issues/4079).
|
||||
|
||||
### Instinct MI300 series: backward weights convolution performance issue
|
||||
### Instinct MI300 Series: backward weights convolution performance issue
|
||||
|
||||
A performance issue affects certain tensor shapes during backward weights convolution when using
|
||||
FP16 or FP32 data types on Instinct MI300 series accelerators. This issue will be addressed in a future ROCm release.
|
||||
FP16 or FP32 data types on Instinct MI300 Series GPUs. This issue will be addressed in a future ROCm release.
|
||||
See [GitHub issue #4080](https://github.com/ROCm/ROCm/issues/4080).
|
||||
|
||||
To mitigate the issue during model training, set the following environment variables:
|
||||
@@ -77,7 +77,7 @@ This issue will be addressed in a future ROCm release. See [GitHub issue #4085](
|
||||
|
||||
Canny edge detection kernels might access out-of-bounds memory locations while
|
||||
computing gradient intensities on edge pixels. This issue is isolated to
|
||||
Canny-specific use cases on Instinct MI300 series accelerators. This issue is
|
||||
Canny-specific use cases on Instinct MI300 Series GPUs. This issue is
|
||||
resolved in the [MIVisionX `develop` branch](https://github.com/ROCm/mivisionx)
|
||||
and will be part of a future ROCm release. See [GitHub issue #4086](https://github.com/ROCm/ROCm/issues/4086).
|
||||
|
||||
|
||||
@@ -3,9 +3,9 @@
|
||||
The following are previously known issues resolved in this release. For resolved issues related to
|
||||
individual components, review the [Detailed component changes](#detailed-component-changes).
|
||||
|
||||
### Instinct MI300 series: backward weights convolution performance issue
|
||||
### Instinct MI300 Series: backward weights convolution performance issue
|
||||
|
||||
Fixed a performance issue affecting certain tensor shapes during backward weights convolution when using FP16 or FP32 data types on Instinct MI300 series accelerators. See [GitHub issue #4080](https://github.com/ROCm/ROCm/issues/4080).
|
||||
Fixed a performance issue affecting certain tensor shapes during backward weights convolution when using FP16 or FP32 data types on Instinct MI300 Series GPUs. See [GitHub issue #4080](https://github.com/ROCm/ROCm/issues/4080).
|
||||
|
||||
### ROCm Compute Profiler and ROCm Systems Profiler post-upgrade issues
|
||||
|
||||
|
||||
@@ -19,7 +19,7 @@ This issue has been fixed in the ROCm 6.3.2 release. See [GitHub issue #4085](ht
|
||||
|
||||
An issue where Canny edge detection kernels accessed out-of-bounds memory locations while
|
||||
computing gradient intensities on edge pixels has been fixed. This issue was isolated to
|
||||
Canny-specific use cases on Instinct MI300 series accelerators. See [GitHub issue #4086](https://github.com/ROCm/ROCm/issues/4086).
|
||||
Canny-specific use cases on Instinct MI300 Series GPUs. See [GitHub issue #4086](https://github.com/ROCm/ROCm/issues/4086).
|
||||
|
||||
### AMD VCN instability with rocDecode
|
||||
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
## Operating system and hardware support changes
|
||||
|
||||
ROCm 6.3.1 adds support for Debian 12 (kernel: 6.1). Debian is supported only on AMD Instinct accelerators. See the installation instructions at [Debian native installation](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/install/native-install/debian.html).
|
||||
ROCm 6.3.1 adds support for Debian 12 (kernel: 6.1). Debian is supported only on AMD Instinct GPUs. See the installation instructions at [Debian native installation](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/install/native-install/debian.html).
|
||||
|
||||
ROCm 6.3.1 enables support for AMD Instinct MI325X accelerator. For more information, see [AMD Instinct™ MI325X Accelerators](https://www.amd.com/en/products/accelerators/instinct/mi300/mi325x.html).
|
||||
ROCm 6.3.1 enables support for AMD Instinct MI325X GPU. For more information, see [AMD Instinct™ MI325X GPUs](https://www.amd.com/en/products/accelerators/instinct/mi300/mi325x.html).
|
||||
|
||||
See the [Compatibility
|
||||
matrix](https://rocm.docs.amd.com/en/docs-6.3.1/compatibility/compatibility-matrix.html)
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
## Operating system and hardware support changes
|
||||
|
||||
ROCm 6.3.2 adds support for Azure Linux 3.0 (kernel: 6.6). Azure Linux is supported only on AMD Instinct accelerators. For more information, see [Azure Linux installation](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html).
|
||||
ROCm 6.3.2 adds support for Azure Linux 3.0 (kernel: 6.6). Azure Linux is supported only on AMD Instinct GPUs. For more information, see [Azure Linux installation](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html).
|
||||
|
||||
See the [Compatibility
|
||||
matrix](https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html)
|
||||
|
||||
Reference in New Issue
Block a user