Main Docs: references of accelerator removal and change to GPU (#5495)

* Docs: references of accelerator removal and change to GPU

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
This commit is contained in:
anisha-amd
2025-10-16 11:22:10 -04:00
committed by GitHub
parent 5cb6bfe151
commit a98236a4e3
95 changed files with 414 additions and 422 deletions

View File

@@ -46,4 +46,4 @@ ROCm SMI for ROCm 6.1.2
#### Fixes
* Fixed an issue causing ROCm SMI to incorrectly report GPU utilization for RDNA3 GPUs. See the issue on [GitHub](https://github.com/ROCm/ROCm/issues/3112).
* Fixed the parsing of `pp_od_clk_voltage` in `get_od_clk_volt_info` to work better with MI-series hardware.
* Fixed the parsing of `pp_od_clk_voltage` in `get_od_clk_volt_info` to work better with MI-Series hardware.

View File

@@ -33,10 +33,10 @@ Until a fix is provided, users should rely on ROCm v5.2.3 to support their SRIOV
#### AMD Instinct™ MI200 firmware updates
Customers cannot update the Integrated Firmware Image (IFWI) for AMD Instinct™ MI200 accelerators.
Customers cannot update the Integrated Firmware Image (IFWI) for AMD Instinct™ MI200 GPUs.
An updated firmware maintenance bundle consisting of an installation tool and images specific to
AMD Instinct™ MI200 accelerators is under planning and will be available soon.
AMD Instinct™ MI200 GPUs is under planning and will be available soon.
#### Known issue with rocThrust and rocPRIM libraries

View File

@@ -50,12 +50,12 @@ fixed in this release.
#### AMD Instinct™ MI200 firmware IFWI maintenance update #3
This IFWI release fixes the following issue in AMD Instinct™ MI210/MI250 Accelerators.
This IFWI release fixes the following issue in AMD Instinct™ MI210/MI250 GPUs.
After prolonged periods of operation, certain MI200 Instinct™ Accelerators may perform in a degraded
After prolonged periods of operation, certain MI200 Instinct™ GPUs may perform in a degraded
way resulting in application failures.
In this package, AMD delivers a new firmware version for MI200 GPU accelerators and a firmware
In this package, AMD delivers a new firmware version for MI200 GPU GPUs and a firmware
installation tool AMD FW FLASH 1.2.
| GPU | Productionp part number | SKU | IFWI name |

View File

@@ -10,10 +10,10 @@ New features include:
* AddressSanitizer for host and device code (GPU) is now available as a beta
Note that ROCm 5.7.0 is EOS for MI50. 5.7 versions of ROCm are the last major releases in the ROCm 5
series. This release is Linux-only.
Series. This release is Linux-only.
:::{important}
The next major ROCm release (ROCm 6.0) will not be backward compatible with the ROCm 5 series.
The next major ROCm release (ROCm 6.0) will not be backward compatible with the ROCm 5 Series.
Changes will include: splitting LLVM packages into more manageable sizes, changes to the HIP runtime
API, splitting rocRAND and hipRAND into separate packages, and reorganizing our file structure.
:::

View File

@@ -3,7 +3,7 @@
ROCm 6.0 is a major release with new performance optimizations, expanded frameworks and library
support, and improved developer experience. This includes initial enablement of the AMD Instinct™
MI300 series.Future releases will further enable and optimize this new platform. Key features include:
MI300 Series.Future releases will further enable and optimize this new platform. Key features include:
* Improved performance in areas like lower precision math and attention layers.
* New hipSPARSELt library to accelerate AI workloads via AMD's sparse matrix core technique.
@@ -18,7 +18,7 @@ the [Changelog](https://rocm.docs.amd.com/en/docs-6.0.0/about/CHANGELOG.html).
### OS and GPU support changes
AMD Instinct™ MI300A and MI300X Accelerator support has been enabled for limited operating
AMD Instinct™ MI300A and MI300X GPU support has been enabled for limited operating
systems.
* Ubuntu 22.04.3 (MI300A and MI300X)

View File

@@ -3,7 +3,7 @@ ROCm™ 6.1.1 introduces minor fixes and improvements to some tools and librarie
### OS support
* ROCm 6.1.1 now supports Oracle Linux. It has been tested against version 8.9 (kernel 5.15.0-205) with AMD Instinct MI300X accelerators.
* ROCm 6.1.1 now supports Oracle Linux. It has been tested against version 8.9 (kernel 5.15.0-205) with AMD Instinct MI300X GPUs.
* ROCm 6.1.1 has been tested against a pre-release version of Ubuntu 22.04.5 (kernel: 5.15 [GA], 6.8 [HWE]).

View File

@@ -28,7 +28,7 @@ This section introduces notable new features and improvements in ROCm 6.2. See t
ROCm 6.2.0 introduces the following new components to the ROCm software stack.
- **Omniperf** -- A kernel-level profiling tool for machine learning and high-performance computing (HPC) workloads
running on AMD Instinct accelerators. Omniperf offers comprehensive profiling and advanced analysis via command line
running on AMD Instinct GPUs. Omniperf offers comprehensive profiling and advanced analysis via command line
or a GUI dashboard. For more information, see
[Omniperf](https://rocm.docs.amd.com/projects/omniperf/en/latest).
@@ -141,7 +141,7 @@ For more information, see [Model quantization techniques](https://rocm.docs.amd.
#### Improved vLLM support
ROCm 6.2.0 enhances vLLM support for inference on AMD Instinct accelerators, adding
ROCm 6.2.0 enhances vLLM support for inference on AMD Instinct GPUs, adding
capabilities for `FP16`/`BF16` precision for LLMs, and `FP8` support for Llama.
ROCm 6.2.0 adds support for the following vLLM features:
@@ -177,12 +177,12 @@ To enable these experimental new features, see
Use the `rocm/vllm` branch when cloning the GitHub repo. The `vllm/ROCm_performance.md` document outlines
all the accessible features, and the `vllm/Dockerfile.rocm` file can be used.
### Enhanced performance tuning on AMD Instinct accelerators
### Enhanced performance tuning on AMD Instinct GPUs
ROCm is pre-tuned for high-performance computing workloads including large language models, generative AI, and scientific computing.
The ROCm documentation provides comprehensive guidance on configuring your system for AMD Instinct accelerators. It includes
The ROCm documentation provides comprehensive guidance on configuring your system for AMD Instinct GPUs. It includes
detailed instructions on system settings and application tuning suggestions to help you fully leverage the capabilities of these
accelerators for optimal performance. For more information, see
GPUs for optimal performance. For more information, see
[AMD MI300X tuning guides](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/tuning-guides/mi300x/index.html) and
[AMD MI300A system optimization](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/system-optimization/mi300x.html).

View File

@@ -22,7 +22,7 @@ The following is a significant fix introduced in ROCm 6.2.2.
### Fixed Instinct MI300X error recovery failure
Improved the reliability of AMD Instinct MI300X accelerators in scenarios involving
Improved the reliability of AMD Instinct MI300X GPUs in scenarios involving
uncorrectable errors. Previously, error recovery did not occur as expected,
potentially leaving the system in an undefined state. This fix ensures that error
recovery functions as expected, maintaining system stability.

View File

@@ -32,7 +32,7 @@ ROCm documentation continues to be updated to provide clearer and more comprehen
a wider variety of user needs and use cases.
* Added a new GPU cluster networking guide. See
[Cluster network performance validation for AMD Instinct accelerators](https://rocm.docs.amd.com/projects/gpu-cluster-networking/en/latest/index.html).
[Cluster network performance validation for AMD Instinct GPUs](https://rocm.docs.amd.com/projects/gpu-cluster-networking/en/latest/index.html).
This documentation provides guidelines on validating network configurations
in single-node and multi-node environments to attain optimal speed and bandwidth

View File

@@ -138,7 +138,7 @@ wider variety of user needs and use cases.
documentation](https://rocm.docs.amd.com/projects/Tensile/en/docs-6.3.0/src/index.html).
- New documentation has been added to explain the advantages of enabling the IOMMU in passthrough
mode for Instinct accelerators and Radeon GPUs. See [Input-Output Memory Management
mode for Instinct GPUs and Radeon GPUs. See [Input-Output Memory Management
Unit](https://rocm.docs.amd.com/en/docs-6.3.0/conceptual/iommu.html).
- The HIP documentation has been updated and includes the following new topics:

View File

@@ -26,9 +26,9 @@ documentation to verify compatibility and system requirements.
The following are notable new features and improvements in ROCm 6.3.1. For changes to individual components, see
[Detailed component changes](#detailed-component-changes).
### Per queue resiliency for Instinct MI300 accelerators
### Per queue resiliency for Instinct MI300 GPUs
The AMDGPU driver now includes enhanced resiliency for misbehaving applications on AMD Instinct MI300 accelerators. This helps isolate the impact of misbehaving applications, ensuring other workloads running on the same accelerator are unaffected.
The AMDGPU driver now includes enhanced resiliency for misbehaving applications on AMD Instinct MI300 GPUs. This helps isolate the impact of misbehaving applications, ensuring other workloads running on the same GPU are unaffected.
### ROCm Runfile Installer
@@ -38,7 +38,7 @@ ROCm 6.3.1 introduces the ROCm Runfile Installer, with initial support for Ubunt
ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.
* Added documentation on training a model with ROCm Megatron-LM. AMD offers a Docker image for MI300X accelerators
* Added documentation on training a model with ROCm Megatron-LM. AMD offers a Docker image for MI300X GPUs
containing essential components to get started, including ROCm libraries, PyTorch, and Megatron-LM utilities. See
[Training a model using ROCm Megatron-LM](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/train-a-model.html)
to get started.

View File

@@ -45,9 +45,9 @@ bash -c "echo taskset -p \$\$"
See [issue #3493](https://github.com/ROCm/ROCm/issues/3493) on GitHub.
### Display issues on servers with Instinct MI300-series accelerators when loading AMDGPU driver
### Display issues on servers with Instinct MI300-Series GPUs when loading AMDGPU driver
AMD Instinct MI300-series accelerators and third-party GPUs such as the Matrox G200 have an issue impacting video
AMD Instinct MI300-Series GPUs and third-party GPUs such as the Matrox G200 have an issue impacting video
output. The issue was reproduced on a Dell server model PowerEdge XE9680. Servers from other vendors utilizing Matrox
G200 cards may be impacted as well. This issue was found with ROCm 6.2.0 but is present in older ROCm versions.

View File

@@ -5,7 +5,7 @@ individual components are listed in the [Detailed component changes](detailed-co
### Instinct MI300X GPU recovery failure on uncorrectable errors
For the AMD Instinct MI300X accelerator, GPU recovery resets triggered by uncorrectable errors (UE) might not complete
For the AMD Instinct MI300X GPU, GPU recovery resets triggered by uncorrectable errors (UE) might not complete
successfully, which can result in the system being left in an undefined state. A system reboot is needed to recover from
this state. Additionally, error logging might fail in these situations, hindering diagnostics.

View File

@@ -5,13 +5,13 @@ issues related to individual components, review the [Detailed component changes]
### Instinct MI300X reports incorrect raw GPU timestamps
On MI300X accelerators, the command processor firmware reports incorrect raw GPU timestamps. This
On MI300X GPUs, the command processor firmware reports incorrect raw GPU timestamps. This
issue is under investigation and will be addressed in a future release. See [GitHub issue #4079](https://github.com/ROCm/ROCm/issues/4079).
### Instinct MI300 series: backward weights convolution performance issue
### Instinct MI300 Series: backward weights convolution performance issue
A performance issue affects certain tensor shapes during backward weights convolution when using
FP16 or FP32 data types on Instinct MI300 series accelerators. This issue will be addressed in a future ROCm release.
FP16 or FP32 data types on Instinct MI300 Series GPUs. This issue will be addressed in a future ROCm release.
See [GitHub issue #4080](https://github.com/ROCm/ROCm/issues/4080).
To mitigate the issue during model training, set the following environment variables:
@@ -77,7 +77,7 @@ This issue will be addressed in a future ROCm release. See [GitHub issue #4085](
Canny edge detection kernels might access out-of-bounds memory locations while
computing gradient intensities on edge pixels. This issue is isolated to
Canny-specific use cases on Instinct MI300 series accelerators. This issue is
Canny-specific use cases on Instinct MI300 Series GPUs. This issue is
resolved in the [MIVisionX `develop` branch](https://github.com/ROCm/mivisionx)
and will be part of a future ROCm release. See [GitHub issue #4086](https://github.com/ROCm/ROCm/issues/4086).

View File

@@ -3,9 +3,9 @@
The following are previously known issues resolved in this release. For resolved issues related to
individual components, review the [Detailed component changes](#detailed-component-changes).
### Instinct MI300 series: backward weights convolution performance issue
### Instinct MI300 Series: backward weights convolution performance issue
Fixed a performance issue affecting certain tensor shapes during backward weights convolution when using FP16 or FP32 data types on Instinct MI300 series accelerators. See [GitHub issue #4080](https://github.com/ROCm/ROCm/issues/4080).
Fixed a performance issue affecting certain tensor shapes during backward weights convolution when using FP16 or FP32 data types on Instinct MI300 Series GPUs. See [GitHub issue #4080](https://github.com/ROCm/ROCm/issues/4080).
### ROCm Compute Profiler and ROCm Systems Profiler post-upgrade issues

View File

@@ -19,7 +19,7 @@ This issue has been fixed in the ROCm 6.3.2 release. See [GitHub issue #4085](ht
An issue where Canny edge detection kernels accessed out-of-bounds memory locations while
computing gradient intensities on edge pixels has been fixed. This issue was isolated to
Canny-specific use cases on Instinct MI300 series accelerators. See [GitHub issue #4086](https://github.com/ROCm/ROCm/issues/4086).
Canny-specific use cases on Instinct MI300 Series GPUs. See [GitHub issue #4086](https://github.com/ROCm/ROCm/issues/4086).
### AMD VCN instability with rocDecode

View File

@@ -1,8 +1,8 @@
## Operating system and hardware support changes
ROCm 6.3.1 adds support for Debian 12 (kernel: 6.1). Debian is supported only on AMD Instinct accelerators. See the installation instructions at [Debian native installation](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/install/native-install/debian.html).
ROCm 6.3.1 adds support for Debian 12 (kernel: 6.1). Debian is supported only on AMD Instinct GPUs. See the installation instructions at [Debian native installation](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/install/native-install/debian.html).
ROCm 6.3.1 enables support for AMD Instinct MI325X accelerator. For more information, see [AMD Instinct™ MI325X Accelerators](https://www.amd.com/en/products/accelerators/instinct/mi300/mi325x.html).
ROCm 6.3.1 enables support for AMD Instinct MI325X GPU. For more information, see [AMD Instinct™ MI325X GPUs](https://www.amd.com/en/products/accelerators/instinct/mi300/mi325x.html).
See the [Compatibility
matrix](https://rocm.docs.amd.com/en/docs-6.3.1/compatibility/compatibility-matrix.html)

View File

@@ -1,6 +1,6 @@
## Operating system and hardware support changes
ROCm 6.3.2 adds support for Azure Linux 3.0 (kernel: 6.6). Azure Linux is supported only on AMD Instinct accelerators. For more information, see [Azure Linux installation](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html).
ROCm 6.3.2 adds support for Azure Linux 3.0 (kernel: 6.6). Azure Linux is supported only on AMD Instinct GPUs. For more information, see [Azure Linux installation](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html).
See the [Compatibility
matrix](https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html)