mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-11 15:47:59 -05:00
Compare commits
13 Commits
docs/7.0-a
...
docs/7.0-r
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
fa4b6765b3 | ||
|
|
58b749b326 | ||
|
|
13abb4086e | ||
|
|
e4bf4302e5 | ||
|
|
579e72ad4c | ||
|
|
f7e670d36c | ||
|
|
6cebe026b1 | ||
|
|
2134313c4a | ||
|
|
08e5cb4321 | ||
|
|
22732e81a2 | ||
|
|
975c4036a4 | ||
|
|
6df8002b08 | ||
|
|
0f261049bb |
@@ -1,9 +1,24 @@
|
||||
IPC
|
||||
SPIR
|
||||
VFS
|
||||
builtins
|
||||
crosslane
|
||||
frontend
|
||||
gpt
|
||||
openai
|
||||
oss
|
||||
MXFP
|
||||
SGLang
|
||||
VMware
|
||||
amd
|
||||
bdf
|
||||
compatiblity
|
||||
csv
|
||||
enum
|
||||
json
|
||||
subproject
|
||||
ROCpd
|
||||
rocpd
|
||||
STL
|
||||
XCCs
|
||||
chiplets
|
||||
hipRTC
|
||||
nvRTC
|
||||
warpSize
|
||||
Datacenter
|
||||
GST
|
||||
IET
|
||||
|
||||
7348
CHANGELOG.md
7348
CHANGELOG.md
File diff suppressed because it is too large
Load Diff
657
RELEASE.md
657
RELEASE.md
@@ -1,657 +0,0 @@
|
||||
<!-- Do not edit this file! -->
|
||||
<!-- This file is autogenerated with -->
|
||||
<!-- tools/autotag/tag_script.py -->
|
||||
<!-- Disable lints since this is an auto-generated file. -->
|
||||
<!-- markdownlint-disable blanks-around-headers -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
<!-- markdownlint-disable no-blanks-blockquote -->
|
||||
<!-- markdownlint-disable ul-indent -->
|
||||
<!-- markdownlint-disable no-trailing-spaces -->
|
||||
<!-- markdownlint-disable reference-links-images -->
|
||||
<!-- markdownlint-disable no-missing-space-atx -->
|
||||
<!-- spellcheck-disable -->
|
||||
# ROCm 6.4.1 release notes
|
||||
|
||||
The release notes provide a summary of notable changes since the previous ROCm release.
|
||||
|
||||
- [Release highlights](#release-highlights)
|
||||
|
||||
- [Operating system and hardware support changes](#operating-system-and-hardware-support-changes)
|
||||
|
||||
- [ROCm components versioning](#rocm-components)
|
||||
|
||||
- [Detailed component changes](#detailed-component-changes)
|
||||
|
||||
- [ROCm known issues](#rocm-known-issues)
|
||||
|
||||
- [ROCm upcoming changes](#rocm-upcoming-changes)
|
||||
|
||||
```{note}
|
||||
If you’re using Radeon™ PRO or Radeon GPUs in a workstation setting with a display connected, see the [Use ROCm on Radeon GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/docs/compatibility/native_linux/native_linux_compatibility.html)
|
||||
documentation to verify compatibility and system requirements.
|
||||
```
|
||||
|
||||
## Release highlights
|
||||
|
||||
The following are notable new features and improvements in ROCm 6.4.1. For changes to individual components, see
|
||||
[Detailed component changes](#detailed-component-changes).
|
||||
|
||||
### Addition of DPX partition mode under NPS2 memory mode
|
||||
|
||||
AMD Instinct MI300X now supports DPX partition mode under NPS2 memory mode. For more partitioning information, see the [Deep dive into the MI300 compute and memory partition modes](https://rocm.blogs.amd.com/software-tools-optimization/compute-memory-modes/README.html) blog and [AMD Instinct MI300X system optimization](https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html#change-gpu-partition-modes).
|
||||
|
||||
### Introducing the ROCm Data Science toolkit
|
||||
|
||||
The ROCm Data Science toolkit (or ROCm-DS) is an open-source software collection for high-performance data science applications built on the core ROCm platform. You can leverage ROCm-DS to accelerate both new and existing data science workloads, allowing you to execute intensive applications with larger datasets at lightning speed. ROCm-DS is in an early access state. Running production workloads is not recommended. For more information, see [AMD ROCm-DS Documentation](https://rocm.docs.amd.com/projects/rocm-ds/en/latest/index.html).
|
||||
|
||||
### ROCm Offline Installer Creator updates
|
||||
|
||||
The ROCm Offline Installer Creator 6.4.1 now allows you to use the SPACEBAR or ENTER keys for menu item selection in the GUI. It also adds support for Debian 12 and fixes an issue for “full” mode RHEL offline installer creation, where GDM packages were uninstalled during offline installation. See [ROCm Offline Installer Creator](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/rocm-offline-installer.html) for more information.
|
||||
|
||||
### ROCm Runfile Installer updates
|
||||
|
||||
The ROCm Runfile Installer 6.4.1 adds the following improvements:
|
||||
- Relaxed version checks for installation on different distributions. Provided the dependencies are not installed by the Runfile Installer, you can target installation for a different path from the host system running the installer. For example, the installer can run on a system using Ubuntu 22.04 and install to a partition/system that is using Ubuntu 24.04.
|
||||
- Performance improvements for detecting a previous ROCm install.
|
||||
- Removal of the extra `opt` directory created for the target during the ROCm installation. For example, installing to `target=/home/amd` now installs ROCm to `/home/amd/rocm-6.4.1` and not `/home/amd/opt/rocm-6.4.1`. For installs using `target=/`, the installation will continue to use `/opt/`.
|
||||
- The Runfile Installer can be used to uninstall any Runfile-based installation of the driver.
|
||||
- In the CLI interface, the `postrocm` argument can now be run separately from the `rocm` argument. In cases where `postrocm` was missed from the initial ROCm install, `postrocm` can now be run on the same target folder. For example, if you installed ROCm 6.4.1 using `install.run target=/myrocm rocm`, you can run the post-installation separately using the command `install.run target=/myrocm/rocm-6.4.1 postrocm`.
|
||||
|
||||
For more information, see [ROCm Runfile Installer](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/rocm-runfile-installer.html).
|
||||
|
||||
### ROCm documentation updates
|
||||
|
||||
ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.
|
||||
|
||||
* [Tutorials for AI developers](https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/) have been expanded with five new tutorials. These tutorials are Jupyter notebook-based, easy-to-follow documents. They are ideal for AI developers who want to learn about specific topics, including inference, fine-tuning, and training. For more information about the changes, see [Changelog for the AI Developer Hub](https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/changelog.html).
|
||||
* The [Training a model with LLM Foundry](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/mpt-llm-foundry.html) performance testing guide has been added. This guide describes how to use the preconfigured [ROCm/pytorch-training](https://hub.docker.com/layers/rocm/pytorch-training/v25.5/images/sha256-d47850a9b25b4a7151f796a8d24d55ea17bba545573f0d50d54d3852f96ecde5) training environment and [https://github.com/ROCm/MAD](https://github.com/ROCm/MAD) to test the training performance of the LLM Foundry framework on AMD Instinct MI325X and MI300X accelerators using the [MPT-30B](https://huggingface.co/mosaicml/mpt-30b) model.
|
||||
* The [Training a model with PyTorch](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.html) performance testing guide has been updated to feature the latest [ROCm/pytorch-training](https://hub.docker.com/layers/rocm/pytorch-training/v25.5/images/sha256-d47850a9b25b4a7151f796a8d24d55ea17bba545573f0d50d54d3852f96ecde5) Docker image (a preconfigured training environment with ROCm and PyTorch). Support for [Llama 3.3 70B](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) has been added.
|
||||
* The [Training a model with JAX MaxText](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/jax-maxtext.html) performance testing guide has been updated to feature the latest [ROCm/jax-training](https://hub.docker.com/layers/rocm/jax-training/maxtext-v25.5/images/sha256-4e0516358a227cae8f552fb866ec07e2edcf244756f02e7b40212abfbab5217b) Docker image (a preconfigured training environment with ROCm, JAX, and [MaxText](https://github.com/AI-Hypercomputer/maxtext)). Support for [Llama 3.3 70B](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) has been added.
|
||||
* The [vLLM inference performance testing](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference/vllm-benchmark.html?model=pyt_vllm_qwq-32b) guide has been updated to feature the latest [ROCm/vLLM](https://hub.docker.com/layers/rocm/vllm/latest/images/sha256-5c8b4436dd0464119d9df2b44c745fadf81512f18ffb2f4b5dc235c71ebe26b4) Docker image (a preconfigured environment for inference with ROCm and [vLLM](https://docs.vllm.ai/en/latest/)). Support for the [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) model has been added.
|
||||
* The [PyTorch inference performance testing](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference/pytorch-inference-benchmark.html?model=pyt_clip_inference) guide has been added, featuring the [ROCm/PyTorch](https://hub.docker.com/layers/rocm/pytorch/latest/images/sha256-ab1d350b818b90123cfda31363019d11c0d41a8f12a19e3cb2cb40cf0261137d) Docker image (a preconfigured inference environment with ROCm and PyTorch) with initial support for the [CLIP](https://huggingface.co/laion/CLIP-ViT-B-32-laion2B-s34B-b79K) and [Chai-1](https://huggingface.co/chaidiscovery/chai-1) models.
|
||||
|
||||
## Operating system and hardware support changes
|
||||
|
||||
ROCm 6.4.1 introduces support for the RDNA4 architecture-based [Radeon AI PRO
|
||||
R9700](https://www.amd.com/en/products/graphics/workstations/radeon-ai-pro/ai-9000-series/amd-radeon-ai-pro-r9700.html),
|
||||
[Radeon RX 9070](https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9070.html),
|
||||
[Radeon RX 9070 XT](https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9070xt.html),
|
||||
Radeon RX 9070 GRE, and
|
||||
[Radeon RX 9060 XT](https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9060xt.html) GPUs
|
||||
for compute workloads. It also adds support for RDNA3 architecture-based [Radeon PRO W7700](https://www.amd.com/en/products/graphics/workstations/radeon-pro/w7700.html) and [Radeon RX 7800 XT](https://www.amd.com/en/products/graphics/desktops/radeon/7000-series/amd-radeon-rx-7800-xt.html) GPUs. These GPUs are supported on Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.6, RHEL 9.5, and RHEL 9.4.
|
||||
For details, see the full list of [Supported GPUs
|
||||
(Linux)](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-gpus).
|
||||
|
||||
See the [Compatibility
|
||||
matrix](../../docs/compatibility/compatibility-matrix.rst)
|
||||
for more information about operating system and hardware compatibility.
|
||||
|
||||
## ROCm components
|
||||
|
||||
The following table lists the versions of ROCm components for ROCm 6.4.1, including any version
|
||||
changes from 6.4.0 to 6.4.1. Click the component's updated version to go to a list of its changes.
|
||||
Click {fab}`github` to go to the component's source code on GitHub.
|
||||
|
||||
<div class="pst-scrollable-table-container">
|
||||
<table id="rocm-rn-components" class="table">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Category</th>
|
||||
<th>Group</th>
|
||||
<th>Name</th>
|
||||
<th>Version</th>
|
||||
<th></th>
|
||||
</tr>
|
||||
</thead>
|
||||
<colgroup>
|
||||
<col span="1">
|
||||
<col span="1">
|
||||
</colgroup>
|
||||
<tbody class="rocm-components-libs rocm-components-ml">
|
||||
<tr>
|
||||
<th rowspan="9">Libraries</th>
|
||||
<th rowspan="9">Machine learning and computer vision</th>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/composable_kernel/en/docs-6.4.1/index.html">Composable Kernel</a></td>
|
||||
<td>1.1.0</td>
|
||||
<td><a href="https://github.com/ROCm/composable_kernel"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/AMDMIGraphX/en/docs-6.4.1/index.html">MIGraphX</a></td>
|
||||
<td>2.12.0</td>
|
||||
<td><a href="https://github.com/ROCm/AMDMIGraphX"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/MIOpen/en/docs-6.4.1/index.html">MIOpen</a></td>
|
||||
<td>3.4.0</td>
|
||||
<td><a href="https://github.com/ROCm/MIOpen"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/MIVisionX/en/docs-6.4.1/index.html">MIVisionX</a></td>
|
||||
<td>3.2.0</td>
|
||||
<td><a href="https://github.com/ROCm/MIVisionX"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocAL/en/docs-6.4.1/index.html">rocAL</a></td>
|
||||
<td>2.2.0</td>
|
||||
<td><a href="https://github.com/ROCm/rocAL"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocDecode/en/docs-6.4.1/index.html">rocDecode</a></td>
|
||||
<td>0.10.0</td>
|
||||
<td><a href="https://github.com/ROCm/rocDecode"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocJPEG/en/docs-6.4.1/index.html">rocJPEG</a></td>
|
||||
<td>0.8.0</td>
|
||||
<td><a href="https://github.com/ROCm/rocJPEG"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocPyDecode/en/docs-6.4.1/index.html">rocPyDecode</a></td>
|
||||
<td>0.3.1</td>
|
||||
<td><a href="https://github.com/ROCm/rocPyDecode"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rpp/en/docs-6.4.1/index.html">RPP</a></td>
|
||||
<td>1.9.10</td>
|
||||
<td><a href="https://github.com/ROCm/rpp"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
<tbody class="rocm-components-libs rocm-components-communication tbody-reverse-zebra">
|
||||
<tr>
|
||||
<th rowspan="2"></th>
|
||||
<th rowspan="2">Communication</th>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rccl/en/docs-6.4.1/index.html">RCCL</a></td>
|
||||
<td>2.22.3 ⇒ <a href="#rccl-2-22-3">2.22.3</td>
|
||||
<td><a href="https://github.com/ROCm/rccl"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocSHMEM/en/docs-6.4.1/index.html">rocSHMEM</a></td>
|
||||
<td>2.0.0</td>
|
||||
<td><a href="https://github.com/ROCm/rocSHMEM"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
<tbody class="rocm-components-libs rocm-components-math tbody-reverse-zebra">
|
||||
<tr>
|
||||
<th rowspan="16"></th>
|
||||
<th rowspan="16">Math</th>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/hipBLAS/en/docs-6.4.1/index.html">hipBLAS</a></td>
|
||||
<td>2.4.0</td>
|
||||
<td><a href="https://github.com/ROCm/hipBLAS"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/hipBLASLt/en/docs-6.4.1/index.html">hipBLASLt</a></td>
|
||||
<td>0.12.0 ⇒ <a href="#hipblaslt-0-12-1">0.12.1</td>
|
||||
<td><a href="https://github.com/ROCm/hipBLASLt"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/hipFFT/en/docs-6.4.1/index.html">hipFFT</a></td>
|
||||
<td>1.0.18</td>
|
||||
<td><a href="https://github.com/ROCm/hipFFT"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/hipfort/en/docs-6.4.1/index.html">hipfort</a></td>
|
||||
<td>0.6.0</td>
|
||||
<td><a href="https://github.com/ROCm/hipfort"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/hipRAND/en/docs-6.4.1/index.html">hipRAND</a></td>
|
||||
<td>2.12.0</td>
|
||||
<td><a href="https://github.com/ROCm/hipRAND"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/hipSOLVER/en/docs-6.4.1/index.html">hipSOLVER</a></td>
|
||||
<td>2.4.0</td>
|
||||
<td><a href="https://github.com/ROCm/hipSOLVER"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/hipSPARSE/en/docs-6.4.1/index.html">hipSPARSE</a></td>
|
||||
<td>3.2.0</td>
|
||||
<td><a href="https://github.com/ROCm/hipSPARSE"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/hipSPARSELt/en/docs-6.4.1/index.html">hipSPARSELt</a></td>
|
||||
<td>0.2.3</td>
|
||||
<td><a href="https://github.com/ROCm/hipSPARSELt"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocALUTION/en/docs-6.4.1/index.html">rocALUTION</a></td>
|
||||
<td>3.2.2 ⇒ <a href="#rocalution-3-2-3">3.2.3</td></td>
|
||||
<td><a href="https://github.com/ROCm/rocALUTION"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocBLAS/en/docs-6.4.1/index.html">rocBLAS</a></td>
|
||||
<td>4.4.0</td>
|
||||
<td><a href="https://github.com/ROCm/rocBLAS"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocFFT/en/docs-6.4.1/index.html">rocFFT</a></td>
|
||||
<td>1.0.32</td>
|
||||
<td><a href="https://github.com/ROCm/rocFFT"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocRAND/en/docs-6.4.1/index.html">rocRAND</a></td>
|
||||
<td>3.3.0</td>
|
||||
<td><a href="https://github.com/ROCm/rocRAND"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocSOLVER/en/docs-6.4.1/index.html">rocSOLVER</a></td>
|
||||
<td>3.28.0</td>
|
||||
<td><a href="https://github.com/ROCm/rocSOLVER"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocSPARSE/en/docs-6.4.1/index.html">rocSPARSE</a></td>
|
||||
<td>3.4.0</td>
|
||||
<td><a href="https://github.com/ROCm/rocSPARSE"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocWMMA/en/docs-6.4.1/index.html">rocWMMA</a></td>
|
||||
<td>1.7.0</td>
|
||||
<td><a href="https://github.com/ROCm/rocWMMA"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/Tensile/en/docs-6.4.1/src/index.html">Tensile</a></td>
|
||||
<td>4.43.0</td>
|
||||
<td><a href="https://github.com/ROCm/Tensile"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
<tbody class="rocm-components-libs rocm-components-primitives tbody-reverse-zebra">
|
||||
<tr>
|
||||
<th rowspan="4"></th>
|
||||
<th rowspan="4">Primitives</th>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/hipCUB/en/docs-6.4.1/index.html">hipCUB</a></td>
|
||||
<td>3.4.0</td>
|
||||
<td><a href="https://github.com/ROCm/hipCUB"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/hipTensor/en/docs-6.4.1/index.html">hipTensor</a></td>
|
||||
<td>1.5.0</td>
|
||||
<td><a href="https://github.com/ROCm/hipTensor"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocPRIM/en/docs-6.4.1/index.html">rocPRIM</a></td>
|
||||
<td>3.4.0</td>
|
||||
<td><a href="https://github.com/ROCm/rocPRIM"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocThrust/en/docs-6.4.1/index.html">rocThrust</a></td>
|
||||
<td>3.3.0</td>
|
||||
<td><a href="https://github.com/ROCm/rocThrust"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
<tbody class="rocm-components-tools rocm-components-system tbody-reverse-zebra">
|
||||
<tr>
|
||||
<th rowspan="7">Tools</th>
|
||||
<th rowspan="7">System management</th>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/amdsmi/en/docs-6.4.1/index.html">AMD SMI</a></td>
|
||||
<td>25.3.0 ⇒ <a href="#amd-smi-25-4-2">25.4.2</a></td>
|
||||
<td><a href="https://github.com/ROCm/amdsmi"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rdc/en/docs-6.4.1/index.html">ROCm Data Center Tool</a></td>
|
||||
<td>0.3.0 ⇒ <a href="#rocm-data-center-tool-0-3-0">0.3.0</td>
|
||||
<td><a href="https://github.com/ROCm/rdc"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocminfo/en/docs-6.4.1/index.html">rocminfo</a></td>
|
||||
<td>1.0.0</td>
|
||||
<td><a href="https://github.com/ROCm/rocminfo"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocm_smi_lib/en/docs-6.4.1/index.html">ROCm SMI</a></td>
|
||||
<td>7.5.0 ⇒ <a href="#rocm-smi-7-5-0">7.5.0</a></td>
|
||||
<td><a href="https://github.com/ROCm/rocm_smi_lib"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/ROCmValidationSuite/en/docs-6.4.1/index.html">ROCmValidationSuite</a></td>
|
||||
<td>1.1.0</td>
|
||||
<td><a href="https://github.com/ROCm/ROCmValidationSuite"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
<tbody class="rocm-components-tools rocm-components-perf">
|
||||
<tr>
|
||||
<th rowspan="6"></th>
|
||||
<th rowspan="6">Performance</th>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocm_bandwidth_test/en/docs-6.4.1/index.html">ROCm Bandwidth
|
||||
Test</a></td>
|
||||
<td>1.4.0</td>
|
||||
<td><a href="https://github.com/ROCm/rocm_bandwidth_test/"><i
|
||||
class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocprofiler-compute/en/docs-6.4.1/index.html">ROCm Compute Profiler</a></td>
|
||||
<td>3.1.0</td>
|
||||
<td><a href="https://github.com/ROCm/rocprofiler-compute"><i
|
||||
class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocprofiler-systems/en/docs-6.4.1/index.html">ROCm Systems Profiler</a></td>
|
||||
<td>1.0.0 ⇒ <a href="#rocm-systems-profiler-1-0-1">1.0.1</td>
|
||||
<td><a href="https://github.com/ROCm/rocprofiler-systems"><i
|
||||
class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocprofiler/en/docs-6.4.1/index.html">ROCProfiler</a></td>
|
||||
<td>2.0.0</td>
|
||||
<td><a href="https://github.com/ROCm/ROCProfiler/"><i
|
||||
class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/docs-6.4.1/index.html">ROCprofiler-SDK</a></td>
|
||||
<td>0.6.0</td>
|
||||
<td><a href="https://github.com/ROCm/rocprofiler-sdk/"><i
|
||||
class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr >
|
||||
<td><a href="https://rocm.docs.amd.com/projects/roctracer/en/docs-6.4.1/index.html">ROCTracer</a></td>
|
||||
<td>4.1.0</td>
|
||||
<td><a href="https://github.com/ROCm/ROCTracer/"><i
|
||||
class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
<tbody class="rocm-components-tools rocm-components-dev">
|
||||
<tr>
|
||||
<th rowspan="5"></th>
|
||||
<th rowspan="5">Development</th>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/HIPIFY/en/docs-6.4.1/index.html">HIPIFY</a></td>
|
||||
<td>19.0.0</td>
|
||||
<td><a href="https://github.com/ROCm/HIPIFY/"><i
|
||||
class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/ROCdbgapi/en/docs-6.4.1/index.html">ROCdbgapi</a></td>
|
||||
<td>0.77.2</td>
|
||||
<td><a href="https://github.com/ROCm/ROCdbgapi/"><i
|
||||
class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/ROCmCMakeBuildTools/en/docs-6.4.1/index.html">ROCm CMake</a></td>
|
||||
<td>0.14.0</td>
|
||||
<td><a href="https://github.com/ROCm/rocm-cmake/"><i
|
||||
class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/ROCgdb/en/docs-6.4.1/index.html">ROCm Debugger (ROCgdb)</a>
|
||||
</td>
|
||||
<td>15.2</td>
|
||||
<td><a href="https://github.com/ROCm/ROCgdb/"><i
|
||||
class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/rocr_debug_agent/en/docs-6.4.1/index.html">ROCr Debug Agent</a>
|
||||
</td>
|
||||
<td>2.0.4</td>
|
||||
<td><a href="https://github.com/ROCm/rocr_debug_agent/"><i
|
||||
class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
<tbody class="rocm-components-compilers tbody-reverse-zebra">
|
||||
<tr>
|
||||
<th rowspan="2" colspan="2">Compilers</th>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/HIPCC/en/docs-6.4.1/index.html">HIPCC</a></td>
|
||||
<td>1.1.1</td>
|
||||
<td><a href="https://github.com/ROCm/llvm-project/tree/amd-staging/amd/hipcc"><i
|
||||
class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/llvm-project/en/docs-6.4.1/index.html">llvm-project</a></td>
|
||||
<td>19.0.0</td>
|
||||
<td><a href="https://github.com/ROCm/llvm-project/"><i
|
||||
class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
<tbody class="rocm-components-runtimes tbody-reverse-zebra">
|
||||
<tr>
|
||||
<th rowspan="2" colspan="2">Runtimes</th>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/HIP/en/docs-6.4.1/index.html">HIP</a></td>
|
||||
<td>6.4.0 ⇒ <a href="#hip-6-4-1">6.4.1</td>
|
||||
<td><a href="https://github.com/ROCm/HIP/"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="https://rocm.docs.amd.com/projects/ROCR-Runtime/en/docs-6.4.1/index.html">ROCr Runtime</a></td>
|
||||
<td>1.15.0 ⇒ <a href="#rocr-runtime-1-15-0">1.15.0</td>
|
||||
<td><a href="https://github.com/ROCm/ROCR-Runtime/"><i class="fab fa-github fa-lg"></i></a></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
## Detailed component changes
|
||||
|
||||
The following sections describe key changes to ROCm components.
|
||||
|
||||
```{note}
|
||||
For a historical overview of ROCm component updates, see the {doc}`ROCm consolidated changelog </release/changelog>`.
|
||||
```
|
||||
|
||||
### **AMD SMI** (25.4.2)
|
||||
|
||||
#### Added
|
||||
|
||||
* Dumping CPER entries from RAS tool `amdsmi_get_gpu_cper_entries()` to Python and C APIs.
|
||||
- Dumping CPER entries consist of `amdsmi_cper_hdr_t`.
|
||||
- Dumping CPER entries is also enabled in the CLI interface through `sudo amd-smi ras --cper`.
|
||||
* `amdsmi_get_gpu_busy_percent` to the C API.
|
||||
|
||||
#### Changed
|
||||
|
||||
* Modified VRAM display for amd-smi monitor -v.
|
||||
|
||||
#### Optimized
|
||||
|
||||
* Improved load times for CLI commands when the GPU has multiple parititons.
|
||||
|
||||
#### Resolved issues
|
||||
|
||||
* Fixed partition enumeration in `amd-smi list -e`, `amdsmi_get_gpu_enumeration_info()`, `amdsmi_enumeration_info_t`, `drm_card`, and `drm_render` fields.
|
||||
|
||||
#### Known issues
|
||||
|
||||
* When using the `--follow` flag with `amd-smi ras --cper`, CPER entries are not streamed continuously as intended. This will be fixed in an upcoming ROCm release.
|
||||
|
||||
```{note}
|
||||
See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions.
|
||||
```
|
||||
|
||||
### **HIP** (6.4.1)
|
||||
|
||||
#### Added
|
||||
|
||||
* New log mask enumeration `LOG_COMGR` enables logging precise code object information.
|
||||
|
||||
#### Changed
|
||||
|
||||
* HIP runtime uses device bitcode before SPIRV.
|
||||
* The implementation of preventing `hipLaunchKernel` latency degradation with number of idle streams is reverted/disabled by default.
|
||||
|
||||
#### Optimized
|
||||
|
||||
* Improved kernel logging includes de-mangling shader names.
|
||||
* Refined implementation in HIP APIs `hipEventRecords` and `hipStreamWaitEvent` for performance improvement.
|
||||
|
||||
#### Resolved issues
|
||||
|
||||
* Stale state during the graph capture. The return error was fixed, HIP runtime now always uses the latest dependent nodes during `hipEventRecord` capture.
|
||||
* Segmentation fault during kernel execution. HIP runtime now allows maximum stack size as per ISA on the GPU device.
|
||||
|
||||
### **hipBLASLt** (0.12.1)
|
||||
|
||||
#### Resolved issues
|
||||
|
||||
* Fixed an accuracy issue for some solutions using an `FP32` or `TF32` data type with a TT transpose.
|
||||
|
||||
### **RCCL** (2.22.3)
|
||||
|
||||
#### Changed
|
||||
|
||||
* MSCCL++ is now disabled by default. To enable it, set `RCCL_MSCCLPP_ENABLE=1`.
|
||||
|
||||
#### Resolved issues
|
||||
|
||||
* Fixed an issue where early termination, in rare circumstances, could cause the application to stop responding by adding synchronization before destroying a proxy thread.
|
||||
* Fixed the accuracy issue for the MSCCLPP `allreduce7` kernel in graph mode.
|
||||
|
||||
#### Known issues
|
||||
|
||||
* When splitting a communicator using `ncclCommSplit` in some GPU configurations, MSCCL initialization can cause a segmentation fault. The recommended workaround is to disable MSCCL with `export RCCL_MSCCL_ENABLE=0`.
|
||||
This issue will be fixed in a future ROCm release.
|
||||
|
||||
* Within the RCCL-UnitTests test suite, failures occur in tests ending with the
|
||||
`.ManagedMem` and `.ManagedMemGraph` suffixes. These failures only affect the
|
||||
test results and do not affect the RCCL component itself. This issue will be
|
||||
resolved in a future ROCm release.
|
||||
|
||||
### **rocALUTION** (3.2.3)
|
||||
|
||||
#### Added
|
||||
|
||||
* The `-a` option has been added to the `rmake.py` build script. This option allows you to select specific architectures when building on Microsoft Windows.
|
||||
|
||||
#### Resolved issues
|
||||
|
||||
* Fixed an issue where the `HIP_PATH` environment variable was being ignored when compiling on Microsoft Windows.
|
||||
|
||||
### **ROCm Data Center Tool** (0.3.0)
|
||||
|
||||
#### Added
|
||||
|
||||
- Support for GPU partitions.
|
||||
- `RDC_FI_GPU_BUSY_PERCENT` metric.
|
||||
|
||||
#### Changed
|
||||
|
||||
- Updated `rdc_field` to align with `rdc_bootstrap` for current metrics.
|
||||
|
||||
#### Resolved issues
|
||||
|
||||
- Fixed [ROCProfiler](https://rocm.docs.amd.com/projects/rocprofiler/en/docs-6.4.0/index.html) eval metrics and memory leaks.
|
||||
|
||||
### **ROCm SMI** (7.5.0)
|
||||
|
||||
#### Resolved issues
|
||||
|
||||
- Fixed partition enumeration. It now refers to the correct DRM Render and Card paths.
|
||||
|
||||
```{note}
|
||||
See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions.
|
||||
```
|
||||
|
||||
### **ROCm Systems Profiler** (1.0.1)
|
||||
|
||||
#### Added
|
||||
|
||||
* How-to document for [network performance profiling](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/how-to/nic-profiling.html) for standard Network Interface Cards (NICs).
|
||||
|
||||
#### Resolved issues
|
||||
|
||||
* Fixed a build issue with Dyninst on GCC 13.
|
||||
|
||||
### **ROCr Runtime** (1.15.0)
|
||||
|
||||
#### Resolved issues
|
||||
|
||||
* Fixed a rare occurrence issue on AMD Instinct MI25, MI50, and MI100 GPUs, where the `SDMA` copies might start before the dependent Kernel finishes and could cause memory corruption.
|
||||
|
||||
## ROCm known issues
|
||||
|
||||
ROCm known issues are noted on {fab}`github` [GitHub](https://github.com/ROCm/ROCm/labels/Verified%20Issue). For known
|
||||
issues related to individual components, review the [Detailed component changes](#detailed-component-changes).
|
||||
|
||||
### Radeon AI PRO R9700 hangs when running Stable Diffusion 2.1 at batch sizes above four
|
||||
|
||||
Radeon AI PRO R9700 GPUs might hang when running [Stable Diffusion
|
||||
2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1) with batch sizes
|
||||
greater than four. As a workaround, limit batch sizes to four or fewer. This issue
|
||||
will be addressed in a future ROCm release. See [issue #4770](https://github.com/ROCm/ROCm/issues/4770) on GitHub.
|
||||
|
||||
### RCCL MSCCL initialization failure
|
||||
|
||||
When splitting a communicator using `ncclCommSplit` in some GPU configurations, MSCCL initialization can cause a segmentation fault. The recommended workaround is to disable MSCCL with `export RCCL_MSCCL_ENABLE=0`.
|
||||
This issue will be fixed in a future ROCm release. See [issue #4769](https://github.com/ROCm/ROCm/issues/4769) on GitHub.
|
||||
|
||||
### AMD SMI CLI: CPER entries not dumped continuously when using follow flag
|
||||
|
||||
* When using the `--follow` flag with `amd-smi ras --cper`, CPER entries are not streamed continuously as intended. This will be fixed in an upcoming ROCm release.
|
||||
See [issue #4768](https://github.com/ROCm/ROCm/issues/4768) on GitHub.
|
||||
|
||||
### ROCm SMI uninstallation issue on RHEL and SLES
|
||||
|
||||
`rocm-smi-lib` does not get uninstalled and remains orphaned on RHEL and SLES systems when:
|
||||
|
||||
* [Uninstalling ROCm using the AMDGPU installer](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/amdgpu-install.html#uninstalling-rocm) with `amdgpu-install --uninstall`
|
||||
|
||||
* [Uninstalling via package manager](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/install-methods/package-manager/package-manager-rhel.html#uninstall-rocm-packages)
|
||||
with `dnf remove rocm-core` on RHEL or `zypper remove rocm-core` on SLES.
|
||||
|
||||
As a workaround, manually remove the `rocm-smi-lib` package using `sudo dnf remove rocm-smi-lib` or `sudo zypper remove rocm-smi-lib`.
|
||||
See [issue #4767](https://github.com/ROCm/ROCm/issues/4767) on GitHub.
|
||||
|
||||
## ROCm upcoming changes
|
||||
|
||||
The following changes to the ROCm software stack are anticipated for future releases.
|
||||
|
||||
### ROCm SMI deprecation
|
||||
|
||||
[ROCm SMI](https://github.com/ROCm/rocm_smi_lib) will be phased out in an
|
||||
upcoming ROCm release and will enter maintenance mode. After this transition,
|
||||
only critical bug fixes will be addressed and no further feature development
|
||||
will take place.
|
||||
|
||||
It's strongly recommended to transition your projects to [AMD
|
||||
SMI](https://github.com/ROCm/amdsmi), the successor to ROCm SMI. AMD SMI
|
||||
includes all the features of the ROCm SMI and will continue to receive regular
|
||||
updates, new functionality, and ongoing support. For more information on AMD
|
||||
SMI, see the [AMD SMI documentation](https://rocm.docs.amd.com/projects/amdsmi/en/latest/).
|
||||
|
||||
### ROCTracer, ROCProfiler, rocprof, and rocprofv2 deprecation
|
||||
|
||||
Development and support for ROCTracer, ROCProfiler, `rocprof`, and `rocprofv2` are being phased out in favor of ROCprofiler-SDK in upcoming ROCm releases. Starting with ROCm 6.4, only critical defect fixes will be addressed for older versions of the profiling tools and libraries. All users are encouraged to upgrade to the latest version of the ROCprofiler-SDK library and the (`rocprofv3`) tool to ensure continued support and access to new features. ROCprofiler-SDK is still in beta today and will be production-ready in a future ROCm release.
|
||||
|
||||
It's anticipated that ROCTracer, ROCProfiler, `rocprof`, and `rocprofv2` will reach end-of-life by future releases, aligning with Q1 of 2026.
|
||||
|
||||
### AMDGPU wavefront size compiler macro deprecation
|
||||
|
||||
Access to the wavefront size as a compile-time constant via the `__AMDGCN_WAVEFRONT_SIZE`
|
||||
and `__AMDGCN_WAVEFRONT_SIZE__` macros or the `constexpr warpSize` variable is deprecated
|
||||
and will be disabled in a future release.
|
||||
|
||||
* The `__AMDGCN_WAVEFRONT_SIZE__` macro and `__AMDGCN_WAVEFRONT_SIZE` alias will be removed in an upcoming release.
|
||||
It is recommended to remove any use of this macro. For more information, see
|
||||
[AMDGPU support](https://rocm.docs.amd.com/projects/llvm-project/en/docs-6.4.0/LLVM/clang/html/AMDGPUSupport.html).
|
||||
* `warpSize` will only be available as a non-`constexpr` variable. Where required,
|
||||
the wavefront size should be queried via the `warpSize` variable in device code,
|
||||
or via `hipGetDeviceProperties` in host code. Neither of these will result in a compile-time constant.
|
||||
* For cases where compile-time evaluation of the wavefront size cannot be avoided,
|
||||
uses of `__AMDGCN_WAVEFRONT_SIZE`, `__AMDGCN_WAVEFRONT_SIZE__`, or `warpSize`
|
||||
can be replaced with a user-defined macro or `constexpr` variable with the wavefront
|
||||
size(s) for the target hardware. For example:
|
||||
|
||||
```
|
||||
#if defined(__GFX9__)
|
||||
#define MY_MACRO_FOR_WAVEFRONT_SIZE 64
|
||||
#else
|
||||
#define MY_MACRO_FOR_WAVEFRONT_SIZE 32
|
||||
#endif
|
||||
```
|
||||
|
||||
### HIPCC Perl scripts deprecation
|
||||
|
||||
The HIPCC Perl scripts (`hipcc.pl` and `hipconfig.pl`) will be removed in an upcoming release.
|
||||
|
||||
### Changes to ROCm Object Tooling
|
||||
|
||||
ROCm Object Tooling tools ``roc-obj-ls``, ``roc-obj-extract``, and ``roc-obj`` are
|
||||
deprecated in ROCm 6.4, and will be removed in a future release. Functionality
|
||||
has been added to the ``llvm-objdump --offloading`` tool option to extract all
|
||||
clang-offload-bundles into individual code objects found within the objects
|
||||
or executables passed as input. The ``llvm-objdump --offloading`` tool option also
|
||||
supports the ``--arch-name`` option, and only extracts code objects found with
|
||||
the specified target architecture. See [llvm-objdump](https://llvm.org/docs/CommandGuide/llvm-objdump.html)
|
||||
for more information.
|
||||
|
||||
### HIP runtime API changes
|
||||
|
||||
There are a number of upcoming changes planned for HIP runtime API in an upcoming major release
|
||||
that are not backward compatible with prior releases. Most of these changes increase
|
||||
alignment between HIP and CUDA APIs or behavior. Some of the upcoming changes are to
|
||||
clean up header files, remove namespace collision, and have a clear separation between
|
||||
`hipRTC` and HIP runtime.
|
||||
24
docs/conf.py
24
docs/conf.py
@@ -27,28 +27,14 @@ project = "ROCm Documentation"
|
||||
project_path = os.path.abspath(".").replace("\\", "/")
|
||||
author = "Advanced Micro Devices, Inc."
|
||||
copyright = "Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved."
|
||||
version = "7.0 Alpha 2"
|
||||
release = "7.0 Alpha 2"
|
||||
version = "7.0 RC1"
|
||||
release = "7.0 RC1"
|
||||
setting_all_article_info = True
|
||||
all_article_info_os = ["linux", "windows"]
|
||||
all_article_info_os = ["linux"]
|
||||
all_article_info_author = ""
|
||||
|
||||
# pages with specific settings
|
||||
article_pages = [
|
||||
{"file": "preview/index", "os": ["linux"],},
|
||||
{"file": "preview/release", "os": ["linux"],},
|
||||
{"file": "preview/versions", "os": ["linux"],},
|
||||
{"file": "preview/install/index", "os": ["linux"],},
|
||||
{"file": "preview/install/instinct-driver", "os": ["linux"],},
|
||||
{"file": "preview/install/rocm", "os": ["linux"],},
|
||||
{"file": "preview/benchmark-docker/index", "os": ["linux"],},
|
||||
{"file": "preview/benchmark-docker/training", "os": ["linux"],},
|
||||
{"file": "preview/benchmark-docker/pre-training-megatron-lm-llama-3-8b", "os": ["linux"],},
|
||||
{"file": "preview/benchmark-docker/pre-training-torchtitan-llama-3-70b", "os": ["linux"],},
|
||||
{"file": "preview/benchmark-docker/fine-tuning-lora-llama-2-70b", "os": ["linux"],},
|
||||
{"file": "preview/benchmark-docker/inference", "os": ["linux"],},
|
||||
{"file": "preview/benchmark-docker/inference-vllm-llama-3.1-405b-fp4", "os": ["linux"],},
|
||||
{"file": "preview/benchmark-docker/inference-sglang-deepseek-r1-fp4", "os": ["linux"],},
|
||||
{"file": "preview/release", "date": "2025-08-07",},
|
||||
]
|
||||
|
||||
external_toc_path = "./sphinx/_toc.yml"
|
||||
@@ -73,7 +59,7 @@ html_static_path = ["sphinx/static/css", "sphinx/static/js"]
|
||||
html_css_files = ["rocm_custom.css", "rocm_rn.css"]
|
||||
html_js_files = ["preview-version-list.js"]
|
||||
|
||||
html_title = "ROCm 7.0 Alpha 2 documentation"
|
||||
html_title = "ROCm 7.0 RC1 documentation"
|
||||
|
||||
html_theme_options = {"link_main_doc": False}
|
||||
|
||||
|
||||
@@ -1,9 +1,5 @@
|
||||
.. meta::
|
||||
:description: Benchmarking AI model training, fine-tuning, and inference
|
||||
:keywords: composable kernel, CK, ROCm, API, documentation
|
||||
|
||||
*******************************************
|
||||
Docker images for AI training and inference
|
||||
Docker images for AI inference
|
||||
*******************************************
|
||||
|
||||
.. note::
|
||||
@@ -15,27 +11,30 @@ Docker images for AI training and inference
|
||||
<https://rocm.docs.amd.com/en/docs-7.0-docker/benchmark-docker/index.html>`__
|
||||
documentation.
|
||||
|
||||
This page accompanies preview Docker images designed to validate and reproduce
|
||||
training performance on AMD Instinct™ MI355X and MI350X accelerators. The images provide access to
|
||||
Alpha versions of the ROCm 7.0 software stack and are targeted at early-access users evaluating
|
||||
training workloads using next-generation AMD accelerators.
|
||||
|
||||
This preview offers hands-on benchmarking using representative large-scale
|
||||
language and reasoning models with optimized compute precisions and
|
||||
configurations.
|
||||
This page accompanies preview Docker images designed to reproduce
|
||||
inference performance on AMD Instinct™ MI355X, MI350X, and MI300X series
|
||||
accelerators. The images provide access to preview versions of the ROCm 7.0
|
||||
software stack and are targeted at early-access users evaluating AI
|
||||
inference workloads using next-generation AMD accelerators.
|
||||
|
||||
.. important::
|
||||
|
||||
The following AI workload benchmarks only support the ROCm 7.0 Alpha release on AMD Instinct
|
||||
MI355X and MI350X accelerators.
|
||||
The following AI workload benchmarks use the ROCm 7.0 release candidate
|
||||
preview on AMD Instinct MI355X, MI350X, and MI300X series accelerators.
|
||||
|
||||
If you're looking for production-level workloads for the MI300X series, see
|
||||
If you're looking for production-level workloads for MI300X series accelerators, see
|
||||
`Infinity Hub <https://www.amd.com/en/developer/resources/infinity-hub.html>`_.
|
||||
|
||||
.. grid:: 2
|
||||
|
||||
.. grid-item-card:: Training
|
||||
.. grid-item-card:: Inference
|
||||
|
||||
* :doc:`pre-training-megatron-lm-llama-3-8b`
|
||||
* :doc:`inference-vllm-llama-3.1-405b-fp4`
|
||||
|
||||
* :doc:`pre-training-torchtitan-llama-3-70b`
|
||||
* :doc:`inference-vllm-llama-3.3-70b-fp8`
|
||||
|
||||
* :doc:`inference-vllm-gpt-oss-120b`
|
||||
|
||||
* :doc:`inference-sglang-deepseek-r1-fp4`
|
||||
|
||||
* :doc:`inference-sglang-deepseek-r1-fp8`
|
||||
|
||||
@@ -0,0 +1,108 @@
|
||||
***********************************************
|
||||
Benchmark DeepSeek R1 FP4 inference with SGLang
|
||||
***********************************************
|
||||
|
||||
.. note::
|
||||
|
||||
For the latest iteration of AI training and inference performance for ROCm
|
||||
7.0, see `Infinity Hub
|
||||
<https://www.amd.com/en/developer/resources/infinity-hub.html#q=ROCm%207>`__
|
||||
and the `ROCm 7.0 AI training and inference performance
|
||||
<https://rocm.docs.amd.com/en/docs-7.0-docker/benchmark-docker/index.html>`__
|
||||
documentation.
|
||||
|
||||
This section provides instructions to test the inference performance of DeepSeek R1
|
||||
with FP4 precision via the SGLang serving framework.
|
||||
The accompanying Docker image integrates the ROCm 7.0 preview with SGLang, and is
|
||||
tailored for AMD Instinct MI355X and MI350X accelerators. This
|
||||
benchmark does not support other accelerators.
|
||||
|
||||
Follow these steps to pull the required image, spin up the container with the
|
||||
appropriate options, download the model, and run the benchmark.
|
||||
|
||||
Pull the Docker image
|
||||
=====================
|
||||
|
||||
Use the following command to pull the `Docker image
|
||||
<https://hub.docker.com/layers/rocm/7.0-preview/rocm7.0_preview_ubuntu_22.04_sgl-dev-v0.5.2rc2_mi35x_rc1/images/sha256-2c2a78219b478421482db0c4dce612cca11ce163274f5dbad2b305067fb86012>`__.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker pull rocm/7.0-preview:rocm7.0_preview_ubuntu_22.04_sgl-dev-v0.5.2rc2_mi35x_rc1
|
||||
|
||||
Download the model
|
||||
==================
|
||||
|
||||
See the model card on Hugging Face at `DeepSeek-R1-MXFP4-Preview
|
||||
<https://huggingface.co/amd/DeepSeek-R1-MXFP4-Preview>`__. This model uses
|
||||
microscaling 4-bit floating point (MXFP4) quantization through `AMD Quark
|
||||
<https://quark.docs.amd.com/latest/>`_ for efficient inference on AMD
|
||||
accelerators.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
pip install huggingface_hub[cli] hf_transfer hf_xet
|
||||
HF_HUB_ENABLE_HF_TRANSFER=1 \
|
||||
HF_HOME=/data/huggingface-cache \
|
||||
HF_TOKEN="<HF_TOKEN>" \
|
||||
huggingface-cli download amd/DeepSeek-R1-0528-MXFP4-Preview --exclude "original/*"
|
||||
|
||||
Run the inference benchmark
|
||||
===========================
|
||||
|
||||
1. Start the container using the following command.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker run -it \
|
||||
--user root \
|
||||
--group-add video \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
-w /app/ \
|
||||
--ipc=host \
|
||||
--network=host \
|
||||
--shm-size 64G \
|
||||
--mount type=bind,src=/data,dst=/data \
|
||||
--device=/dev/kfd \
|
||||
--device=/dev/dri \
|
||||
-e SGLANG_USE_AITER=1 \
|
||||
rocm/7.0-preview:rocm7.0_preview_ubuntu_22.04_sgl-dev-v0.5.2rc2_mi35x_rc1
|
||||
|
||||
2. Start the server.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
python3 -m sglang.launch_server \
|
||||
--model-path amd/DeepSeek-R1-0528-MXFP4-Preview \
|
||||
--host localhost \
|
||||
--port 8000 \
|
||||
--tensor-parallel-size 8 \
|
||||
--trust-remote-code \
|
||||
--chunked-prefill-size 196608 \
|
||||
--mem-fraction-static 0.8 \
|
||||
--disable-radix-cache \
|
||||
--num-continuous-decode-steps 4 \
|
||||
--max-prefill-tokens 196608 \
|
||||
--cuda-graph-max-bs 128 &
|
||||
|
||||
3. Run the benchmark with the following options.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
input_tokens=1024
|
||||
output_tokens=1024
|
||||
max_concurrency=64
|
||||
num_prompts=128
|
||||
|
||||
python3 -m sglang.bench_serving \
|
||||
--host localhost \
|
||||
--port 8000 \
|
||||
--model amd/DeepSeek-R1-0528-MXFP4-Preview \
|
||||
--dataset-name random \
|
||||
--random-input ${input_tokens} \
|
||||
--random-output ${output_tokens} \
|
||||
--random-range-ratio 1.0 \
|
||||
--max-concurrency ${max_concurrency} \
|
||||
--num-prompt ${num_prompts}
|
||||
|
||||
@@ -0,0 +1,141 @@
|
||||
***********************************************
|
||||
Benchmark DeepSeek R1 FP8 inference with SGLang
|
||||
***********************************************
|
||||
|
||||
.. note::
|
||||
|
||||
For the latest iteration of AI training and inference performance for ROCm
|
||||
7.0, see `Infinity Hub
|
||||
<https://www.amd.com/en/developer/resources/infinity-hub.html#q=ROCm%207>`__
|
||||
and the `ROCm 7.0 AI training and inference performance
|
||||
<https://rocm.docs.amd.com/en/docs-7.0-docker/benchmark-docker/index.html>`__
|
||||
documentation.
|
||||
|
||||
This section provides instructions to test the inference performance of DeepSeek R1
|
||||
with FP8 precision via the SGLang serving framework.
|
||||
The accompanying Docker image integrates the ROCm 7.0 preview with SGLang, and is
|
||||
tailored for AMD Instinct MI355X, MI350X, and MI300X series accelerators. This
|
||||
benchmark does not support other accelerators.
|
||||
|
||||
Follow these steps to pull the required image, spin up the container with the
|
||||
appropriate options, download the model, and run the benchmark.
|
||||
|
||||
Pull the Docker image
|
||||
=====================
|
||||
|
||||
Use the following command to pull the appropriate `Docker image <https://hub.docker.com/r/rocm/7.0-preview/tags>`_
|
||||
for your system.
|
||||
|
||||
.. tab-set::
|
||||
|
||||
.. tab-item:: MI350 series
|
||||
:sync: mi35x
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker pull rocm/7.0-preview:rocm7.0_preview_ubuntu_22.04_sgl-dev-v0.5.2rc2_mi35x_rc1
|
||||
|
||||
.. tab-item:: MI300X series
|
||||
:sync: mi30x
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker pull rocm/7.0-preview:rocm7.0_preview_ubuntu_22.04_sgl-dev-v0.5.2rc2-mi30x_rc1
|
||||
|
||||
Download the model
|
||||
==================
|
||||
|
||||
See the model card on Hugging Face at `deepseek-ai/DeepSeek-R1-0528 <https://huggingface.co/deepseek-ai/DeepSeek-R1-0528>`__.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
pip install huggingface_hub[cli] hf_transfer hf_xet
|
||||
HF_HUB_ENABLE_HF_TRANSFER=1 \
|
||||
HF_HOME=/data/huggingface-cache \
|
||||
HF_TOKEN="<HF_TOKEN>" \
|
||||
huggingface-cli download deepseek-ai/DeepSeek-R1-0528 --exclude "original/*"
|
||||
|
||||
Run the inference benchmark
|
||||
===========================
|
||||
|
||||
1. Start the container using the following command.
|
||||
|
||||
.. tab-set::
|
||||
|
||||
.. tab-item:: MI350 series
|
||||
:sync: mi35x
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker run -it \
|
||||
--user root \
|
||||
--group-add video \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
-w /app/ \
|
||||
--ipc=host \
|
||||
--network=host \
|
||||
--shm-size 64G \
|
||||
--mount type=bind,src=/data,dst=/data \
|
||||
--device=/dev/kfd \
|
||||
--device=/dev/dri \
|
||||
-e SGLANG_USE_AITER=1 \
|
||||
rocm/7.0-preview:rocm7.0_preview_ubuntu_22.04_sgl-dev-v0.5.2rc2_mi35x_rc1
|
||||
|
||||
.. tab-item:: MI300X series
|
||||
:sync: mi30x
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker run -it \
|
||||
--user root \
|
||||
--group-add video \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
-w /app/ \
|
||||
--ipc=host \
|
||||
--network=host \
|
||||
--shm-size 64G \
|
||||
--mount type=bind,src=/data,dst=/data \
|
||||
--device=/dev/kfd \
|
||||
--device=/dev/dri \
|
||||
-e SGLANG_USE_AITER=1 \
|
||||
rocm/7.0-preview:rocm7.0_preview_ubuntu_22.04_sgl-dev-v0.5.2rc2-mi30x_rc1
|
||||
|
||||
2. Start the server.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
python3 -m sglang.launch_server \
|
||||
--model-path deepseek-ai/DeepSeek-R1-0528 \
|
||||
--host localhost \
|
||||
--port 8000 \
|
||||
--tensor-parallel-size 8 \
|
||||
--trust-remote-code \
|
||||
--chunked-prefill-size 196608 \
|
||||
--mem-fraction-static 0.8 \
|
||||
--disable-radix-cache \
|
||||
--num-continuous-decode-steps 4 \
|
||||
--max-prefill-tokens 196608 \
|
||||
--cuda-graph-max-bs 128 &
|
||||
|
||||
3. Run the benchmark with the following options.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
input_tokens=1024
|
||||
output_tokens=1024
|
||||
max_concurrency=64
|
||||
num_prompts=128
|
||||
|
||||
python3 -m sglang.bench_serving \
|
||||
--host localhost \
|
||||
--port 8000 \
|
||||
--model deepseek-ai/DeepSeek-R1-0528 \
|
||||
--dataset-name random \
|
||||
--random-input ${input_tokens} \
|
||||
--random-output ${output_tokens} \
|
||||
--random-range-ratio 1.0 \
|
||||
--max-concurrency ${max_concurrency} \
|
||||
--num-prompt ${num_prompts}
|
||||
|
||||
107
docs/preview/benchmark-docker/inference-vllm-gpt-oss-120b.rst
Normal file
107
docs/preview/benchmark-docker/inference-vllm-gpt-oss-120b.rst
Normal file
@@ -0,0 +1,107 @@
|
||||
******************************************
|
||||
Benchmark GPT OSS 120B inference with vLLM
|
||||
******************************************
|
||||
|
||||
.. note::
|
||||
|
||||
For the latest iteration of AI training and inference performance for ROCm
|
||||
7.0, see `Infinity Hub
|
||||
<https://www.amd.com/en/developer/resources/infinity-hub.html#q=ROCm%207>`__
|
||||
and the `ROCm 7.0 AI training and inference performance
|
||||
<https://rocm.docs.amd.com/en/docs-7.0-docker/benchmark-docker/index.html>`__
|
||||
documentation.
|
||||
|
||||
This section provides instructions to test the inference performance of OpenAI
|
||||
GPT OSS 120B on the vLLM inference engine. The accompanying Docker image integrates
|
||||
the ROCm 7.0 preview with vLLM, and is tailored for AMD Instinct
|
||||
MI355X and MI350X accelerators. This benchmark does not support other
|
||||
GPUs.
|
||||
|
||||
Follow these steps to pull the required image, spin up the container with the
|
||||
appropriate options, download the model, and run the throughput test.
|
||||
|
||||
Pull the Docker image
|
||||
=====================
|
||||
|
||||
Use the following command to pull the `Docker image <https://hub.docker.com/layers/rocm/7.0-preview/rocm7.0_preview_ubuntu_22.04_vllm_0.10.1_instinct_rc1/images/sha256-eee29678dc4dc8f2e054de889555be6f4fd74e58053bf7277d56ace1a850513e>`__.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker pull rocm/7.0-preview:rocm7.0_preview_ubuntu_22.04_vllm_0.10.1_instinct_rc1
|
||||
|
||||
Download the model
|
||||
==================
|
||||
|
||||
See the model card on Hugging Face at `openai/gpt-oss-120b
|
||||
<https://huggingface.co/openai/gpt-oss-120b>`__.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
pip install huggingface_hub[cli] hf_transfer hf_xet
|
||||
HF_HUB_ENABLE_HF_TRANSFER=1 \
|
||||
HF_HOME=/data/huggingface-cache \
|
||||
HF_TOKEN="<HF_TOKEN>" \
|
||||
huggingface-cli download openai/gpt-oss-120b --local-dir /data/gpt-oss-120b
|
||||
|
||||
Run the inference benchmark
|
||||
===========================
|
||||
|
||||
1. Start the container using the following command.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker run --rm -d \
|
||||
--network host \
|
||||
--ipc host \
|
||||
--privileged \
|
||||
--cap-add=CAP_SYS_ADMIN \
|
||||
--device=/dev/kfd \
|
||||
--device=/dev/dri \
|
||||
--device=/dev/mem \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
--shm-size 32G \
|
||||
-v /data/huggingface-cache:/root/.cache/huggingface/hub/ \
|
||||
-v "$PWD/.vllm_cache/":/root/.cache/vllm/ \
|
||||
-v /data/gpt-oss-120b:/data/gpt-oss-120b \
|
||||
-e VLLM_USE_AITER_TRITON_FUSED_SPLIT_QKV_ROPE=1 \
|
||||
-e VLLM_USE_AITER_TRITON_FUSED_ADD_RMSNORM_PAD=1 \
|
||||
-e VLLM_USE_AITER_TRITON_GEMM=1 \
|
||||
-e VLLM_ROCM_USE_AITER=1 \
|
||||
-e VLLM_USE_AITER_UNIFIED_ATTENTION=1 \
|
||||
-e VLLM_ROCM_USE_AITER_MHA=0 \
|
||||
-e TRITON_HIP_PRESHUFFLE_SCALES=1 \
|
||||
-e VLLM_DISABLE_COMPILE_CACHE=1 \
|
||||
-e HSA_NO_SCRATCH_RECLAIM=1 \
|
||||
--name vllm-server \
|
||||
rocm/7.0-preview:rocm7.0_preview_ubuntu_22.04_vllm_0.10.1_instinct_rc1
|
||||
|
||||
2. Set environment variables and start the server.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
vllm serve /data/gpt-oss-120b/ \
|
||||
--tensor-parallel 1 \
|
||||
--no-enable-prefix-caching --disable-log-requests \
|
||||
--compilation-config '{"compile_sizes": [1, 2, 4, 8, 16, 24, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192], "cudagraph_capture_sizes":[8192,4096,2048,1024,1008,992,976,960,944,928,912,896,880,864,848,832,816,800,784,768,752,736,720,704,688,672,656,640,624,608,592,576,560,544,528,512,496,480,464,448,432,416,400,384,368,352,336,320,304,288,272,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1], "full_cuda_graph": true}' \
|
||||
--block-size 64 \
|
||||
--swap-space 16 \
|
||||
--gpu-memory-utilization 0.95 \
|
||||
--async-scheduling
|
||||
|
||||
3. Run the benchmark with the following options.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
vllm bench serve \
|
||||
--model /data/gpt-oss-120b/ \
|
||||
--backend vllm \
|
||||
--host 0.0.0.0 \
|
||||
--dataset-name "random" \
|
||||
--random-input-len 1024 \
|
||||
--random-output-len 1024 \
|
||||
--random-prefix-len 0 \
|
||||
--num-prompts 32 \
|
||||
--max-concurrency 16 \
|
||||
--request-rate "inf" \
|
||||
--ignore-eos
|
||||
@@ -0,0 +1,135 @@
|
||||
************************************************
|
||||
Benchmark Llama 3.1 405B FP4 inference with vLLM
|
||||
************************************************
|
||||
|
||||
.. note::
|
||||
|
||||
For the latest iteration of AI training and inference performance for ROCm
|
||||
7.0, see `Infinity Hub
|
||||
<https://www.amd.com/en/developer/resources/infinity-hub.html#q=ROCm%207>`__
|
||||
and the `ROCm 7.0 AI training and inference performance
|
||||
<https://rocm.docs.amd.com/en/docs-7.0-docker/benchmark-docker/index.html>`__
|
||||
documentation.
|
||||
|
||||
This section provides instructions to test the inference performance of Llama
|
||||
3.1 405B on the vLLM inference engine. The accompanying Docker image integrates
|
||||
the ROCm 7.0 preview with vLLM, and is tailored for AMD Instinct
|
||||
MI355X and MI350X accelerators. This benchmark does not support other
|
||||
GPUs.
|
||||
|
||||
Follow these steps to pull the required image, spin up the container with the
|
||||
appropriate options, download the model, and run the throughput test.
|
||||
|
||||
Pull the Docker image
|
||||
=====================
|
||||
|
||||
Use the following command to pull the `Docker image <https://hub.docker.com/layers/rocm/7.0-preview/rocm7.0_preview_ubuntu_22.04_vllm_0.10.1_instinct_rc1/images/sha256-eee29678dc4dc8f2e054de889555be6f4fd74e58053bf7277d56ace1a850513e>`__.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker pull rocm/7.0-preview:rocm7.0_preview_ubuntu_22.04_vllm_0.10.1_instinct_rc1
|
||||
|
||||
Download the model
|
||||
==================
|
||||
|
||||
See the model card on Hugging Face at
|
||||
`amd/Llama-3.1-405B-Instruct-MXFP4-Preview
|
||||
<https://huggingface.co/amd/Llama-3.1-405B-Instruct-MXFP4-Preview>`__. This
|
||||
model uses microscaling 4-bit floating point (MXFP4) quantization via `AMD
|
||||
Quark <https://quark.docs.amd.com/latest/>`_ for efficient inference on AMD
|
||||
accelerators.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
pip install huggingface_hub[cli] hf_transfer hf_xet
|
||||
HF_HUB_ENABLE_HF_TRANSFER=1 \
|
||||
HF_HOME=/data/huggingface-cache \
|
||||
HF_TOKEN="<HF_TOKEN>" \
|
||||
huggingface-cli download amd/Llama-3.1-405B-Instruct-MXFP4-Preview --exclude "original/*"
|
||||
|
||||
Run the inference benchmark
|
||||
===========================
|
||||
|
||||
1. Start the container using the following command.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker run -it \
|
||||
--ipc=host \
|
||||
--network=host \
|
||||
--privileged \
|
||||
--cap-add=CAP_SYS_ADMIN \
|
||||
--device=/dev/kfd \
|
||||
--device=/dev/dri \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
-v /data:/data \
|
||||
-e HF_HOME=/data/huggingface-cache \
|
||||
-e HF_HUB_OFFLINE=1 \
|
||||
-e VLLM_USE_V1=1 \
|
||||
-e VLLM_V1_USE_PREFILL_DECODE_ATTENTION=1 \
|
||||
-e AMDGCN_USE_BUFFER_OPS=1 \
|
||||
-e VLLM_USE_AITER_TRITON_ROPE=1 \
|
||||
-e TRITON_HIP_ASYNC_COPY_BYPASS_PERMUTE=1 \
|
||||
-e TRITON_HIP_USE_ASYNC_COPY=1 \
|
||||
-e TRITON_HIP_USE_BLOCK_PINGPONG=1 \
|
||||
-e TRITON_HIP_ASYNC_FAST_SWIZZLE=1 \
|
||||
-e VLLM_ROCM_USE_AITER=1 \
|
||||
-e VLLM_ROCM_USE_AITER_RMSNORM=1 \
|
||||
-e VLLM_TRITON_FP4_GEMM_USE_ASM=1 \
|
||||
-e VLLM_TRITON_FP4_GEMM_SPLITK_USE_BF16=1 \
|
||||
--name vllm-server \
|
||||
rocm/7.0-preview:rocm7.0_preview_ubuntu_22.04_vllm_0.10.1_instinct_rc1
|
||||
|
||||
2. Start the server.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
max_model_len=10240
|
||||
max_num_seqs=1024
|
||||
max_num_batched_tokens=131072
|
||||
max_seq_len_to_capture=16384
|
||||
tensor_parallel_size=8
|
||||
|
||||
vllm serve amd/Llama-3.1-405B-Instruct-MXFP4-Preview \
|
||||
--host localhost \
|
||||
--port 8000 \
|
||||
--swap-space 64 \
|
||||
--disable-log-requests \
|
||||
--dtype auto \
|
||||
--max-model-len ${max_model_len} \
|
||||
--tensor-parallel-size ${tensor_parallel_size} \
|
||||
--max-num-seqs ${max_num_seqs} \
|
||||
--distributed-executor-backend mp \
|
||||
--kv-cache-dtype fp8 \
|
||||
--gpu-memory-utilization 0.92 \
|
||||
--max-seq-len-to-capture ${max_seq_len_to_capture} \
|
||||
--max-num-batched-tokens ${max_num_batched_tokens} \
|
||||
--no-enable-prefix-caching \
|
||||
--async-scheduling
|
||||
|
||||
# Wait for model to load and server is ready to accept requests
|
||||
|
||||
3. Open another terminal on the same machine and run the benchmark with the following options.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# Connect to server
|
||||
docker exec -it vllm-server bash
|
||||
|
||||
# Run the client benchmark
|
||||
input_tokens=1024
|
||||
output_tokens=1024
|
||||
max_concurrency=4
|
||||
num_prompts=32
|
||||
|
||||
python3 /app/vllm/benchmarks/benchmark_serving.py --host localhost --port 8000 \
|
||||
--model amd/Llama-3.1-405B-Instruct-MXFP4-Preview \
|
||||
--dataset-name random \
|
||||
--random-input-len ${input_tokens} \
|
||||
--random-output-len ${output_tokens} \
|
||||
--max-concurrency ${max_concurrency} \
|
||||
--num-prompts ${num_prompts} \
|
||||
--percentile-metrics ttft,tpot,itl,e2el \
|
||||
--ignore-eos
|
||||
|
||||
@@ -0,0 +1,132 @@
|
||||
************************************************
|
||||
Benchmark Llama 3.3 70B FP8 inference with vLLM
|
||||
************************************************
|
||||
|
||||
.. note::
|
||||
|
||||
For the latest iteration of AI training and inference performance for ROCm
|
||||
7.0, see `Infinity Hub
|
||||
<https://www.amd.com/en/developer/resources/infinity-hub.html#q=ROCm%207>`__
|
||||
and the `ROCm 7.0 AI training and inference performance
|
||||
<https://rocm.docs.amd.com/en/docs-7.0-docker/benchmark-docker/index.html>`__
|
||||
documentation.
|
||||
|
||||
This section provides instructions to test the inference performance of Llama
|
||||
3.3 70B on the vLLM inference engine. The accompanying Docker image integrates
|
||||
the ROCm 7.0 preview with vLLM, and is tailored for AMD Instinct
|
||||
MI355X, MI350X, and MI300X series accelerators. This benchmark does not support other
|
||||
GPUs.
|
||||
|
||||
Follow these steps to pull the required image, spin up the container with the
|
||||
appropriate options, download the model, and run the throughput test.
|
||||
|
||||
Pull the Docker image
|
||||
=====================
|
||||
|
||||
Use the following command to pull the `Docker image <https://hub.docker.com/layers/rocm/7.0-preview/rocm7.0_preview_ubuntu_22.04_vllm_0.10.1_instinct_rc1/images/sha256-eee29678dc4dc8f2e054de889555be6f4fd74e58053bf7277d56ace1a850513e>`__.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker pull rocm/7.0-preview:rocm7.0_preview_ubuntu_22.04_vllm_0.10.1_instinct_rc1
|
||||
|
||||
Download the model
|
||||
==================
|
||||
|
||||
See the model card on Hugging Face at
|
||||
`amd/Llama-3.3-70B-Instruct-FP8-KV <https://huggingface.co/amd/Llama-3.3-70B-Instruct-FP8-KV>`__.
|
||||
This model uses FP8 quantization via `AMD Quark
|
||||
<https://quark.docs.amd.com/latest/>`_ for efficient inference on AMD
|
||||
accelerators.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
pip install huggingface_hub[cli] hf_transfer hf_xet
|
||||
HF_HUB_ENABLE_HF_TRANSFER=1 \
|
||||
HF_HOME=/data/huggingface-cache \
|
||||
HF_TOKEN="<HF_TOKEN>" \
|
||||
huggingface-cli download amd/Llama-3.3-70B-Instruct-FP8-KV --exclude "original/*"
|
||||
|
||||
Run the inference benchmark
|
||||
===========================
|
||||
|
||||
1. Start the container using the following command.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker run -it \
|
||||
--ipc=host \
|
||||
--network=host \
|
||||
--privileged \
|
||||
--cap-add=CAP_SYS_ADMIN \
|
||||
--device=/dev/kfd \
|
||||
--device=/dev/dri \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
-v /data:/data \
|
||||
-e HF_HOME=/data/huggingface-cache \
|
||||
-e HF_HUB_OFFLINE=1 \
|
||||
-e VLLM_USE_V1=1 \
|
||||
-e VLLM_V1_USE_PREFILL_DECODE_ATTENTION=1 \
|
||||
-e AMDGCN_USE_BUFFER_OPS=1 \
|
||||
-e VLLM_USE_AITER_TRITON_ROPE=1 \
|
||||
-e TRITON_HIP_ASYNC_COPY_BYPASS_PERMUTE=1 \
|
||||
-e TRITON_HIP_USE_ASYNC_COPY=1 \
|
||||
-e TRITON_HIP_USE_BLOCK_PINGPONG=1 \
|
||||
-e TRITON_HIP_ASYNC_FAST_SWIZZLE=1 \
|
||||
-e VLLM_ROCM_USE_AITER=1 \
|
||||
-e VLLM_ROCM_USE_AITER_RMSNORM=1 \
|
||||
--name vllm-server \
|
||||
rocm/7.0-preview:rocm7.0_preview_ubuntu_22.04_vllm_0.10.1_instinct_rc1
|
||||
|
||||
2. Start the server.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
max_model_len=10240
|
||||
max_num_seqs=1024
|
||||
max_num_batched_tokens=131072
|
||||
max_seq_len_to_capture=16384
|
||||
tensor_parallel_size=1
|
||||
|
||||
vllm serve amd/Llama-3.3-70B-Instruct-FP8-KV \
|
||||
--host localhost \
|
||||
--port 8000 \
|
||||
--swap-space 64 \
|
||||
--disable-log-requests \
|
||||
--dtype auto \
|
||||
--max-model-len ${max_model_len} \
|
||||
--tensor-parallel-size ${tensor_parallel_size} \
|
||||
--max-num-seqs ${max_num_seqs} \
|
||||
--distributed-executor-backend mp \
|
||||
--kv-cache-dtype fp8 \
|
||||
--gpu-memory-utilization 0.94 \
|
||||
--max-seq-len-to-capture ${max_seq_len_to_capture} \
|
||||
--max-num-batched-tokens ${max_num_batched_tokens} \
|
||||
--no-enable-prefix-caching \
|
||||
--async-scheduling
|
||||
|
||||
# Wait for model to load and server is ready to accept requests
|
||||
|
||||
3. Open another terminal on the same machine and run the benchmark with the following options.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# Connect to server
|
||||
docker exec -it vllm-server bash
|
||||
|
||||
# Run the client benchmark
|
||||
input_tokens=8192
|
||||
output_tokens=1024
|
||||
max_concurrency=4
|
||||
num_prompts=32
|
||||
|
||||
python3 /app/vllm/benchmarks/benchmark_serving.py --host localhost --port 8000 \
|
||||
--model amd/Llama-3.3-70B-Instruct-FP8-KV \
|
||||
--dataset-name random \
|
||||
--random-input-len ${input_tokens} \
|
||||
--random-output-len ${output_tokens} \
|
||||
--max-concurrency ${max_concurrency} \
|
||||
--num-prompts ${num_prompts} \
|
||||
--percentile-metrics ttft,tpot,itl,e2el \
|
||||
--ignore-eos
|
||||
|
||||
@@ -1,96 +0,0 @@
|
||||
*****************************************************
|
||||
Benchmarking Llama 3 8B pre-training with Megatron-LM
|
||||
*****************************************************
|
||||
|
||||
.. note::
|
||||
|
||||
For the latest iteration of AI training and inference performance for ROCm
|
||||
7.0, see `Infinity Hub
|
||||
<https://www.amd.com/en/developer/resources/infinity-hub.html#q=ROCm%207>`__
|
||||
and the `ROCm 7.0 AI training and inference performance
|
||||
<https://rocm.docs.amd.com/en/docs-7.0-docker/benchmark-docker/index.html>`__
|
||||
documentation.
|
||||
|
||||
This section details how to benchmark Llama 3 8B pre-training using the
|
||||
Megatron-LM framework. It includes configurations for both ``FP8`` and
|
||||
``BF16`` precision to measure throughput.
|
||||
The accompanying Docker image integrates the ROCm 7.0 Alpha with Megatron-LM, and is
|
||||
tailored for AMD Instinct MI355X and MI350X accelerators. This
|
||||
benchmark does not support other accelerators.
|
||||
|
||||
Follow these steps to pull the required image, spin up the container with the
|
||||
appropriate options, download the model, and run the throughput test.
|
||||
|
||||
1. Pull the Docker image.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker pull rocm/7.0-preview:rocm7.0_preview_pytorch_training_mi35X_alpha
|
||||
|
||||
2. Start the container.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker run -it --device /dev/dri --device /dev/kfd \
|
||||
--network host --ipc host --group-add video \
|
||||
--cap-add SYS_PTRACE --security-opt seccomp=unconfined --privileged \
|
||||
-v $HOME:$HOME \
|
||||
-v $HOME/.ssh:/root/.ssh \
|
||||
--shm-size 64G \
|
||||
-w /workspace/Megatron-LM \
|
||||
--name training_benchmark \
|
||||
rocm/7.0-preview:rocm7.0_preview_pytorch_training_mi35X_alpha
|
||||
|
||||
.. note::
|
||||
|
||||
This containerized environment includes all necessary dependencies and pre-tuned
|
||||
configurations for the supported models and precision types.
|
||||
|
||||
3. Run the training script for Llama 3 8B with the appropriate options for your desired precision.
|
||||
|
||||
.. tab-set::
|
||||
|
||||
.. tab-item:: FP8 precision
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
bash examples/llama/train_llama3.sh \
|
||||
TEE_OUTPUT=1 \
|
||||
MBS=4 \
|
||||
BS=512 \
|
||||
TP=1 \
|
||||
TE_FP8=1 \
|
||||
SEQ_LENGTH=8192 \
|
||||
MODEL_SIZE=8 \
|
||||
TOTAL_ITERS=10 \
|
||||
GEMM_TUNING=0
|
||||
|
||||
.. tab-item:: BF16 precision
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
bash examples/llama/train_llama3.sh
|
||||
TEE_OUTPUT=1 \
|
||||
MBS=4 \
|
||||
BS=256 \
|
||||
TP=1 \
|
||||
TE_FP8=0 \
|
||||
SEQ_LENGTH=8192 \
|
||||
MODEL_SIZE=8 \
|
||||
TOTAL_ITERS=10
|
||||
|
||||
.. note::
|
||||
|
||||
The ``train_llama3.sh`` script accepts the following options:
|
||||
|
||||
* ``MBS``: Micro-batch size per GPU
|
||||
|
||||
* ``BS``: Global batch size
|
||||
|
||||
* ``TP``: Tensor parallelism
|
||||
|
||||
* ``SEQ_LENGTH``: Maximum input token sequence length
|
||||
|
||||
* ``TE_FP8``: Toggle to enable FP8
|
||||
|
||||
* ``TOTAL_ITERS``: Number of training iterations to execute
|
||||
@@ -1,79 +0,0 @@
|
||||
*****************************************************
|
||||
Benchmarking Llama 3 70B pre-training with torchtitan
|
||||
*****************************************************
|
||||
|
||||
.. note::
|
||||
|
||||
For the latest iteration of AI training and inference performance for ROCm
|
||||
7.0, see `Infinity Hub
|
||||
<https://www.amd.com/en/developer/resources/infinity-hub.html#q=ROCm%207>`__
|
||||
and the `ROCm 7.0 AI training and inference performance
|
||||
<https://rocm.docs.amd.com/en/docs-7.0-docker/benchmark-docker/index.html>`__
|
||||
documentation.
|
||||
|
||||
This guide provides instructions for benchmarking the pre-training throughput
|
||||
of the Llama 3 70B model using torchtitan. By following these steps, you will
|
||||
use a pre-configured Docker container, download the necessary Llama 3 assets,
|
||||
and run the training script to measure performance in either ``FP8`` or ``BF16``
|
||||
precision.
|
||||
The accompanying Docker image integrates the ROCm 7.0 Alpha with torchtitan, and is
|
||||
tailored for next-generation AMD Instinct MI355X and MI350X accelerators. This
|
||||
benchmark does not support other accelerators.
|
||||
|
||||
Follow these steps to pull the required image, spin up the container with the
|
||||
appropriate options, download the model, and run the throughput test.
|
||||
|
||||
1. Pull the Docker image.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker pull rocm/7.0-preview:rocm7.0_preview_pytorch_training_mi35X_alpha
|
||||
|
||||
2. Start the container.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker run -it --device /dev/dri --device /dev/kfd \
|
||||
--network host --ipc host --group-add video \
|
||||
--cap-add SYS_PTRACE --security-opt seccomp=unconfined --privileged \
|
||||
-v $HOME:$HOME -v \
|
||||
$HOME/.ssh:/root/.ssh \
|
||||
--shm-size 64G \
|
||||
-w /workspace/torchtitan \
|
||||
--name training_benchmark \
|
||||
rocm/7.0-preview:rocm7.0_preview_pytorch_training_mi35X_alpha
|
||||
|
||||
.. note::
|
||||
|
||||
This containerized environment includes all necessary dependencies and pre-tuned
|
||||
configurations for the supported models and precision types.
|
||||
|
||||
3. Download the Llama 3 tokenizer. Make sure to set ``HF_TOKEN`` using a valid Hugging Face access
|
||||
token with Llama model permissions.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
export HF_TOKEN= #{your huggingface token with Llama 3 access}
|
||||
python scripts/download_tokenizer.py --repo_id meta-llama/Meta-Llama-3-70B --tokenizer_path "original" --hf_token=${HF_TOKEN}
|
||||
|
||||
4. Run the training script for Llama 3 70B with the appropriate configuration file for your desired
|
||||
precision.
|
||||
|
||||
.. tab-set::
|
||||
|
||||
.. tab-item:: FP8 precision
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
CONFIG_FILE="./llama3_70b_fsdp_fp8.toml" ./run_train.sh
|
||||
|
||||
.. tab-item:: BF16 precision
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
CONFIG_FILE="./llama3_70b_fsdp_bf16.toml" ./run_train.sh
|
||||
|
||||
.. note::
|
||||
|
||||
These configuration files define batch size, FSDP strategy, optimizer settings, and precision
|
||||
type for each benchmarking run.
|
||||
@@ -1,31 +0,0 @@
|
||||
************************
|
||||
Benchmark model training
|
||||
************************
|
||||
|
||||
.. note::
|
||||
|
||||
For the latest iteration of AI training and inference performance for ROCm
|
||||
7.0, see `Infinity Hub
|
||||
<https://www.amd.com/en/developer/resources/infinity-hub.html#q=ROCm%207>`__
|
||||
and the `ROCm 7.0 AI training and inference performance
|
||||
<https://rocm.docs.amd.com/en/docs-7.0-docker/benchmark-docker/index.html>`__
|
||||
documentation.
|
||||
|
||||
The process of training models is computationally intensive, requiring
|
||||
specialized hardware like GPUs to accelerate computations and reduce training
|
||||
time. Training models on AMD GPUs involves leveraging the parallel processing
|
||||
capabilities of these GPUs to significantly speed up the model training process
|
||||
in deep learning tasks.
|
||||
|
||||
Training models on AMD GPUs with the ROCm software platform allows you to use
|
||||
the powerful parallel processing capabilities and efficient compute resource
|
||||
management, significantly improving training time and overall performance in
|
||||
machine learning applications.
|
||||
|
||||
.. grid:: 1
|
||||
|
||||
.. grid-item-card:: Training benchmarking
|
||||
|
||||
* :doc:`pre-training-megatron-lm-llama-3-8b`
|
||||
|
||||
* :doc:`pre-training-torchtitan-llama-3-70b`
|
||||
@@ -1,18 +1,18 @@
|
||||
---
|
||||
myst:
|
||||
html_meta:
|
||||
"description": "AMD ROCm 7.0 Alpha 2 documentation"
|
||||
"description": "AMD ROCm 7.0 RC1 documentation"
|
||||
"keywords": "Radeon, open, compute, platform, install, how, conceptual, reference, home, docs"
|
||||
---
|
||||
|
||||
# AMD ROCm 7.0 Alpha 2 documentation
|
||||
# AMD ROCm 7.0 RC1 documentation
|
||||
|
||||
AMD ROCm is an open-source software platform optimized to extract HPC and AI
|
||||
workload performance from AMD Instinct™ accelerators while maintaining
|
||||
workload performance from AMD Instinct™ accelerators and GPUs while maintaining
|
||||
compatibility with industry software frameworks.
|
||||
|
||||
This documentation provides early access information about ROCm 7.0
|
||||
Alpha 2. This preview release provides access to new
|
||||
This documentation provides early access information about the ROCm 7.0
|
||||
RC1. This preview release provides access to new
|
||||
features under development for testing so users can provide feedback.
|
||||
It is not recommended for production use.
|
||||
|
||||
@@ -23,5 +23,5 @@ For a complete list of ROCm 7.0 preview releases, see the [ROCm 7.0 preview rele
|
||||
|
||||
The documentation includes:
|
||||
|
||||
- [ROCm 7.0 Alpha 2 release notes](release.rst) with feature details and support matrix
|
||||
- [Installation instructions](install/index.rst) for the ROCm 7.0 Alpha 2 and the Instinct Driver
|
||||
- [ROCm 7.0 RC1 release notes](release.rst) with feature details and support matrix
|
||||
- [Installation instructions](install/index.rst) for the ROCm 7.0 RC1 and the Instinct Driver
|
||||
|
||||
@@ -4,24 +4,24 @@
|
||||
ROCm
|
||||
|
||||
****************************************
|
||||
ROCm 7.0 Alpha installation instructions
|
||||
ROCm 7.0 RC1 installation instructions
|
||||
****************************************
|
||||
|
||||
The ROCm 7.0 Alpha must be installed using your Linux distribution's native
|
||||
The ROCm 7.0 RC1 must be installed using your Linux distribution's native
|
||||
package manager. This release supports specific hardware and software
|
||||
configurations -- before installing, see the :ref:`supported OSes and hardware
|
||||
<alpha-2-system-requirements>` outlined in the Alpha 2 release notes.
|
||||
<rc1-system-requirements>` outlined in the RC1 release notes.
|
||||
|
||||
.. important::
|
||||
|
||||
Upgrades and downgrades are not supported. You must uninstall any existing
|
||||
ROCm installation before installing the Alpha 2 build.
|
||||
ROCm installation before installing the RC1 build.
|
||||
|
||||
.. grid:: 2
|
||||
|
||||
.. grid-item-card:: Install ROCm
|
||||
|
||||
See :doc:`Install the ROCm 7.0 Alpha 2 via package manager <rocm>`.
|
||||
See :doc:`Install the ROCm 7.0 RC1 via package manager <rocm>`.
|
||||
|
||||
.. grid-item-card:: Install Instinct Driver
|
||||
|
||||
|
||||
@@ -2,8 +2,10 @@
|
||||
Install the Instinct Driver via package manager
|
||||
***********************************************
|
||||
|
||||
This section describes how to install the Instinct Driver using ``apt`` on
|
||||
Ubuntu 22.04 or 24.04, or ``dnf`` on Red Hat Enterprise Linux 9.6.
|
||||
This page describes how to install the Instinct Driver using
|
||||
your Linux distribution's package manager. Before installing, see the
|
||||
:ref:`supported hardware and distros <rc1-system-requirements>` to make sure
|
||||
your system is compatible.
|
||||
|
||||
.. important::
|
||||
|
||||
@@ -35,6 +37,59 @@ Before installing, complete the following prerequisites.
|
||||
|
||||
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
|
||||
|
||||
.. tab-item:: Debian 12
|
||||
:sync: debian-12
|
||||
|
||||
Install kernel headers.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
|
||||
|
||||
.. tab-item:: RHEL 8.10
|
||||
:sync: rhel-810
|
||||
|
||||
1. Register your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
subscription-manager register --username <username> --password <password>
|
||||
subscription-manager attach --auto
|
||||
|
||||
2. Update your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf update --releasever=8.10 --exclude=\*release\*
|
||||
|
||||
3. Install kernel headers.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install "kernel-headers-$(uname -r)" "kernel-devel-$(uname -r)"
|
||||
|
||||
.. tab-item:: RHEL 9.4
|
||||
:sync: rhel-96
|
||||
|
||||
1. Register your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
subscription-manager register --username <username> --password <password>
|
||||
subscription-manager attach --auto
|
||||
|
||||
2. Update your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf update --releasever=9.4 --exclude=\*release\*
|
||||
|
||||
3. Install kernel headers.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install "kernel-headers-$(uname -r)" "kernel-devel-$(uname -r)" "kernel-devel-matched-$(uname -r)"
|
||||
|
||||
.. tab-item:: RHEL 9.6
|
||||
:sync: rhel-96
|
||||
|
||||
@@ -57,6 +112,78 @@ Before installing, complete the following prerequisites.
|
||||
|
||||
sudo dnf install "kernel-headers-$(uname -r)" "kernel-devel-$(uname -r)" "kernel-devel-matched-$(uname -r)"
|
||||
|
||||
.. tab-item:: Oracle Linux 8.10
|
||||
:sync: ol-810
|
||||
|
||||
1. Update your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf update --releasever=8.10 --exclude=\*release\*
|
||||
|
||||
2. Install kernel headers.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install "kernel-uek-devel-$(uname -r)"
|
||||
|
||||
.. tab-item:: Oracle Linux 9.6
|
||||
:sync: ol-96
|
||||
|
||||
1. Update your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf update --releasever=9.6 --exclude=\*release\*
|
||||
|
||||
2. Install kernel headers.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install "kernel-uek-devel-$(uname -r)"
|
||||
|
||||
.. tab-item:: SLES 15 SP6
|
||||
:sync: sles-156
|
||||
|
||||
1. Register your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo SUSEConnect -r <REGCODE>
|
||||
|
||||
2. Update your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper update
|
||||
|
||||
3. Install kernel headers.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper install kernel-default-devel
|
||||
|
||||
.. tab-item:: SLES 15 SP7
|
||||
:sync: sles-157
|
||||
|
||||
1. Register your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo SUSEConnect -r <REGCODE>
|
||||
|
||||
2. Update your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper update
|
||||
|
||||
3. Install kernel headers.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper install kernel-default-devel
|
||||
|
||||
Register ROCm repositories
|
||||
==========================
|
||||
|
||||
@@ -81,7 +208,7 @@ Register ROCm repositories
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/30.10_alpha2/ubuntu jammy main" \
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/30.10_rc1/ubuntu jammy main" \
|
||||
| sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
|
||||
@@ -104,10 +231,65 @@ Register ROCm repositories
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/30.10_alpha2/ubuntu noble main" \
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/30.10_rc1/ubuntu noble main" \
|
||||
| sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
|
||||
.. tab-item:: Debian 12
|
||||
:sync: debian-12
|
||||
|
||||
1. Add the package signing key.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# Make the directory if it doesn't exist yet.
|
||||
# This location is recommended by the distribution maintainers.
|
||||
sudo mkdir --parents --mode=0755 /etc/apt/keyrings
|
||||
# Download the key, convert the signing-key to a full
|
||||
# keyring required by apt and store in the keyring directory.
|
||||
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
|
||||
gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
|
||||
|
||||
2. Register the kernel mode driver.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/30.10_rc1/ubuntu jammy main" \
|
||||
| sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
|
||||
.. tab-item:: RHEL 8.10
|
||||
:sync: rhel-810
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/30.10_rc1/rhel/8.10/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo dnf clean all
|
||||
|
||||
.. tab-item:: RHEL 9.4
|
||||
:sync: rhel-94
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/30.10_rc1/rhel/9.4/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo dnf clean all
|
||||
|
||||
.. tab-item:: RHEL 9.6
|
||||
:sync: rhel-96
|
||||
|
||||
@@ -116,7 +298,7 @@ Register ROCm repositories
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/30.10_alpha2/rhel/9.6/main/x86_64/
|
||||
baseurl=https://repo.radeon.com/amdgpu/30.10_rc1/rhel/9.6/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -124,6 +306,70 @@ Register ROCm repositories
|
||||
EOF
|
||||
sudo dnf clean all
|
||||
|
||||
.. tab-item:: Oracle Linux 8.10
|
||||
:sync: ol-810
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/30.10_rc1/rhel/8.10/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo dnf clean all
|
||||
|
||||
.. tab-item:: Oracle Linux 9.6
|
||||
:sync: ol-96
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/30.10_rc1/el/9.6/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo dnf clean all
|
||||
|
||||
.. tab-item:: SLES 15 SP6
|
||||
:sync: sles-156
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/30.10_rc1/sle/15.6/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo zypper refresh
|
||||
|
||||
.. tab-item:: SLES 15 SP7
|
||||
:sync: sles-157
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/30.10_rc1/sle/15.7/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo zypper refresh
|
||||
|
||||
Install the kernel driver
|
||||
=========================
|
||||
|
||||
@@ -135,6 +381,7 @@ Install the kernel driver
|
||||
.. code-block:: shell
|
||||
|
||||
sudo apt install amdgpu-dkms
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: Ubuntu 24.04
|
||||
:sync: ubuntu-24
|
||||
@@ -142,6 +389,31 @@ Install the kernel driver
|
||||
.. code-block:: shell
|
||||
|
||||
sudo apt install amdgpu-dkms
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: Debian 12
|
||||
:sync: debian-12
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo apt install amdgpu-dkms
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: RHEL 8.10
|
||||
:sync: rhel-810
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install amdgpu-dkms
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: RHEL 9.4
|
||||
:sync: rhel-94
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install amdgpu-dkms
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: RHEL 9.6
|
||||
:sync: rhel-96
|
||||
@@ -149,6 +421,39 @@ Install the kernel driver
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install amdgpu-dkms
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: Oracle Linux 8.10
|
||||
:sync: ol-810
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install amdgpu-dkms
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: Oracle Linux 9.6
|
||||
:sync: ol-96
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install amdgpu-dkms
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: SLES 15 SP6
|
||||
:sync: sles-156
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper --gpg-auto-import-keys install amdgpu-dkms
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: SLES 15 SP7
|
||||
:sync: sles-157
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper --gpg-auto-import-keys install amdgpu-dkms
|
||||
sudo reboot
|
||||
|
||||
Uninstalling
|
||||
============
|
||||
@@ -173,6 +478,8 @@ Uninstalling
|
||||
sudo rm -rf /var/cache/apt/*
|
||||
sudo apt clean all
|
||||
sudo apt update
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: Ubuntu 24.04
|
||||
:sync: ubuntu-24
|
||||
@@ -192,6 +499,69 @@ Uninstalling
|
||||
sudo rm -rf /var/cache/apt/*
|
||||
sudo apt clean all
|
||||
sudo apt update
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: Debian 12
|
||||
:sync: debian-12
|
||||
|
||||
1. Uninstall the kernel mode driver.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo apt autoremove amdgpu-dkms
|
||||
|
||||
2. Remove AMDGPU repositories.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo rm /etc/apt/sources.list.d/amdgpu.list
|
||||
# Clear the cache and clean the system
|
||||
sudo rm -rf /var/cache/apt/*
|
||||
sudo apt clean all
|
||||
sudo apt update
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: RHEL 8.10
|
||||
:sync: rhel-810
|
||||
|
||||
1. Uninstall the kernel mode driver.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf remove amdgpu-dkms
|
||||
|
||||
2. Remove AMDGPU repositories.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo rm /etc/yum.repos.d/amdgpu.repo
|
||||
# Clear the cache and clean the system
|
||||
sudo rm -rf /var/cache/dnf
|
||||
sudo dnf clean all
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: RHEL 9.4
|
||||
:sync: rhel-94
|
||||
|
||||
1. Uninstall the kernel mode driver.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf remove amdgpu-dkms
|
||||
|
||||
2. Remove AMDGPU repositories.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo rm /etc/yum.repos.d/amdgpu.repo
|
||||
# Clear the cache and clean the system
|
||||
sudo rm -rf /var/cache/dnf
|
||||
sudo dnf clean all
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: RHEL 9.6
|
||||
:sync: rhel-96
|
||||
@@ -210,3 +580,85 @@ Uninstalling
|
||||
# Clear the cache and clean the system
|
||||
sudo rm -rf /var/cache/dnf
|
||||
sudo dnf clean all
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: Oracle Linux 8.10
|
||||
:sync: ol-810
|
||||
|
||||
1. Uninstall the kernel mode driver.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf remove amdgpu-dkms
|
||||
|
||||
2. Remove AMDGPU repositories.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo rm /etc/yum.repos.d/amdgpu.repo
|
||||
# Clear the cache and clean the system
|
||||
sudo rm -rf /var/cache/dnf
|
||||
sudo dnf clean all
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: Oracle Linux 9.6
|
||||
:sync: ol-96
|
||||
|
||||
1. Uninstall the kernel mode driver.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf remove amdgpu-dkms
|
||||
|
||||
2. Remove AMDGPU repositories.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo rm /etc/yum.repos.d/amdgpu.repo
|
||||
# Clear the cache and clean the system
|
||||
sudo rm -rf /var/cache/dnf
|
||||
sudo dnf clean all
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: SLES 15 SP6
|
||||
:sync: sles-156
|
||||
|
||||
1. Uninstall the kernel mode driver.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper remove amdgpu-dkms amdgpu-dkms-firmware
|
||||
|
||||
2. Remove AMDGPU repositories.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper removerepo "amdgpu"
|
||||
# Clear the cache and clean the system
|
||||
sudo zypper clean --all
|
||||
sudo zypper refresh
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: SLES 15 SP7
|
||||
:sync: sles-157
|
||||
|
||||
1. Uninstall the kernel mode driver.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper remove amdgpu-dkms amdgpu-dkms-firmware
|
||||
|
||||
2. Remove AMDGPU repositories.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper removerepo "amdgpu"
|
||||
# Clear the cache and clean the system
|
||||
sudo zypper clean --all
|
||||
sudo zypper refresh
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
@@ -1,9 +1,13 @@
|
||||
************************************************
|
||||
Install the ROCm 7.0 Alpha 2 via package manager
|
||||
Install the ROCm 7.0 RC1 via package manager
|
||||
************************************************
|
||||
|
||||
This page describes how to install the ROCm 7.0 Alpha 2 build using ``apt`` on
|
||||
Ubuntu 22.04 or 24.04, or ``dnf`` on Red Hat Enterprise Linux 9.6.
|
||||
This page describes how to install the AMD ROCm 7.0 RC1 build using
|
||||
your Linux distribution's package manager. Before installing, see the
|
||||
:ref:`supported hardware and distros <rc1-system-requirements>` to make sure
|
||||
your system is compatible.
|
||||
|
||||
.. _rc1-system-requirements:
|
||||
|
||||
.. important::
|
||||
|
||||
@@ -47,6 +51,109 @@ Before installing, complete the following prerequisites.
|
||||
|
||||
sudo usermod -a -G render,video $LOGNAME
|
||||
|
||||
.. tab-item:: Debian 12
|
||||
:sync: debian-12
|
||||
|
||||
1. Install development packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo apt install python3-setuptools python3-wheel
|
||||
|
||||
2. Configure user permissions for GPU access.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo usermod -a -G render,video $LOGNAME
|
||||
|
||||
.. tab-item:: RHEL 8.10
|
||||
:sync: rhel-810
|
||||
|
||||
1. Register your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
subscription-manager register --username <username> --password <password>
|
||||
subscription-manager attach --auto
|
||||
|
||||
2. Update your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf update --releasever=8.10 --exclude=\*release\*
|
||||
|
||||
3. Install additional package repositories.
|
||||
|
||||
Add the EPEL repository:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
|
||||
sudo rpm -ivh epel-release-latest-8.noarch.rpm
|
||||
|
||||
Enable the CodeReady Linux Build (CRB) repository.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install dnf-plugin-config-manager
|
||||
sudo crb enable
|
||||
|
||||
4. Install development packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install python3-setuptools python3-wheel
|
||||
|
||||
5. Configure user permissions for GPU access.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo usermod -a -G render,video $LOGNAME
|
||||
|
||||
.. tab-item:: RHEL 9.4
|
||||
:sync: rhel-94
|
||||
|
||||
1. Register your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
subscription-manager register --username <username> --password <password>
|
||||
subscription-manager attach --auto
|
||||
|
||||
2. Update your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf update --releasever=9.4 --exclude=\*release\*
|
||||
|
||||
3. Install additional package repositories.
|
||||
|
||||
Add the EPEL repository:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
|
||||
sudo rpm -ivh epel-release-latest-9.noarch.rpm
|
||||
|
||||
Enable the CodeReady Linux Build (CRB) repository.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install dnf-plugin-config-manager
|
||||
sudo crb enable
|
||||
|
||||
4. Install development packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install python3-setuptools python3-wheel
|
||||
|
||||
5. Configure user permissions for GPU access.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo usermod -a -G render,video $LOGNAME
|
||||
|
||||
.. tab-item:: RHEL 9.6
|
||||
:sync: rhel-96
|
||||
|
||||
@@ -91,6 +198,158 @@ Before installing, complete the following prerequisites.
|
||||
|
||||
sudo usermod -a -G render,video $LOGNAME
|
||||
|
||||
.. tab-item:: Oracle Linux 8.10
|
||||
:sync: ol-810
|
||||
|
||||
1. Update your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf update --releasever=8.10 --exclude=\*release\*
|
||||
|
||||
2. Install additional package repositories.
|
||||
|
||||
Add the EPEL repository:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
|
||||
sudo rpm -ivh epel-release-latest-8.noarch.rpm
|
||||
|
||||
Enable the CodeReady Linux Build (CRB) repository.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install dnf-plugin-config-manager
|
||||
sudo crb enable
|
||||
|
||||
3. Install development packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install python3-setuptools python3-wheel
|
||||
|
||||
4. Configure user permissions for GPU access.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo usermod -a -G render,video $LOGNAME
|
||||
|
||||
.. tab-item:: Oracle Linux 9.6
|
||||
:sync: ol-96
|
||||
|
||||
1. Update your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf update --releasever=9.6 --exclude=\*release\*
|
||||
|
||||
2. Install additional package repositories.
|
||||
|
||||
Add the EPEL repository:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
|
||||
sudo rpm -ivh epel-release-latest-9.noarch.rpm
|
||||
|
||||
Enable the CodeReady Linux Build (CRB) repository.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install dnf-plugin-config-manager
|
||||
sudo crb enable
|
||||
|
||||
3. Install development packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install python3-setuptools python3-wheel
|
||||
|
||||
4. Configure user permissions for GPU access.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo usermod -a -G render,video $LOGNAME
|
||||
|
||||
.. tab-item:: SLES 15 SP6
|
||||
:sync: sles-156
|
||||
|
||||
1. Register your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo SUSEConnect -r <REGCODE>
|
||||
|
||||
2. Update your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper update
|
||||
|
||||
3. Install additional package repositories.
|
||||
|
||||
Add a few modules with SUSEConnect and the science repository.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo SUSEConnect -p sle-module-desktop-applications/15.6/x86_64
|
||||
sudo SUSEConnect -p sle-module-development-tools/15.6/x86_64
|
||||
sudo SUSEConnect -p PackageHub/15.6/x86_64
|
||||
sudo zypper install zypper
|
||||
sudo zypper addrepo https://download.opensuse.org/repositories/science/SLE_15_SP5/science.repo
|
||||
|
||||
4. Install development packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install python3-setuptools python3-wheel
|
||||
|
||||
5. Configure user permissions for GPU access.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo usermod -a -G render,video $LOGNAME
|
||||
|
||||
.. tab-item:: SLES 15 SP7
|
||||
:sync: sles-157
|
||||
|
||||
1. Register your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo SUSEConnect -r <REGCODE>
|
||||
|
||||
2. Update your Enterprise Linux.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper update
|
||||
|
||||
3. Install additional package repositories.
|
||||
|
||||
Add a few modules with SUSEConnect and the science repository.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo SUSEConnect -p sle-module-desktop-applications/15.7/x86_64
|
||||
sudo SUSEConnect -p sle-module-development-tools/15.7/x86_64
|
||||
sudo SUSEConnect -p PackageHub/15.7/x86_64
|
||||
sudo zypper install zypper
|
||||
sudo zypper addrepo https://download.opensuse.org/repositories/science/SLE_15_SP5/science.repo
|
||||
|
||||
4. Install development packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install python3-setuptools python3-wheel
|
||||
|
||||
5. Configure user permissions for GPU access.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo usermod -a -G render,video $LOGNAME
|
||||
|
||||
Register ROCm repositories
|
||||
==========================
|
||||
|
||||
@@ -115,10 +374,10 @@ Register ROCm repositories
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/7.0_alpha2 jammy main" \
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/7.0_rc1 jammy main" \
|
||||
| sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/graphics/7.0_alpha2/ubuntu jammy main" \
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/graphics/7.0_rc1/ubuntu jammy main" \
|
||||
| sudo tee /etc/apt/sources.list.d/rocm-graphics.list
|
||||
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
|
||||
@@ -144,25 +403,54 @@ Register ROCm repositories
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/7.0_alpha2 noble main" \
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/7.0_rc1 noble main" \
|
||||
| sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/graphics/7.0_alpha2/ubuntu noble main" \
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/graphics/7.0_rc1/ubuntu noble main" \
|
||||
| sudo tee /etc/apt/sources.list.d/rocm-graphics.list
|
||||
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
|
||||
| sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt update
|
||||
sudo apt update
|
||||
|
||||
.. tab-item:: RHEL 9.6
|
||||
:sync: rhel-96
|
||||
.. tab-item:: Debian 12
|
||||
:sync: debian-12
|
||||
|
||||
1. Add the package signing key.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# Make the directory if it doesn't exist yet.
|
||||
# This location is recommended by the distribution maintainers.
|
||||
sudo mkdir --parents --mode=0755 /etc/apt/keyrings
|
||||
# Download the key, convert the signing-key to a full
|
||||
# keyring required by apt and store in the keyring directory.
|
||||
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
|
||||
gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
|
||||
|
||||
2. Register ROCm packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/7.0_rc1 jammy main" \
|
||||
| sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/graphics/7.0_rc1/ubuntu jammy main" \
|
||||
| sudo tee /etc/apt/sources.list.d/rocm-graphics.list
|
||||
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
|
||||
| sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt update
|
||||
|
||||
.. tab-item:: RHEL 8.10
|
||||
:sync: rhel-810
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-7.0.0]
|
||||
name=ROCm7.0.0
|
||||
baseurl=https://repo.radeon.com/rocm/el9/7.0_alpha2/main
|
||||
baseurl=https://repo.radeon.com/rocm/el8/7.0_rc1/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -172,7 +460,7 @@ Register ROCm repositories
|
||||
sudo tee /etc/yum.repos.d/rocm-graphics.repo <<EOF
|
||||
[ROCm-7.0.0-Graphics]
|
||||
name=ROCm7.0.0-Graphics
|
||||
baseurl=https://repo.radeon.com/graphics/7.0_alpha2/rhel/9/main/x86_64/
|
||||
baseurl=https://repo.radeon.com/graphics/7.0_rc1/rhel/8.10/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -180,6 +468,160 @@ Register ROCm repositories
|
||||
EOF
|
||||
sudo dnf clean all
|
||||
|
||||
.. tab-item:: RHEL 9.4
|
||||
:sync: rhel-94
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-7.0.0]
|
||||
name=ROCm7.0.0
|
||||
baseurl=https://repo.radeon.com/rocm/el9/7.0_rc1/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
|
||||
sudo tee /etc/yum.repos.d/rocm-graphics.repo <<EOF
|
||||
[ROCm-7.0.0-Graphics]
|
||||
name=ROCm7.0.0-Graphics
|
||||
baseurl=https://repo.radeon.com/graphics/7.0_rc1/rhel/9.4/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo dnf clean all
|
||||
|
||||
.. tab-item:: RHEL 9.6
|
||||
:sync: rhel-96
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-7.0.0]
|
||||
name=ROCm7.0.0
|
||||
baseurl=https://repo.radeon.com/rocm/el9/7.0_rc1/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
|
||||
sudo tee /etc/yum.repos.d/rocm-graphics.repo <<EOF
|
||||
[ROCm-7.0.0-Graphics]
|
||||
name=ROCm7.0.0-Graphics
|
||||
baseurl=https://repo.radeon.com/graphics/7.0_rc1/rhel/9.6/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo dnf clean all
|
||||
|
||||
.. tab-item:: Oracle Linux 8.10
|
||||
:sync: ol-810
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-7.0.0]
|
||||
name=ROCm7.0.0
|
||||
baseurl=https://repo.radeon.com/rocm/el8/7.0_rc1/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
|
||||
sudo tee /etc/yum.repos.d/rocm-graphics.repo <<EOF
|
||||
[ROCm-7.0.0-Graphics]
|
||||
name=ROCm7.0.0-Graphics
|
||||
baseurl=https://repo.radeon.com/graphics/7.0_rc1/el/8.10/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo dnf clean all
|
||||
|
||||
.. tab-item:: Oracle Linux 9.6
|
||||
:sync: ol-96
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-7.0.0]
|
||||
name=ROCm7.0.0
|
||||
baseurl=https://repo.radeon.com/rocm/el9/7.0_rc1/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
|
||||
sudo tee /etc/yum.repos.d/rocm-graphics.repo <<EOF
|
||||
[ROCm-7.0.0-Graphics]
|
||||
name=ROCm7.0.0-Graphics
|
||||
baseurl=https://repo.radeon.com/graphics/7.0_rc1/el/9.6/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo dnf clean all
|
||||
|
||||
.. tab-item:: SLES 15 SP6
|
||||
:sync: sles-156
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo tee /etc/zypp/repos.d/rocm.repo <<EOF
|
||||
[ROCm-7.0.0]
|
||||
name=ROCm7.0.0
|
||||
baseurl=https://repo.radeon.com/rocm/zyp/7.0_rc1/main
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
|
||||
sudo tee /etc/zypp/repos.d/rocm-graphics.repo <<EOF
|
||||
[ROCm-7.0.0-Graphics]
|
||||
name=ROCm7.0.0-Graphics
|
||||
baseurl=https://repo.radeon.com/graphics/7.0_rc1/sle/15.6/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
|
||||
sudo zypper refresh
|
||||
|
||||
.. tab-item:: SLES 15 SP7
|
||||
:sync: sles-157
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo tee /etc/zypp/repos.d/rocm.repo <<EOF
|
||||
[ROCm-7.0.0]
|
||||
name=ROCm7.0.0
|
||||
baseurl=https://repo.radeon.com/rocm/zyp/7.0_rc1/main
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
|
||||
sudo tee /etc/zypp/repos.d/rocm-graphics.repo <<EOF
|
||||
[ROCm-7.0.0-Graphics]
|
||||
name=ROCm7.0.0-Graphics
|
||||
baseurl=https://repo.radeon.com/graphics/7.0_rc1/sle/15.7/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
|
||||
sudo zypper refresh
|
||||
|
||||
Install ROCm
|
||||
============
|
||||
|
||||
@@ -199,6 +641,27 @@ Install ROCm
|
||||
|
||||
sudo apt install rocm
|
||||
|
||||
.. tab-item:: Debian 12
|
||||
:sync: debian-12
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo apt install rocm
|
||||
|
||||
.. tab-item:: RHEL 8.10
|
||||
:sync: rhel-810
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install rocm
|
||||
|
||||
.. tab-item:: RHEL 9.4
|
||||
:sync: rhel-94
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install rocm
|
||||
|
||||
.. tab-item:: RHEL 9.6
|
||||
:sync: rhel-96
|
||||
|
||||
@@ -206,6 +669,34 @@ Install ROCm
|
||||
|
||||
sudo dnf install rocm
|
||||
|
||||
.. tab-item:: Oracle Linux 8.10
|
||||
:sync: ol-810
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install rocm
|
||||
|
||||
.. tab-item:: Oracle Linux 9.6
|
||||
:sync: ol-96
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf install rocm
|
||||
|
||||
.. tab-item:: SLES 15 SP6
|
||||
:sync: sles-156
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper --gpg-auto-import-keys install rocm
|
||||
|
||||
.. tab-item:: SLES 15 SP7
|
||||
:sync: sles-157
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper --gpg-auto-import-keys install rocm
|
||||
|
||||
.. _uninstall-rocm:
|
||||
|
||||
Uninstalling
|
||||
@@ -237,6 +728,8 @@ Uninstalling
|
||||
sudo rm -rf /var/cache/apt/*
|
||||
sudo apt clean all
|
||||
sudo apt update
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: Ubuntu 24.04
|
||||
:sync: ubuntu-24
|
||||
@@ -262,6 +755,87 @@ Uninstalling
|
||||
sudo rm -rf /var/cache/apt/*
|
||||
sudo apt clean all
|
||||
sudo apt update
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: Debian 12
|
||||
:sync: debian-12
|
||||
|
||||
1. Uninstall specific meta packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo apt autoremove rocm
|
||||
|
||||
2. Uninstall ROCm packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo apt autoremove rocm-core
|
||||
|
||||
3. Remove ROCm repositories.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo rm /etc/apt/sources.list.d/rocm*.list
|
||||
# Clear the cache and clean the system
|
||||
sudo rm -rf /var/cache/apt/*
|
||||
sudo apt clean all
|
||||
sudo apt update
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: RHEL 8.10
|
||||
:sync: rhel-810
|
||||
|
||||
1. Uninstall specific meta packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf remove rocm
|
||||
|
||||
2. Uninstall ROCm packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf remove rocm-core amdgpu-core
|
||||
|
||||
3. Remove ROCm repositories.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo rm /etc/yum.repos.d/rocm*.repo*
|
||||
# Clear the cache and clean the system
|
||||
sudo rm -rf /var/cache/dnf
|
||||
sudo dnf clean all
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: RHEL 9.4
|
||||
:sync: rhel-94
|
||||
|
||||
1. Uninstall specific meta packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf remove rocm
|
||||
|
||||
2. Uninstall ROCm packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf remove rocm-core amdgpu-core
|
||||
|
||||
3. Remove ROCm repositories.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo rm /etc/yum.repos.d/rocm*.repo*
|
||||
# Clear the cache and clean the system
|
||||
sudo rm -rf /var/cache/dnf
|
||||
sudo dnf clean all
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: RHEL 9.6
|
||||
:sync: rhel-96
|
||||
@@ -286,3 +860,111 @@ Uninstalling
|
||||
# Clear the cache and clean the system
|
||||
sudo rm -rf /var/cache/dnf
|
||||
sudo dnf clean all
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: Oracle Linux 8.10
|
||||
:sync: ol-810
|
||||
|
||||
1. Uninstall specific meta packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf remove rocm
|
||||
|
||||
2. Uninstall ROCm packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf remove rocm-core amdgpu-core
|
||||
|
||||
3. Remove ROCm repositories.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo rm /etc/yum.repos.d/rocm*.repo*
|
||||
# Clear the cache and clean the system
|
||||
sudo rm -rf /var/cache/dnf
|
||||
sudo dnf clean all
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: Oracle Linux 9.6
|
||||
:sync: ol-96
|
||||
|
||||
1. Uninstall specific meta packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf remove rocm
|
||||
|
||||
2. Uninstall ROCm packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo dnf remove rocm-core amdgpu-core
|
||||
|
||||
3. Remove ROCm repositories.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo rm /etc/yum.repos.d/rocm*.repo*
|
||||
# Clear the cache and clean the system
|
||||
sudo rm -rf /var/cache/dnf
|
||||
sudo dnf clean all
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: SLES 15 SP6
|
||||
:sync: sles-156
|
||||
|
||||
1. Uninstall specific meta packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper remove rocm
|
||||
|
||||
2. Uninstall ROCm packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper remove rocm-core amdgpu-core
|
||||
|
||||
3. Remove ROCm repositories.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper removerepo "ROCm-7.0.0"
|
||||
sudo zypper removerepo "ROCm-7.0.0-Graphics"
|
||||
# Clear the cache and clean the system
|
||||
sudo zypper clean --all
|
||||
sudo zypper refresh
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
.. tab-item:: SLES 15 SP7
|
||||
:sync: sles-157
|
||||
|
||||
1. Uninstall specific meta packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper remove rocm
|
||||
|
||||
2. Uninstall ROCm packages.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper remove rocm-core amdgpu-core
|
||||
|
||||
3. Remove ROCm repositories.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
sudo zypper removerepo "ROCm-7.0.0"
|
||||
sudo zypper removerepo "ROCm-7.0.0-Graphics"
|
||||
# Clear the cache and clean the system
|
||||
sudo zypper clean --all
|
||||
sudo zypper refresh
|
||||
# Restart the system
|
||||
sudo reboot
|
||||
|
||||
@@ -1,147 +1,388 @@
|
||||
******************************
|
||||
ROCm 7.0 Alpha 2 release notes
|
||||
******************************
|
||||
***************************
|
||||
ROCm 7.0 RC1 release notes
|
||||
***************************
|
||||
|
||||
The ROCm 7.0 Alpha 2 is a preview of the upcoming ROCm 7.0 release,
|
||||
which includes functional support for AMD Instinct™ MI355X and MI350X
|
||||
on bare metal, single-node systems. It also introduces new ROCm features for
|
||||
MI300X, MI200, and MI100 series accelerators. This is an Alpha-quality release;
|
||||
expect issues and limitations that will be addressed in upcoming previews.
|
||||
The ROCm 7.0 RC1 is a release candidate for the upcoming ROCm 7.0 major
|
||||
release, which introduces functional support for AMD Instinct™ MI355X and
|
||||
MI350X on single node systems and new features for current-generation
|
||||
accelerators.
|
||||
In this RC1, system support is widened to include more AMD GPUs, Linux
|
||||
distributions, and virtualization options. This preview includes enhancements
|
||||
to the HIP runtime, ROCm libraries, and system management tooling.
|
||||
|
||||
This is a first release candidate; expect issues and limitations that will be
|
||||
addressed in upcoming previews.
|
||||
|
||||
.. important::
|
||||
|
||||
The Alpha 2 release is not intended for performance evaluation.
|
||||
This preview is not intended for performance evaluation.
|
||||
For the latest stable release for use in production, see the [ROCm documentation](https://rocm.docs.amd.com/en/latest/).
|
||||
|
||||
This page provides a high-level summary of key changes added to the Alpha 2
|
||||
release since `the previous Alpha
|
||||
<https://rocm.docs.amd.com/en/docs-7.0-alpha/preview/index.html>`_.
|
||||
This document highlights the key changes in the RC1 build since the
|
||||
`Beta <https://rocm.docs.amd.com/en/docs-7.0-beta/preview/release.html>`__.
|
||||
For a complete history, see the :doc:`ROCm 7.0 preview release history <versions>`.
|
||||
|
||||
.. _alpha-2-system-requirements:
|
||||
.. _rc1-system-requirements:
|
||||
|
||||
Operating system and hardware support
|
||||
=====================================
|
||||
|
||||
Only the accelerators and operating systems listed here are supported. Multi-node systems,
|
||||
virtualized environments, and GPU partitioning are not supported in the Alpha 2 release.
|
||||
This preview supports the following AMD accelerators and Linux distributions in single node setups.
|
||||
|
||||
* AMD Instinct accelerator: MI355X, MI350X, MI325X [#mi325x]_, MI300X, MI300A, MI250X, MI250, MI210, MI100
|
||||
* Operating system: Ubuntu 22.04, Ubuntu 24.04, RHEL 9.6
|
||||
* System type: Bare metal, single node only
|
||||
* Partitioning: Not supported
|
||||
.. tab-set::
|
||||
|
||||
.. [#mi325x] MI325X is only supported with Ubuntu 22.04.
|
||||
.. tab-item:: Instinct MI355X, MI350X
|
||||
|
||||
.. _alpha-2-highlights:
|
||||
.. list-table::
|
||||
:stub-columns: 1
|
||||
:widths: 30, 70
|
||||
|
||||
Alpha 2 release highlights
|
||||
==========================
|
||||
* - Ubuntu
|
||||
- 24.04, 22.04
|
||||
|
||||
This section highlights key features enabled in the ROCm 7.0 Alpha 2 release.
|
||||
* - RHEL
|
||||
- 9.6
|
||||
|
||||
* - Oracle Linux
|
||||
- 9
|
||||
|
||||
.. tab-item:: Instinct MI325X
|
||||
|
||||
.. list-table::
|
||||
:stub-columns: 1
|
||||
:widths: 30, 70
|
||||
|
||||
* - Ubuntu
|
||||
- 22.04
|
||||
|
||||
.. tab-item:: Instinct MI300X
|
||||
|
||||
.. list-table::
|
||||
:stub-columns: 1
|
||||
:widths: 30, 70
|
||||
|
||||
* - Ubuntu
|
||||
- 24.04, 22.04
|
||||
|
||||
* - RHEL
|
||||
- 9.6, 8.10
|
||||
|
||||
* - SLES
|
||||
- 15 SP7, 15 SP6
|
||||
|
||||
* - Oracle Linux
|
||||
- 9, 8
|
||||
|
||||
* - Debian
|
||||
- 12
|
||||
|
||||
.. tab-item:: Instinct MI300A
|
||||
|
||||
.. list-table::
|
||||
:stub-columns: 1
|
||||
:widths: 30, 70
|
||||
|
||||
* - Ubuntu
|
||||
- 24.04, 22.04
|
||||
|
||||
* - RHEL
|
||||
- 9.6, 8.10
|
||||
|
||||
* - SLES
|
||||
- 15 SP7, 15 SP6
|
||||
|
||||
.. tab-item:: Instinct MI250X, MI250, MI210
|
||||
|
||||
.. list-table::
|
||||
:stub-columns: 1
|
||||
:widths: 30, 70
|
||||
|
||||
* - Ubuntu
|
||||
- 24.04, 22.04
|
||||
|
||||
* - RHEL
|
||||
- 9.6, 9.4, 8.10
|
||||
|
||||
* - SLES
|
||||
- 15 SP7, 15 SP6
|
||||
|
||||
.. tab-item:: Instinct MI100
|
||||
|
||||
.. list-table::
|
||||
:stub-columns: 1
|
||||
:widths: 30, 70
|
||||
|
||||
* - Ubuntu
|
||||
- 24.04, 22.04
|
||||
|
||||
* - RHEL
|
||||
- 9.6, 8.10
|
||||
|
||||
* - SLES
|
||||
- 15 SP7, 15 SP6
|
||||
|
||||
.. tab-item:: Radeon PRO V710, V620
|
||||
|
||||
.. list-table::
|
||||
:stub-columns: 1
|
||||
:widths: 30, 70
|
||||
|
||||
* - Ubuntu
|
||||
- 24.04, 22.04
|
||||
|
||||
* - RHEL
|
||||
- 9.6, 8.10
|
||||
|
||||
* - SLES
|
||||
- 15 SP7, 15 SP6
|
||||
|
||||
.. tab-item:: Radeon RX 9000 series
|
||||
|
||||
.. list-table::
|
||||
:stub-columns: 1
|
||||
:widths: 30, 70
|
||||
|
||||
* - Ubuntu
|
||||
- 24.04, 22.04
|
||||
|
||||
* - RHEL
|
||||
- 9.6
|
||||
|
||||
* - SLES
|
||||
- 15 SP7, 15 SP6
|
||||
|
||||
.. tab-item:: Radeon RX 7000 series
|
||||
|
||||
.. list-table::
|
||||
:stub-columns: 1
|
||||
:widths: 30, 70
|
||||
|
||||
* - Ubuntu
|
||||
- 24.04, 22.04
|
||||
|
||||
* - RHEL
|
||||
- 9.6, 8.10
|
||||
|
||||
* - SLES
|
||||
- 15 SP7, 15 SP6
|
||||
|
||||
See the :doc:`installation instructions <install/index>` to install ROCm 7.0 RC1 and the Instinct Driver
|
||||
for your hardware and distribution.
|
||||
|
||||
Virtualization support
|
||||
----------------------
|
||||
|
||||
The RC1 includes support for GPU virtualization on KVM-based SR-IOV and VMware ESXi 8.
|
||||
The following tables detail supported OS configurations per AMD accelerator.
|
||||
|
||||
.. tab-set::
|
||||
|
||||
.. tab-item:: KVM SR-IOV
|
||||
|
||||
All supported configurations require the `GIM SR-IOV driver version
|
||||
8.3.0K <https://github.com/amd/MxGPU-Virtualization/releases>`__.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - AMD accelerator
|
||||
- Host OS
|
||||
- Guest OS
|
||||
|
||||
* - Instinct MI350X
|
||||
- Ubuntu 24.04
|
||||
- Ubuntu 24.04
|
||||
|
||||
* - Instinct MI325X
|
||||
- Ubuntu 22.04
|
||||
- Ubuntu 22.04
|
||||
|
||||
* - Instinct MI300X
|
||||
- Ubuntu 22.04
|
||||
- Ubuntu 22.04
|
||||
|
||||
* - Instinct MI210
|
||||
- RHEL 9.4
|
||||
- Ubuntu 22.04 or RHEL 9.4
|
||||
|
||||
* - Radeon PRO V710
|
||||
- Ubuntu 22.04
|
||||
- Ubuntu 24.04
|
||||
|
||||
.. tab-item:: ESXi 8
|
||||
|
||||
The following configurations are supported on hosts running VMware ESXi 8.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - AMD accelerator
|
||||
- Guest OS
|
||||
|
||||
* - Instinct MI325X
|
||||
- Ubuntu 24.04
|
||||
|
||||
* - Instinct MI300X
|
||||
- Ubuntu 24.04
|
||||
|
||||
* - Instinct MI210
|
||||
- Ubuntu 24.04
|
||||
|
||||
.. _rc1-highlights:
|
||||
|
||||
RC1 release highlights
|
||||
=======================
|
||||
|
||||
This section highlights key features enabled in the ROCm 7.0 RC1.
|
||||
|
||||
AI frameworks
|
||||
-------------
|
||||
|
||||
The ROCm 7.0 Alpha 2 release supports PyTorch 2.7, TensorFlow 2.19, and Triton 3.3.0.
|
||||
The ROCm 7.0 RC1 supports PyTorch 2.7, TensorFlow 2.19, and Triton 3.3.0.
|
||||
|
||||
Libraries
|
||||
---------
|
||||
|
||||
MIGraphX
|
||||
~~~~~~~~
|
||||
Composable Kernel
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
Added support for the Open Compute Project (OCP) ``FP8`` data type on MI350X accelerators.
|
||||
The RC1 adds functional support for microscaling (MX) data type ``FP6`` in
|
||||
Composable Kernel. This builds upon `MX data type support (ROCm 7.0 Alpha)
|
||||
<https://rocm.docs.amd.com/en/docs-7.0-alpha/preview/release.html#new-data-type-support>`__.
|
||||
|
||||
hipBLASLt
|
||||
~~~~~~~~~
|
||||
|
||||
GEMM performance has been improved for ``FP8``, ``FP16``, ``BF16``, and ``FP32`` data types.
|
||||
|
||||
RCCL support
|
||||
~~~~~~~~~~~~
|
||||
|
||||
RCCL is supported for single-node functional usage only. Multi-node communication capabilities will
|
||||
be supported in future preview releases.
|
||||
RCCL is supported for single node functional usage only. Multi-node communication capabilities will
|
||||
be supported in a future release.
|
||||
|
||||
HIP
|
||||
---
|
||||
|
||||
The HIP runtime includes support for:
|
||||
The following changes improve functionality and runtime performance:
|
||||
|
||||
* Added ``constexpr`` operators for ``FP16`` and ``BF16``.
|
||||
* Improved launch latency for device-to-device (D2D) copies and ``memset`` operations on AMD Instinct MI300 series accelerators.
|
||||
|
||||
* Added ``__syncwarp`` operation.
|
||||
* Added ``hipMemGetHandleForAddressRange`` to retrieve a handle for a specified
|
||||
memory address range. This provides functional parity with CUDA
|
||||
``cuMemGetHandleForAddressRange``.
|
||||
|
||||
* The ``_sync()`` version of crosslane builtins such as ``shfl_sync()`` and
|
||||
``__reduce_add_sync`` are enabled by default. These can be disabled by
|
||||
setting the preprocessor macro ``HIP_DISABLE_WARP_SYNC_BUILTINS``.
|
||||
|
||||
In addition, the HIP runtime includes the following functional enhancements which improve runtime
|
||||
performance and user experience:
|
||||
|
||||
* HIP runtime now enables peer-to-peer (P2P) memory copies to utilize all
|
||||
available SDMA engines, rather than being limited to a single engine. It also
|
||||
selects the best engine first to give optimal bandwidth.
|
||||
|
||||
* To match CUDA runtime behavior more closely, HIP runtime APIs no longer check
|
||||
the stream validity with streams passed as input parameters. If the input
|
||||
stream is invalid, it causes a segmentation fault instead of returning
|
||||
an error code ``hipErrorContextIsDestroyed``.
|
||||
|
||||
The following issues have been resolved:
|
||||
|
||||
* An issue when retrieving a memory object from the IPC memory handle causing
|
||||
failures in some framework test applications.
|
||||
|
||||
* An issue causing the incorrect return error ``hipErrorNoDevice`` when a crash occurred
|
||||
on a GPU due to an illegal operation or memory violation. The HIP runtime now
|
||||
handles the failure on the GPU side properly and reports the precise error
|
||||
code based on the last error seen on the GPU.
|
||||
|
||||
See :ref:`HIP compatibility <hip-known-limitation>` for more information about upcoming API changes.
|
||||
* Resolved an issue causing crashes in TensorFlow applications. The HIP runtime now
|
||||
combines multiple definitions of ``callbackQueue`` into a single function; in
|
||||
case of an exception, it passes its handler to the application and provides
|
||||
the proper error code.
|
||||
|
||||
Compilers
|
||||
---------
|
||||
|
||||
The Alpha 2 release introduces the AMD Next-Gen Fortran compiler. ``llvm-flang``
|
||||
(sometimes called ``new-flang`` or ``flang-18``) is a re-implementation of the
|
||||
Fortran frontend. It is a strategic replacement for ``classic-flang`` and is
|
||||
developed in LLVM's upstream repo at `<https://github.com/llvm/llvm-project/tree/main/flang>`__.
|
||||
``llvm-strip`` now supports AMD GPU device code objects (``EM_AMDGPU``).
|
||||
|
||||
Key enhancements include:
|
||||
HIPCC
|
||||
~~~~~
|
||||
|
||||
* Compiler:
|
||||
The legacy Perl-based HIPCC scripts -- ``hipcc.pl`` and ``hipconfig.pl`` -- have been removed.
|
||||
|
||||
* Improved memory load and store instructions.
|
||||
AMD SMI
|
||||
-------
|
||||
|
||||
* Updated clang/llvm to `AMD clang version 20.0.0git` (equivalent to LLVM 20.0.0 with additional out-of-tree patches).
|
||||
* Added:
|
||||
|
||||
* Support added for separate debug file generation for device code.
|
||||
* New default view when using ``amd-smi`` without arguments. The
|
||||
improved default view provides a snapshot of commonly requested information
|
||||
such as bdf, current partition mode, version information, and more. You can
|
||||
obtain the same information in other output formats using ``amd-smi default
|
||||
--json`` or ``amd-smi default --csv``.
|
||||
|
||||
* Comgr:
|
||||
* New APIs:
|
||||
|
||||
* Added support for an in-memory virtual file system (VFS) for storing temporary files
|
||||
generated during intermediate compilation steps. This is designed to
|
||||
improve performance by reducing on-disk file I/O. Currently, VFS is
|
||||
supported only for the device library link step, with plans for expanded
|
||||
support in future releases.
|
||||
* ``amdsmi_get_gpu_bad_page_threshold()`` to get bad page threshold counts.
|
||||
|
||||
* SPIR-V:
|
||||
* ``amdsmi_get_cpu_model_name()`` to get CPU model names (not sourced from E-SMI library).
|
||||
|
||||
* Improved `target-specific extensions <https://github.com/ROCm/llvm-project/blob/c2535466c6e40acd5ecf6ba1676a4e069c6245cc/clang/docs/LanguageExtensions.rst>`_:
|
||||
* ``amdsmi_get_cpu_affinity_with_scope()`` to get CPU affinity.
|
||||
|
||||
* Added a new target-specific builtin ``__builtin_amdgcn_processor_is`` for late or deferred queries of the current target processor.
|
||||
* API enhancements:
|
||||
|
||||
* Added a new target-specific builtin ``__builtin_amdgcn_is_invocable``, enabling fine-grained, per-builtin feature availability.
|
||||
* ``amdsmi_get_power_info()`` now populates ``socket_power``.
|
||||
|
||||
* HIPIFY now supports NVIDIA CUDA 12.8.0 APIs:
|
||||
* ``amdsmi_asic_info_t`` now also includes ``subsystem_id``.
|
||||
|
||||
* Added support for all new device and host APIs, including ``FP4``, ``FP6``, and ``FP128`` -- including support for the corresponding ROCm HIP equivalents.
|
||||
* CLI enhancements:
|
||||
|
||||
* Deprecated features:
|
||||
* ``amd-smi topology`` is now available in guest environments.
|
||||
|
||||
* ROCm components no longer use the ``__AMDGCN_WAVEFRONT_SIZE`` and
|
||||
``__AMDGCN_WAVEFRONT_SIZE__`` macros nor HIP's ``warpSize`` variable as
|
||||
``constexpr``s. These macros and reliance on ``warpSize`` as a ``constexpr`` are
|
||||
deprecated and will be disabled in a future release. Users are encouraged
|
||||
to update their code if needed to ensure future compatibility.
|
||||
* ``amd-smi monitor -p`` now displays the power cap alongside power.
|
||||
|
||||
Instinct Driver / ROCm packaging separation
|
||||
* Optimized:
|
||||
|
||||
* Improved overall performance by reducing the number of backend API calls for ``amd-smi`` CLI commands.
|
||||
|
||||
* Removed partition information from the default ``amd-smi static`` CLI
|
||||
command to avoid waking the GPU unnecessarily. This info remains available
|
||||
via ``amd-smi`` (default view) and ``amd-smi static -p``.
|
||||
|
||||
* Optimized CLI command ``amd-smi topology`` in partition mode.
|
||||
|
||||
* Changed:
|
||||
|
||||
* Updated ``amdsmi_get_clock_info`` in ``amdsmi_interface.py``. The ``clk_deep_sleep`` field now returns the sleep integer value.
|
||||
|
||||
* The char arrays in the following structures have been changed.
|
||||
|
||||
* ``amdsmi_vbios_info_t`` member ``build_date`` changed from ``AMDSMI_MAX_DATE_LENGTH`` to ``AMDSMI_MAX_STRING_LENGTH``.
|
||||
|
||||
* ``amdsmi_dpm_policy_entry_t`` member ``policy_description`` changed from ``AMDSMI_MAX_NAME`` to ``AMDSMI_MAX_STRING_LENGTH``.
|
||||
|
||||
* ``amdsmi_name_value_t`` member ``name`` changed from ``AMDSMI_MAX_NAME`` to ``AMDSMI_MAX_STRING_LENGTH``.
|
||||
|
||||
* Added new event notification types to ``amdsmi_evt_notification_type_t``:
|
||||
``AMDSMI_EVT_NOTIF_EVENT_MIGRATE_START``,
|
||||
``AMDSMI_EVT_NOTIF_EVENT_MIGRATE_END``,
|
||||
``AMDSMI_EVT_NOTIF_EVENT_PAGE_FAULT_START``,
|
||||
``AMDSMI_EVT_NOTIF_EVENT_PAGE_FAULT_END``,
|
||||
``AMDSMI_EVT_NOTIF_EVENT_QUEUE_EVICTION``,
|
||||
``AMDSMI_EVT_NOTIF_EVENT_QUEUE_RESTORE``,
|
||||
``AMDSMI_EVT_NOTIF_EVENT_UNMAP_FROM_GPU``,
|
||||
``AMDSMI_EVT_NOTIF_PROCESS_START``, ``AMDSMI_EVT_NOTIF_PROCESS_END``.
|
||||
|
||||
* The ``amdsmi_bdf_t`` union was changed to have an identical unnamed struct for backwards compatiblity.
|
||||
|
||||
* Removed:
|
||||
|
||||
* Cleaned up and unified the API by removing unused definitions and redundant components.
|
||||
|
||||
* Removed unneeded API ``amdsmi_free_name_value_pairs()``.
|
||||
|
||||
* Removed unused definitions: ``AMDSMI_MAX_NAME``, ``AMDSMI_256_LENGTH``,
|
||||
``AMDSMI_MAX_DATE_LENGTH``, ``MAX_AMDSMI_NAME_LENGTH``, ``AMDSMI_LIB_VERSION_YEAR``,
|
||||
``AMDSMI_DEFAULT_VARIANT``, ``AMDSMI_MAX_NUM_POWER_PROFILES``,
|
||||
``AMDSMI_MAX_DRIVER_VERSION_LENGTH``.
|
||||
|
||||
* Removed unused member ``year`` in struct ``amdsmi_version_t``.
|
||||
|
||||
* Replaced ``amdsmi_io_link_type_t`` with the unified ``amdsmi_link_type_t``. ``amdsmi_io_link_type_t`` is no longer needed.
|
||||
Code using the old enum might need to be updated; this change also affects ``amdsmi_link_metrics_t``, where the ``link_type`` field is changed from ``amdsmi_io_link_type_t`` to ``amdsmi_link_type_t``.
|
||||
|
||||
* Removed the ``amdsmi_get_power_info_v2()`` function as its functionality is now unified in ``amdsmi_get_power_info()``.
|
||||
|
||||
* Removed ``AMDSMI_EVT_NOTIF_RING_HANG`` event notification type in ``amdsmi_evt_notification_type_t``.
|
||||
|
||||
* Removed enum ``amdsmi_vram_vendor_type_t``. ``amdsmi_get_gpu_vram_info()`` now provides vendor names as a string.
|
||||
|
||||
* Removed backwards compatibility for the ``jpeg_activity`` and ``vcn_activity`` fields in ``amdsmi_get_gpu_metrics_info()``.
|
||||
Use ``xcp_stats.jpeg_busy`` or ``xcp_stats.vcn_busy`` instead. This change removes ambiguity between new and old fields
|
||||
and supports the expanded metrics available in modern ASICs.
|
||||
|
||||
* Resolved issues:
|
||||
|
||||
* Removed duplicated GPU IDs when receiving events using the ``amd-smi event`` command.
|
||||
|
||||
Instinct Driver/ROCm packaging separation
|
||||
-------------------------------------------
|
||||
|
||||
The Instinct Driver is now distributed separately from the ROCm software stack and is now stored
|
||||
@@ -152,17 +393,4 @@ Instinct Datacenter GPU Driver
|
||||
information.
|
||||
|
||||
Forward and backward compatibility between the Instinct Driver and ROCm is not supported in the
|
||||
Alpha 2 release. See the :doc:`installation instructions <install/index>`.
|
||||
|
||||
Known limitations
|
||||
=================
|
||||
|
||||
.. _hip-known-limitation:
|
||||
|
||||
HIP compatibility
|
||||
-----------------
|
||||
|
||||
HIP runtime APIs in the ROCm 7.0 Alpha 2 don't include the upcoming backward-incompatible changes. See `HIP 7.0 Is
|
||||
Coming: What You Need to Know to Stay Ahead
|
||||
<https://rocm.blogs.amd.com/ecosystems-and-partners/transition-to-hip-7.0-blog/README.html>`_ to learn about the
|
||||
changes expected for HIP.
|
||||
RC1. See the :doc:`installation instructions <install/index>`.
|
||||
|
||||
@@ -7,7 +7,7 @@ root: preview/index
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: preview/release.rst
|
||||
title: Alpha 2 release notes
|
||||
title: RC1 release notes
|
||||
- file: preview/install/index.rst
|
||||
title: Installation
|
||||
subtrees:
|
||||
@@ -16,3 +16,12 @@ subtrees:
|
||||
title: Install ROCm
|
||||
- file: preview/install/instinct-driver
|
||||
title: Install Instinct Driver
|
||||
- file: preview/benchmark-docker/index.rst
|
||||
title: Docker images for AI
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: preview/benchmark-docker/inference-vllm-llama-3.1-405b-fp4.rst
|
||||
- file: preview/benchmark-docker/inference-vllm-llama-3.3-70b-fp8.rst
|
||||
- file: preview/benchmark-docker/inference-vllm-gpt-oss-120b.rst
|
||||
- file: preview/benchmark-docker/inference-sglang-deepseek-r1-fp4.rst
|
||||
- file: preview/benchmark-docker/inference-sglang-deepseek-r1-fp8.rst
|
||||
|
||||
@@ -11,5 +11,8 @@ ready(() => {
|
||||
"a.header-all-versions[href='https://rocm.docs.amd.com/en/latest/release/versions.html']",
|
||||
);
|
||||
versionListLink.textContent = "Preview versions"
|
||||
versionListLink.href = "https://rocm.docs.amd.com/en/docs-7.0-alpha-2/preview/versions.html"
|
||||
versionListLink.href = "https://rocm.docs.amd.com/en/docs-7.0-rc1/preview/versions.html"
|
||||
|
||||
const headerLogoText = document.querySelector("div.header-logo a:not(.navbar-brand):not(.header-all-versions)")
|
||||
headerLogoText.textContent = "ROCm™ Software 7.0 RC1"
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user