mirror of
https://github.com/ROCm/ROCm.git
synced 2026-04-27 03:01:52 -04:00
Compare commits
26 Commits
docs_7.2.2
...
develop
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
12ae92f14e | ||
|
|
8686138dea | ||
|
|
d7811ea489 | ||
|
|
f319a53155 | ||
|
|
f544433d88 | ||
|
|
cf23f24f8a | ||
|
|
b672e436dc | ||
|
|
874b7b12cc | ||
|
|
2e7ccf4637 | ||
|
|
b24daa8b23 | ||
|
|
f0a8028a3a | ||
|
|
f71761862d | ||
|
|
2c5cfa3a66 | ||
|
|
5e9d541a4c | ||
|
|
05f85b0701 | ||
|
|
4ffcad0f07 | ||
|
|
12f71b15d2 | ||
|
|
96675b5bee | ||
|
|
7d1b84a008 | ||
|
|
b4b2f55a1a | ||
|
|
a71487d6c5 | ||
|
|
0f4cf5db3d | ||
|
|
df20cc3da9 | ||
|
|
0c98d56aa0 | ||
|
|
0b43ac9ccc | ||
|
|
ce38751a24 |
@@ -20,3 +20,7 @@ build:
|
||||
- "doxygen"
|
||||
- "gfortran" # For pre-processing fortran sources
|
||||
- "graphviz" # For dot graphs in doxygen
|
||||
|
||||
search:
|
||||
ignore:
|
||||
- "**/previous-versions/**"
|
||||
|
||||
285
RELEASE.md
285
RELEASE.md
@@ -10,15 +10,157 @@
|
||||
<!-- markdownlint-disable reference-links-images -->
|
||||
<!-- markdownlint-disable no-missing-space-atx -->
|
||||
<!-- spellcheck-disable -->
|
||||
# ROCm 7.2.1 release notes
|
||||
|
||||
# ROCm 7.2.2 release notes
|
||||
|
||||
ROCm 7.2.2 is a quality release that resolves the issue listed in the Release highlights.
|
||||
|
||||
## Release highlights
|
||||
|
||||
The following are the notable changes in ROCm 7.2.2.
|
||||
|
||||
### ROCTracer failure to report kernel operations is fixed
|
||||
|
||||
In ROCm 7.2.1, applications using [ROCTracer](https://rocm.docs.amd.com/projects/roctracer/en/latest/index.html) failed to receive some or all kernel operation events due to a ROCTracer reporting failure. This issue has been resolved, and the fix has been applied to ROCTracer.
|
||||
|
||||
### User space, driver, and firmware dependent changes
|
||||
|
||||
The software for AMD Data Center GPU products requires maintaining a hardware
|
||||
and software stack with interdependencies among the GPU and baseboard
|
||||
firmware, AMD GPU drivers, and the ROCm user space software. While AMD publishes drivers and ROCm user space components, your server or infrastructure provider publishes the GPU and baseboard firmware by bundling AMD firmware releases via an AMD Platform Level Data Model (PLDM) bundle, which includes the Integrated Firmware Image (IFWI).
|
||||
|
||||
GPU and baseboard firmware versioning might differ across GPU families.
|
||||
|
||||
<div class="pst-scrollable-table-container">
|
||||
<table class="table table--middle-left">
|
||||
<thead>
|
||||
<tr>
|
||||
<th class="head">
|
||||
<p>ROCm Version</p>
|
||||
</th>
|
||||
<th class="head">
|
||||
<p>GPU</p>
|
||||
</th>
|
||||
<th class="head">
|
||||
<p>PLDM Bundle (Firmware)</p>
|
||||
</th>
|
||||
<th class="head">
|
||||
<p>AMD GPU Driver (amdgpu)</p>
|
||||
</th>
|
||||
<th class="head">
|
||||
<p>AMD GPU <br>
|
||||
Virtualization Driver (GIM)</p>
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<style>
|
||||
tbody#virtualization-support-instinct tr:last-child {
|
||||
border-bottom: 2px solid var(--pst-color-primary);
|
||||
}
|
||||
</style>
|
||||
<tr>
|
||||
<td rowspan="9" style="vertical-align: middle;">ROCm 7.2.2</td>
|
||||
<td>MI355X</td>
|
||||
<td>
|
||||
01.26.00.02<br>
|
||||
01.25.17.07<br>
|
||||
01.25.16.03
|
||||
</td>
|
||||
<td>
|
||||
30.30.x where x (0-2)<br>
|
||||
30.20.x where x (0-1)<br>
|
||||
30.10.x where x (0-2)
|
||||
</td>
|
||||
<td rowspan="3" style="vertical-align: middle;">8.7.1.K</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>MI350X</td>
|
||||
<td>
|
||||
01.26.00.02<br>
|
||||
01.25.17.07<br>
|
||||
01.25.16.03
|
||||
</td>
|
||||
<td>
|
||||
30.30.x where x (0-2)<br>
|
||||
30.20.x where x (0-1)<br>
|
||||
30.10.x where x (0-2)
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>MI325X<a href="#footnote1"><sup>[1]</sup></a></td>
|
||||
<td>
|
||||
01.25.06.08<br>
|
||||
01.25.04.02
|
||||
</td>
|
||||
<td>30.30.x where x (0-2)<br>
|
||||
30.20.x where x (0-1)<a href="#footnote1"><sup>[1]</sup></a><br>
|
||||
30.10.x where x (0-2)<br>
|
||||
6.4.z where z (0-3)<br>
|
||||
6.3.3
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>MI300X<a href="#footnote2"><sup>[2]</sup></a></td>
|
||||
<td>01.25.06.04<br>
|
||||
01.25.03.12<br>
|
||||
01.25.02.04</td>
|
||||
<td rowspan="6" style="vertical-align: middle;">
|
||||
30.30.x where x (0-2)<br>
|
||||
30.20.x where x (0-1)<br>
|
||||
30.10.x where x (0-2)<br>
|
||||
6.4.z where z (0–3)<br>
|
||||
6.3.3
|
||||
</td>
|
||||
<td>8.7.1.K</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>MI300A</td>
|
||||
<td>BKC 26.1</td>
|
||||
<td rowspan="3" style="vertical-align: middle;">Not Applicable</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>MI250X</td>
|
||||
<td>IFWI 47 (or later)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>MI250</td>
|
||||
<td>MU5 w/ IFWI 75 (or later)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>MI210</td>
|
||||
<td>MU5 w/ IFWI 75 (or later)</td>
|
||||
<td>8.7.1.K</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>MI100</td>
|
||||
<td>VBIOS D3430401-037</td>
|
||||
<td>Not Applicable</td>
|
||||
</tr>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
<p id="footnote1">[1]: For AMD Instinct MI325X KVM SR-IOV users, don't use AMD GPU driver (amdgpu) 30.20.0.</p>
|
||||
<p id="footnote2">[2]: AMD Instinct MI300X KVM SR-IOV with Multi-VF (8 VF) support requires a compatible firmware BKC bundle, which will be released in the coming months.</p>
|
||||
|
||||
### ROCm documentation updates
|
||||
|
||||
ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider range of user needs and use cases.
|
||||
|
||||
* The new [AMD RDNA3.5 system optimization](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/rdna3-5.html) topic describes how to optimize systems powered by AMD Ryzen APUs with RDNA3.5 architecture. These APUs combine high-performance CPU cores with integrated RDNA3.5 graphics, and support LPDDR5X-8000 or DDR5 memory.
|
||||
|
||||
```{note}
|
||||
ROCm 7.2.2 doesn't include any other significant changes or feature additions. For comprehensive changes, new features, and enhancements in ROCm 7.2.1, refer to the [ROCm 7.2.1 release notes](#rocm-7-2-1-release-notes) below.
|
||||
```
|
||||
|
||||
## ROCm 7.2.1 release notes
|
||||
|
||||
The release notes provide a summary of notable changes since the previous ROCm release.
|
||||
|
||||
- [Release highlights](#release-highlights)
|
||||
- [Release highlights](#id1)
|
||||
|
||||
- [Supported hardware, operating system, and virtualization changes](#supported-hardware-operating-system-and-virtualization-changes)
|
||||
|
||||
- [User space, driver, and firmware dependent changes](#user-space-driver-and-firmware-dependent-changes)
|
||||
- [User space, driver, and firmware dependent changes](#id2)
|
||||
|
||||
- [ROCm components versioning](#rocm-components)
|
||||
|
||||
@@ -31,16 +173,15 @@ The release notes provide a summary of notable changes since the previous ROCm r
|
||||
- [ROCm upcoming changes](#rocm-upcoming-changes)
|
||||
|
||||
```{note}
|
||||
If you’re using AMD Radeon GPUs or Ryzen APUs in a workstation setting with a display connected, see the [Use ROCm on Radeon and Ryzen](https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/index.html)
|
||||
documentation to verify compatibility and system requirements.
|
||||
If you’re using AMD Radeon™ GPUs or Ryzen™ for graphics workloads, see the [Use ROCm on Radeon and Ryzen](https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/index.html) documentation to verify compatibility and system requirements.
|
||||
```
|
||||
|
||||
## Release highlights
|
||||
### Release highlights
|
||||
|
||||
The following are notable new features and improvements in ROCm 7.2.1. For changes to individual components, see
|
||||
[Detailed component changes](#detailed-component-changes).
|
||||
|
||||
### Supported hardware, operating system, and virtualization changes
|
||||
#### Supported hardware, operating system, and virtualization changes
|
||||
|
||||
Hardware support remains unchanged in this release.
|
||||
|
||||
@@ -52,11 +193,11 @@ For more information about:
|
||||
|
||||
* Operating systems, see [Supported operating systems](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.1/reference/system-requirements.html#supported-operating-systems) and [ROCm installation for Linux](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.1/).
|
||||
|
||||
#### Virtualization support
|
||||
##### Virtualization support
|
||||
|
||||
Virtualization support remains unchanged in this release. For more information, see [Virtualization support](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.1/reference/system-requirements.html#virtualization-support).
|
||||
|
||||
### User space, driver, and firmware dependent changes
|
||||
#### User space, driver, and firmware dependent changes
|
||||
|
||||
The software for AMD Data Center GPU products requires maintaining a hardware
|
||||
and software stack with interdependencies among the GPU and baseboard
|
||||
@@ -100,13 +241,9 @@ GPU and baseboard firmware versioning might differ across GPU families.
|
||||
01.25.16.03
|
||||
</td>
|
||||
<td>
|
||||
30.30.1<br>
|
||||
30.30.0<br>
|
||||
30.20.1<br>
|
||||
30.20.0<br>
|
||||
30.10.2<br>
|
||||
30.10.1<br>
|
||||
30.10
|
||||
30.30.x where x (0-2)<br>
|
||||
30.20.x where x (0-1)<br>
|
||||
30.10.X where x (0-2)
|
||||
</td>
|
||||
<td rowspan="3" style="vertical-align: middle;">8.7.1.K</td>
|
||||
</tr>
|
||||
@@ -118,27 +255,21 @@ GPU and baseboard firmware versioning might differ across GPU families.
|
||||
01.25.16.03
|
||||
</td>
|
||||
<td>
|
||||
30.30.1<br>
|
||||
30.30.0<br>
|
||||
30.20.1<br>
|
||||
30.20.0<br>
|
||||
30.10.2<br>
|
||||
30.10.1<br>
|
||||
30.10
|
||||
30.30.x where x (0-2)<br>
|
||||
30.20.x where x (0-1)<br>
|
||||
30.10.X where x (0-2)
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>MI325X<a href="#footnote1"><sup>[1]</sup></a></td>
|
||||
<td>
|
||||
01.25.06.08<br>
|
||||
01.25.04.02
|
||||
</td>
|
||||
<td>30.30.1<br>
|
||||
30.30.0<br>
|
||||
30.20.1<br>
|
||||
30.20.0<a href="#footnote1"><sup>[1]</sup></a><br>
|
||||
30.10.2<br>
|
||||
30.10.1<br>
|
||||
30.10<br>
|
||||
<td>
|
||||
30.30.x where x (0-2)<br>
|
||||
30.20.x where x (0-1)<a href="#footnote1"><sup>[1]</sup></a><br>
|
||||
30.10.X where x (0-2)<br>
|
||||
6.4.z where z (0-3)<br>
|
||||
6.3.3
|
||||
</td>
|
||||
@@ -149,13 +280,9 @@ GPU and baseboard firmware versioning might differ across GPU families.
|
||||
01.25.03.12<br>
|
||||
01.25.02.04</td>
|
||||
<td rowspan="6" style="vertical-align: middle;">
|
||||
30.30.1<br>
|
||||
30.30.0<br>
|
||||
30.20.1<br>
|
||||
30.20.0<br>
|
||||
30.10.2<br>
|
||||
30.10.1<br>
|
||||
30.10<br>
|
||||
30.30.x where x (0-2)<br>
|
||||
30.20.x where x (0-1)<br>
|
||||
30.10.X where x (0-2)<br>
|
||||
6.4.z where z (0–3)<br>
|
||||
6.3.3
|
||||
</td>
|
||||
@@ -190,24 +317,24 @@ GPU and baseboard firmware versioning might differ across GPU families.
|
||||
<p id="footnote1">[1]: For AMD Instinct MI325X KVM SR-IOV users, don't use AMD GPU driver (amdgpu) 30.20.0.</p>
|
||||
<p id="footnote2">[2]: For AMD Instinct MI300X KVM SR-IOV with Multi-VF (8 VF) support requires a compatible firmware BKC bundle which will be released in coming months.</p>
|
||||
|
||||
### hipBLASLt updates
|
||||
#### hipBLASLt updates
|
||||
|
||||
hipBLASLt has improved performance for MXFP8 and MXFP4 GEMMs.
|
||||
|
||||
### Deep learning and AI framework updates
|
||||
#### Deep learning and AI framework updates
|
||||
|
||||
ROCm provides a comprehensive ecosystem for deep learning development. For more information, see [Deep learning frameworks for ROCm](../../docs/how-to/deep-learning-rocm.rst) and the [Compatibility
|
||||
matrix](../../docs/compatibility/compatibility-matrix.rst) for the complete list of Deep learning and AI framework versions tested for compatibility with ROCm. AMD ROCm has officially updated support for the following Deep learning and AI frameworks:
|
||||
|
||||
#### JAX
|
||||
##### JAX
|
||||
|
||||
ROCm 7.2.1 enables support for JAX 0.8.2. For more information, see [JAX compatibility](../../docs/compatibility/ml-compatibility/jax-compatibility.rst).
|
||||
|
||||
#### ROCm Offline Installer Creator discontinuation
|
||||
### ROCm Offline Installer Creator discontinuation
|
||||
|
||||
The ROCm Offline Installer Creator is discontinued in ROCm 7.2.1. Equivalent installation capabilities are available through the ROCm Runfile Installer, a self-extracting installer that is not based on OS package managers. For more information, see [ROCm Runfile Installer](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.1/install/rocm-runfile-installer.html).
|
||||
|
||||
### ROCm documentation updates
|
||||
#### ROCm documentation updates
|
||||
|
||||
ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider range of user needs and use cases.
|
||||
|
||||
@@ -231,7 +358,7 @@ ROCm documentation continues to be updated to provide clearer and more comprehen
|
||||
* [Host software glossary](https://rocm.docs.amd.com/en/docs-7.2.1/reference/glossary/host-software.html): Provides brief definitions of development tools, compilers, libraries, and runtime environments for programming AMD GPUs.
|
||||
* [Performance glossary](https://rocm.docs.amd.com/en/docs-7.2.1/reference/glossary/performance.html): Provides brief definitions of performance analysis concepts and optimization techniques.
|
||||
|
||||
## ROCm components
|
||||
### ROCm components
|
||||
|
||||
The following table lists the versions of ROCm components for ROCm 7.2.1, including any version
|
||||
changes from 7.2.0 to 7.2.1. Click the component's updated version to go to a list of its changes.
|
||||
@@ -561,7 +688,7 @@ Click {fab}`github` to go to the component's source code on GitHub.
|
||||
</table>
|
||||
</div>
|
||||
|
||||
## Detailed component changes
|
||||
### Detailed component changes
|
||||
|
||||
The following sections describe key changes to ROCm components.
|
||||
|
||||
@@ -569,13 +696,13 @@ The following sections describe key changes to ROCm components.
|
||||
For a historical overview of ROCm component updates, see the {doc}`ROCm consolidated changelog </release/changelog>`.
|
||||
```
|
||||
|
||||
### **AMD SMI** (26.2.2)
|
||||
#### **AMD SMI** (26.2.2)
|
||||
|
||||
#### Added
|
||||
##### Added
|
||||
|
||||
* GPU board and base board temperature sensors to `amd-smi monitor` command.
|
||||
|
||||
#### Resolved issues
|
||||
##### Resolved issues
|
||||
|
||||
* JSON output was not formatted correctly when using watch mode with metrics.
|
||||
* Output was not properly redirected to file when using JSON format.
|
||||
@@ -583,75 +710,75 @@ For a historical overview of ROCm component updates, see the {doc}`ROCm consolid
|
||||
* Invalid CPER files caused garbage output for AFID lists.
|
||||
* JSON output was not formatted correctly for reset commands.
|
||||
|
||||
### **HIP** (7.2.1)
|
||||
#### **HIP** (7.2.1)
|
||||
|
||||
#### Resolved issues
|
||||
##### Resolved issues
|
||||
|
||||
* Corrected the validation of stream capture in global‑capture mode. It is no longer affected by any thread‑local capture‑mode sequences occurring in other threads.
|
||||
* Corrected the return value of `hipEventQuery` and `hipEventSynchronize`. The HIP runtime now properly handles and restricts stream capture within these APIs.
|
||||
* Corrected an issue in the batch-dispatch doorbell for AQL packets to avoid a potential CPU hang.
|
||||
* To address potential delays in memory‑object destruction that could affect application logic, the HIP runtime disables memory‑object reference counting in direct‑dispatch mode.
|
||||
|
||||
#### Changed
|
||||
##### Changed
|
||||
|
||||
* The `AMD_DIRECT_DISPATCH` environment variable has been deprecated in the HIP runtime.
|
||||
|
||||
### **hipBLASLt** (1.2.2)
|
||||
#### **hipBLASLt** (1.2.2)
|
||||
|
||||
#### Changed
|
||||
##### Changed
|
||||
|
||||
* Enumeration value update for the Sigmoid Activation Function feature.
|
||||
|
||||
### **rocDecode** (1.7.0)
|
||||
#### **rocDecode** (1.7.0)
|
||||
|
||||
#### Upcoming changes
|
||||
##### Upcoming changes
|
||||
|
||||
* The rocDecode GitHub repository will be officially moved to [https://github.com/ROCm/rocm-systems/tree/develop/projects/rocdecode](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocdecode) in an upcoming release.
|
||||
|
||||
### **rocJPEG** (1.4.0)
|
||||
#### **rocJPEG** (1.4.0)
|
||||
|
||||
#### Changed
|
||||
##### Changed
|
||||
|
||||
* Bug fixes and performance improvements.
|
||||
|
||||
#### Upcoming changes
|
||||
##### Upcoming changes
|
||||
|
||||
* The rocJPEG GitHub repository will be officially moved to [https://github.com/ROCm/rocm-systems/tree/develop/projects/rocjpeg](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocjpeg) in an upcoming release.
|
||||
|
||||
### **rocSHMEM** (3.2.0)
|
||||
#### **rocSHMEM** (3.2.0)
|
||||
|
||||
#### Added
|
||||
##### Added
|
||||
* Warnings to notify if large BAR is not available.
|
||||
|
||||
#### Resolved issues
|
||||
##### Resolved issues
|
||||
|
||||
* GDA Backend will disable itself when no GDA compatible NICs are available rather than crashing.
|
||||
* Fix memory coherency issues on gfx1201.
|
||||
|
||||
#### Known issues
|
||||
##### Known issues
|
||||
|
||||
* Only 64-bit rocSHMEM atomic APIs are implemented for the GDA conduit.
|
||||
|
||||
### **RPP** (2.2.1)
|
||||
#### **RPP** (2.2.1)
|
||||
|
||||
#### Added
|
||||
##### Added
|
||||
|
||||
* Error-code capture in test scripts for all C++ tests.
|
||||
|
||||
#### Optimized
|
||||
##### Optimized
|
||||
|
||||
* Optimized F16 variants by replacing scalar load/store operations with AVX2 intrinsics for spatter, log, blend, color_cast, flip, crop_mirror_normalize, and exposure kernels.
|
||||
|
||||
## ROCm known issues
|
||||
### ROCm known issues
|
||||
|
||||
ROCm known issues are noted on {fab}`github` [GitHub](https://github.com/ROCm/ROCm/labels/Verified%20Issue). For known
|
||||
issues related to individual components, review the [Detailed component changes](#detailed-component-changes).
|
||||
|
||||
### hipBLASLt performance regression for specific GEMM configurations
|
||||
#### hipBLASLt performance regression for specific GEMM configurations
|
||||
|
||||
You might observe a noticeable performance regression if you’re using hipBLASLt with the following GPUs for LLMs with specific GEMM configurations:
|
||||
|
||||
#### AMD Instinct MI300X and MI325X GPUs
|
||||
##### AMD Instinct MI300X and MI325X GPUs
|
||||
|
||||
Affected GEMM configurations:
|
||||
|
||||
@@ -661,7 +788,7 @@ Affected GEMM configurations:
|
||||
|
||||
* 9728 × 8192 × 65536 (F8F8S, TN)
|
||||
|
||||
#### AMD Instinct MI350 Series GPUs
|
||||
##### AMD Instinct MI350 Series GPUs
|
||||
|
||||
Affected GEMM configurations:
|
||||
|
||||
@@ -683,20 +810,28 @@ GEMM operations using hipBLASLt might result in longer runtime on AMD Instinct M
|
||||
|
||||
Applications that use [ROCTracer](https://rocm.docs.amd.com/projects/roctracer/en/latest/index.html) might fail to receive some or all kernel operation events due to a ROCTracer reporting failure. ROCTracer is already deprecated and is scheduled to reach end of support (EoS) by the end of 2026 Q2. For more details on ROCTracer deprecation, see [ROCm upcoming changes](#roctracer-rocprofiler-rocprof-and-rocprofv2-deprecation). This issue will be resolved in a future PyTorch on ROCm release that replaces ROCTracer with [ROCprofiler-SDK](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/). See [GitHub issue #6102](https://github.com/ROCm/ROCm/issues/6102).
|
||||
|
||||
## ROCm resolved issues
|
||||
#### Longer runtime for hipBLASLt GEMM operations on Instinct MI300X GPUs in partitioned mode
|
||||
|
||||
GEMM operations using hipBLASLt might result in longer runtime on AMD Instinct MI300X GPUs configured in CPX or NPS4 partition mode (38 control units or CUs). This issue occurs when hipBLASLt fails to find applicable pre-tuned kernels. As a result, it performs an extensive kernel search, which increases both search time and the overall operation runtime. This issue is resolved in the {fab}`github`[hipBLASLt develop branch](https://github.com/ROCm/rocm-libraries/tree/develop/projects/hipblaslt) and will be part of a future ROCm release. See [GitHub issue #6066](https://github.com/ROCm/ROCm/issues/6066).
|
||||
|
||||
#### ROCTracer might fail to report kernel operations
|
||||
|
||||
Applications that use [ROCTracer](https://rocm.docs.amd.com/projects/roctracer/en/latest/index.html) might fail to receive some or all kernel operation events due to a ROCTracer reporting failure. ROCTracer is already deprecated and is scheduled to reach end of support (EoS) by the end of 2026 Q2. For more details on ROCTracer deprecation, see [ROCm upcoming changes](#roctracer-rocprofiler-rocprof-and-rocprofv2-deprecation). This issue will be resolved in a future PyTorch on ROCm release that replaces ROCTracer with [ROCprofiler-SDK](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/). See [GitHub issue #6102](https://github.com/ROCm/ROCm/issues/6102).
|
||||
|
||||
### ROCm resolved issues
|
||||
|
||||
The following are previously known issues resolved in this release. For resolved issues related to
|
||||
individual components, review the [Detailed component changes](#detailed-component-changes).
|
||||
|
||||
### Increased runtime latency of the HIP hipStreamCreate API
|
||||
#### Increased runtime latency of the HIP hipStreamCreate API
|
||||
|
||||
As issue that resulted in doubling of the runtime latency of the [HIP](https://rocmdocs.amd.com/projects/HIP/en/latest/doxygen/html/group___stream.html) `hipStreamCreate` API has been resolved. See [GitHub issue #5978](https://github.com/ROCm/ROCm/issues/5978).
|
||||
|
||||
## ROCm upcoming changes
|
||||
### ROCm upcoming changes
|
||||
|
||||
The following changes to the ROCm software stack are anticipated for future releases.
|
||||
|
||||
### ROCTracer, ROCProfiler, rocprof, and rocprofv2 deprecation
|
||||
#### ROCTracer, ROCProfiler, rocprof, and rocprofv2 deprecation
|
||||
|
||||
ROCTracer, ROCProfiler, `rocprof`, and `rocprofv2` are deprecated. It's strongly recommended to upgrade to the latest version of the [ROCprofiler-SDK](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/) library and the (`rocprofv3`) tool to ensure continued support and access to new features.
|
||||
|
||||
@@ -704,7 +839,7 @@ To learn about key feature improvements and benefits of ROCprofiler-SDK over the
|
||||
|
||||
It's anticipated that ROCTracer, ROCProfiler, `rocprof`, and `rocprofv2` will reach end of support (EoS) by the end of 2026 Q2.
|
||||
|
||||
### ROCm SMI deprecation
|
||||
#### ROCm SMI deprecation
|
||||
|
||||
[ROCm SMI](https://github.com/ROCm/rocm_smi_lib) will be phased out in an
|
||||
upcoming ROCm release and will enter maintenance mode. After this transition,
|
||||
@@ -717,7 +852,7 @@ includes all the features of the ROCm SMI and will continue to receive regular
|
||||
updates, new functionality, and ongoing support. For more information on AMD
|
||||
SMI, see the [AMD SMI documentation](https://rocm.docs.amd.com/projects/amdsmi/en/latest/).
|
||||
|
||||
### Changes to ROCm Object Tooling
|
||||
#### Changes to ROCm Object Tooling
|
||||
|
||||
ROCm Object Tooling tools ``roc-obj-ls``, ``roc-obj-extract``, and ``roc-obj`` were
|
||||
deprecated in ROCm 6.4, and will be removed in a future release. Functionality
|
||||
@@ -726,4 +861,4 @@ clang-offload-bundles into individual code objects found within the objects
|
||||
or executables passed as input. The ``llvm-objdump --offloading`` tool option also
|
||||
supports the ``--arch-name`` option, and only extracts code objects found with
|
||||
the specified target architecture. See [llvm-objdump](https://llvm.org/docs/CommandGuide/llvm-objdump.html)
|
||||
for more information.
|
||||
for more information.
|
||||
@@ -1,4 +1,4 @@
|
||||
ROCm Version,7.2.1,7.2.0,7.1.1,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5, 6.1.2, 6.1.1, 6.1.0, 6.0.2, 6.0.0
|
||||
ROCm Version,7.2.2/7.2.1,7.2.0,7.1.1,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5, 6.1.2, 6.1.1, 6.1.0, 6.0.2, 6.0.0
|
||||
:ref:`Operating systems & kernels <OS-kernel-versions>` [#os-compatibility-past-60]_,Ubuntu 24.04.3,Ubuntu 24.04.3,Ubuntu 24.04.3,Ubuntu 24.04.3,Ubuntu 24.04.3,Ubuntu 24.04.3,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,"Ubuntu 24.04.1, 24.04","Ubuntu 24.04.1, 24.04","Ubuntu 24.04.1, 24.04",Ubuntu 24.04,,,,,,
|
||||
,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,"Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3, 22.04.2","Ubuntu 22.04.4, 22.04.3, 22.04.2"
|
||||
,,,,,,,,,,,,,,,,,,,"Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5"
|
||||
@@ -46,7 +46,7 @@ ROCm Version,7.2.1,7.2.0,7.1.1,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6
|
||||
CUB,2.8.5,2.8.5,2.8.5,2.8.5,2.6.0,2.6.0,2.5.0,2.5.0,2.5.0,2.5.0,2.3.2,2.3.2,2.3.2,2.3.2,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.1,2.0.1
|
||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
||||
DRIVER & USER SPACE [#kfd_support-past-60]_,.. _kfd-userspace-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,,
|
||||
:doc:`AMD GPU Driver <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"30.30.1, 30.30.0, 30.20.1, 30.20.0 [#mi325x_KVM-past-60]_, 30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x","30.30.0, 30.20.1, 30.20.0 [#mi325x_KVM-past-60]_, 30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x","30.20.1, 30.20.0 [#mi325x_KVM-past-60]_, 30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x","30.20.0 [#mi325x_KVM-past-60]_, 30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x","30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x, 6.3.x","30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x, 6.3.x, 6.2.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x"
|
||||
:doc:`AMD GPU Driver <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"30.30.2, 30.30.1, 30.30.0, 30.20.1, 30.20.0 [#mi325x_KVM-past-60]_, 30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x","30.30.0, 30.20.1, 30.20.0 [#mi325x_KVM-past-60]_, 30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x","30.20.1, 30.20.0 [#mi325x_KVM-past-60]_, 30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x","30.20.0 [#mi325x_KVM-past-60]_, 30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x","30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x, 6.3.x","30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x, 6.3.x, 6.2.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x"
|
||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
||||
ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,,
|
||||
:doc:`Composable Kernel <composable_kernel:index>`,1.2.0,1.2.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0
|
||||
@@ -90,7 +90,7 @@ ROCm Version,7.2.1,7.2.0,7.1.1,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6
|
||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
||||
SUPPORT LIBS,,,,,,,,,,,,,,,,,,,,,,,,
|
||||
`hipother <https://github.com/ROCm/hipother>`_,7.2.53211,7.2.26015,7.1.52802,7.1.25424,7.0.51831,7.0.51830,6.4.43483,6.4.43483,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
|
||||
`rocm-core <https://github.com/ROCm/rocm-core>`_,7.2.1,7.2.0,7.1.1,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0,6.1.5,6.1.2,6.1.1,6.1.0,6.0.2,6.0.0
|
||||
`rocm-core <https://github.com/ROCm/rocm-core>`_,7.2.2/7.2.1,7.2.0,7.1.1,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0,6.1.5,6.1.2,6.1.1,6.1.0,6.0.2,6.0.0
|
||||
`ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,20240607.5.7,20240607.5.7,20240607.4.05,20240607.1.4246,20240125.5.08,20240125.5.08,20240125.5.08,20240125.3.30,20231016.2.245,20231016.2.245
|
||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
||||
SYSTEM MGMT TOOLS,.. _tools-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,,
|
||||
@@ -104,9 +104,9 @@ ROCm Version,7.2.1,7.2.0,7.1.1,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6
|
||||
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,2.6.0,2.6.0,2.6.0,2.6.0,2.6.0,2.6.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0
|
||||
:doc:`ROCm Compute Profiler <rocprofiler-compute:index>`,3.4.0,3.4.0,3.3.1,3.3.0,3.2.3,3.2.3,3.1.1,3.1.1,3.1.0,3.1.0,3.0.0,3.0.0,3.0.0,3.0.0,2.0.1,2.0.1,2.0.1,2.0.1,N/A,N/A,N/A,N/A,N/A,N/A
|
||||
:doc:`ROCm Systems Profiler <rocprofiler-systems:index>`,1.3.0,1.3.0,1.2.1,1.2.0,1.1.1,1.1.0,1.0.2,1.0.2,1.0.1,1.0.0,0.1.2,0.1.1,0.1.0,0.1.0,1.11.2,1.11.2,1.11.2,1.11.2,N/A,N/A,N/A,N/A,N/A,N/A
|
||||
:doc:`ROCProfiler <rocprofiler:index>`,2.0.70201,2.0.70200,2.0.70101,2.0.70100,2.0.70002,2.0.70000,2.0.60403,2.0.60402,2.0.60401,2.0.60400,2.0.60303,2.0.60302,2.0.60301,2.0.60300,2.0.60204,2.0.60202,2.0.60201,2.0.60200,2.0.60105,2.0.60102,2.0.60101,2.0.60100,2.0.60002,2.0.60000
|
||||
:doc:`ROCProfiler <rocprofiler:index>`,2.0.70202/2.0.70201,2.0.70200,2.0.70101,2.0.70100,2.0.70002,2.0.70000,2.0.60403,2.0.60402,2.0.60401,2.0.60400,2.0.60303,2.0.60302,2.0.60301,2.0.60300,2.0.60204,2.0.60202,2.0.60201,2.0.60200,2.0.60105,2.0.60102,2.0.60101,2.0.60100,2.0.60002,2.0.60000
|
||||
:doc:`ROCprofiler-SDK <rocprofiler-sdk:index>`,1.1.0,1.1.0,1.0.0,1.0.0,1.0.0,1.0.0,0.6.0,0.6.0,0.6.0,0.6.0,0.5.0,0.5.0,0.5.0,0.5.0,0.4.0,0.4.0,0.4.0,0.4.0,N/A,N/A,N/A,N/A,N/A,N/A
|
||||
:doc:`ROCTracer <roctracer:index>`,4.1.70201,4.1.70200,4.1.70101,4.1.70100,4.1.70002,4.1.70000,4.1.60403,4.1.60402,4.1.60401,4.1.60400,4.1.60303,4.1.60302,4.1.60301,4.1.60300,4.1.60204,4.1.60202,4.1.60201,4.1.60200,4.1.60105,4.1.60102,4.1.60101,4.1.60100,4.1.60002,4.1.60000
|
||||
:doc:`ROCTracer <roctracer:index>`,4.1.70202/4.1.70201,4.1.70200,4.1.70101,4.1.70100,4.1.70002,4.1.70000,4.1.60403,4.1.60402,4.1.60401,4.1.60400,4.1.60303,4.1.60302,4.1.60301,4.1.60300,4.1.60204,4.1.60202,4.1.60201,4.1.60200,4.1.60105,4.1.60102,4.1.60101,4.1.60100,4.1.60002,4.1.60000
|
||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
||||
DEVELOPMENT TOOLS,,,,,,,,,,,,,,,,,,,,,,,,
|
||||
:doc:`HIPIFY <hipify:index>`,22.0.0,22.0.0,20.0.0,20.0.0,20.0.0,20.0.0,19.0.0,19.0.0,19.0.0,19.0.0,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24455,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
|
||||
|
||||
|
@@ -22,12 +22,12 @@ compatibility and system requirements.
|
||||
.. container:: format-big-table
|
||||
|
||||
.. csv-table::
|
||||
:header: "ROCm Version", "7.2.1", "7.2.0", "6.4.0"
|
||||
:header: "ROCm Version", "7.2.2/7.2.1", "7.2.0", "6.4.0"
|
||||
:stub-columns: 1
|
||||
|
||||
:ref:`Operating systems & kernels <OS-kernel-versions>` [#os-compatibility]_,Ubuntu 24.04.4,Ubuntu 24.04.3,Ubuntu 24.04.2
|
||||
,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5
|
||||
,"RHEL 10.1, 10.0, 9.7, 9.6, 9.4","RHEL 10.1, 10.0, 9.7, 9.6, 9.4","RHEL 9.5, 9.4"
|
||||
,"RHEL 10.1, 10.0, |br| 9.7, 9.6, 9.4","RHEL 10.1, 10.0, |br| 9.7, 9.6, 9.4","RHEL 9.5, 9.4"
|
||||
,RHEL 8.10,RHEL 8.10,RHEL 8.10
|
||||
,SLES 15 SP7,SLES 15 SP7,SLES 15 SP6
|
||||
,"Oracle Linux 10, 9, 8","Oracle Linux 10, 9, 8","Oracle Linux 9, 8"
|
||||
@@ -69,7 +69,7 @@ compatibility and system requirements.
|
||||
CUB,2.8.5,2.8.5,2.5.0
|
||||
,,,
|
||||
DRIVER & USER SPACE [#kfd_support]_,.. _kfd-userspace-support-compatibility-matrix:,,
|
||||
:doc:`AMD GPU Driver <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"30.30.1, 30.30.0, 30.20.1, |br| 30.20.0 [#mi325x_KVM]_, 30.10.2, |br| 30.10.1 [#driver_patch]_, 30.10, 6.4.x","30.30.0, 30.20.1, 30.20.0 [#mi325x_KVM]_, |br| 30.10.2, 30.10.1 [#driver_patch]_, |br| 30.10, 6.4.x","6.4.x, 6.3.x, 6.2.x, 6.1.x"
|
||||
:doc:`AMD GPU Driver <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"30.30.2, 30.30.1, 30.30.0, |br| 30.20.1, 30.20.0 [#mi325x_KVM]_, 30.10.2, |br| 30.10.1 [#driver_patch]_, 30.10, 6.4.x","30.30.0, 30.20.1, 30.20.0 [#mi325x_KVM]_, |br| 30.10.2, 30.10.1 [#driver_patch]_, |br| 30.10, 6.4.x","6.4.x, 6.3.x, 6.2.x, 6.1.x"
|
||||
,,,
|
||||
ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix:,,
|
||||
:doc:`Composable Kernel <composable_kernel:index>`,1.2.0,1.2.0,1.1.0
|
||||
@@ -113,7 +113,7 @@ compatibility and system requirements.
|
||||
,,,
|
||||
SUPPORT LIBS,,,
|
||||
`hipother <https://github.com/ROCm/hipother>`_,7.2.53211,7.2.26015,6.4.43482
|
||||
`rocm-core <https://github.com/ROCm/rocm-core>`_,7.2.1,7.2.0,6.4.0
|
||||
`rocm-core <https://github.com/ROCm/rocm-core>`_,7.2.2/7.2.1,7.2.0,6.4.0
|
||||
`ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,N/A [#ROCT-rocr]_,N/A [#ROCT-rocr]_,N/A [#ROCT-rocr]_
|
||||
,,,
|
||||
SYSTEM MGMT TOOLS,.. _tools-support-compatibility-matrix:,,
|
||||
@@ -127,9 +127,9 @@ compatibility and system requirements.
|
||||
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,2.6.0,2.6.0,1.4.0
|
||||
:doc:`ROCm Compute Profiler <rocprofiler-compute:index>`,3.4.0,3.4.0,3.1.0
|
||||
:doc:`ROCm Systems Profiler <rocprofiler-systems:index>`,1.3.0,1.3.0,1.0.0
|
||||
:doc:`ROCProfiler <rocprofiler:index>`,2.0.70201,2.0.70200,2.0.60400
|
||||
:doc:`ROCProfiler <rocprofiler:index>`,2.0.70202/2.0.70201,2.0.70200,2.0.60400
|
||||
:doc:`ROCprofiler-SDK <rocprofiler-sdk:index>`,1.1.0,1.1.0,0.6.0
|
||||
:doc:`ROCTracer <roctracer:index>`,4.1.70201,4.1.70200,4.1.60400
|
||||
:doc:`ROCTracer <roctracer:index>`,4.1.70202/4.1.70201,4.1.70200,4.1.60400
|
||||
,,,
|
||||
DEVELOPMENT TOOLS,,,
|
||||
:doc:`HIPIFY <hipify:index>`,22.0.0,22.0.0,19.0.0
|
||||
@@ -155,8 +155,8 @@ compatibility and system requirements.
|
||||
|
||||
.. rubric:: Footnotes
|
||||
|
||||
.. [#os-compatibility] Some operating systems are supported on specific GPUs. For detailed information about operating systems supported on ROCm 7.2.1, see the latest :ref:`supported_distributions`. For version specific information, see `ROCm 7.2.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.0/reference/system-requirements.html#supported-operating-systems>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-operating-systems>`__.
|
||||
.. [#gpu-compatibility] Some GPUs have limited operating system support. For detailed information about GPUs supporting ROCm 7.2.1, see the latest :ref:`supported_GPUs`. For version specific information, see `ROCm 7.2.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.0/reference/system-requirements.html#supported-gpus>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-gpus>`__.
|
||||
.. [#os-compatibility] Some operating systems are supported on specific GPUs. For detailed information about operating systems supported on ROCm 7.2.2/7.2.1, see the latest :ref:`supported_distributions`. For version specific information, see `ROCm 7.2.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.0/reference/system-requirements.html#supported-operating-systems>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-operating-systems>`__.
|
||||
.. [#gpu-compatibility] Some GPUs have limited operating system support. For detailed information about GPUs supporting ROCm 7.2.2/7.2.1, see the latest :ref:`supported_GPUs`. For version specific information, see `ROCm 7.2.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.0/reference/system-requirements.html#supported-gpus>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-gpus>`__.
|
||||
.. [#dgl_compat] DGL is supported only on ROCm 7.0.0, ROCm 6.4.3, and ROCm 6.4.0.
|
||||
.. [#mi325x_KVM] For AMD Instinct MI325X KVM SR-IOV users, do not use AMD GPU Driver (amdgpu) 30.20.0.
|
||||
.. [#driver_patch] AMD GPU Driver (amdgpu) 30.10.1 is a quality release that resolves an issue identified in the 30.10 release. There are no other significant changes or feature additions in ROCm 7.0.1 from ROCm 7.0.0. AMD GPU Driver (amdgpu) 30.10.1 is compatible with ROCm 7.0.1 and ROCm 7.0.0.
|
||||
@@ -168,7 +168,7 @@ compatibility and system requirements.
|
||||
Operating systems, kernel and Glibc versions
|
||||
*********************************************
|
||||
|
||||
For detailed information on operating system supported on ROCm 7.2.1 and associated Kernel and Glibc version, see the latest :ref:`supported_distributions`. For version specific information, see `ROCm 7.2.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.0/reference/system-requirements.html#supported-operating-systems>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-operating-systems>`__.
|
||||
For detailed information on operating system supported on ROCm 7.2.2/7.2.1 and associated Kernel and Glibc version, see the latest :ref:`supported_distributions`. For version specific information, see `ROCm 7.2.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.0/reference/system-requirements.html#supported-operating-systems>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-operating-systems>`__.
|
||||
|
||||
.. note::
|
||||
|
||||
|
||||
12
docs/conf.py
12
docs/conf.py
@@ -81,7 +81,7 @@ latex_elements = {
|
||||
}
|
||||
|
||||
html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "rocm.docs.amd.com")
|
||||
html_context = {"docs_header_version": "7.2.1"}
|
||||
html_context = {"docs_header_version": "7.2.2"}
|
||||
if os.environ.get("READTHEDOCS", "") == "True":
|
||||
html_context["READTHEDOCS"] = True
|
||||
|
||||
@@ -93,15 +93,15 @@ project = "ROCm Documentation"
|
||||
project_path = os.path.abspath(".").replace("\\", "/")
|
||||
author = "Advanced Micro Devices, Inc."
|
||||
copyright = "Copyright (c) 2026 Advanced Micro Devices, Inc. All rights reserved."
|
||||
version = "7.2.1"
|
||||
release = "7.2.1"
|
||||
version = "7.2.2"
|
||||
release = "7.2.2"
|
||||
setting_all_article_info = True
|
||||
all_article_info_os = ["linux", "windows"]
|
||||
all_article_info_author = ""
|
||||
|
||||
# pages with specific settings
|
||||
article_pages = [
|
||||
{"file": "about/release-notes", "os": ["linux"], "date": "2026-03-25"},
|
||||
{"file": "about/release-notes", "os": ["linux"], "date": "2026-04-14"},
|
||||
{"file": "release/changelog", "os": ["linux"],},
|
||||
{"file": "compatibility/compatibility-matrix", "os": ["linux"]},
|
||||
{"file": "compatibility/ml-compatibility/pytorch-compatibility", "os": ["linux"]},
|
||||
@@ -146,6 +146,7 @@ article_pages = [
|
||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/pytorch-training-v25.4", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/pytorch-training-v25.5", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/pytorch-training-v25.6", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/inference/xdit-diffusion-inference", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/pytorch-training-v25.7", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/pytorch-training-v25.8", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/pytorch-training-v25.9", "os": ["linux"]},
|
||||
@@ -204,6 +205,7 @@ article_pages = [
|
||||
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-25.13", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-26.1", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-26.2", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-26.3", "os": ["linux"]},
|
||||
|
||||
{"file": "how-to/rocm-for-ai/inference/deploy-your-model", "os": ["linux"]},
|
||||
|
||||
@@ -244,7 +246,7 @@ external_projects_current_project = "rocm"
|
||||
# external_projects_remote_repository = ""
|
||||
|
||||
html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "https://rocm-stg.amd.com/")
|
||||
html_context = {"docs_header_version": "7.2.1"}
|
||||
html_context = {"docs_header_version": "7.2.2"}
|
||||
if os.environ.get("READTHEDOCS", "") == "True":
|
||||
html_context["READTHEDOCS"] = True
|
||||
|
||||
|
||||
@@ -0,0 +1,354 @@
|
||||
docker:
|
||||
pull_tag: rocm/pytorch-xdit:v26.3
|
||||
docker_hub_url: https://hub.docker.com/layers/rocm/pytorch-xdit/v26.3/images/sha256-ac78a03d2911bf1b49c001d3be2e8bd745c1bc455cb49ae972825a7986880902
|
||||
ROCm: 7.12.0
|
||||
whats_new:
|
||||
- "Qwen-Image support"
|
||||
- "Qwen-Image-Edit support"
|
||||
- "Aiter update to support Sage attention v2"
|
||||
- "xDiT update to support MXFP4 GEMMs in Wan2.2, Wan2.1 and Flux.2"
|
||||
components:
|
||||
TheRock:
|
||||
version: e40a6da
|
||||
url: https://github.com/ROCm/TheRock
|
||||
rocm-libraries:
|
||||
version: 9e9e900
|
||||
url: https://github.com/ROCm/rocm-libraries
|
||||
rocm-systems:
|
||||
version: ca89a1a
|
||||
url: https://github.com/ROCm/rocm-systems
|
||||
torch:
|
||||
version: 91be249
|
||||
url: https://github.com/ROCm/pytorch
|
||||
torchaudio:
|
||||
version: e3c6ee2
|
||||
url: https://github.com/pytorch/audio
|
||||
torchvision:
|
||||
version: b919bd0
|
||||
url: https://github.com/pytorch/vision
|
||||
triton:
|
||||
version: a272dfa
|
||||
url: https://github.com/ROCm/triton
|
||||
accelerate:
|
||||
version: 46ba481
|
||||
url: https://github.com/huggingface/accelerate
|
||||
aiter:
|
||||
version: 82d733f
|
||||
url: https://github.com/ROCm/aiter
|
||||
diffusers:
|
||||
version: a80b192
|
||||
url: https://github.com/huggingface/diffusers
|
||||
xfuser:
|
||||
version: 2882027
|
||||
url: https://github.com/xdit-project/xDiT
|
||||
yunchang:
|
||||
version: 631bdfd
|
||||
url: https://github.com/feifeibear/long-context-attention
|
||||
supported_models:
|
||||
- group: Hunyuan Video
|
||||
js_tag: hunyuan
|
||||
models:
|
||||
- model: Hunyuan Video
|
||||
model_repo: tencent/HunyuanVideo
|
||||
revision: refs/pr/18
|
||||
url: https://huggingface.co/tencent/HunyuanVideo
|
||||
github: https://github.com/Tencent-Hunyuan/HunyuanVideo
|
||||
mad_tag: pyt_xdit_hunyuanvideo
|
||||
js_tag: hunyuan_tag
|
||||
benchmark_command:
|
||||
- mkdir results
|
||||
- 'xdit \'
|
||||
- '--model {model_repo} \'
|
||||
- '--prompt "In the large cage, two puppies were wagging their tails at each other." \'
|
||||
- '--batch_size 1 \'
|
||||
- '--height 720 --width 1280 \'
|
||||
- '--seed 1168860793 \'
|
||||
- '--num_frames 129 \'
|
||||
- '--num_inference_steps 50 \'
|
||||
- '--warmup_calls 1 \'
|
||||
- '--num_iterations 1 \'
|
||||
- '--ulysses_degree 8 \'
|
||||
- '--enable_tiling --enable_slicing \'
|
||||
- '--guidance_scale 6.0 \'
|
||||
- '--use_torch_compile \'
|
||||
- '--attention_backend aiter \'
|
||||
- '--output_directory results'
|
||||
- model: Hunyuan Video 1.5
|
||||
model_repo: hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v
|
||||
url: https://huggingface.co/hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v
|
||||
github: https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5
|
||||
mad_tag: pyt_xdit_hunyuanvideo_1_5
|
||||
js_tag: hunyuan_1_5_tag
|
||||
benchmark_command:
|
||||
- mkdir results
|
||||
- 'xdit \'
|
||||
- '--model {model_repo} \'
|
||||
- '--prompt "In the large cage, two puppies were wagging their tails at each other." \'
|
||||
- '--task t2v \'
|
||||
- '--height 720 --width 1280 \'
|
||||
- '--seed 1168860793 \'
|
||||
- '--num_frames 129 \'
|
||||
- '--num_inference_steps 50 \'
|
||||
- '--num_iterations 1 \'
|
||||
- '--ulysses_degree 8 \'
|
||||
- '--enable_tiling --enable_slicing \'
|
||||
- '--use_torch_compile \'
|
||||
- '--attention_backend aiter \'
|
||||
- '--output_directory results'
|
||||
- group: Wan-AI
|
||||
js_tag: wan
|
||||
models:
|
||||
- model: Wan2.1
|
||||
model_repo: Wan-AI/Wan2.1-I2V-14B-720P-Diffusers
|
||||
url: https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P-Diffusers
|
||||
github: https://github.com/Wan-Video/Wan2.1
|
||||
mad_tag: pyt_xdit_wan_2_1
|
||||
js_tag: wan_21_tag
|
||||
benchmark_command:
|
||||
- mkdir results
|
||||
- 'xdit \'
|
||||
- '--model {model_repo} \'
|
||||
- '--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline''s intricate details and the refreshing atmosphere of the seaside." \'
|
||||
- '--height 720 \'
|
||||
- '--width 1280 \'
|
||||
- '--input_images /app/data/wan_input.jpg \'
|
||||
- '--num_frames 81 \'
|
||||
- '--ulysses_degree 8 \'
|
||||
- '--seed 42 \'
|
||||
- '--num_iterations 1 \'
|
||||
- '--num_inference_steps 40 \'
|
||||
- '--use_torch_compile \'
|
||||
- '--attention_backend aiter \'
|
||||
- '--output_directory results'
|
||||
- model: Wan2.2
|
||||
model_repo: Wan-AI/Wan2.2-I2V-A14B-Diffusers
|
||||
url: https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B-Diffusers
|
||||
github: https://github.com/Wan-Video/Wan2.2
|
||||
mad_tag: pyt_xdit_wan_2_2
|
||||
js_tag: wan_22_tag
|
||||
benchmark_command:
|
||||
- mkdir results
|
||||
- 'xdit \'
|
||||
- '--model {model_repo} \'
|
||||
- '--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline''s intricate details and the refreshing atmosphere of the seaside." \'
|
||||
- '--height 720 \'
|
||||
- '--width 1280 \'
|
||||
- '--input_images /app/data/wan_input.jpg \'
|
||||
- '--num_frames 81 \'
|
||||
- '--ulysses_degree 8 \'
|
||||
- '--seed 42 \'
|
||||
- '--num_iterations 1 \'
|
||||
- '--num_inference_steps 40 \'
|
||||
- '--use_torch_compile \'
|
||||
- '--attention_backend aiter \'
|
||||
- '--output_directory results'
|
||||
- group: FLUX
|
||||
js_tag: flux
|
||||
models:
|
||||
- model: FLUX.1
|
||||
model_repo: black-forest-labs/FLUX.1-dev
|
||||
url: https://huggingface.co/black-forest-labs/FLUX.1-dev
|
||||
github: https://github.com/black-forest-labs/flux
|
||||
mad_tag: pyt_xdit_flux
|
||||
js_tag: flux_1_tag
|
||||
benchmark_command:
|
||||
- mkdir results
|
||||
- 'xdit \'
|
||||
- '--model {model_repo} \'
|
||||
- '--seed 42 \'
|
||||
- '--prompt "A small cat" \'
|
||||
- '--height 1024 \'
|
||||
- '--width 1024 \'
|
||||
- '--num_inference_steps 25 \'
|
||||
- '--max_sequence_length 256 \'
|
||||
- '--warmup_calls 5 \'
|
||||
- '--ulysses_degree 8 \'
|
||||
- '--use_torch_compile \'
|
||||
- '--guidance_scale 0.0 \'
|
||||
- '--num_iterations 50 \'
|
||||
- '--attention_backend aiter \'
|
||||
- '--output_directory results'
|
||||
- model: FLUX.1 Kontext
|
||||
model_repo: black-forest-labs/FLUX.1-Kontext-dev
|
||||
url: https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev
|
||||
github: https://github.com/black-forest-labs/flux
|
||||
mad_tag: pyt_xdit_flux_kontext
|
||||
js_tag: flux_1_kontext_tag
|
||||
benchmark_command:
|
||||
- mkdir results
|
||||
- 'xdit \'
|
||||
- '--model {model_repo} \'
|
||||
- '--seed 42 \'
|
||||
- '--prompt "Add a cool hat to the cat" \'
|
||||
- '--height 1024 \'
|
||||
- '--width 1024 \'
|
||||
- '--num_inference_steps 30 \'
|
||||
- '--max_sequence_length 512 \'
|
||||
- '--warmup_calls 5 \'
|
||||
- '--ulysses_degree 8 \'
|
||||
- '--use_torch_compile \'
|
||||
- '--input_images /app/data/flux_cat.png \'
|
||||
- '--guidance_scale 2.5 \'
|
||||
- '--num_iterations 25 \'
|
||||
- '--attention_backend aiter \'
|
||||
- '--output_directory results'
|
||||
- model: FLUX.2
|
||||
model_repo: black-forest-labs/FLUX.2-dev
|
||||
url: https://huggingface.co/black-forest-labs/FLUX.2-dev
|
||||
github: https://github.com/black-forest-labs/flux2
|
||||
mad_tag: pyt_xdit_flux_2
|
||||
js_tag: flux_2_tag
|
||||
benchmark_command:
|
||||
- mkdir results
|
||||
- 'xdit \'
|
||||
- '--model {model_repo} \'
|
||||
- '--seed 42 \'
|
||||
- '--prompt "Add a cool hat to the cat" \'
|
||||
- '--height 1024 \'
|
||||
- '--width 1024 \'
|
||||
- '--num_inference_steps 50 \'
|
||||
- '--max_sequence_length 512 \'
|
||||
- '--warmup_calls 5 \'
|
||||
- '--ulysses_degree 8 \'
|
||||
- '--use_torch_compile \'
|
||||
- '--input_images /app/data/flux_cat.png \'
|
||||
- '--guidance_scale 4.0 \'
|
||||
- '--num_iterations 25 \'
|
||||
- '--attention_backend aiter \'
|
||||
- '--output_directory results'
|
||||
- model: FLUX.2 Klein
|
||||
model_repo: black-forest-labs/FLUX.2-klein-9B
|
||||
url: https://huggingface.co/black-forest-labs/FLUX.2-klein-9B
|
||||
github: https://github.com/black-forest-labs/flux2
|
||||
mad_tag: pyt_xdit_flux_2_klein
|
||||
js_tag: flux_2_klein_tag
|
||||
benchmark_command:
|
||||
- mkdir results
|
||||
- 'xdit \'
|
||||
- '--model {model_repo} \'
|
||||
- '--seed 42 \'
|
||||
- '--prompt "A spectacular sunset over the ocean" \'
|
||||
- '--height 2048 \'
|
||||
- '--width 2048 \'
|
||||
- '--num_inference_steps 4 \'
|
||||
- '--warmup_calls 5 \'
|
||||
- '--ulysses_degree 8 \'
|
||||
- '--use_torch_compile \'
|
||||
- '--guidance_scale 1.0 \'
|
||||
- '--num_iterations 25 \'
|
||||
- '--attention_backend aiter \'
|
||||
- '--output_directory results'
|
||||
- group: StableDiffusion
|
||||
js_tag: stablediffusion
|
||||
models:
|
||||
- model: stable-diffusion-3.5-large
|
||||
model_repo: stabilityai/stable-diffusion-3.5-large
|
||||
url: https://huggingface.co/stabilityai/stable-diffusion-3.5-large
|
||||
github: https://github.com/Stability-AI/sd3.5
|
||||
mad_tag: pyt_xdit_sd_3_5
|
||||
js_tag: stable_diffusion_3_5_large_tag
|
||||
benchmark_command:
|
||||
- mkdir results
|
||||
- 'xdit \'
|
||||
- '--model {model_repo} \'
|
||||
- '--prompt "A capybara holding a sign that reads Hello World" \'
|
||||
- '--num_iterations 50 \'
|
||||
- '--num_inference_steps 28 \'
|
||||
- '--pipefusion_parallel_degree 4 \'
|
||||
- '--use_cfg_parallel \'
|
||||
- '--use_torch_compile \'
|
||||
- '--attention_backend aiter \'
|
||||
- '--output_directory results'
|
||||
- group: Z-Image
|
||||
js_tag: z_image
|
||||
models:
|
||||
- model: Z-Image Turbo
|
||||
model_repo: Tongyi-MAI/Z-Image-Turbo
|
||||
url: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
|
||||
github: https://github.com/Tongyi-MAI/Z-Image
|
||||
mad_tag: pyt_xdit_z_image_turbo
|
||||
js_tag: z_image_turbo_tag
|
||||
benchmark_command:
|
||||
- mkdir results
|
||||
- 'xdit \'
|
||||
- '--model {model_repo} \'
|
||||
- '--seed 42 \'
|
||||
- '--prompt "A crowded beach" \'
|
||||
- '--height 1088 \'
|
||||
- '--width 1920 \'
|
||||
- '--num_inference_steps 9 \'
|
||||
- '--ulysses_degree 2 \'
|
||||
- '--use_torch_compile \'
|
||||
- '--guidance_scale 0.0 \'
|
||||
- '--num_iterations 50 \'
|
||||
- '--attention_backend aiter \'
|
||||
- '--output_directory results'
|
||||
- group: LTX
|
||||
js_tag: ltx
|
||||
models:
|
||||
- model: LTX-2
|
||||
model_repo: Lightricks/LTX-2
|
||||
url: https://huggingface.co/Lightricks/LTX-2
|
||||
github: https://github.com/Lightricks/LTX-2
|
||||
mad_tag: pyt_xdit_ltx2
|
||||
js_tag: ltx2_tag
|
||||
benchmark_command:
|
||||
- mkdir results
|
||||
- 'xdit \'
|
||||
- '--model {model_repo} \'
|
||||
- '--seed 42 \'
|
||||
- '--prompt "Cinematic action packed shot. The man says silently: \"We need to run.\". The camera zooms in on his mouth then immediately screams: \"NOW!\". The camera zooms back out, he turns around and bolts it." \'
|
||||
- '--height 1088 \'
|
||||
- '--width 1920 \'
|
||||
- '--num_inference_steps 40 \'
|
||||
- '--ulysses_degree 8 \'
|
||||
- '--use_torch_compile \'
|
||||
- '--guidance_scale 4.0 \'
|
||||
- '--num_iterations 1 \'
|
||||
- '--attention_backend aiter \'
|
||||
- '--output_directory results'
|
||||
- group: Qwen-Image
|
||||
js_tag: qwen_image
|
||||
models:
|
||||
- model: Qwen-Image
|
||||
model_repo: Qwen/Qwen-Image
|
||||
url: https://huggingface.co/Qwen/Qwen-Image
|
||||
github: https://github.com/QwenLM/Qwen-Image
|
||||
mad_tag: pyt_xdit_qwen_image
|
||||
js_tag: qwen_image_tag
|
||||
benchmark_command:
|
||||
- mkdir results
|
||||
- 'xdit \'
|
||||
- '--model {model_repo} \'
|
||||
- '--seed 42 \'
|
||||
- '--prompt "A cat holding a sign that says hello world" \'
|
||||
- '--height 2048 \'
|
||||
- '--width 2048 \'
|
||||
- '--num_inference_steps 50 \'
|
||||
- '--ulysses_degree 8 \'
|
||||
- '--use_torch_compile \'
|
||||
- '--num_iterations 1 \'
|
||||
- '--attention_backend aiter \'
|
||||
- '--output_directory results'
|
||||
- model: Qwen-Image-Edit
|
||||
model_repo: Qwen/Qwen-Image-Edit
|
||||
url: https://huggingface.co/Qwen/Qwen-Image-Edit
|
||||
github: https://github.com/QwenLM/Qwen-Image
|
||||
mad_tag: pyt_xdit_qwen_image_edit
|
||||
js_tag: qwen_image_edit_tag
|
||||
benchmark_command:
|
||||
- mkdir results
|
||||
- 'xdit \'
|
||||
- '--model {model_repo} \'
|
||||
- '--seed 42 \'
|
||||
- '--prompt "Add a cool hat to the cat." \'
|
||||
- '--negative_prompt "" \'
|
||||
- '--input_images /app/data/flux_cat.png \'
|
||||
- '--height 2048 \'
|
||||
- '--width 2048 \'
|
||||
- '--num_inference_steps 50 \'
|
||||
- '--ulysses_degree 8 \'
|
||||
- '--use_torch_compile \'
|
||||
- '--num_iterations 1 \'
|
||||
- '--attention_backend aiter \'
|
||||
- '--output_directory results'
|
||||
@@ -1,24 +1,24 @@
|
||||
docker:
|
||||
pull_tag: rocm/pytorch-xdit:v26.3
|
||||
docker_hub_url: https://hub.docker.com/layers/rocm/pytorch-xdit/v26.3/images/sha256-ac78a03d2911bf1b49c001d3be2e8bd745c1bc455cb49ae972825a7986880902
|
||||
pull_tag: rocm/pytorch-xdit:v26.4
|
||||
docker_hub_url: https://hub.docker.com/layers/rocm/pytorch-xdit/v26.4/images/sha256-b4296a638eb8dc7ebcafc808e180b78a3c44177580c21986082ec9539496067c
|
||||
ROCm: 7.12.0
|
||||
whats_new:
|
||||
- "Qwen-Image support"
|
||||
- "Qwen-Image-Edit support"
|
||||
- "Aiter update to support Sage attention v2"
|
||||
- "xDiT update to support MXFP4 GEMMs in Wan2.2, Wan2.1 and Flux.2"
|
||||
- "Qwen-Image-2512 support"
|
||||
- "Z-Image support"
|
||||
- "Parallel VAE decode support for Wan models"
|
||||
- "Batch inference and data parallel support"
|
||||
components:
|
||||
TheRock:
|
||||
version: e40a6da
|
||||
version: 9b611c6
|
||||
url: https://github.com/ROCm/TheRock
|
||||
rocm-libraries:
|
||||
version: 9e9e900
|
||||
version: 7567d83
|
||||
url: https://github.com/ROCm/rocm-libraries
|
||||
rocm-systems:
|
||||
version: ca89a1a
|
||||
version: 93bc019
|
||||
url: https://github.com/ROCm/rocm-systems
|
||||
torch:
|
||||
version: 91be249
|
||||
version: ff65f5b
|
||||
url: https://github.com/ROCm/pytorch
|
||||
torchaudio:
|
||||
version: e3c6ee2
|
||||
@@ -33,13 +33,16 @@ docker:
|
||||
version: 46ba481
|
||||
url: https://github.com/huggingface/accelerate
|
||||
aiter:
|
||||
version: 82d733f
|
||||
version: a169e14
|
||||
url: https://github.com/ROCm/aiter
|
||||
diffusers:
|
||||
version: a80b192
|
||||
url: https://github.com/huggingface/diffusers
|
||||
distvae:
|
||||
version: bf7531e
|
||||
url: https://github.com/xdit-project/DistVAE
|
||||
xfuser:
|
||||
version: 2882027
|
||||
version: 45c44e7
|
||||
url: https://github.com/xdit-project/xDiT
|
||||
yunchang:
|
||||
version: 631bdfd
|
||||
@@ -114,6 +117,7 @@ docker:
|
||||
- '--input_images /app/data/wan_input.jpg \'
|
||||
- '--num_frames 81 \'
|
||||
- '--ulysses_degree 8 \'
|
||||
- '--use_parallel_vae \'
|
||||
- '--seed 42 \'
|
||||
- '--num_iterations 1 \'
|
||||
- '--num_inference_steps 40 \'
|
||||
@@ -136,6 +140,7 @@ docker:
|
||||
- '--input_images /app/data/wan_input.jpg \'
|
||||
- '--num_frames 81 \'
|
||||
- '--ulysses_degree 8 \'
|
||||
- '--use_parallel_vae \'
|
||||
- '--seed 42 \'
|
||||
- '--num_iterations 1 \'
|
||||
- '--num_inference_steps 40 \'
|
||||
@@ -262,12 +267,12 @@ docker:
|
||||
- group: Z-Image
|
||||
js_tag: z_image
|
||||
models:
|
||||
- model: Z-Image Turbo
|
||||
model_repo: Tongyi-MAI/Z-Image-Turbo
|
||||
url: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
|
||||
- model: Z-Image
|
||||
model_repo: Tongyi-MAI/Z-Image
|
||||
url: https://huggingface.co/Tongyi-MAI/Z-Image
|
||||
github: https://github.com/Tongyi-MAI/Z-Image
|
||||
mad_tag: pyt_xdit_z_image_turbo
|
||||
js_tag: z_image_turbo_tag
|
||||
mad_tag: pyt_xdit_z_image
|
||||
js_tag: z_image_tag
|
||||
benchmark_command:
|
||||
- mkdir results
|
||||
- 'xdit \'
|
||||
@@ -276,11 +281,13 @@ docker:
|
||||
- '--prompt "A crowded beach" \'
|
||||
- '--height 1088 \'
|
||||
- '--width 1920 \'
|
||||
- '--num_inference_steps 9 \'
|
||||
- '--num_inference_steps 50 \'
|
||||
- '--ulysses_degree 2 \'
|
||||
- '--ring_degree 2 \'
|
||||
- '--use_cfg_parallel \'
|
||||
- '--use_torch_compile \'
|
||||
- '--guidance_scale 0.0 \'
|
||||
- '--num_iterations 50 \'
|
||||
- '--guidance_scale 4.0 \'
|
||||
- '--num_iterations 25 \'
|
||||
- '--attention_backend aiter \'
|
||||
- '--output_directory results'
|
||||
- group: LTX
|
||||
@@ -311,8 +318,8 @@ docker:
|
||||
js_tag: qwen_image
|
||||
models:
|
||||
- model: Qwen-Image
|
||||
model_repo: Qwen/Qwen-Image
|
||||
url: https://huggingface.co/Qwen/Qwen-Image
|
||||
model_repo: Qwen/Qwen-Image-2512
|
||||
url: https://huggingface.co/Qwen/Qwen-Image-2512
|
||||
github: https://github.com/QwenLM/Qwen-Image
|
||||
mad_tag: pyt_xdit_qwen_image
|
||||
js_tag: qwen_image_tag
|
||||
|
||||
@@ -127,7 +127,7 @@ Download the base model and fine-tuning dataset
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
huggingface-cli login
|
||||
hf auth login
|
||||
|
||||
.. note::
|
||||
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn how to validate LLM inference performance on MI300X GPUs using AMD MAD and the
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn how to validate LLM inference performance on MI300X accelerators using AMD MAD and the ROCm vLLM Docker image.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn how to validate LLM inference performance on MI300X GPUs using AMD MAD and the ROCm vLLM Docker image.
|
||||
@@ -479,4 +480,4 @@ Previous versions
|
||||
=================
|
||||
|
||||
See :doc:`vllm-history` to find documentation for previous releases
|
||||
of the ``ROCm/vllm`` Docker image.
|
||||
of the ``ROCm/vllm`` Docker image.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn how to validate LLM inference performance on MI300X GPUs using AMD MAD and the ROCm vLLM Docker image.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn how to validate LLM inference performance on MI300X GPUs using AMD MAD and the unified
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn how to validate LLM inference performance on MI300X GPUs using AMD MAD and the unified
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn how to validate LLM inference performance on MI300X GPUs using AMD MAD and the
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn how to validate LLM inference performance on MI300X GPUs using AMD MAD and the
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn how to validate LLM inference performance on MI300X GPUs using AMD MAD and the
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn how to validate LLM inference performance on MI300X GPUs using AMD MAD and the
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn how to validate LLM inference performance on MI300X GPUs using AMD MAD and the
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn how to validate LLM inference performance on MI300X GPUs using AMD MAD and the
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn how to validate LLM inference performance on MI300X GPUs using AMD MAD and the
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn how to validate LLM inference performance on MI300X GPUs using AMD MAD and the
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn to validate diffusion model video generation on MI300X, MI350X and MI355X accelerators using
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn to validate diffusion model video generation on MI300X, MI350X and MI355X accelerators using
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn to validate diffusion model video generation on MI300X, MI350X and MI355X accelerators using
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn to validate diffusion model video generation on MI300X, MI350X and MI355X accelerators using
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn to validate diffusion model video generation on MI300X, MI350X and MI355X accelerators using
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn to validate diffusion model video generation on MI300X, MI350X and MI355X accelerators using
|
||||
|
||||
@@ -0,0 +1,321 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: Learn to validate diffusion model video generation on MI300X, MI350X and MI355X accelerators using
|
||||
prebuilt and optimized docker images.
|
||||
:keywords: xDiT, diffusion, video, video generation, image, image generation, validate, benchmark
|
||||
|
||||
************************
|
||||
xDiT diffusion inference
|
||||
************************
|
||||
|
||||
.. caution::
|
||||
|
||||
This documentation does not reflect the latest version of the xDiT diffusion
|
||||
inference performance documentation. See
|
||||
:doc:`/how-to/rocm-for-ai/inference/xdit-diffusion-inference` for the latest
|
||||
version.
|
||||
|
||||
.. _xdit-video-diffusion-263:
|
||||
|
||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.3-inference-models.yaml
|
||||
|
||||
{% set docker = data.docker %}
|
||||
|
||||
The `rocm/pytorch-xdit <{{ docker.docker_hub_url }}>`_ Docker image offers a prebuilt, optimized environment based on `xDiT <https://github.com/xdit-project/xDiT>`_ for
|
||||
benchmarking diffusion model video and image generation on gfx942 and gfx950 series (AMD Instinct™ MI300X, MI325X, MI350X, and MI355X) GPUs.
|
||||
The image runs ROCm **{{docker.ROCm}}** (preview) based on `TheRock <https://github.com/ROCm/TheRock>`_
|
||||
and includes the following components:
|
||||
|
||||
.. dropdown:: Software components - {{ docker.pull_tag.split('-')|last }}
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - Software component
|
||||
- Version
|
||||
|
||||
{% for component_name, component_data in docker.components.items() %}
|
||||
* - `{{ component_name }} <{{ component_data.url }}>`_
|
||||
- {{ component_data.version }}
|
||||
{% endfor %}
|
||||
|
||||
Follow this guide to pull the required image, spin up a container, download the model, and run a benchmark.
|
||||
For preview and development releases, see `amdsiloai/pytorch-xdit <https://hub.docker.com/r/amdsiloai/pytorch-xdit>`_.
|
||||
|
||||
What's new
|
||||
==========
|
||||
|
||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.3-inference-models.yaml
|
||||
|
||||
{% set docker = data.docker %}
|
||||
|
||||
{% for item in docker.whats_new %}
|
||||
* {{ item }}
|
||||
{% endfor %}
|
||||
|
||||
.. _xdit-video-diffusion-supported-models-263:
|
||||
|
||||
Supported models
|
||||
================
|
||||
|
||||
The following models are supported for inference performance benchmarking.
|
||||
Some instructions, commands, and recommendations in this documentation might
|
||||
vary by model -- select one to get started.
|
||||
|
||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.3-inference-models.yaml
|
||||
|
||||
{% set docker = data.docker %}
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<div id="vllm-benchmark-ud-params-picker" class="container-fluid">
|
||||
<div class="row gx-0">
|
||||
<div class="col-2 me-1 px-2 model-param-head">Model</div>
|
||||
<div class="row col-10 pe-0">
|
||||
{% for model_group in docker.supported_models %}
|
||||
<div class="col-6 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.js_tag }}" tabindex="0">{{ model_group.group }}</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="row gx-0 pt-1">
|
||||
<div class="col-2 me-1 px-2 model-param-head">Variant</div>
|
||||
<div class="row col-10 pe-0">
|
||||
{% for model_group in docker.supported_models %}
|
||||
{% set models = model_group.models %}
|
||||
{% for model in models %}
|
||||
{% if models|length % 3 == 0 %}
|
||||
<div class="col-4 px-2 model-param" data-param-k="model" data-param-v="{{ model.js_tag }}" data-param-group="{{ model_group.js_tag }}" tabindex="0">{{ model.model }}</div>
|
||||
{% else %}
|
||||
<div class="col-6 px-2 model-param" data-param-k="model" data-param-v="{{ model.js_tag }}" data-param-group="{{ model_group.js_tag }}" tabindex="0">{{ model.model }}</div>
|
||||
{% endif %}
|
||||
{% endfor %}
|
||||
{% endfor %}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{% for model_group in docker.supported_models %}
|
||||
{% for model in model_group.models %}
|
||||
|
||||
.. container:: model-doc {{ model.js_tag }}
|
||||
|
||||
.. note::
|
||||
|
||||
To learn more about your specific model see the `{{ model.model }} model card on Hugging Face <{{ model.url }}>`_
|
||||
or visit the `GitHub page <{{ model.github }}>`__. Note that some models require access authorization before use via an
|
||||
external license agreement through a third party.
|
||||
|
||||
{% endfor %}
|
||||
{% endfor %}
|
||||
|
||||
System validation
|
||||
=================
|
||||
|
||||
Before running AI workloads, it's important to validate that your AMD hardware is configured
|
||||
correctly and performing optimally.
|
||||
|
||||
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
|
||||
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
|
||||
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
|
||||
before starting.
|
||||
|
||||
To test for optimal performance, consult the recommended :ref:`System health benchmarks
|
||||
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
|
||||
system's configuration.
|
||||
|
||||
Pull the Docker image
|
||||
=====================
|
||||
|
||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.3-inference-models.yaml
|
||||
|
||||
{% set docker = data.docker %}
|
||||
|
||||
For this tutorial, it's recommended to use the latest ``{{ docker.pull_tag }}`` Docker image.
|
||||
Pull the image using the following command:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker pull {{ docker.pull_tag }}
|
||||
|
||||
Validate and benchmark
|
||||
======================
|
||||
|
||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.3-inference-models.yaml
|
||||
|
||||
{% set docker = data.docker %}
|
||||
|
||||
Once the image has been downloaded you can follow these steps to
|
||||
run benchmarks and generate outputs.
|
||||
|
||||
{% for model_group in docker.supported_models %}
|
||||
{% for model in model_group.models %}
|
||||
|
||||
.. container:: model-doc {{model.js_tag}}
|
||||
|
||||
The following commands are written for {{ model.model }}.
|
||||
See :ref:`xdit-video-diffusion-supported-models-263` to switch to another available model.
|
||||
|
||||
{% endfor %}
|
||||
{% endfor %}
|
||||
|
||||
Choose your setup method
|
||||
------------------------
|
||||
|
||||
You can either use an existing Hugging Face cache or download the model fresh inside the container.
|
||||
|
||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.3-inference-models.yaml
|
||||
|
||||
{% set docker = data.docker %}
|
||||
|
||||
{% for model_group in docker.supported_models %}
|
||||
{% for model in model_group.models %}
|
||||
.. container:: model-doc {{model.js_tag}}
|
||||
|
||||
.. tab-set::
|
||||
|
||||
.. tab-item:: Option 1: Use existing Hugging Face cache
|
||||
|
||||
If you already have models downloaded on your host system, you can mount your existing cache.
|
||||
|
||||
1. Set your Hugging Face cache location.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
export HF_HOME=/your/hf_cache/location
|
||||
|
||||
2. Download the model (if not already cached).
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
huggingface-cli download {{ model.model_repo }} {% if model.revision %} --revision {{ model.revision }} {% endif %}
|
||||
|
||||
3. Launch the container with mounted cache.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker run \
|
||||
-it --rm \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
--user root \
|
||||
--device=/dev/kfd \
|
||||
--device=/dev/dri \
|
||||
--group-add video \
|
||||
--ipc=host \
|
||||
--network host \
|
||||
--privileged \
|
||||
--shm-size 128G \
|
||||
--name pytorch-xdit \
|
||||
-e HSA_NO_SCRATCH_RECLAIM=1 \
|
||||
-e OMP_NUM_THREADS=16 \
|
||||
-e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
|
||||
-e HF_HOME=/app/huggingface_models \
|
||||
-v $HF_HOME:/app/huggingface_models \
|
||||
{{ docker.pull_tag }}
|
||||
|
||||
.. tab-item:: Option 2: Download inside container
|
||||
|
||||
If you prefer to keep the container self-contained or don't have an existing cache.
|
||||
|
||||
1. Launch the container
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker run \
|
||||
-it --rm \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
--user root \
|
||||
--device=/dev/kfd \
|
||||
--device=/dev/dri \
|
||||
--group-add video \
|
||||
--ipc=host \
|
||||
--network host \
|
||||
--privileged \
|
||||
--shm-size 128G \
|
||||
--name pytorch-xdit \
|
||||
-e HSA_NO_SCRATCH_RECLAIM=1 \
|
||||
-e OMP_NUM_THREADS=16 \
|
||||
-e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
|
||||
{{ docker.pull_tag }}
|
||||
|
||||
2. Inside the container, set the Hugging Face cache location and download the model.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
export HF_HOME=/app/huggingface_models
|
||||
huggingface-cli download {{ model.model_repo }} {% if model.revision %} --revision {{ model.revision }} {% endif %}
|
||||
|
||||
.. warning::
|
||||
|
||||
Models will be downloaded to the container's filesystem and will be lost when the container is removed unless you persist the data with a volume.
|
||||
{% endfor %}
|
||||
{% endfor %}
|
||||
|
||||
Run inference
|
||||
=============
|
||||
|
||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.3-inference-models.yaml
|
||||
|
||||
{% set docker = data.docker %}
|
||||
|
||||
{% for model_group in docker.supported_models %}
|
||||
{% for model in model_group.models %}
|
||||
|
||||
.. container:: model-doc {{ model.js_tag }}
|
||||
|
||||
.. tab-set::
|
||||
|
||||
.. tab-item:: MAD-integrated benchmarking
|
||||
|
||||
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
|
||||
directory and install the required packages on the host machine.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
git clone https://github.com/ROCm/MAD
|
||||
cd MAD
|
||||
pip install -r requirements.txt
|
||||
|
||||
2. On the host machine, use this command to run the performance benchmark test on
|
||||
the `{{model.model}} <{{ model.url }}>`_ model using one node.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
|
||||
madengine run \
|
||||
--tags {{model.mad_tag}} \
|
||||
--keep-model-dir \
|
||||
--live-output
|
||||
|
||||
MAD launches a Docker container with the name
|
||||
``container_ci-{{model.mad_tag}}``. The throughput and serving reports of the
|
||||
model are collected in the following paths: ``{{ model.mad_tag }}_throughput.csv``
|
||||
and ``{{ model.mad_tag }}_serving.csv``.
|
||||
|
||||
.. tab-item:: Standalone benchmarking
|
||||
|
||||
To run the benchmarks for {{ model.model }}, use the following command:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
{{ model.benchmark_command
|
||||
| map('replace', '{model_repo}', model.model_repo)
|
||||
| map('trim')
|
||||
| join('\n ') }}
|
||||
|
||||
The generated content and timing information will be stored under the results directory.
|
||||
|
||||
{% endfor %}
|
||||
{% endfor %}
|
||||
|
||||
Previous versions
|
||||
=================
|
||||
|
||||
See
|
||||
:doc:`/how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-history`
|
||||
to find documentation for previous releases of xDiT diffusion inference
|
||||
performance testing.
|
||||
@@ -15,11 +15,20 @@ benchmarking, see the version-specific documentation.
|
||||
- Components
|
||||
- Resources
|
||||
|
||||
* - ``rocm/pytorch-xdit:v26.3`` (latest)
|
||||
* - ``rocm/pytorch-xdit:v26.4`` (latest)
|
||||
-
|
||||
* TheRock e40a6da
|
||||
* `ROCm 7.12.0 preview <https://rocm.docs.amd.com/en/7.12.0-preview/about/release-notes.html>`__
|
||||
* TheRock 9b611c6
|
||||
-
|
||||
* :doc:`Documentation </how-to/rocm-for-ai/inference/xdit-diffusion-inference>`
|
||||
* `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-xdit/v26.4/images/sha256-b4296a638eb8dc7ebcafc808e180b78a3c44177580c21986082ec9539496067c>`__
|
||||
|
||||
* - ``rocm/pytorch-xdit:v26.3``
|
||||
-
|
||||
* `ROCm 7.12.0 preview <https://rocm.docs.amd.com/en/7.12.0-preview/about/release-notes.html>`__
|
||||
* TheRock e40a6da
|
||||
-
|
||||
* :doc:`Documentation <xdit-26.3>`
|
||||
* `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-xdit/v26.3/images/sha256-ac78a03d2911bf1b49c001d3be2e8bd745c1bc455cb49ae972825a7986880902>`__
|
||||
|
||||
* - ``rocm/pytorch-xdit:v26.2``
|
||||
|
||||
@@ -692,7 +692,7 @@ This performance test supports the following models:
|
||||
* [DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528)
|
||||
|
||||
To set up your environment and download the models using the Hugging Face CLI,
|
||||
use the following commands. Modify the `huggingface-cli download` command
|
||||
use the following commands. Modify the `hf download` command
|
||||
to download the desired model.
|
||||
|
||||
```bash
|
||||
@@ -704,7 +704,7 @@ pip install huggingface_hub
|
||||
|
||||
# Download the model to the shared NFS mount point
|
||||
# Replace 'deepseek-ai/DeepSeek-R1-0528' with your desired model
|
||||
huggingface-cli download --token <your_hf_token> \
|
||||
hf download --token <your_hf_token> \
|
||||
deepseek-ai/DeepSeek-R1-0528 \
|
||||
--local-dir /mount/point/models/DeepSeek-R1
|
||||
```
|
||||
|
||||
@@ -387,7 +387,7 @@ source ~/venvs/hf/bin/activate
|
||||
pip install huggingface_hub
|
||||
|
||||
# Download the model to the shared NFS mount point
|
||||
huggingface-cli download --token <your_hf_token> \
|
||||
hf download --token <your_hf_token> \
|
||||
EmbeddedLLM/deepseek-r1-FP8-Dynamic \
|
||||
--local-dir /mount/point/models/EmbeddedLLM/deepseek-r1-FP8-Dynamic
|
||||
```
|
||||
|
||||
@@ -35,3 +35,5 @@ training, fine-tuning, and inference. It leverages popular machine learning fram
|
||||
- :doc:`xDiT diffusion inference <xdit-diffusion-inference>`
|
||||
|
||||
- :doc:`Deploying your model <deploy-your-model>`
|
||||
|
||||
- :doc:`xDiT diffusion inference <xdit-diffusion-inference>`
|
||||
|
||||
@@ -15,7 +15,7 @@ xDiT diffusion inference
|
||||
|
||||
The `rocm/pytorch-xdit <{{ docker.docker_hub_url }}>`_ Docker image offers a prebuilt, optimized environment based on `xDiT <https://github.com/xdit-project/xDiT>`_ for
|
||||
benchmarking diffusion model video and image generation on gfx942 and gfx950 series (AMD Instinct™ MI300X, MI325X, MI350X, and MI355X) GPUs.
|
||||
The image runs ROCm **{{docker.ROCm}}** (preview) based on `TheRock <https://github.com/ROCm/TheRock>`_
|
||||
The image runs `ROCm {{docker.ROCm}} (preview) <https://rocm.docs.amd.com/en/7.12.0-preview/about/release-notes.html>`__ based on `TheRock <https://github.com/ROCm/TheRock>`_
|
||||
and includes the following components:
|
||||
|
||||
.. dropdown:: Software components - {{ docker.pull_tag.split('-')|last }}
|
||||
@@ -36,6 +36,7 @@ For preview and development releases, see `amdsiloai/pytorch-xdit <https://hub.d
|
||||
|
||||
What's new
|
||||
==========
|
||||
|
||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/xdit-inference-models.yaml
|
||||
|
||||
{% set docker = data.docker %}
|
||||
@@ -179,7 +180,7 @@ You can either use an existing Hugging Face cache or download the model fresh in
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
huggingface-cli download {{ model.model_repo }} {% if model.revision %} --revision {{ model.revision }} {% endif %}
|
||||
hf download {{ model.model_repo }} {% if model.revision %} --revision {{ model.revision }} {% endif %}
|
||||
|
||||
3. Launch the container with mounted cache.
|
||||
|
||||
@@ -236,7 +237,7 @@ You can either use an existing Hugging Face cache or download the model fresh in
|
||||
.. code-block:: shell
|
||||
|
||||
export HF_HOME=/app/huggingface_models
|
||||
huggingface-cli download {{ model.model_repo }} {% if model.revision %} --revision {{ model.revision }} {% endif %}
|
||||
hf download {{ model.model_repo }} {% if model.revision %} --revision {{ model.revision }} {% endif %}
|
||||
|
||||
.. warning::
|
||||
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using JAX MaxText for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using JAX MaxText for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using JAX MaxText for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using JAX MaxText for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using JAX MaxText for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using JAX MaxText for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
*****************************************************************
|
||||
Migrating workloads to Primus (Megatron backend) from Megatron-LM
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using ROCm Megatron-LM
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using PyTorch for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using PyTorch for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using PyTorch for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using PyTorch for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using PyTorch for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using PyTorch for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using PyTorch for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using PyTorch for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using PyTorch for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using PyTorch for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using PyTorch for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using PyTorch for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using PyTorch for ROCm.
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
:orphan:
|
||||
:no-search:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using PyTorch for ROCm.
|
||||
|
||||
@@ -631,8 +631,8 @@ To launch the training job on a SLURM cluster for Llama 3.3 70B, run the followi
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
huggingface-cli login # Get access to HF Llama model space
|
||||
huggingface-cli download meta-llama/Llama-3.3-70B-Instruct --local-dir ./models/Llama-3.3-70B-Instruct # Download the Llama 3.3 model locally
|
||||
hf auth login # Get access to HF Llama model space
|
||||
hf download meta-llama/Llama-3.3-70B-Instruct --local-dir ./models/Llama-3.3-70B-Instruct # Download the Llama 3.3 model locally
|
||||
# In the MAD repository
|
||||
cd scripts/pytorch_train
|
||||
sbatch Torchtune_Multinode.sh
|
||||
|
||||
@@ -14,7 +14,7 @@ optimize performance for specific types of workloads or use-cases. The contents
|
||||
|
||||
.. grid-item-card:: AMD RDNA
|
||||
|
||||
* :doc:`AMD RDNA3.5 system optimization <strixhalo>`
|
||||
* :doc:`AMD RDNA3.5 system optimization <rdna3-5>`
|
||||
* :doc:`AMD RDNA2 system optimization <w6000-v620>`
|
||||
|
||||
.. grid-item-card:: AMD Instinct
|
||||
|
||||
273
docs/how-to/system-optimization/rdna3-5.rst
Normal file
273
docs/how-to/system-optimization/rdna3-5.rst
Normal file
@@ -0,0 +1,273 @@
|
||||
.. meta::
|
||||
:description: System optimization of AMD RDNA3.5 Ryzen APUs (gfx1150/gfx1151/gfx1152) systems. Learn about VRAM, GTT, TTM tuning, shared memory configuration, and required Linux kernel support.
|
||||
:keywords: AMD RDNA3.5, Ryzen APU, gfx1150, gfx1151, gfx1152, ROCm, VRAM, GTT, GART, TTM, GPUVM, system optimization
|
||||
|
||||
:orphan:
|
||||
|
||||
.. _strix-halo-optimization:
|
||||
|
||||
==========================================
|
||||
AMD RDNA3.5 system optimization
|
||||
==========================================
|
||||
|
||||
This topic describes how to optimize systems powered by AMD Ryzen APUs with
|
||||
RDNA3.5 architecture. These APUs combine high-performance CPU cores with
|
||||
integrated RDNA3.5 graphics, and support LPDDR5X-8000 or DDR5 memory, making
|
||||
them particularly well-suited for:
|
||||
|
||||
* LLM development and inference systems
|
||||
* High-performance workstations
|
||||
* Virtualization hosts running multiple VMs
|
||||
* GPU compute and parallel processing
|
||||
* Gaming systems
|
||||
* Home servers and AI development platforms
|
||||
|
||||
.. _memory-settings:
|
||||
|
||||
Memory settings
|
||||
===============
|
||||
|
||||
AMD Ryzen APUs with RDNA3.5 architecture (gfx1150, gfx1151, and gfx1152 LLVM
|
||||
targets) memory access is handled through GPU Virtual Memory (GPUVM), which
|
||||
provides per-process GPU virtual address spaces (VMIDs) rather than a separate,
|
||||
discrete VRAM pool.
|
||||
|
||||
As a result, memory on RDNA3.5 APUs is mapped rather than physically
|
||||
partitioned. The terms Graphics Address Remapping Table (GART) and Graphics
|
||||
Translation Table (GTT) describe limits on how much system memory can be mapped
|
||||
into GPU address spaces and who can use it, rather than distinct types of
|
||||
physical memory.
|
||||
|
||||
* **GART**
|
||||
|
||||
Defines the amount of platform address space (system RAM or Memory-Mapped I/O)
|
||||
that can be mapped into the GPU virtual address space used by the kernel driver.
|
||||
On systems with physically shared CPU and GPU memory, such as RDNA3.5-based
|
||||
systems, this mapped system memory effectively serves as VRAM for the GPU.
|
||||
GART is typically kept relatively small to limit GPU page-table size and is
|
||||
primarily used for driver-internal operations.
|
||||
|
||||
* **GTT**
|
||||
|
||||
Defines the amount of system RAM that can be mapped into GPU virtual address
|
||||
spaces for user processes. This is the memory pool used by applications such
|
||||
as PyTorch and other AI/compute workloads. GTT allocations are dynamic and
|
||||
not permanently reserved, allowing the operating system to reclaim memory when
|
||||
the GPU isn't actively using it. By default, the GTT limit is set to
|
||||
approximately 50 percent of total system RAM.
|
||||
|
||||
.. note::
|
||||
|
||||
On systems with physically shared CPU and GPU memory, such as RDNA3.5-based
|
||||
systems, several terms are often used interchangeably in firmware menus,
|
||||
documentation, and community discussions:
|
||||
|
||||
* VRAM
|
||||
* Carve-out
|
||||
* GART
|
||||
* Dedicated GPU memory
|
||||
* Firmware-reserved GPU memory
|
||||
|
||||
In this topic, VRAM will be used going forward.
|
||||
|
||||
You can adjust the amount of memory available to the GPU by:
|
||||
|
||||
* Increasing the VRAM in BIOS, or
|
||||
|
||||
* Reducing the configured GTT size to be smaller than the reserved amount.
|
||||
|
||||
If the GTT size is larger than the VRAM, the AMD GPU driver performs VRAM
|
||||
allocations using GTT (GTT-backed allocations), as described in the
|
||||
`torvalds/linux@759e764 <https://github.com/torvalds/linux/commit/759e764f7d587283b4e0b01ff930faca64370e59>`_
|
||||
GitHub commit.
|
||||
|
||||
Because memory is physically shared, there's no performance distinction
|
||||
like that of discrete GPUs where dedicated VRAM is significantly faster than
|
||||
system memory. Firmware may optionally reserve some memory exclusively for GPU
|
||||
use, but this provides little benefit for most workloads while permanently
|
||||
reducing available system memory.
|
||||
|
||||
For this reason, AI frameworks work more efficiently with GTT-backed allocations. GTT
|
||||
allows large, flexible mappings without permanently reserving memory, resulting
|
||||
in better overall system utilization on unified memory systems.
|
||||
|
||||
Configuring shared memory limits on Linux
|
||||
-----------------------------------------
|
||||
|
||||
The maximum amount of shared GPU-accessible memory can be increased by changing
|
||||
the kernel **Translation Table Manager (TTM)** page limit. This setting controls
|
||||
how many system memory pages can be mapped for GPU use and is exposed at:
|
||||
|
||||
::
|
||||
|
||||
/sys/module/ttm/parameters/pages_limit
|
||||
|
||||
The value is expressed in **pages**, and not bytes or gigabytes (GB).
|
||||
|
||||
.. note::
|
||||
|
||||
It's recommended to keep the dedicated VRAM reservation in BIOS small
|
||||
(for example, 0.5 GB) and increasing the shared (TTM/GTT) limit instead.
|
||||
|
||||
A helper utility is available to simplify configuration.
|
||||
|
||||
1. Install ``pipx``:
|
||||
|
||||
::
|
||||
|
||||
sudo apt install pipx
|
||||
pipx ensurepath
|
||||
|
||||
2. Install the AMD debug tools:
|
||||
|
||||
::
|
||||
|
||||
pipx install amd-debug-tools
|
||||
|
||||
3. Query the current shared memory configuration:
|
||||
|
||||
::
|
||||
|
||||
amd-ttm
|
||||
|
||||
4. Set the usable shared memory (in GB):
|
||||
|
||||
::
|
||||
|
||||
amd-ttm --set <NUM>
|
||||
|
||||
5. Reboot for changes to take effect.
|
||||
|
||||
.. note::
|
||||
|
||||
The amd-ttm convert the pages to GB to help the users.
|
||||
|
||||
Example with output
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Check the current settings:
|
||||
|
||||
::
|
||||
|
||||
amd-ttm
|
||||
💻 Current TTM pages limit: 16469033 pages (62.82 GB)
|
||||
💻 Total system memory: 125.65 GB
|
||||
|
||||
Change the usable shared memory:
|
||||
|
||||
::
|
||||
|
||||
❯ amd-ttm --set 100
|
||||
🐧 Successfully set TTM pages limit to 26214400 pages (100.00 GB)
|
||||
🐧 Configuration written to /etc/modprobe.d/ttm.conf
|
||||
○ NOTE: You need to reboot for changes to take effect.
|
||||
Would you like to reboot the system now? (y/n): y
|
||||
|
||||
Revert to kernel defaults:
|
||||
|
||||
::
|
||||
|
||||
❯ amd-ttm --clear
|
||||
🐧 Configuration /etc/modprobe.d/ttm.conf removed
|
||||
Would you like to reboot the system now? (y/n): y
|
||||
|
||||
.. _operating-system-support:
|
||||
|
||||
Operating system support
|
||||
========================
|
||||
|
||||
The ROCm compatibility tables can be found at the following links:
|
||||
|
||||
- `System requirements (Linux) <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html>`_
|
||||
- `System requirements (Microsoft Windows) <https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html>`_
|
||||
|
||||
AMD Ryzen AI Max series APUs (gfx1151) have additional kernel version
|
||||
requirements, as described in the following section.
|
||||
|
||||
Required kernel version
|
||||
-----------------------
|
||||
|
||||
Support for AMD Ryzen AI Max series APUs requires specific Linux kernel fixes
|
||||
that update internal limits in the AMD KFD driver to ensure correct queue
|
||||
creation and memory availability checks. Without these updates, GPU compute
|
||||
workloads might fail to initialize or exhibit unpredictable behavior.
|
||||
|
||||
The following commits are required for AMD Ryzen AI Max series support:
|
||||
|
||||
- `gregkh/linux@7f26af7 <https://github.com/gregkh/linux/commit/7f26af7bf9b76c2c2a1a761aab5803e52be21eea>`_
|
||||
- `gregkh/linux@7445db6 <https://github.com/gregkh/linux/commit/7445db6a7d5a0242d8214582b480600b266cba9e>`_
|
||||
|
||||
These patches are available in the following minimum kernel versions:
|
||||
|
||||
- Ubuntu 24.04 Hardware Enablement (HWE): ``6.17.0-19.19~24.04.2`` or later
|
||||
- Ubuntu 24.04 Original Equipment Manufacturer (OEM): ``6.14.0-1018`` or later
|
||||
- All other distributions: Linux kernel ``6.18.4`` or later
|
||||
|
||||
The table below reflects compatibility for AMD-released pre-built ROCm
|
||||
binaries only. Distributions that ship native ROCm packaging might
|
||||
provide different support levels.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 0
|
||||
:widths: 10 90
|
||||
|
||||
* - ❌
|
||||
- Unsupported combination
|
||||
* - ⚠️
|
||||
- Unstable/experimental combination
|
||||
* - ✅
|
||||
- Stable and supported combination
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 20 50 15 15
|
||||
|
||||
* - ROCm Release
|
||||
- | Ubuntu 24.04 HWE (>= 6.17.0-19.19~24.04.2),
|
||||
| Ubuntu 24.04 OEM (>= 6.14.0-1018) or
|
||||
| Ubuntu 26.04 Generic
|
||||
- Other distributions >= 6.18.4
|
||||
- Other distributions < 6.18.4
|
||||
|
||||
* - 7.11.0 or 7.12.0
|
||||
- ✅
|
||||
- ✅
|
||||
- ⚠️
|
||||
|
||||
* - 7.10.0 or 7.9.0
|
||||
- ❌
|
||||
- ❌
|
||||
- ⚠️
|
||||
|
||||
* - 7.2.1
|
||||
- ✅
|
||||
- ✅
|
||||
- ⚠️
|
||||
|
||||
* - 7.2.0
|
||||
- ✅
|
||||
- ✅
|
||||
- ❌
|
||||
|
||||
* - 7.1.x
|
||||
- ❌
|
||||
- ❌
|
||||
- ⚠️
|
||||
|
||||
* - 6.4.x
|
||||
- ❌
|
||||
- ❌
|
||||
- ⚠️
|
||||
|
||||
.. note::
|
||||
|
||||
Ubuntu 24.04 HWE kernels earlier than ``6.17.0-19.19~24.04.2`` and Ubuntu
|
||||
24.04 OEM kernels earlier than ``6.14.0-1018`` are not supported for
|
||||
RDNA3.5 APUs.
|
||||
|
||||
The following distributions include the required fixes in their native
|
||||
packaging, independent of AMD pre-built binaries:
|
||||
|
||||
- Fedora 43
|
||||
- Ubuntu 26.04
|
||||
- Arch Linux 2026.02.01
|
||||
@@ -1,289 +0,0 @@
|
||||
.. meta::
|
||||
:description: Learn about system settings and performance tuning for AMD Strix Halo (Ryzen AI MAX/MAX+) APUs.
|
||||
:keywords: Strix Halo, Ryzen AI MAX, workstation, BIOS, installation, APU, optimization, ROCm
|
||||
|
||||
.. _strix-halo-optimization:
|
||||
|
||||
==========================================
|
||||
AMD Strix Halo system optimization
|
||||
==========================================
|
||||
|
||||
This document provides guidance for optimizing systems powered by AMD Ryzen AI
|
||||
MAX and MAX+ processors (codenamed Strix Halo). These APUs combine
|
||||
high-performance CPU cores with integrated RDNA 3.5 graphics and support up to
|
||||
128GB of unified LPDDR5X-8000 memory, making them particularly well-suited for:
|
||||
|
||||
* LLM development and inference systems
|
||||
* High-performance workstations
|
||||
* Virtualization hosts running multiple VMs
|
||||
* GPU compute and parallel processing
|
||||
* Gaming systems
|
||||
* Home servers and AI development platforms
|
||||
|
||||
The main purpose of this document is to help users utilize Strix Halo APUs to
|
||||
their full potential through proper system configuration.
|
||||
|
||||
.. _memory-settings:
|
||||
|
||||
Memory settings
|
||||
===============
|
||||
|
||||
On Strix Halo GPUs (gfx1151) memory access is handled through GPU Virtual Memory
|
||||
(GPUVM), which provides per-process GPU virtual address spaces (VMIDs) rather
|
||||
than a separate, discrete VRAM pool.
|
||||
|
||||
As a result, memory on Strix Halo is **mapped**, not physically partitioned.
|
||||
The terms Graphics Address Remapping Table (GART) and GTT (Graphics Translation
|
||||
Table) describe limits on how much system memory can be mapped into GPU address
|
||||
spaces and who can use it, rather than distinct types of physical memory.
|
||||
|
||||
* **GART**
|
||||
|
||||
Defines the amount of platform address space (system RAM or Memory-Mapped I/O)
|
||||
that can be mapped into the GPU virtual address space used by the kernel driver.
|
||||
On systems with physically shared CPU and GPU memory, such as Strix Halo, this
|
||||
mapped system memory effectively serves as VRAM for the GPU. GART is typically
|
||||
kept relatively small to limit GPU page-table size and is mainly used for
|
||||
driver-internal operations.
|
||||
|
||||
* **GTT**
|
||||
|
||||
Defines the amount of system RAM that can be mapped into GPU virtual address
|
||||
spaces for user processes. This is the memory pool used by applications such
|
||||
as PyTorch and other AI/compute workloads. GTT allocations are dynamic and are
|
||||
not permanently reserved, allowing the operating system to reclaim memory when
|
||||
it is not actively used by the GPU. By default, the GTT limit is set to
|
||||
approximately 50% of total system RAM.
|
||||
|
||||
.. note::
|
||||
|
||||
On systems with physically shared CPU and GPU memory such as Strix Halo,
|
||||
several terms are often used interchangeably in firmware menus, documentations
|
||||
and community discussions:
|
||||
|
||||
* VRAM
|
||||
* Carve-out
|
||||
* GART
|
||||
* Dedicated GPU memory
|
||||
* Firmware-reserved GPU memory
|
||||
|
||||
In this document, we will use VRAM from this point onward.
|
||||
|
||||
If desired, you can adjust how much memory is preferentially available to the
|
||||
GPU by:
|
||||
|
||||
* Increasing the VRAM in BIOS, or
|
||||
|
||||
* Reducing the configured GTT size so it is smaller than the reserved amount.
|
||||
|
||||
If the GTT size bigger than VRAM at that case the amdgpu driver for VRAM allocation
|
||||
using GTT (GTT-backed allocations) as you can see in
|
||||
`torvalds/linux@759e764 <https://github.com/torvalds/linux/commit/759e764f7d587283b4e0b01ff930faca64370e59>`_
|
||||
commit.
|
||||
|
||||
Because memory is physically shared, there is no performance distinction
|
||||
similar to discrete GPUs where dedicated VRAM is significantly faster than
|
||||
system memory. Firmware may optionally reserve some memory exclusively for GPU
|
||||
use, but this provides little benefit for most workloads while permanently
|
||||
reducing available system memory.
|
||||
|
||||
For this reason, AI frameworks typically prefer GTT-backed allocations. GTT
|
||||
allows large, flexible mappings without permanently reserving memory, resulting
|
||||
in better overall system utilization on unified memory systems.
|
||||
|
||||
Configuring shared memory limits on linux
|
||||
-----------------------------------------
|
||||
|
||||
The maximum amount of shared GPU-accessible memory can be increased by changing
|
||||
the kernel **Translation Table Manager (TTM)** page limit. This setting controls
|
||||
how many system memory pages may be mapped for GPU use and is exposed at:
|
||||
|
||||
::
|
||||
|
||||
/sys/module/ttm/parameters/pages_limit
|
||||
|
||||
The value is expressed in **pages**, not bytes or gigabytes (GB).
|
||||
|
||||
.. note::
|
||||
|
||||
AMD recommends keeping the dedicated VRAM reservation in BIOS small
|
||||
(for example 0.5 GB) and increasing the shared (TTM/GTT) limit instead.
|
||||
|
||||
A helper utility is available to simplify configuration.
|
||||
|
||||
1. Install ``pipx``:
|
||||
|
||||
::
|
||||
|
||||
sudo apt install pipx
|
||||
pipx ensurepath
|
||||
|
||||
2. Install the AMD debug tools:
|
||||
|
||||
::
|
||||
|
||||
pipx install amd-debug-tools
|
||||
|
||||
3. Query the current shared memory configuration:
|
||||
|
||||
::
|
||||
|
||||
amd-ttm
|
||||
|
||||
4. Set the usable shared memory (in GB):
|
||||
|
||||
::
|
||||
|
||||
amd-ttm --set <NUM>
|
||||
|
||||
5. Reboot for changes to take effect.
|
||||
|
||||
.. note::
|
||||
|
||||
The amd-ttm convert the pages to GB to help the users.
|
||||
|
||||
**Example with output**
|
||||
|
||||
Check the current settings:
|
||||
|
||||
::
|
||||
|
||||
amd-ttm
|
||||
💻 Current TTM pages limit: 16469033 pages (62.82 GB)
|
||||
💻 Total system memory: 125.65 GB
|
||||
|
||||
Change the usable shared memory:
|
||||
|
||||
::
|
||||
|
||||
❯ amd-ttm --set 100
|
||||
🐧 Successfully set TTM pages limit to 26214400 pages (100.00 GB)
|
||||
🐧 Configuration written to /etc/modprobe.d/ttm.conf
|
||||
○ NOTE: You need to reboot for changes to take effect.
|
||||
Would you like to reboot the system now? (y/n): y
|
||||
|
||||
Revert to kernel defaults:
|
||||
|
||||
::
|
||||
|
||||
❯ amd-ttm --clear
|
||||
🐧 Configuration /etc/modprobe.d/ttm.conf removed
|
||||
Would you like to reboot the system now? (y/n): y
|
||||
|
||||
.. _operating-system-support:
|
||||
|
||||
Operating system support
|
||||
========================
|
||||
|
||||
The ROCm compatibility tables can be found at the following links:
|
||||
|
||||
- `System requirements (Linux) <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html>`_
|
||||
- `System requirements (Windows) <https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html>`_
|
||||
|
||||
However, for Strix Halo there are additional kernel version requirements,
|
||||
which are described in the following section.
|
||||
|
||||
Required kernel version
|
||||
-----------------------
|
||||
|
||||
Support for Strix Halo requires specific fixes in the Linux kernel that
|
||||
update internal limits in the AMD KFD driver to ensure correct queue
|
||||
creation and memory availability checks. Without these updates, GPU
|
||||
compute workloads may fail to initialize or behave unpredictably. The
|
||||
necessary Linux kernel patches have been merged upstream and are
|
||||
included in Linux kernel 6.18.4 and newer releases.
|
||||
|
||||
The following commits are required for Strix Halo support:
|
||||
|
||||
- `gregkh/linux@7f26af7 <https://github.com/gregkh/linux/commit/7f26af7bf9b76c2c2a1a761aab5803e52be21eea>`_
|
||||
- `gregkh/linux@7445db6 <https://github.com/gregkh/linux/commit/7445db6a7d5a0242d8214582b480600b266cba9e>`_
|
||||
|
||||
The table below reflects compatibility for **AMD-released pre-built ROCm
|
||||
binaries only**. Distributions that ship **native ROCm packaging** may
|
||||
provide different support levels.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 0
|
||||
:widths: 10 90
|
||||
|
||||
* - ❌
|
||||
- Unsupported combination
|
||||
* - ⚠️
|
||||
- Unstable / experimental combination
|
||||
* - ✅
|
||||
- Stable and supported combination
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 12 14 14 16 14 16 16
|
||||
|
||||
* - ROCm Release
|
||||
- Ubuntu 24.04 HWE
|
||||
- Ubuntu 24.04 OEM (<= 6.14.0-1017)
|
||||
- Ubuntu 24.04 OEM (>= 6.14.0-1018)
|
||||
- Ubuntu 26.04 Generic
|
||||
- Generic Distro < 6.18.4
|
||||
- Generic Distro >= 6.18.4
|
||||
|
||||
* - 7.11.0
|
||||
- ⚠️
|
||||
- ⚠️
|
||||
- ✅
|
||||
- ✅
|
||||
- ⚠️
|
||||
- ✅
|
||||
|
||||
* - 7.10.0
|
||||
- ⚠️
|
||||
- ⚠️
|
||||
- ❌
|
||||
- ❌
|
||||
- ⚠️
|
||||
- ❌
|
||||
|
||||
* - 7.9.0
|
||||
- ⚠️
|
||||
- ⚠️
|
||||
- ❌
|
||||
- ❌
|
||||
- ⚠️
|
||||
- ❌
|
||||
|
||||
* - 7.2.1
|
||||
- ⚠️
|
||||
- ⚠️
|
||||
- ✅
|
||||
- ✅
|
||||
- ⚠️
|
||||
- ✅
|
||||
|
||||
* - 7.2.0
|
||||
- ❌
|
||||
- ✅
|
||||
- ✅
|
||||
- ✅
|
||||
- ❌
|
||||
- ✅
|
||||
|
||||
* - 7.1.x
|
||||
- ⚠️
|
||||
- ⚠️
|
||||
- ❌
|
||||
- ❌
|
||||
- ⚠️
|
||||
- ❌
|
||||
|
||||
* - 6.4.x
|
||||
- ⚠️
|
||||
- ⚠️
|
||||
- ❌
|
||||
- ❌
|
||||
- ⚠️
|
||||
- ❌
|
||||
|
||||
The following distributions include the required fixes in their
|
||||
native packaging, independent of AMD pre-built binaries:
|
||||
|
||||
- Fedora 43
|
||||
- Ubuntu 26.04
|
||||
- Arch Linux
|
||||
@@ -15,7 +15,7 @@ compatibility with industry software frameworks. For more information, see
|
||||
ROCm supports multiple programming languages and programming interfaces such as
|
||||
{doc}`HIP <hip:index>`, OpenCL, and OpenMP, as explained in the [Programming guide](./how-to/programming_guide.rst).
|
||||
|
||||
If you're using AMD Radeon GPUs or Ryzen APUs in a workstation setting with a display connected, review {doc}`ROCm on Radeon and Ryzen documentation<radeon:index>`.
|
||||
If you're using AMD Radeon™ GPUs or Ryzen™ APUs for graphics workloads, see the {doc}`ROCm on Radeon and Ryzen <radeon:index>` documentation.
|
||||
|
||||
```{note}
|
||||
The [AMD ROCm Programming Guide](https://rocm-handbook.amd.com/projects/amd-rocm-programming-guide/en/latest/)
|
||||
|
||||
@@ -10,6 +10,7 @@
|
||||
|
||||
| Version | Release date |
|
||||
| ------- | ------------ |
|
||||
| [7.2.2](https://rocm.docs.amd.com/en/docs-7.2.2/) | April 14, 2026 |
|
||||
| [7.2.1](https://rocm.docs.amd.com/en/docs-7.2.1/) | March 25, 2026 |
|
||||
| [7.2.0](https://rocm.docs.amd.com/en/docs-7.2.0/) | January 21, 2026 |
|
||||
| [7.1.1](https://rocm.docs.amd.com/en/docs-7.1.1/) | November 26, 2025 |
|
||||
|
||||
@@ -37,7 +37,7 @@ click==8.3.1
|
||||
# sphinx-external-toc
|
||||
comm==0.2.3
|
||||
# via ipykernel
|
||||
cryptography==46.0.6
|
||||
cryptography==46.0.7
|
||||
# via pyjwt
|
||||
debugpy==1.8.19
|
||||
# via ipykernel
|
||||
@@ -162,7 +162,7 @@ pygments==2.20.0
|
||||
# ipython
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
pyjwt[crypto]==2.10.1
|
||||
pyjwt[crypto]==2.12.0
|
||||
# via pygithub
|
||||
pynacl==1.6.2
|
||||
# via pygithub
|
||||
@@ -261,7 +261,7 @@ tabulate==0.9.0
|
||||
# via jupyter-cache
|
||||
tomli==2.4.0
|
||||
# via sphinx
|
||||
tornado==6.5.4
|
||||
tornado==6.5.5
|
||||
# via
|
||||
# ipykernel
|
||||
# jupyter-client
|
||||
|
||||
Reference in New Issue
Block a user