mirror of
https://github.com/ROCm/ROCm.git
synced 2026-04-05 03:01:17 -04:00
Compare commits
1 Commits
develop
...
amd/hsivas
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
3a0807dd1d |
@@ -83,7 +83,6 @@ Cavium
|
|||||||
CentOS
|
CentOS
|
||||||
ChatGPT
|
ChatGPT
|
||||||
Cholesky
|
Cholesky
|
||||||
cholesky
|
|
||||||
CoRR
|
CoRR
|
||||||
Codespaces
|
Codespaces
|
||||||
Commitizen
|
Commitizen
|
||||||
@@ -171,7 +170,6 @@ FluxBenchmark
|
|||||||
Fortran
|
Fortran
|
||||||
Fuyu
|
Fuyu
|
||||||
GALB
|
GALB
|
||||||
GART
|
|
||||||
GAT
|
GAT
|
||||||
GATNE
|
GATNE
|
||||||
GCC
|
GCC
|
||||||
@@ -207,13 +205,11 @@ GPT
|
|||||||
GPU
|
GPU
|
||||||
GPU's
|
GPU's
|
||||||
GPUDirect
|
GPUDirect
|
||||||
GPUVM
|
|
||||||
GPUs
|
GPUs
|
||||||
GraphBolt
|
GraphBolt
|
||||||
GraphSage
|
GraphSage
|
||||||
GRBM
|
GRBM
|
||||||
GRE
|
GRE
|
||||||
GTT
|
|
||||||
GenAI
|
GenAI
|
||||||
GenZ
|
GenZ
|
||||||
GitHub
|
GitHub
|
||||||
@@ -300,11 +296,9 @@ LLMs
|
|||||||
LLVM
|
LLVM
|
||||||
LM
|
LM
|
||||||
logsumexp
|
logsumexp
|
||||||
LPDDR
|
|
||||||
LRU
|
LRU
|
||||||
LSAN
|
LSAN
|
||||||
LSan
|
LSan
|
||||||
lstsq
|
|
||||||
LTS
|
LTS
|
||||||
LSTMs
|
LSTMs
|
||||||
LteAll
|
LteAll
|
||||||
@@ -450,7 +444,6 @@ QPS
|
|||||||
Qcycles
|
Qcycles
|
||||||
QoS
|
QoS
|
||||||
Qwen
|
Qwen
|
||||||
Radix
|
|
||||||
RAII
|
RAII
|
||||||
RAS
|
RAS
|
||||||
RCCL
|
RCCL
|
||||||
@@ -530,7 +523,6 @@ Skylake
|
|||||||
Softmax
|
Softmax
|
||||||
Spack
|
Spack
|
||||||
SplitK
|
SplitK
|
||||||
Strix
|
|
||||||
Supermicro
|
Supermicro
|
||||||
Szegedy
|
Szegedy
|
||||||
TagRAM
|
TagRAM
|
||||||
@@ -541,7 +533,6 @@ TCI
|
|||||||
TCIU
|
TCIU
|
||||||
TCP
|
TCP
|
||||||
TCR
|
TCR
|
||||||
TTM
|
|
||||||
TVM
|
TVM
|
||||||
THREADGROUPS
|
THREADGROUPS
|
||||||
threadgroups
|
threadgroups
|
||||||
@@ -596,9 +587,6 @@ verl's
|
|||||||
VGPR
|
VGPR
|
||||||
VGPRs
|
VGPRs
|
||||||
VM
|
VM
|
||||||
VMID
|
|
||||||
VMIDs
|
|
||||||
VMs
|
|
||||||
VMEM
|
VMEM
|
||||||
VMWare
|
VMWare
|
||||||
VRAM
|
VRAM
|
||||||
@@ -686,7 +674,6 @@ cmake
|
|||||||
cmd
|
cmd
|
||||||
coalescable
|
coalescable
|
||||||
codename
|
codename
|
||||||
codenamed
|
|
||||||
collater
|
collater
|
||||||
comgr
|
comgr
|
||||||
compat
|
compat
|
||||||
@@ -883,7 +870,6 @@ netplan
|
|||||||
num
|
num
|
||||||
numref
|
numref
|
||||||
ocl
|
ocl
|
||||||
openai
|
|
||||||
opencl
|
opencl
|
||||||
opencv
|
opencv
|
||||||
openmp
|
openmp
|
||||||
|
|||||||
85
CHANGELOG.md
85
CHANGELOG.md
@@ -4,84 +4,6 @@ This page is a historical overview of changes made to ROCm components. This
|
|||||||
consolidated changelog documents key modifications and improvements across
|
consolidated changelog documents key modifications and improvements across
|
||||||
different versions of the ROCm software stack and its components.
|
different versions of the ROCm software stack and its components.
|
||||||
|
|
||||||
## ROCm 7.2.1
|
|
||||||
|
|
||||||
See the [ROCm 7.2.1 release notes](https://rocm.docs.amd.com/en/docs-7.2.1/about/release-notes.html#rocm-7-2-1-release-notes)
|
|
||||||
for a complete overview of this release.
|
|
||||||
|
|
||||||
### **AMD SMI** (26.2.2)
|
|
||||||
|
|
||||||
#### Added
|
|
||||||
|
|
||||||
* GPU board and base board temperature sensors to `amd-smi monitor` command.
|
|
||||||
|
|
||||||
#### Resolved issues
|
|
||||||
|
|
||||||
* JSON output was not formatted correctly when using watch mode with metrics.
|
|
||||||
* Output was not properly redirected to file when using JSON format.
|
|
||||||
* CPER component output was not redirected when using the `--follow` option.
|
|
||||||
* Invalid CPER files caused garbage output for AFID lists.
|
|
||||||
* JSON output was not formatted correctly for reset commands.
|
|
||||||
|
|
||||||
### **HIP** (7.2.1)
|
|
||||||
|
|
||||||
#### Resolved issues
|
|
||||||
|
|
||||||
* Corrected the validation of stream capture in global‑capture mode. It is no longer affected by any thread‑local capture‑mode sequences occurring in other threads.
|
|
||||||
* Corrected the return value of `hipEventQuery` and `hipEventSynchronize`. The HIP runtime now properly handles and restricts stream capture within these APIs.
|
|
||||||
* Corrected an issue in the batch-dispatch doorbell for AQL packets to avoid a potential CPU hang.
|
|
||||||
* To address potential delays in memory‑object destruction that could affect application logic, the HIP runtime disables memory‑object reference counting in direct‑dispatch mode.
|
|
||||||
|
|
||||||
#### Changed
|
|
||||||
|
|
||||||
* The `AMD_DIRECT_DISPATCH` environment variable has been deprecated in the HIP runtime.
|
|
||||||
|
|
||||||
### **hipBLASLt** (1.2.2)
|
|
||||||
|
|
||||||
#### Changed
|
|
||||||
|
|
||||||
* Enumeration value update for the Sigmoid Activation Function feature.
|
|
||||||
|
|
||||||
### **rocDecode** (1.7.0)
|
|
||||||
|
|
||||||
#### Upcoming changes
|
|
||||||
|
|
||||||
* The rocDecode GitHub repository will be officially moved to [https://github.com/ROCm/rocm-systems/tree/develop/projects/rocdecode](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocdecode) in an upcoming release.
|
|
||||||
|
|
||||||
### **rocJPEG** (1.4.0)
|
|
||||||
|
|
||||||
#### Changed
|
|
||||||
|
|
||||||
* Bug fixes and performance improvements.
|
|
||||||
|
|
||||||
#### Upcoming changes
|
|
||||||
|
|
||||||
* The rocJPEG GitHub repository will be officially moved to [https://github.com/ROCm/rocm-systems/tree/develop/projects/rocjpeg](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocjpeg) in an upcoming release.
|
|
||||||
|
|
||||||
### **rocSHMEM** (3.2.0)
|
|
||||||
|
|
||||||
#### Added
|
|
||||||
* Warnings to notify if large BAR is not available.
|
|
||||||
|
|
||||||
#### Resolved issues
|
|
||||||
|
|
||||||
* GDA Backend will disable itself when no GDA compatible NICs are available rather than crashing.
|
|
||||||
* Fix memory coherency issues on gfx1201.
|
|
||||||
|
|
||||||
#### Known issues
|
|
||||||
|
|
||||||
* Only 64-bit rocSHMEM atomic APIs are implemented for the GDA conduit.
|
|
||||||
|
|
||||||
### **RPP** (2.2.1)
|
|
||||||
|
|
||||||
#### Added
|
|
||||||
|
|
||||||
* Error-code capture in test scripts for all C++ tests.
|
|
||||||
|
|
||||||
#### Optimized
|
|
||||||
|
|
||||||
* Optimized F16 variants by replacing scalar load/store operations with AVX2 intrinsics for spatter, log, blend, color_cast, flip, crop_mirror_normalize, and exposure kernels.
|
|
||||||
|
|
||||||
## ROCm 7.2.0
|
## ROCm 7.2.0
|
||||||
|
|
||||||
See the [ROCm 7.2.0 release notes](https://rocm.docs.amd.com/en/docs-7.2.0/about/release-notes.html#rocm-7-2-0-release-notes)
|
See the [ROCm 7.2.0 release notes](https://rocm.docs.amd.com/en/docs-7.2.0/about/release-notes.html#rocm-7-2-0-release-notes)
|
||||||
@@ -769,13 +691,6 @@ for a complete overview of this release.
|
|||||||
#### Resolved issues
|
#### Resolved issues
|
||||||
* Test Suite - Error Code Capture updates.
|
* Test Suite - Error Code Capture updates.
|
||||||
|
|
||||||
### **Tensile** (4.45.0)
|
|
||||||
|
|
||||||
#### Removed
|
|
||||||
|
|
||||||
- `op_sel` modifiers for `v_dot4` from Tensile codegen.
|
|
||||||
- Dependency on `rocm-agent-enumerator` during build.
|
|
||||||
|
|
||||||
## ROCm 7.1.1
|
## ROCm 7.1.1
|
||||||
|
|
||||||
See the [ROCm 7.1.1 release notes](https://rocm.docs.amd.com/en/docs-7.1.1/about/release-notes.html#rocm-7-1-1-release-notes)
|
See the [ROCm 7.1.1 release notes](https://rocm.docs.amd.com/en/docs-7.1.1/about/release-notes.html#rocm-7-1-1-release-notes)
|
||||||
|
|||||||
168
README.md
168
README.md
@@ -1,165 +1,49 @@
|
|||||||
<div align="center">
|
# AMD ROCm Software
|
||||||
<img src="docs/data/amd-rocm-logo.png" width="200px" alt="ROCm logo">
|
|
||||||
|
|
||||||
<h3 align="center">
|
|
||||||
Open-source stack designed for GPU computation
|
|
||||||
</h3>
|
|
||||||
|
|
||||||
<p align="center">
|
|
||||||
<a href="https://rocm.docs.amd.com/en/latest/"><b>Docs</b></a> • <a href="https://rocm.blogs.amd.com/"><b>Blogs</b></a> • <a href="https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/"><b>Tutorials</b></a> • <a href="https://rocm.docs.amd.com/en/latest/how-to/deep-learning-rocm.html"><b>Deep learning frameworks</b></a> • <a href="https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/index.html"><b>ROCm for AI</b></a>
|
|
||||||
</p>
|
|
||||||
|
|
||||||
</div>
|
|
||||||
|
|
||||||
# AMD ROCm™ software
|
|
||||||
|
|
||||||
ROCm is an open-source stack, composed primarily of open-source software, designed for graphics
|
ROCm is an open-source stack, composed primarily of open-source software, designed for graphics
|
||||||
processing unit (GPU) computation. ROCm consists of a collection of drivers, development tools, and
|
processing unit (GPU) computation. ROCm consists of a collection of drivers, development tools, and
|
||||||
APIs that enable GPU programming from low-level kernel to end-user applications.
|
APIs that enable GPU programming from low-level kernel to end-user applications.
|
||||||
|
|
||||||
You can customize the ROCm software to meet your specific needs. You can develop,
|
With ROCm, you can customize your GPU software to meet your specific needs. You can develop,
|
||||||
collaborate, test, and deploy your applications in a free, open-source, integrated, and secure software
|
collaborate, test, and deploy your applications in a free, open source, integrated, and secure software
|
||||||
ecosystem. ROCm is particularly well-suited to GPU-accelerated high-performance computing (HPC),
|
ecosystem. ROCm is particularly well-suited to GPU-accelerated high-performance computing (HPC),
|
||||||
artificial intelligence (AI), scientific computing, and computer-aided design (CAD).
|
artificial intelligence (AI), scientific computing, and computer aided design (CAD).
|
||||||
|
|
||||||
ROCm is powered by [HIP](https://github.com/ROCm/rocm-systems/tree/develop/projects/hip),
|
ROCm is powered by AMD’s
|
||||||
a C++ runtime API and kernel language for AMD GPUs. HIP allows developers to create portable
|
[Heterogeneous-computing Interface for Portability (HIP)](https://github.com/ROCm/HIP),
|
||||||
applications by providing a programming interface that is similar to NVIDIA CUDA™.
|
an open-source software C++ GPU programming environment and its corresponding runtime. HIP
|
||||||
|
allows ROCm developers to create portable applications on different platforms by deploying code on a
|
||||||
|
range of platforms, from dedicated gaming GPUs to exascale HPC clusters.
|
||||||
|
|
||||||
ROCm supports programming models, such as OpenMP and OpenCL, and includes all necessary
|
ROCm supports programming models, such as OpenMP and OpenCL, and includes all necessary open
|
||||||
open-source software compilers, debuggers, and libraries. ROCm is fully integrated into machine learning
|
source software compilers, debuggers, and libraries. ROCm is fully integrated into machine learning
|
||||||
(ML) frameworks, such as PyTorch and TensorFlow.
|
(ML) frameworks, such as PyTorch and TensorFlow.
|
||||||
|
|
||||||
> [!IMPORTANT]
|
> [!IMPORTANT]
|
||||||
> A new open-source build platform for ROCm is under development at
|
> A new open source build platform for ROCm is under development at
|
||||||
> https://github.com/ROCm/TheRock, featuring a unified CMake build with bundled
|
> https://github.com/ROCm/TheRock, featuring a unified CMake build with bundled
|
||||||
> dependencies, Microsoft Windows support, and more.
|
> dependencies, Windows support, and more.
|
||||||
|
|
||||||
## Table of contents
|
## Getting and Building ROCm from Source
|
||||||
|
|
||||||
- [Supported hardware and operating systems](#supported-hardware-and-operating-systems)
|
Please use [TheRock](https://github.com/ROCm/TheRock) build system to build ROCm from source.
|
||||||
- [Quick start](#quick-start)
|
|
||||||
- [Get started with ROCm](#get-started-with-rocm)
|
|
||||||
- [Get started with PyTorch on ROCm](#get-started-with-pytorch-on-rocm)
|
|
||||||
- [Core components](#core-components)
|
|
||||||
- [Math libraries](#math-libraries)
|
|
||||||
- [ML and computer vision](#ml-and-computer-vision)
|
|
||||||
- [Collective communication and primitives](#collective-communication-and-primitives)
|
|
||||||
- [System management tools](#system-management-tools)
|
|
||||||
- [Profiling tools](#profiling-tools)
|
|
||||||
- [Development tools](#development-tools)
|
|
||||||
- [Runtimes and compilers](#runtimes-and-compilers)
|
|
||||||
- [Release notes](#release-notes)
|
|
||||||
- [Licenses](#licenses)
|
|
||||||
- [ROCm release history](#rocm-release-history)
|
|
||||||
- [Contribute](#contribute)
|
|
||||||
|
|
||||||
---
|
## ROCm documentation
|
||||||
|
|
||||||
## Supported hardware and operating systems
|
This repository contains the [manifest file](https://gerrit.googlesource.com/git-repo/+/HEAD/docs/manifest-format.md)
|
||||||
|
for ROCm releases, changelogs, and release information.
|
||||||
|
|
||||||
Use the [Compatibility matrix](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html) for official support across ROCm versions, operating system kernels, and GPU architectures (CDNA/Instinct™, RDNA/Radeon™, and Radeon Pro). Recent releases cover Ubuntu, RHEL, SLES, Oracle Linux, Debian, Rocky Linux, and more. GPU targets include CDNA4, CDNA3, CDNA2, RDNA4, and RDNA3.
|
The `default.xml` file contains information for all repositories and the associated commit used to build
|
||||||
|
the current ROCm release; `default.xml` uses the [Manifest Format repository](https://gerrit.googlesource.com/git-repo/).
|
||||||
|
|
||||||
If you’re using AMD Radeon GPUs or Ryzen APUs in a workstation setting with a display connected, see the [ROCm on Radeon and Ryzen documentation](https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/index.html) for operating system/framework support and step-by-step installation instructions.
|
Source code for our documentation is located in the `/docs` folder of most ROCm repositories. The
|
||||||
|
`develop` branch of our repositories contains content for the next ROCm release.
|
||||||
|
|
||||||
---
|
The ROCm documentation homepage is [rocm.docs.amd.com](https://rocm.docs.amd.com).
|
||||||
|
|
||||||
## Quick start
|
For information on how to contribute to the ROCm documentation, see [Contributing to the ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html).
|
||||||
|
|
||||||
Follow these instructions to start using ROCm.
|
## Older ROCm releases
|
||||||
|
|
||||||
### Get started with ROCm
|
For release information for older ROCm releases, refer to the
|
||||||
|
|
||||||
Follow the [ROCm installation guide](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html) to install ROCm on your system.
|
|
||||||
|
|
||||||
### Get started with PyTorch on ROCm
|
|
||||||
|
|
||||||
Follow the [PyTorch on ROCm installation guide](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/3rd-party/pytorch-install.html) to install PyTorch with ROCm support in a Docker environment.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Core components
|
|
||||||
|
|
||||||
The core ROCm stack consists of the following components:
|
|
||||||
|
|
||||||
### Math libraries
|
|
||||||
|
|
||||||
- [rocBLAS](https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocblas), [hipBLAS](https://github.com/ROCm/rocm-libraries/tree/develop/projects/hipblas), and [hipBLASLt](https://github.com/ROCm/rocm-libraries/tree/develop/projects/hipblaslt)
|
|
||||||
- [rocFFT](https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocfft) and [hipFFT](https://github.com/ROCm/rocm-libraries/tree/develop/projects/hipfft)
|
|
||||||
- [rocRAND](https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocrand) and [hipRAND](https://github.com/ROCm/rocm-libraries/tree/develop/projects/hiprand)
|
|
||||||
- [rocSOLVER](https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocsolver) and [hipSOLVER](https://github.com/ROCm/rocm-libraries/tree/develop/projects/hipsolver)
|
|
||||||
- [rocSPARSE](https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocsparse) and [hipSPARSE](https://github.com/ROCm/rocm-libraries/tree/develop/projects/hipsparse)
|
|
||||||
- [rocWMMA](https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocwmma) and [hipTensor](https://github.com/ROCm/rocm-libraries/tree/develop/projects/hiptensor)
|
|
||||||
|
|
||||||
### ML and computer vision
|
|
||||||
|
|
||||||
- [Composable Kernel](https://github.com/ROCm/rocm-libraries/tree/develop/projects/composablekernel)
|
|
||||||
- [MIGraphX](https://github.com/ROCm/AMDMIGraphX/)
|
|
||||||
- [MIOpen](https://github.com/ROCm/rocm-libraries/tree/develop/projects/miopen)
|
|
||||||
- [MIVisionX](https://github.com/ROCm/MIVisionX)
|
|
||||||
- [ROCm Performance Primitives (RPP)](https://github.com/ROCm/rpp)
|
|
||||||
|
|
||||||
### Collective communication and primitives
|
|
||||||
|
|
||||||
- [hipCUB](https://github.com/ROCm/rocm-libraries/tree/develop/projects/hipcub)
|
|
||||||
- [RCCL](https://github.com/ROCm/rocm-systems/tree/develop/projects/rccl)
|
|
||||||
- [rocPRIM](https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocprim)
|
|
||||||
- [rocSHMEM](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocshmem)
|
|
||||||
- [rocThrust](https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocthrust)
|
|
||||||
|
|
||||||
### System management tools
|
|
||||||
|
|
||||||
- [AMD SMI](https://github.com/ROCm/rocm-systems/tree/develop/projects/amdsmi)
|
|
||||||
- [rocminfo](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocminfo)
|
|
||||||
|
|
||||||
### Profiling tools
|
|
||||||
|
|
||||||
- [ROCprofiler-SDK](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocprofiler-sdk)
|
|
||||||
- [ROCm Compute Profiler](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocprofiler-compute)
|
|
||||||
|
|
||||||
### Development tools
|
|
||||||
|
|
||||||
- [ROCm Debugger (ROCgdb)](https://github.com/ROCm/ROCgdb)
|
|
||||||
- [ROCdbgapi](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocdbgapi)
|
|
||||||
|
|
||||||
### Runtimes and compilers
|
|
||||||
|
|
||||||
- [HIP](https://github.com/ROCm/rocm-systems/tree/develop/projects/hip)
|
|
||||||
- [LLVM](https://github.com/ROCm/llvm-project)
|
|
||||||
- [ROCR Runtime (ROCR)](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocr-runtime)
|
|
||||||
|
|
||||||
For a complete list of ROCm components and version information, see the
|
|
||||||
[ROCm components](https://rocm.docs.amd.com/en/latest/about/release-notes.html#rocm-components).
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Release notes
|
|
||||||
|
|
||||||
- [Latest version of ROCm](https://rocm.docs.amd.com/en/latest/about/release-notes.html) - production
|
|
||||||
- [ROCm 7.12.0](https://rocm.docs.amd.com/en/7.12.0-preview/about/release-notes.html) – preview stream
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Licenses
|
|
||||||
|
|
||||||
- [ROCm licenses](https://rocm.docs.amd.com/en/latest/about/license.html)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## ROCm release history
|
|
||||||
|
|
||||||
For information on older ROCm releases, see the
|
|
||||||
[ROCm release history](https://rocm.docs.amd.com/en/latest/release/versions.html).
|
[ROCm release history](https://rocm.docs.amd.com/en/latest/release/versions.html).
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Contribute
|
|
||||||
|
|
||||||
AMD welcomes ROCm contributions using GitHub PRs or issues. See the links
|
|
||||||
below for contribution guidelines.
|
|
||||||
|
|
||||||
- [ROCm](CONTRIBUTING.md)
|
|
||||||
- [TheRock](https://github.com/ROCm/TheRock/blob/main/CONTRIBUTING.md)
|
|
||||||
- [ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html)
|
|
||||||
- [ROCm Systems](https://github.com/ROCm/rocm-systems/blob/develop/CONTRIBUTING.md)
|
|
||||||
- [ROCm Libraries](https://github.com/ROCm/rocm-libraries/blob/develop/CONTRIBUTING.md)
|
|
||||||
|
|||||||
1206
RELEASE.md
1206
RELEASE.md
File diff suppressed because it is too large
Load Diff
@@ -31,7 +31,7 @@ additional licenses. Please review individual repositories for more information.
|
|||||||
| [aomp-extras](https://github.com/ROCm/aomp-extras/) | [MIT](https://github.com/ROCm/aomp-extras/blob/aomp-dev/LICENSE) |
|
| [aomp-extras](https://github.com/ROCm/aomp-extras/) | [MIT](https://github.com/ROCm/aomp-extras/blob/aomp-dev/LICENSE) |
|
||||||
| [AQLprofile](https://github.com/ROCm/rocm-systems/tree/develop/projects/aqlprofile/) | [MIT](https://github.com/ROCm/rocm-systems/blob/develop/projects/aqlprofile/LICENSE.md) |
|
| [AQLprofile](https://github.com/ROCm/rocm-systems/tree/develop/projects/aqlprofile/) | [MIT](https://github.com/ROCm/rocm-systems/blob/develop/projects/aqlprofile/LICENSE.md) |
|
||||||
| [Code Object Manager (Comgr)](https://github.com/ROCm/llvm-project/tree/amd-staging/amd/comgr) | [The University of Illinois/NCSA](https://github.com/ROCm/llvm-project/blob/amd-staging/amd/comgr/LICENSE.txt) |
|
| [Code Object Manager (Comgr)](https://github.com/ROCm/llvm-project/tree/amd-staging/amd/comgr) | [The University of Illinois/NCSA](https://github.com/ROCm/llvm-project/blob/amd-staging/amd/comgr/LICENSE.txt) |
|
||||||
| [Composable Kernel](https://github.com/ROCm/rocm-libraries/tree/develop/projects/composablekernel) | [MIT](https://github.com/ROCm/rocm-libraries/tree/develop/projects/composablekernel/LICENSE) |
|
| [Composable Kernel](https://github.com/ROCm/composable_kernel) | [MIT](https://github.com/ROCm/composable_kernel/blob/develop/LICENSE) |
|
||||||
| [half](https://github.com/ROCm/half/) | [MIT](https://github.com/ROCm/half/blob/rocm/LICENSE.txt) |
|
| [half](https://github.com/ROCm/half/) | [MIT](https://github.com/ROCm/half/blob/rocm/LICENSE.txt) |
|
||||||
| [HIP](https://github.com/ROCm/rocm-systems/tree/develop/projects/hip/) | [MIT](https://github.com/ROCm/rocm-systems/blob/develop/projects/hip/LICENSE.md) |
|
| [HIP](https://github.com/ROCm/rocm-systems/tree/develop/projects/hip/) | [MIT](https://github.com/ROCm/rocm-systems/blob/develop/projects/hip/LICENSE.md) |
|
||||||
| [hipamd](https://github.com/ROCm/rocm-systems/tree/develop/projects/clr/hipamd/) | [MIT](https://github.com/ROCm/rocm-systems/blob/develop/projects/clr/hipamd/LICENSE.md) |
|
| [hipamd](https://github.com/ROCm/rocm-systems/tree/develop/projects/clr/hipamd/) | [MIT](https://github.com/ROCm/rocm-systems/blob/develop/projects/clr/hipamd/LICENSE.md) |
|
||||||
@@ -56,10 +56,10 @@ additional licenses. Please review individual repositories for more information.
|
|||||||
| [rocALUTION](https://github.com/ROCm/rocALUTION/) | [MIT](https://github.com/ROCm/rocALUTION/blob/develop/LICENSE.md) |
|
| [rocALUTION](https://github.com/ROCm/rocALUTION/) | [MIT](https://github.com/ROCm/rocALUTION/blob/develop/LICENSE.md) |
|
||||||
| [rocBLAS](https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocblas/) | [MIT](https://github.com/ROCm/rocm-libraries/blob/develop/projects/rocblas/LICENSE.md) |
|
| [rocBLAS](https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocblas/) | [MIT](https://github.com/ROCm/rocm-libraries/blob/develop/projects/rocblas/LICENSE.md) |
|
||||||
| [ROCdbgapi](https://github.com/ROCm/ROCdbgapi/) | [MIT](https://github.com/ROCm/ROCdbgapi/blob/amd-staging/LICENSE.txt) |
|
| [ROCdbgapi](https://github.com/ROCm/ROCdbgapi/) | [MIT](https://github.com/ROCm/ROCdbgapi/blob/amd-staging/LICENSE.txt) |
|
||||||
| [rocDecode](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocdecode) | [MIT](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocdecode/LICENSE) |
|
| [rocDecode](https://github.com/ROCm/rocDecode) | [MIT](https://github.com/ROCm/rocDecode/blob/develop/LICENSE) |
|
||||||
| [rocFFT](https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocfft/) | [MIT](https://github.com/ROCm/rocm-libraries/blob/develop/projects/rocfft/LICENSE.md) |
|
| [rocFFT](https://github.com/ROCm/rocm-libraries/tree/develop/projects/rocfft/) | [MIT](https://github.com/ROCm/rocm-libraries/blob/develop/projects/rocfft/LICENSE.md) |
|
||||||
| [ROCgdb](https://github.com/ROCm/ROCgdb/) | [GNU General Public License v3.0](https://github.com/ROCm/ROCgdb/blob/amd-staging/COPYING3) |
|
| [ROCgdb](https://github.com/ROCm/ROCgdb/) | [GNU General Public License v3.0](https://github.com/ROCm/ROCgdb/blob/amd-staging/COPYING3) |
|
||||||
| [rocJPEG](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocjpeg) | [MIT](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocjpeg/LICENSE) |
|
| [rocJPEG](https://github.com/ROCm/rocJPEG/) | [MIT](https://github.com/ROCm/rocJPEG/blob/develop/LICENSE) |
|
||||||
| [ROCK-Kernel-Driver](https://github.com/ROCm/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/ROCm/ROCK-Kernel-Driver/blob/master/COPYING) |
|
| [ROCK-Kernel-Driver](https://github.com/ROCm/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/ROCm/ROCK-Kernel-Driver/blob/master/COPYING) |
|
||||||
| [rocminfo](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocminfo/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocm-systems/blob/develop/projects/rocminfo/License.txt) |
|
| [rocminfo](https://github.com/ROCm/rocm-systems/tree/develop/projects/rocminfo/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocm-systems/blob/develop/projects/rocminfo/License.txt) |
|
||||||
| [ROCm Bandwidth Test](https://github.com/ROCm/rocm_bandwidth_test/) | [MIT](https://github.com/ROCm/rocm_bandwidth_test/blob/master/LICENSE.txt) |
|
| [ROCm Bandwidth Test](https://github.com/ROCm/rocm_bandwidth_test/) | [MIT](https://github.com/ROCm/rocm_bandwidth_test/blob/master/LICENSE.txt) |
|
||||||
|
|||||||
@@ -1,130 +1,136 @@
|
|||||||
ROCm Version,7.2.1,7.2.0,7.1.1,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5, 6.1.2, 6.1.1, 6.1.0, 6.0.2, 6.0.0
|
ROCm Version,7.2.0,7.1.1,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5, 6.1.2, 6.1.1, 6.1.0, 6.0.2, 6.0.0
|
||||||
:ref:`Operating systems & kernels <OS-kernel-versions>` [#os-compatibility-past-60]_,Ubuntu 24.04.3,Ubuntu 24.04.3,Ubuntu 24.04.3,Ubuntu 24.04.3,Ubuntu 24.04.3,Ubuntu 24.04.3,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,"Ubuntu 24.04.1, 24.04","Ubuntu 24.04.1, 24.04","Ubuntu 24.04.1, 24.04",Ubuntu 24.04,,,,,,
|
:ref:`Operating systems & kernels <OS-kernel-versions>` [#os-compatibility-past-60]_,Ubuntu 24.04.3,Ubuntu 24.04.3,Ubuntu 24.04.3,Ubuntu 24.04.3,Ubuntu 24.04.3,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,"Ubuntu 24.04.1, 24.04","Ubuntu 24.04.1, 24.04","Ubuntu 24.04.1, 24.04",Ubuntu 24.04,,,,,,
|
||||||
,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,"Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3, 22.04.2","Ubuntu 22.04.4, 22.04.3, 22.04.2"
|
,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,"Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3, 22.04.2","Ubuntu 22.04.4, 22.04.3, 22.04.2"
|
||||||
,,,,,,,,,,,,,,,,,,,"Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5"
|
,,,,,,,,,,,,,,,,,,"Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5"
|
||||||
,"RHEL 10.1, 10.0, 9.7, 9.6, 9.4","RHEL 10.1, 10.0, 9.7, 9.6, 9.4","RHEL 10.1, 10.0, 9.7, 9.6, 9.4","RHEL 10.0, 9.6, 9.4","RHEL 10.0, 9.6, 9.4","RHEL 9.6, 9.4","RHEL 9.6, 9.4","RHEL 9.6, 9.4","RHEL 9.6, 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.3, 9.2","RHEL 9.3, 9.2"
|
,"RHEL 10.1, 10.0, 9.7, 9.6, 9.4","RHEL 10.1, 10.0, 9.7, 9.6, 9.4","RHEL 10.0, 9.6, 9.4","RHEL 10.0, 9.6, 9.4","RHEL 9.6, 9.4","RHEL 9.6, 9.4","RHEL 9.6, 9.4","RHEL 9.6, 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.3, 9.2","RHEL 9.3, 9.2"
|
||||||
,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,"RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8"
|
,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,"RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8"
|
||||||
,SLES 15 SP7,SLES 15 SP7,SLES 15 SP7,SLES 15 SP7,SLES 15 SP7,SLES 15 SP7,"SLES 15 SP7, SP6","SLES 15 SP7, SP6",SLES 15 SP6,SLES 15 SP6,"SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4"
|
,SLES 15 SP7,SLES 15 SP7,SLES 15 SP7,SLES 15 SP7,SLES 15 SP7,"SLES 15 SP7, SP6","SLES 15 SP7, SP6",SLES 15 SP6,SLES 15 SP6,"SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4"
|
||||||
,,,,,,,,,,,,,,,,,,,,CentOS 7.9,CentOS 7.9,CentOS 7.9,CentOS 7.9,CentOS 7.9
|
,,,,,,,,,,,,,,,,,,,CentOS 7.9,CentOS 7.9,CentOS 7.9,CentOS 7.9,CentOS 7.9
|
||||||
,"Oracle Linux 10, 9, 8","Oracle Linux 10, 9, 8","Oracle Linux 10, 9, 8","Oracle Linux 10, 9, 8","Oracle Linux 10, 9, 8","Oracle Linux 9, 8","Oracle Linux 9, 8","Oracle Linux 9, 8","Oracle Linux 9, 8","Oracle Linux 9, 8",Oracle Linux 8.10,Oracle Linux 8.10,Oracle Linux 8.10,Oracle Linux 8.10,Oracle Linux 8.9,Oracle Linux 8.9,Oracle Linux 8.9,Oracle Linux 8.9,Oracle Linux 8.9,Oracle Linux 8.9,Oracle Linux 8.9,,,
|
,"Oracle Linux 10, 9, 8","Oracle Linux 10, 9, 8","Oracle Linux 10, 9, 8","Oracle Linux 10, 9, 8","Oracle Linux 9, 8","Oracle Linux 9, 8","Oracle Linux 9, 8","Oracle Linux 9, 8","Oracle Linux 9, 8",Oracle Linux 8.10,Oracle Linux 8.10,Oracle Linux 8.10,Oracle Linux 8.10,Oracle Linux 8.9,Oracle Linux 8.9,Oracle Linux 8.9,Oracle Linux 8.9,Oracle Linux 8.9,Oracle Linux 8.9,Oracle Linux 8.9,,,
|
||||||
,"Debian 13, 12","Debian 13, 12","Debian 13, 12","Debian 13, 12","Debian 13, 12",Debian 12,Debian 12,Debian 12,Debian 12,Debian 12,Debian 12,Debian 12,Debian 12,,,,,,,,,,,
|
,"Debian 13, 12","Debian 13, 12","Debian 13, 12","Debian 13, 12",Debian 12,Debian 12,Debian 12,Debian 12,Debian 12,Debian 12,Debian 12,Debian 12,,,,,,,,,,,
|
||||||
,,,,,Azure Linux 3.0,Azure Linux 3.0,Azure Linux 3.0,Azure Linux 3.0,Azure Linux 3.0,Azure Linux 3.0,Azure Linux 3.0,Azure Linux 3.0,,,,,,,,,,,,
|
,,,,Azure Linux 3.0,Azure Linux 3.0,Azure Linux 3.0,Azure Linux 3.0,Azure Linux 3.0,Azure Linux 3.0,Azure Linux 3.0,Azure Linux 3.0,,,,,,,,,,,,
|
||||||
,Rocky Linux 9,Rocky Linux 9,Rocky Linux 9,Rocky Linux 9,Rocky Linux 9,Rocky Linux 9,,,,,,,,,,,,,,,,,,
|
,Rocky Linux 9,Rocky Linux 9,Rocky Linux 9,Rocky Linux 9,Rocky Linux 9,,,,,,,,,,,,,,,,,,
|
||||||
,.. _architecture-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,,
|
,.. _architecture-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,
|
||||||
:doc:`Architecture <rocm-install-on-linux:reference/system-requirements>`,CDNA4,CDNA4,CDNA4,CDNA4,CDNA4,CDNA4,,,,,,,,,,,,,,,,,,
|
:doc:`Architecture <rocm-install-on-linux:reference/system-requirements>`,CDNA4,CDNA4,CDNA4,CDNA4,CDNA4,,,,,,,,,,,,,,,,,,
|
||||||
,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3
|
,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3
|
||||||
,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2
|
,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2
|
||||||
,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA
|
,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA
|
||||||
,RDNA4,RDNA4,RDNA4,RDNA4,RDNA4,RDNA4,RDNA4,RDNA4,RDNA4,,,,,,,,,,,,,,,
|
,RDNA4,RDNA4,RDNA4,RDNA4,RDNA4,RDNA4,RDNA4,RDNA4,,,,,,,,,,,,,,,
|
||||||
,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3
|
,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3
|
||||||
,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2
|
,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2
|
||||||
,.. _gpu-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,,
|
,.. _gpu-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,
|
||||||
:doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>` [#gpu-compatibility-past-60]_,gfx950,gfx950,gfx950,gfx950,gfx950,gfx950,,,,,,,,,,,,,,,,,,
|
:doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>` [#gpu-compatibility-past-60]_,gfx950,gfx950,gfx950,gfx950,gfx950,,,,,,,,,,,,,,,,,,
|
||||||
,gfx1201,gfx1201,gfx1201,gfx1201,gfx1201,gfx1201,gfx1201,gfx1201,gfx1201,,,,,,,,,,,,,,,
|
,gfx1201,gfx1201,gfx1201,gfx1201,gfx1201,gfx1201,gfx1201,gfx1201,,,,,,,,,,,,,,,
|
||||||
,gfx1200,gfx1200,gfx1200,gfx1200,gfx1200,gfx1200,gfx1200,gfx1200,gfx1200,,,,,,,,,,,,,,,
|
,gfx1200,gfx1200,gfx1200,gfx1200,gfx1200,gfx1200,gfx1200,gfx1200,,,,,,,,,,,,,,,
|
||||||
,gfx1101,gfx1101,gfx1101,gfx1101,gfx1101,gfx1101,gfx1101,gfx1101,gfx1101,,,,,,,,,,,,,,,
|
,gfx1101,gfx1101,gfx1101,gfx1101,gfx1101,gfx1101,gfx1101,gfx1101,,,,,,,,,,,,,,,
|
||||||
,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100
|
,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100
|
||||||
,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030
|
,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030
|
||||||
,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942, gfx942, gfx942, gfx942, gfx942, gfx942, gfx942
|
,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942, gfx942, gfx942, gfx942, gfx942, gfx942, gfx942
|
||||||
,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a
|
,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a
|
||||||
,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908
|
,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908
|
||||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,,
|
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,
|
||||||
:doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.9.1, 2.8.0, 2.7.1","2.9.1, 2.8.0, 2.7.1","2.9, 2.8, 2.7","2.8, 2.7, 2.6","2.8, 2.7, 2.6","2.7, 2.6, 2.5","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
|
:doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.9.1, 2.8.0, 2.7.1","2.9, 2.8, 2.7","2.8, 2.7, 2.6","2.8, 2.7, 2.6","2.7, 2.6, 2.5","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
|
||||||
:doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>`,"2.20.0, 2.19.1, 2.18.1","2.20.0, 2.19.1, 2.18.1","2.20.0, 2.19.1, 2.18.1","2.20.0, 2.19.1, 2.18.1","2.19.1, 2.18.1, 2.17.1 [#tf-mi350-past-60]_","2.19.1, 2.18.1, 2.17.1 [#tf-mi350-past-60]_","2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.14.0, 2.13.1, 2.12.1","2.14.0, 2.13.1, 2.12.1"
|
:doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>`,"2.20.0, 2.19.1, 2.18.1","2.20.0, 2.19.1, 2.18.1","2.20.0, 2.19.1, 2.18.1","2.19.1, 2.18.1, 2.17.1 [#tf-mi350-past-60]_","2.19.1, 2.18.1, 2.17.1 [#tf-mi350-past-60]_","2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.14.0, 2.13.1, 2.12.1","2.14.0, 2.13.1, 2.12.1"
|
||||||
:doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.8.2,0.8.0,0.7.1,0.7.1,0.6.0,0.6.0,0.4.35,0.4.35,0.4.35,0.4.35,0.4.31,0.4.31,0.4.31,0.4.31,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26
|
:doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.8.0,0.7.1,0.7.1,0.6.0,0.6.0,0.4.35,0.4.35,0.4.35,0.4.35,0.4.31,0.4.31,0.4.31,0.4.31,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26
|
||||||
:doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>` [#dgl_compat-past-60]_,N/A,N/A,N/A,N/A,N/A,2.4.0,2.4.0,N/A,N/A,2.4.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
|
:doc:`verl <../compatibility/ml-compatibility/verl-compatibility>` [#verl_compat-past-60]_,N/A,N/A,N/A,N/A,0.6.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,0.3.0.post0,N/A,N/A,N/A,N/A,N/A,N/A
|
||||||
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.23.2,1.23.2,1.23.1,1.22.0,1.22.0,1.22.0,1.20.0,1.20.0,1.20.0,1.20.0,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.14.1,1.14.1
|
:doc:`Stanford Megatron-LM <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>` [#stanford-megatron-lm_compat-past-60]_,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,85f95ae,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
|
||||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
:doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>` [#dgl_compat-past-60]_,N/A,N/A,N/A,N/A,2.4.0,2.4.0,N/A,N/A,2.4.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
|
||||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
:doc:`Megablocks <../compatibility/ml-compatibility/megablocks-compatibility>` [#megablocks_compat-past-60]_,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,0.7.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
|
||||||
THIRD PARTY COMMS,.. _thirdpartycomms-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,,
|
:doc:`Ray <../compatibility/ml-compatibility/ray-compatibility>` [#ray_compat-past-60]_,N/A,N/A,N/A,N/A,N/A,N/A,N/A,2.48.0.post0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
|
||||||
`UCC <https://github.com/ROCm/ucc>`_,>=1.6.0,>=1.4.0,>=1.4.0,>=1.4.0,>=1.4.0,>=1.4.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.2.0,>=1.2.0
|
:doc:`llama.cpp <../compatibility/ml-compatibility/llama-cpp-compatibility>` [#llama-cpp_compat-past-60]_,N/A,N/A,N/A,N/A,b6652,b6356,b6356,b6356,b5997,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
|
||||||
`UCX <https://github.com/ROCm/ucx>`_,>=1.17.0,>=1.17.0,>=1.17.0,>=1.17.0,>=1.17.0,>=1.17.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1
|
:doc:`FlashInfer <../compatibility/ml-compatibility/flashinfer-compatibility>` [#flashinfer_compat-past-60]_,N/A,v0.2.5,N/A,N/A,N/A,N/A,N/A,v0.2.5,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
|
||||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.23.2,1.23.1,1.22.0,1.22.0,1.22.0,1.20.0,1.20.0,1.20.0,1.20.0,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.14.1,1.14.1
|
||||||
THIRD PARTY ALGORITHM,.. _thirdpartyalgorithm-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,,
|
,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
Thrust,2.8.5,2.8.5,2.8.5,2.8.5,2.6.0,2.6.0,2.5.0,2.5.0,2.5.0,2.5.0,2.3.2,2.3.2,2.3.2,2.3.2,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.1,2.0.1
|
,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
CUB,2.8.5,2.8.5,2.8.5,2.8.5,2.6.0,2.6.0,2.5.0,2.5.0,2.5.0,2.5.0,2.3.2,2.3.2,2.3.2,2.3.2,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.1,2.0.1
|
THIRD PARTY COMMS,.. _thirdpartycomms-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,
|
||||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
`UCC <https://github.com/ROCm/ucc>`_,>=1.4.0,>=1.4.0,>=1.4.0,>=1.4.0,>=1.4.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.2.0,>=1.2.0
|
||||||
DRIVER & USER SPACE [#kfd_support-past-60]_,.. _kfd-userspace-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,,
|
`UCX <https://github.com/ROCm/ucx>`_,>=1.17.0,>=1.17.0,>=1.17.0,>=1.17.0,>=1.17.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1
|
||||||
:doc:`AMD GPU Driver <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"30.30.1, 30.30.0, 30.20.1, 30.20.0 [#mi325x_KVM-past-60]_, 30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x","30.30.0, 30.20.1, 30.20.0 [#mi325x_KVM-past-60]_, 30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x","30.20.1, 30.20.0 [#mi325x_KVM-past-60]_, 30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x","30.20.0 [#mi325x_KVM-past-60]_, 30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x","30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x, 6.3.x","30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x, 6.3.x, 6.2.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x"
|
,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
THIRD PARTY ALGORITHM,.. _thirdpartyalgorithm-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,
|
||||||
ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,,
|
Thrust,2.8.5,2.8.5,2.8.5,2.6.0,2.6.0,2.5.0,2.5.0,2.5.0,2.5.0,2.3.2,2.3.2,2.3.2,2.3.2,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.1,2.0.1
|
||||||
:doc:`Composable Kernel <composable_kernel:index>`,1.2.0,1.2.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0
|
CUB,2.8.5,2.8.5,2.8.5,2.6.0,2.6.0,2.5.0,2.5.0,2.5.0,2.5.0,2.3.2,2.3.2,2.3.2,2.3.2,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.1,2.0.1
|
||||||
:doc:`MIGraphX <amdmigraphx:index>`,2.15.0,2.15.0,2.14.0,2.14.0,2.13.0,2.13.0,2.12.0,2.12.0,2.12.0,2.12.0,2.11.0,2.11.0,2.11.0,2.11.0,2.10.0,2.10.0,2.10.0,2.10.0,2.9.0,2.9.0,2.9.0,2.9.0,2.8.0,2.8.0
|
,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
:doc:`MIOpen <miopen:index>`,3.5.1,3.5.1,3.5.1,3.5.1,3.5.0,3.5.0,3.4.0,3.4.0,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.0,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
|
DRIVER & USER SPACE [#kfd_support-past-60]_,.. _kfd-userspace-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,
|
||||||
:doc:`MIVisionX <mivisionx:index>`,3.5.0,3.5.0,3.4.0,3.4.0,3.3.0,3.3.0,3.2.0,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0,3.0.0,3.0.0,2.5.0,2.5.0,2.5.0,2.5.0,2.5.0,2.5.0
|
:doc:`AMD GPU Driver <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"30.30.0, 30.20.1, 30.20.0 [#mi325x_KVM-past-60]_, 30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x","30.20.1, 30.20.0 [#mi325x_KVM-past-60]_, 30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x","30.20.0 [#mi325x_KVM-past-60]_, 30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x","30.10.2, 30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x, 6.3.x","30.10.1 [#driver_patch-past-60]_, 30.10, 6.4.x, 6.3.x, 6.2.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x"
|
||||||
:doc:`rocAL <rocal:index>`,2.5.0,2.5.0,2.4.0,2.4.0,2.3.0,2.3.0,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.0,2.0.0,2.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
|
,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
:doc:`rocDecode <rocdecode:index>`,1.7.0,1.5.0,1.4.0,1.4.0,1.0.0,1.0.0,0.10.0,0.10.0,0.10.0,0.10.0,0.8.0,0.8.0,0.8.0,0.8.0,0.6.0,0.6.0,0.6.0,0.6.0,0.6.0,0.6.0,0.5.0,0.5.0,N/A,N/A
|
ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,
|
||||||
:doc:`rocJPEG <rocjpeg:index>`,1.4.0,1.3.0,1.2.0,1.2.0,1.1.0,1.1.0,0.8.0,0.8.0,0.8.0,0.8.0,0.6.0,0.6.0,0.6.0,0.6.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
|
:doc:`Composable Kernel <composable_kernel:index>`,1.2.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0
|
||||||
:doc:`rocPyDecode <rocpydecode:index>`,0.8.0,0.8.0,0.7.0,0.7.0,0.6.0,0.6.0,0.3.1,0.3.1,0.3.1,0.3.1,0.2.0,0.2.0,0.2.0,0.2.0,0.1.0,0.1.0,0.1.0,0.1.0,N/A,N/A,N/A,N/A,N/A,N/A
|
:doc:`MIGraphX <amdmigraphx:index>`,2.15.0,2.14.0,2.14.0,2.13.0,2.13.0,2.12.0,2.12.0,2.12.0,2.12.0,2.11.0,2.11.0,2.11.0,2.11.0,2.10.0,2.10.0,2.10.0,2.10.0,2.9.0,2.9.0,2.9.0,2.9.0,2.8.0,2.8.0
|
||||||
:doc:`RPP <rpp:index>`,2.2.1,2.2.0,2.1.0,2.1.0,2.0.0,2.0.0,1.9.10,1.9.10,1.9.10,1.9.10,1.9.1,1.9.1,1.9.1,1.9.1,1.8.0,1.8.0,1.8.0,1.8.0,1.5.0,1.5.0,1.5.0,1.5.0,1.4.0,1.4.0
|
:doc:`MIOpen <miopen:index>`,3.5.1,3.5.1,3.5.1,3.5.0,3.5.0,3.4.0,3.4.0,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.0,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
|
||||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
:doc:`MIVisionX <mivisionx:index>`,3.5.0,3.4.0,3.4.0,3.3.0,3.3.0,3.2.0,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0,3.0.0,3.0.0,2.5.0,2.5.0,2.5.0,2.5.0,2.5.0,2.5.0
|
||||||
COMMUNICATION,.. _commlibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,,
|
:doc:`rocAL <rocal:index>`,2.5.0,2.4.0,2.4.0,2.3.0,2.3.0,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.0,2.0.0,2.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
|
||||||
:doc:`RCCL <rccl:index>`,2.27.7,2.27.7,2.27.7,2.27.7,2.26.6,2.26.6,2.22.3,2.22.3,2.22.3,2.22.3,2.21.5,2.21.5,2.21.5,2.21.5,2.20.5,2.20.5,2.20.5,2.20.5,2.18.6,2.18.6,2.18.6,2.18.6,2.18.3,2.18.3
|
:doc:`rocDecode <rocdecode:index>`,1.5.0,1.4.0,1.4.0,1.0.0,1.0.0,0.10.0,0.10.0,0.10.0,0.10.0,0.8.0,0.8.0,0.8.0,0.8.0,0.6.0,0.6.0,0.6.0,0.6.0,0.6.0,0.6.0,0.5.0,0.5.0,N/A,N/A
|
||||||
:doc:`rocSHMEM <rocshmem:index>`,3.2.0,3.2.0,3.1.0,3.0.0,3.0.0,3.0.0,2.0.1,2.0.1,2.0.0,2.0.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
|
:doc:`rocJPEG <rocjpeg:index>`,1.3.0,1.2.0,1.2.0,1.1.0,1.1.0,0.8.0,0.8.0,0.8.0,0.8.0,0.6.0,0.6.0,0.6.0,0.6.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
|
||||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
:doc:`rocPyDecode <rocpydecode:index>`,0.8.0,0.7.0,0.7.0,0.6.0,0.6.0,0.3.1,0.3.1,0.3.1,0.3.1,0.2.0,0.2.0,0.2.0,0.2.0,0.1.0,0.1.0,0.1.0,0.1.0,N/A,N/A,N/A,N/A,N/A,N/A
|
||||||
MATH LIBS,.. _mathlibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,,
|
:doc:`RPP <rpp:index>`,2.2.0,2.1.0,2.1.0,2.0.0,2.0.0,1.9.10,1.9.10,1.9.10,1.9.10,1.9.1,1.9.1,1.9.1,1.9.1,1.8.0,1.8.0,1.8.0,1.8.0,1.5.0,1.5.0,1.5.0,1.5.0,1.4.0,1.4.0
|
||||||
`half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0
|
,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
:doc:`hipBLAS <hipblas:index>`,3.2.0,3.2.0,3.1.0,3.1.0,3.0.2,3.0.0,2.4.0,2.4.0,2.4.0,2.4.0,2.3.0,2.3.0,2.3.0,2.3.0,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.0,2.0.0
|
COMMUNICATION,.. _commlibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,
|
||||||
:doc:`hipBLASLt <hipblaslt:index>`,1.2.2,1.2.1,1.1.0,1.1.0,1.0.0,1.0.0,0.12.1,0.12.1,0.12.1,0.12.0,0.10.0,0.10.0,0.10.0,0.10.0,0.8.0,0.8.0,0.8.0,0.8.0,0.7.0,0.7.0,0.7.0,0.7.0,0.6.0,0.6.0
|
:doc:`RCCL <rccl:index>`,2.27.7,2.27.7,2.27.7,2.26.6,2.26.6,2.22.3,2.22.3,2.22.3,2.22.3,2.21.5,2.21.5,2.21.5,2.21.5,2.20.5,2.20.5,2.20.5,2.20.5,2.18.6,2.18.6,2.18.6,2.18.6,2.18.3,2.18.3
|
||||||
:doc:`hipFFT <hipfft:index>`,1.0.22,1.0.22,1.0.21,1.0.21,1.0.20,1.0.20,1.0.18,1.0.18,1.0.18,1.0.18,1.0.17,1.0.17,1.0.17,1.0.17,1.0.16,1.0.15,1.0.15,1.0.14,1.0.14,1.0.14,1.0.14,1.0.14,1.0.13,1.0.13
|
:doc:`rocSHMEM <rocshmem:index>`,3.2.0,3.1.0,3.0.0,3.0.0,3.0.0,2.0.1,2.0.1,2.0.0,2.0.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
|
||||||
:doc:`hipfort <hipfort:index>`,0.7.1,0.7.1,0.7.1,0.7.1,0.7.0,0.7.0,0.6.0,0.6.0,0.6.0,0.6.0,0.5.1,0.5.1,0.5.0,0.5.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0
|
,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
:doc:`hipRAND <hiprand:index>`,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0,2.12.0,2.12.0,2.12.0,2.12.0,2.11.1,2.11.1,2.11.1,2.11.0,2.11.1,2.11.0,2.11.0,2.11.0,2.10.16,2.10.16,2.10.16,2.10.16,2.10.16,2.10.16
|
MATH LIBS,.. _mathlibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,
|
||||||
:doc:`hipSOLVER <hipsolver:index>`,3.2.0,3.2.0,3.1.0,3.1.0,3.0.0,3.0.0,2.4.0,2.4.0,2.4.0,2.4.0,2.3.0,2.3.0,2.3.0,2.3.0,2.2.0,2.2.0,2.2.0,2.2.0,2.1.1,2.1.1,2.1.1,2.1.0,2.0.0,2.0.0
|
`half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0
|
||||||
:doc:`hipSPARSE <hipsparse:index>`,4.2.0,4.2.0,4.1.0,4.1.0,4.0.1,4.0.1,3.2.0,3.2.0,3.2.0,3.2.0,3.1.2,3.1.2,3.1.2,3.1.2,3.1.1,3.1.1,3.1.1,3.1.1,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,3.0.0
|
:doc:`hipBLAS <hipblas:index>`,3.2.0,3.1.0,3.1.0,3.0.2,3.0.0,2.4.0,2.4.0,2.4.0,2.4.0,2.3.0,2.3.0,2.3.0,2.3.0,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.0,2.0.0
|
||||||
:doc:`hipSPARSELt <hipsparselt:index>`,0.2.6,0.2.6,0.2.5,0.2.5,0.2.4,0.2.4,0.2.3,0.2.3,0.2.3,0.2.3,0.2.2,0.2.2,0.2.2,0.2.2,0.2.1,0.2.1,0.2.1,0.2.1,0.2.0,0.2.0,0.1.0,0.1.0,0.1.0,0.1.0
|
:doc:`hipBLASLt <hipblaslt:index>`,1.2.1,1.1.0,1.1.0,1.0.0,1.0.0,0.12.1,0.12.1,0.12.1,0.12.0,0.10.0,0.10.0,0.10.0,0.10.0,0.8.0,0.8.0,0.8.0,0.8.0,0.7.0,0.7.0,0.7.0,0.7.0,0.6.0,0.6.0
|
||||||
:doc:`rocALUTION <rocalution:index>`,4.1.0,4.1.0,4.0.1,4.0.1,4.0.0,4.0.0,3.2.3,3.2.3,3.2.3,3.2.2,3.2.1,3.2.1,3.2.1,3.2.1,3.2.1,3.2.0,3.2.0,3.2.0,3.1.1,3.1.1,3.1.1,3.1.1,3.0.3,3.0.3
|
:doc:`hipFFT <hipfft:index>`,1.0.22,1.0.21,1.0.21,1.0.20,1.0.20,1.0.18,1.0.18,1.0.18,1.0.18,1.0.17,1.0.17,1.0.17,1.0.17,1.0.16,1.0.15,1.0.15,1.0.14,1.0.14,1.0.14,1.0.14,1.0.14,1.0.13,1.0.13
|
||||||
:doc:`rocBLAS <rocblas:index>`,5.2.0,5.2.0,5.1.1,5.1.0,5.0.2,5.0.0,4.4.1,4.4.1,4.4.0,4.4.0,4.3.0,4.3.0,4.3.0,4.3.0,4.2.4,4.2.1,4.2.1,4.2.0,4.1.2,4.1.2,4.1.0,4.1.0,4.0.0,4.0.0
|
:doc:`hipfort <hipfort:index>`,0.7.1,0.7.1,0.7.1,0.7.0,0.7.0,0.6.0,0.6.0,0.6.0,0.6.0,0.5.1,0.5.1,0.5.0,0.5.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0
|
||||||
:doc:`rocFFT <rocfft:index>`,1.0.36,1.0.36,1.0.35,1.0.35,1.0.34,1.0.34,1.0.32,1.0.32,1.0.32,1.0.32,1.0.31,1.0.31,1.0.31,1.0.31,1.0.30,1.0.29,1.0.29,1.0.28,1.0.27,1.0.27,1.0.27,1.0.26,1.0.25,1.0.23
|
:doc:`hipRAND <hiprand:index>`,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0,2.12.0,2.12.0,2.12.0,2.12.0,2.11.1,2.11.1,2.11.1,2.11.0,2.11.1,2.11.0,2.11.0,2.11.0,2.10.16,2.10.16,2.10.16,2.10.16,2.10.16,2.10.16
|
||||||
:doc:`rocRAND <rocrand:index>`,4.2.0,4.2.0,4.1.0,4.1.0,4.0.0,4.0.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.0,3.2.0,3.2.0,3.2.0,3.1.1,3.1.0,3.1.0,3.1.0,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,2.10.17
|
:doc:`hipSOLVER <hipsolver:index>`,3.2.0,3.1.0,3.1.0,3.0.0,3.0.0,2.4.0,2.4.0,2.4.0,2.4.0,2.3.0,2.3.0,2.3.0,2.3.0,2.2.0,2.2.0,2.2.0,2.2.0,2.1.1,2.1.1,2.1.1,2.1.0,2.0.0,2.0.0
|
||||||
:doc:`rocSOLVER <rocsolver:index>`,3.32.0,3.32.0,3.31.0,3.31.0,3.30.1,3.30.0,3.28.2,3.28.2,3.28.0,3.28.0,3.27.0,3.27.0,3.27.0,3.27.0,3.26.2,3.26.0,3.26.0,3.26.0,3.25.0,3.25.0,3.25.0,3.25.0,3.24.0,3.24.0
|
:doc:`hipSPARSE <hipsparse:index>`,4.2.0,4.1.0,4.1.0,4.0.1,4.0.1,3.2.0,3.2.0,3.2.0,3.2.0,3.1.2,3.1.2,3.1.2,3.1.2,3.1.1,3.1.1,3.1.1,3.1.1,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,3.0.0
|
||||||
:doc:`rocSPARSE <rocsparse:index>`,4.2.0,4.2.0,4.1.0,4.1.0,4.0.2,4.0.2,3.4.0,3.4.0,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.1,3.2.0,3.2.0,3.2.0,3.1.2,3.1.2,3.1.2,3.1.2,3.0.2,3.0.2
|
:doc:`hipSPARSELt <hipsparselt:index>`,0.2.6,0.2.5,0.2.5,0.2.4,0.2.4,0.2.3,0.2.3,0.2.3,0.2.3,0.2.2,0.2.2,0.2.2,0.2.2,0.2.1,0.2.1,0.2.1,0.2.1,0.2.0,0.2.0,0.1.0,0.1.0,0.1.0,0.1.0
|
||||||
:doc:`rocWMMA <rocwmma:index>`,2.2.0,2.2.0,2.1.0,2.0.0,2.0.0,2.0.0,1.7.0,1.7.0,1.7.0,1.7.0,1.6.0,1.6.0,1.6.0,1.6.0,1.5.0,1.5.0,1.5.0,1.5.0,1.4.0,1.4.0,1.4.0,1.4.0,1.3.0,1.3.0
|
:doc:`rocALUTION <rocalution:index>`,4.1.0,4.0.1,4.0.1,4.0.0,4.0.0,3.2.3,3.2.3,3.2.3,3.2.2,3.2.1,3.2.1,3.2.1,3.2.1,3.2.1,3.2.0,3.2.0,3.2.0,3.1.1,3.1.1,3.1.1,3.1.1,3.0.3,3.0.3
|
||||||
:doc:`Tensile <tensile:src/index>`,4.45.0,4.45.0,4.44.0,4.44.0,4.44.0,4.44.0,4.43.0,4.43.0,4.43.0,4.43.0,4.42.0,4.42.0,4.42.0,4.42.0,4.41.0,4.41.0,4.41.0,4.41.0,4.40.0,4.40.0,4.40.0,4.40.0,4.39.0,4.39.0
|
:doc:`rocBLAS <rocblas:index>`,5.2.0,5.1.1,5.1.0,5.0.2,5.0.0,4.4.1,4.4.1,4.4.0,4.4.0,4.3.0,4.3.0,4.3.0,4.3.0,4.2.4,4.2.1,4.2.1,4.2.0,4.1.2,4.1.2,4.1.0,4.1.0,4.0.0,4.0.0
|
||||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
:doc:`rocFFT <rocfft:index>`,1.0.36,1.0.35,1.0.35,1.0.34,1.0.34,1.0.32,1.0.32,1.0.32,1.0.32,1.0.31,1.0.31,1.0.31,1.0.31,1.0.30,1.0.29,1.0.29,1.0.28,1.0.27,1.0.27,1.0.27,1.0.26,1.0.25,1.0.23
|
||||||
PRIMITIVES,.. _primitivelibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,,
|
:doc:`rocRAND <rocrand:index>`,4.2.0,4.1.0,4.1.0,4.0.0,4.0.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.0,3.2.0,3.2.0,3.2.0,3.1.1,3.1.0,3.1.0,3.1.0,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,2.10.17
|
||||||
:doc:`hipCUB <hipcub:index>`,4.2.0,4.2.0,4.1.0,4.1.0,4.0.0,4.0.0,3.4.0,3.4.0,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.1,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
|
:doc:`rocSOLVER <rocsolver:index>`,3.32.0,3.31.0,3.31.0,3.30.1,3.30.0,3.28.2,3.28.2,3.28.0,3.28.0,3.27.0,3.27.0,3.27.0,3.27.0,3.26.2,3.26.0,3.26.0,3.26.0,3.25.0,3.25.0,3.25.0,3.25.0,3.24.0,3.24.0
|
||||||
:doc:`hipTensor <hiptensor:index>`,2.2.0,2.2.0,2.0.0,2.0.0,2.0.0,2.0.0,1.5.0,1.5.0,1.5.0,1.5.0,1.4.0,1.4.0,1.4.0,1.4.0,1.3.0,1.3.0,1.3.0,1.3.0,1.2.0,1.2.0,1.2.0,1.2.0,1.1.0,1.1.0
|
:doc:`rocSPARSE <rocsparse:index>`,4.2.0,4.1.0,4.1.0,4.0.2,4.0.2,3.4.0,3.4.0,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.1,3.2.0,3.2.0,3.2.0,3.1.2,3.1.2,3.1.2,3.1.2,3.0.2,3.0.2
|
||||||
:doc:`rocPRIM <rocprim:index>`,4.2.0,4.2.0,4.1.0,4.1.0,4.0.1,4.0.0,3.4.1,3.4.1,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.2,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
|
:doc:`rocWMMA <rocwmma:index>`,2.2.0,2.1.0,2.0.0,2.0.0,2.0.0,1.7.0,1.7.0,1.7.0,1.7.0,1.6.0,1.6.0,1.6.0,1.6.0,1.5.0,1.5.0,1.5.0,1.5.0,1.4.0,1.4.0,1.4.0,1.4.0,1.3.0,1.3.0
|
||||||
:doc:`rocThrust <rocthrust:index>`,4.2.0,4.2.0,4.1.0,4.1.0,4.0.0,4.0.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.1.1,3.1.0,3.1.0,3.0.1,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,3.0.0
|
:doc:`Tensile <tensile:src/index>`,4.44.0,4.44.0,4.44.0,4.44.0,4.44.0,4.43.0,4.43.0,4.43.0,4.43.0,4.42.0,4.42.0,4.42.0,4.42.0,4.41.0,4.41.0,4.41.0,4.41.0,4.40.0,4.40.0,4.40.0,4.40.0,4.39.0,4.39.0
|
||||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
SUPPORT LIBS,,,,,,,,,,,,,,,,,,,,,,,,
|
PRIMITIVES,.. _primitivelibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,
|
||||||
`hipother <https://github.com/ROCm/hipother>`_,7.2.53211,7.2.26015,7.1.52802,7.1.25424,7.0.51831,7.0.51830,6.4.43483,6.4.43483,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
|
:doc:`hipCUB <hipcub:index>`,4.2.0,4.1.0,4.1.0,4.0.0,4.0.0,3.4.0,3.4.0,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.1,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
|
||||||
`rocm-core <https://github.com/ROCm/rocm-core>`_,7.2.1,7.2.0,7.1.1,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0,6.1.5,6.1.2,6.1.1,6.1.0,6.0.2,6.0.0
|
:doc:`hipTensor <hiptensor:index>`,2.2.0,2.0.0,2.0.0,2.0.0,2.0.0,1.5.0,1.5.0,1.5.0,1.5.0,1.4.0,1.4.0,1.4.0,1.4.0,1.3.0,1.3.0,1.3.0,1.3.0,1.2.0,1.2.0,1.2.0,1.2.0,1.1.0,1.1.0
|
||||||
`ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,20240607.5.7,20240607.5.7,20240607.4.05,20240607.1.4246,20240125.5.08,20240125.5.08,20240125.5.08,20240125.3.30,20231016.2.245,20231016.2.245
|
:doc:`rocPRIM <rocprim:index>`,4.2.0,4.1.0,4.1.0,4.0.1,4.0.0,3.4.1,3.4.1,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.2,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
|
||||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
:doc:`rocThrust <rocthrust:index>`,4.2.0,4.1.0,4.1.0,4.0.0,4.0.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.1.1,3.1.0,3.1.0,3.0.1,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,3.0.0
|
||||||
SYSTEM MGMT TOOLS,.. _tools-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,,
|
,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
:doc:`AMD SMI <amdsmi:index>`,26.2.2,26.2.1,26.2.0,26.1.0,26.0.2,26.0.0,25.5.1,25.5.1,25.4.2,25.3.0,24.7.1,24.7.1,24.7.1,24.7.1,24.6.3,24.6.3,24.6.3,24.6.2,24.5.1,24.5.1,24.5.1,24.4.1,23.4.2,23.4.2
|
SUPPORT LIBS,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
:doc:`ROCm Data Center Tool <rdc:index>`,1.2.0,1.2.0,1.2.0,1.2.0,1.1.0,1.1.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0
|
`hipother <https://github.com/ROCm/hipother>`_,7.2.26015,7.1.52802,7.1.25424,7.0.51831,7.0.51830,6.4.43483,6.4.43483,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
|
||||||
:doc:`rocminfo <rocminfo:index>`,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
|
`rocm-core <https://github.com/ROCm/rocm-core>`_,7.2.0,7.1.1,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0,6.1.5,6.1.2,6.1.1,6.1.0,6.0.2,6.0.0
|
||||||
:doc:`ROCm SMI <rocm_smi_lib:index>`,7.8.0,7.8.0,7.8.0,7.8.0,7.8.0,7.8.0,7.7.0,7.5.0,7.5.0,7.5.0,7.4.0,7.4.0,7.4.0,7.4.0,7.3.0,7.3.0,7.3.0,7.3.0,7.2.0,7.2.0,7.0.0,7.0.0,6.0.2,6.0.0
|
`ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,20240607.5.7,20240607.5.7,20240607.4.05,20240607.1.4246,20240125.5.08,20240125.5.08,20240125.5.08,20240125.3.30,20231016.2.245,20231016.2.245
|
||||||
:doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,1.3.0,1.3.0,1.3.0,1.2.0,1.2.0,1.2.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.0.60204,1.0.60202,1.0.60201,1.0.60200,1.0.60105,1.0.60102,1.0.60101,1.0.60100,1.0.60002,1.0.60000
|
,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
SYSTEM MGMT TOOLS,.. _tools-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,
|
||||||
PERFORMANCE TOOLS,,,,,,,,,,,,,,,,,,,,,,,,
|
:doc:`AMD SMI <amdsmi:index>`,26.2.1,26.2.0,26.1.0,26.0.2,26.0.0,25.5.1,25.5.1,25.4.2,25.3.0,24.7.1,24.7.1,24.7.1,24.7.1,24.6.3,24.6.3,24.6.3,24.6.2,24.5.1,24.5.1,24.5.1,24.4.1,23.4.2,23.4.2
|
||||||
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,2.6.0,2.6.0,2.6.0,2.6.0,2.6.0,2.6.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0
|
:doc:`ROCm Data Center Tool <rdc:index>`,1.2.0,1.2.0,1.2.0,1.1.0,1.1.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0
|
||||||
:doc:`ROCm Compute Profiler <rocprofiler-compute:index>`,3.4.0,3.4.0,3.3.1,3.3.0,3.2.3,3.2.3,3.1.1,3.1.1,3.1.0,3.1.0,3.0.0,3.0.0,3.0.0,3.0.0,2.0.1,2.0.1,2.0.1,2.0.1,N/A,N/A,N/A,N/A,N/A,N/A
|
:doc:`rocminfo <rocminfo:index>`,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
|
||||||
:doc:`ROCm Systems Profiler <rocprofiler-systems:index>`,1.3.0,1.3.0,1.2.1,1.2.0,1.1.1,1.1.0,1.0.2,1.0.2,1.0.1,1.0.0,0.1.2,0.1.1,0.1.0,0.1.0,1.11.2,1.11.2,1.11.2,1.11.2,N/A,N/A,N/A,N/A,N/A,N/A
|
:doc:`ROCm SMI <rocm_smi_lib:index>`,7.8.0,7.8.0,7.8.0,7.8.0,7.8.0,7.7.0,7.5.0,7.5.0,7.5.0,7.4.0,7.4.0,7.4.0,7.4.0,7.3.0,7.3.0,7.3.0,7.3.0,7.2.0,7.2.0,7.0.0,7.0.0,6.0.2,6.0.0
|
||||||
:doc:`ROCProfiler <rocprofiler:index>`,2.0.70201,2.0.70200,2.0.70101,2.0.70100,2.0.70002,2.0.70000,2.0.60403,2.0.60402,2.0.60401,2.0.60400,2.0.60303,2.0.60302,2.0.60301,2.0.60300,2.0.60204,2.0.60202,2.0.60201,2.0.60200,2.0.60105,2.0.60102,2.0.60101,2.0.60100,2.0.60002,2.0.60000
|
:doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,1.3.0,1.3.0,1.2.0,1.2.0,1.2.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.0.60204,1.0.60202,1.0.60201,1.0.60200,1.0.60105,1.0.60102,1.0.60101,1.0.60100,1.0.60002,1.0.60000
|
||||||
:doc:`ROCprofiler-SDK <rocprofiler-sdk:index>`,1.1.0,1.1.0,1.0.0,1.0.0,1.0.0,1.0.0,0.6.0,0.6.0,0.6.0,0.6.0,0.5.0,0.5.0,0.5.0,0.5.0,0.4.0,0.4.0,0.4.0,0.4.0,N/A,N/A,N/A,N/A,N/A,N/A
|
,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
:doc:`ROCTracer <roctracer:index>`,4.1.70201,4.1.70200,4.1.70101,4.1.70100,4.1.70002,4.1.70000,4.1.60403,4.1.60402,4.1.60401,4.1.60400,4.1.60303,4.1.60302,4.1.60301,4.1.60300,4.1.60204,4.1.60202,4.1.60201,4.1.60200,4.1.60105,4.1.60102,4.1.60101,4.1.60100,4.1.60002,4.1.60000
|
PERFORMANCE TOOLS,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,2.6.0,2.6.0,2.6.0,2.6.0,2.6.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0
|
||||||
DEVELOPMENT TOOLS,,,,,,,,,,,,,,,,,,,,,,,,
|
:doc:`ROCm Compute Profiler <rocprofiler-compute:index>`,3.4.0,3.3.1,3.3.0,3.2.3,3.2.3,3.1.1,3.1.1,3.1.0,3.1.0,3.0.0,3.0.0,3.0.0,3.0.0,2.0.1,2.0.1,2.0.1,2.0.1,N/A,N/A,N/A,N/A,N/A,N/A
|
||||||
:doc:`HIPIFY <hipify:index>`,22.0.0,22.0.0,20.0.0,20.0.0,20.0.0,20.0.0,19.0.0,19.0.0,19.0.0,19.0.0,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24455,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
|
:doc:`ROCm Systems Profiler <rocprofiler-systems:index>`,1.3.0,1.2.1,1.2.0,1.1.1,1.1.0,1.0.2,1.0.2,1.0.1,1.0.0,0.1.2,0.1.1,0.1.0,0.1.0,1.11.2,1.11.2,1.11.2,1.11.2,N/A,N/A,N/A,N/A,N/A,N/A
|
||||||
:doc:`ROCm CMake <rocmcmakebuildtools:index>`,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.13.0,0.13.0,0.13.0,0.13.0,0.12.0,0.12.0,0.12.0,0.12.0,0.11.0,0.11.0
|
:doc:`ROCProfiler <rocprofiler:index>`,2.0.70200,2.0.70101,2.0.70100,2.0.70002,2.0.70000,2.0.60403,2.0.60402,2.0.60401,2.0.60400,2.0.60303,2.0.60302,2.0.60301,2.0.60300,2.0.60204,2.0.60202,2.0.60201,2.0.60200,2.0.60105,2.0.60102,2.0.60101,2.0.60100,2.0.60002,2.0.60000
|
||||||
:doc:`ROCdbgapi <rocdbgapi:index>`,0.77.4,0.77.4,0.77.4,0.77.4,0.77.4,0.77.3,0.77.2,0.77.2,0.77.2,0.77.2,0.77.0,0.77.0,0.77.0,0.77.0,0.76.0,0.76.0,0.76.0,0.76.0,0.71.0,0.71.0,0.71.0,0.71.0,0.71.0,0.71.0
|
:doc:`ROCprofiler-SDK <rocprofiler-sdk:index>`,1.1.0,1.0.0,1.0.0,1.0.0,1.0.0,0.6.0,0.6.0,0.6.0,0.6.0,0.5.0,0.5.0,0.5.0,0.5.0,0.4.0,0.4.0,0.4.0,0.4.0,N/A,N/A,N/A,N/A,N/A,N/A
|
||||||
:doc:`ROCm Debugger (ROCgdb) <rocgdb:index>`,16.3.0,16.3.0,16.3.0,16.3.0,16.3.0,16.3.0,15.2.0,15.2.0,15.2.0,15.2.0,15.2.0,15.2.0,15.2.0,15.2.0,14.2.0,14.2.0,14.2.0,14.2.0,14.1.0,14.1.0,14.1.0,14.1.0,13.2.0,13.2.0
|
:doc:`ROCTracer <roctracer:index>`,4.1.70200,4.1.70101,4.1.70100,4.1.70002,4.1.70000,4.1.60403,4.1.60402,4.1.60401,4.1.60400,4.1.60303,4.1.60302,4.1.60301,4.1.60300,4.1.60204,4.1.60202,4.1.60201,4.1.60200,4.1.60105,4.1.60102,4.1.60101,4.1.60100,4.1.60002,4.1.60000
|
||||||
`rocprofiler-register <https://github.com/ROCm/rocprofiler-register>`_,0.5.0,0.5.0,0.5.0,0.5.0,0.5.0,0.5.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.3.0,0.3.0,0.3.0,0.3.0,N/A,N/A
|
,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
:doc:`ROCr Debug Agent <rocr_debug_agent:index>`,2.1.0,2.1.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.4,2.0.4,2.0.4,2.0.4,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3
|
DEVELOPMENT TOOLS,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
:doc:`HIPIFY <hipify:index>`,22.0.0,20.0.0,20.0.0,20.0.0,20.0.0,19.0.0,19.0.0,19.0.0,19.0.0,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24455,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
|
||||||
COMPILERS,.. _compilers-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,,
|
:doc:`ROCm CMake <rocmcmakebuildtools:index>`,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.13.0,0.13.0,0.13.0,0.13.0,0.12.0,0.12.0,0.12.0,0.12.0,0.11.0,0.11.0
|
||||||
`clang-ocl <https://github.com/ROCm/clang-ocl>`_,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,0.5.0,0.5.0,0.5.0,0.5.0,0.5.0,0.5.0
|
:doc:`ROCdbgapi <rocdbgapi:index>`,0.77.4,0.77.4,0.77.4,0.77.4,0.77.3,0.77.2,0.77.2,0.77.2,0.77.2,0.77.0,0.77.0,0.77.0,0.77.0,0.76.0,0.76.0,0.76.0,0.76.0,0.71.0,0.71.0,0.71.0,0.71.0,0.71.0,0.71.0
|
||||||
:doc:`hipCC <hipcc:index>`,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
|
:doc:`ROCm Debugger (ROCgdb) <rocgdb:index>`,16.3.0,16.3.0,16.3.0,16.3.0,16.3.0,15.2.0,15.2.0,15.2.0,15.2.0,15.2.0,15.2.0,15.2.0,15.2.0,14.2.0,14.2.0,14.2.0,14.2.0,14.1.0,14.1.0,14.1.0,14.1.0,13.2.0,13.2.0
|
||||||
`Flang <https://github.com/ROCm/flang>`_,22.0.0.26084,22.0.0.26014,20.0.025444,20.0.025425,20.0.0.25385,20.0.0.25314,19.0.0.25224,19.0.0.25224,19.0.0.25184,19.0.0.25133,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24455,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
|
`rocprofiler-register <https://github.com/ROCm/rocprofiler-register>`_,0.5.0,0.5.0,0.5.0,0.5.0,0.5.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.3.0,0.3.0,0.3.0,0.3.0,N/A,N/A
|
||||||
:doc:`llvm-project <llvm-project:index>`,22.0.0.26084,22.0.0.26014,20.0.025444,20.0.025425,20.0.0.25385,20.0.0.25314,19.0.0.25224,19.0.0.25224,19.0.0.25184,19.0.0.25133,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24491,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
|
:doc:`ROCr Debug Agent <rocr_debug_agent:index>`,2.1.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.4,2.0.4,2.0.4,2.0.4,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3
|
||||||
`OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_,22.0.0.26084,22.0.0.26014,20.0.025444,20.0.025425,20.0.0.25385,20.0.0.25314,19.0.0.25224,19.0.0.25224,19.0.0.25184,19.0.0.25133,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24491,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
|
,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
,,,,,,,,,,,,,,,,,,,,,,,,
|
COMPILERS,.. _compilers-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,
|
||||||
RUNTIMES,.. _runtime-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,,
|
`clang-ocl <https://github.com/ROCm/clang-ocl>`_,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,0.5.0,0.5.0,0.5.0,0.5.0,0.5.0,0.5.0
|
||||||
:doc:`AMD CLR <hip:understand/amd_clr>`,7.2.53211,7.2.26015,7.1.52802,7.1.25424,7.0.51831,7.0.51830,6.4.43484,6.4.43484,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
|
:doc:`hipCC <hipcc:index>`,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
|
||||||
:doc:`HIP <hip:index>`,7.2.53211,7.2.26015,7.1.52802,7.1.25424,7.0.51831,7.0.51830,6.4.43484,6.4.43484,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
|
`Flang <https://github.com/ROCm/flang>`_,22.0.0.26014,20.0.025444,20.0.025425,20.0.0.25385,20.0.0.25314,19.0.0.25224,19.0.0.25224,19.0.0.25184,19.0.0.25133,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24455,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
|
||||||
`OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0
|
:doc:`llvm-project <llvm-project:index>`,22.0.0.26014,20.0.025444,20.0.025425,20.0.0.25385,20.0.0.25314,19.0.0.25224,19.0.0.25224,19.0.0.25184,19.0.0.25133,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24491,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
|
||||||
:doc:`ROCr Runtime <rocr-runtime:index>`,1.18.0,1.18.0,1.18.0,1.18.0,1.18.0,1.18.0,1.15.0,1.15.0,1.15.0,1.15.0,1.14.0,1.14.0,1.14.0,1.14.0,1.14.0,1.14.0,1.14.0,1.13.0,1.13.0,1.13.0,1.13.0,1.13.0,1.12.0,1.12.0
|
`OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_,22.0.0.26014,20.0.025444,20.0.025425,20.0.0.25385,20.0.0.25314,19.0.0.25224,19.0.0.25224,19.0.0.25184,19.0.0.25133,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24491,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
|
||||||
|
,,,,,,,,,,,,,,,,,,,,,,,
|
||||||
|
RUNTIMES,.. _runtime-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,,,
|
||||||
|
:doc:`AMD CLR <hip:understand/amd_clr>`,7.2.26015,7.1.52802,7.1.25424,7.0.51831,7.0.51830,6.4.43484,6.4.43484,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
|
||||||
|
:doc:`HIP <hip:index>`,7.2.26015,7.1.52802,7.1.25424,7.0.51831,7.0.51830,6.4.43484,6.4.43484,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
|
||||||
|
`OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0
|
||||||
|
:doc:`ROCr Runtime <rocr-runtime:index>`,1.18.0,1.18.0,1.18.0,1.18.0,1.18.0,1.15.0,1.15.0,1.15.0,1.15.0,1.14.0,1.14.0,1.14.0,1.14.0,1.14.0,1.14.0,1.14.0,1.13.0,1.13.0,1.13.0,1.13.0,1.13.0,1.12.0,1.12.0
|
||||||
|
|||||||
|
@@ -22,10 +22,10 @@ compatibility and system requirements.
|
|||||||
.. container:: format-big-table
|
.. container:: format-big-table
|
||||||
|
|
||||||
.. csv-table::
|
.. csv-table::
|
||||||
:header: "ROCm Version", "7.2.1", "7.2.0", "6.4.0"
|
:header: "ROCm Version", "7.2.0", "7.1.1", "6.4.0"
|
||||||
:stub-columns: 1
|
:stub-columns: 1
|
||||||
|
|
||||||
:ref:`Operating systems & kernels <OS-kernel-versions>` [#os-compatibility]_,Ubuntu 24.04.4,Ubuntu 24.04.3,Ubuntu 24.04.2
|
:ref:`Operating systems & kernels <OS-kernel-versions>` [#os-compatibility]_,Ubuntu 24.04.3,Ubuntu 24.04.3,Ubuntu 24.04.2
|
||||||
,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5
|
,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5
|
||||||
,"RHEL 10.1, 10.0, 9.7, 9.6, 9.4","RHEL 10.1, 10.0, 9.7, 9.6, 9.4","RHEL 9.5, 9.4"
|
,"RHEL 10.1, 10.0, 9.7, 9.6, 9.4","RHEL 10.1, 10.0, 9.7, 9.6, 9.4","RHEL 9.5, 9.4"
|
||||||
,RHEL 8.10,RHEL 8.10,RHEL 8.10
|
,RHEL 8.10,RHEL 8.10,RHEL 8.10
|
||||||
@@ -54,14 +54,16 @@ compatibility and system requirements.
|
|||||||
,gfx908,gfx908,gfx908
|
,gfx908,gfx908,gfx908
|
||||||
,,,
|
,,,
|
||||||
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix:,,
|
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix:,,
|
||||||
:doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.9.1, 2.8.0, 2.7.1","2.9.1, 2.8.0, 2.7.1","2.6, 2.5, 2.4, 2.3"
|
:doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.9.1, 2.8.0, 2.7.1","2.9, 2.8, 2.7","2.6, 2.5, 2.4, 2.3"
|
||||||
:doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>`,"2.20.0, 2.19.1, 2.18.1","2.20.0, 2.19.1, 2.18.1","2.18.1, 2.17.1, 2.16.2"
|
:doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>`,"2.20.0, 2.19.1, 2.18.1","2.20.0, 2.19.1, 2.18.1","2.18.1, 2.17.1, 2.16.2"
|
||||||
:doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.8.2,0.8.0,0.4.35
|
:doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.8.0,0.7.1,0.4.35
|
||||||
:doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>` [#dgl_compat]_,N/A,N/A,2.4.0
|
:doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>` [#dgl_compat]_,N/A,N/A,2.4.0
|
||||||
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.23.2,1.23.2,1.20.0
|
:doc:`llama.cpp <../compatibility/ml-compatibility/llama-cpp-compatibility>` [#llama-cpp_compat]_,N/A,N/A,b5997
|
||||||
|
:doc:`FlashInfer <../compatibility/ml-compatibility/flashinfer-compatibility>` [#flashinfer_compat]_,N/A,v0.2.5,N/A
|
||||||
|
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.23.2,1.23.1,1.20.0
|
||||||
,,,
|
,,,
|
||||||
THIRD PARTY COMMS,.. _thirdpartycomms-support-compatibility-matrix:,,
|
THIRD PARTY COMMS,.. _thirdpartycomms-support-compatibility-matrix:,,
|
||||||
`UCC <https://github.com/ROCm/ucc>`_,>=1.6.0,>=1.4.0,>=1.3.0
|
`UCC <https://github.com/ROCm/ucc>`_,>=1.4.0,>=1.4.0,>=1.3.0
|
||||||
`UCX <https://github.com/ROCm/ucx>`_,>=1.17.0,>=1.17.0,>=1.15.0
|
`UCX <https://github.com/ROCm/ucx>`_,>=1.17.0,>=1.17.0,>=1.15.0
|
||||||
,,,
|
,,,
|
||||||
THIRD PARTY ALGORITHM,.. _thirdpartyalgorithm-support-compatibility-matrix:,,
|
THIRD PARTY ALGORITHM,.. _thirdpartyalgorithm-support-compatibility-matrix:,,
|
||||||
@@ -69,55 +71,55 @@ compatibility and system requirements.
|
|||||||
CUB,2.8.5,2.8.5,2.5.0
|
CUB,2.8.5,2.8.5,2.5.0
|
||||||
,,,
|
,,,
|
||||||
DRIVER & USER SPACE [#kfd_support]_,.. _kfd-userspace-support-compatibility-matrix:,,
|
DRIVER & USER SPACE [#kfd_support]_,.. _kfd-userspace-support-compatibility-matrix:,,
|
||||||
:doc:`AMD GPU Driver <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"30.30.1, 30.30.0, 30.20.1, |br| 30.20.0 [#mi325x_KVM]_, 30.10.2, |br| 30.10.1 [#driver_patch]_, 30.10, 6.4.x","30.30.0, 30.20.1, 30.20.0 [#mi325x_KVM]_, |br| 30.10.2, 30.10.1 [#driver_patch]_, |br| 30.10, 6.4.x","6.4.x, 6.3.x, 6.2.x, 6.1.x"
|
:doc:`AMD GPU Driver <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"30.30.0, 30.20.1, 30.20.0 [#mi325x_KVM]_, |br| 30.10.2, 30.10.1 [#driver_patch]_, |br| 30.10, 6.4.x","30.20.1, 30.20.0 [#mi325x_KVM]_, |br| 30.10.2, 30.10.1 [#driver_patch]_, |br| 30.10, 6.4.x","6.4.x, 6.3.x, 6.2.x, 6.1.x"
|
||||||
,,,
|
,,,
|
||||||
ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix:,,
|
ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix:,,
|
||||||
:doc:`Composable Kernel <composable_kernel:index>`,1.2.0,1.2.0,1.1.0
|
:doc:`Composable Kernel <composable_kernel:index>`,1.2.0,1.1.0,1.1.0
|
||||||
:doc:`MIGraphX <amdmigraphx:index>`,2.15.0,2.15.0,2.12.0
|
:doc:`MIGraphX <amdmigraphx:index>`,2.15.0,2.14.0,2.12.0
|
||||||
:doc:`MIOpen <miopen:index>`,3.5.1,3.5.1,3.4.0
|
:doc:`MIOpen <miopen:index>`,3.5.1,3.5.1,3.4.0
|
||||||
:doc:`MIVisionX <mivisionx:index>`,3.5.0,3.5.0,3.2.0
|
:doc:`MIVisionX <mivisionx:index>`,3.5.0,3.4.0,3.2.0
|
||||||
:doc:`rocAL <rocal:index>`,2.5.0,2.5.0,2.2.0
|
:doc:`rocAL <rocal:index>`,2.5.0,2.4.0,2.2.0
|
||||||
:doc:`rocDecode <rocdecode:index>`,1.7.0,1.5.0,0.10.0
|
:doc:`rocDecode <rocdecode:index>`,1.5.0,1.4.0,0.10.0
|
||||||
:doc:`rocJPEG <rocjpeg:index>`,1.4.0,1.3.0,0.8.0
|
:doc:`rocJPEG <rocjpeg:index>`,1.3.0,1.2.0,0.8.0
|
||||||
:doc:`rocPyDecode <rocpydecode:index>`,0.8.0,0.8.0,0.3.1
|
:doc:`rocPyDecode <rocpydecode:index>`,0.8.0,0.7.0,0.3.1
|
||||||
:doc:`RPP <rpp:index>`,2.2.1,2.2.0,1.9.10
|
:doc:`RPP <rpp:index>`,2.2.0,2.1.0,1.9.10
|
||||||
,,,
|
,,,
|
||||||
COMMUNICATION,.. _commlibs-support-compatibility-matrix:,,
|
COMMUNICATION,.. _commlibs-support-compatibility-matrix:,,
|
||||||
:doc:`RCCL <rccl:index>`,2.27.7,2.27.7,2.22.3
|
:doc:`RCCL <rccl:index>`,2.27.7,2.27.7,2.22.3
|
||||||
:doc:`rocSHMEM <rocshmem:index>`,3.2.0,3.2.0,2.0.0
|
:doc:`rocSHMEM <rocshmem:index>`,3.2.0,3.1.0,2.0.0
|
||||||
,,,
|
,,,
|
||||||
MATH LIBS,.. _mathlibs-support-compatibility-matrix:,,
|
MATH LIBS,.. _mathlibs-support-compatibility-matrix:,,
|
||||||
`half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0,1.12.0
|
`half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0,1.12.0
|
||||||
:doc:`hipBLAS <hipblas:index>`,3.2.0,3.2.0,2.4.0
|
:doc:`hipBLAS <hipblas:index>`,3.2.0,3.1.0,2.4.0
|
||||||
:doc:`hipBLASLt <hipblaslt:index>`,1.2.2,1.2.1,0.12.0
|
:doc:`hipBLASLt <hipblaslt:index>`,1.2.1,1.1.0,0.12.0
|
||||||
:doc:`hipFFT <hipfft:index>`,1.0.22,1.0.22,1.0.18
|
:doc:`hipFFT <hipfft:index>`,1.0.22,1.0.21,1.0.18
|
||||||
:doc:`hipfort <hipfort:index>`,0.7.1,0.7.1,0.6.0
|
:doc:`hipfort <hipfort:index>`,0.7.1,0.7.1,0.6.0
|
||||||
:doc:`hipRAND <hiprand:index>`,3.1.0,3.1.0,2.12.0
|
:doc:`hipRAND <hiprand:index>`,3.1.0,3.1.0,2.12.0
|
||||||
:doc:`hipSOLVER <hipsolver:index>`,3.2.0,3.2.0,2.4.0
|
:doc:`hipSOLVER <hipsolver:index>`,3.2.0,3.1.0,2.4.0
|
||||||
:doc:`hipSPARSE <hipsparse:index>`,4.2.0,4.2.0,3.2.0
|
:doc:`hipSPARSE <hipsparse:index>`,4.2.0,4.1.0,3.2.0
|
||||||
:doc:`hipSPARSELt <hipsparselt:index>`,0.2.6,0.2.6,0.2.3
|
:doc:`hipSPARSELt <hipsparselt:index>`,0.2.6,0.2.5,0.2.3
|
||||||
:doc:`rocALUTION <rocalution:index>`,4.1.0,4.1.0,3.2.2
|
:doc:`rocALUTION <rocalution:index>`,4.1.0,4.0.1,3.2.2
|
||||||
:doc:`rocBLAS <rocblas:index>`,5.2.0,5.2.0,4.4.0
|
:doc:`rocBLAS <rocblas:index>`,5.2.0,5.1.1,4.4.0
|
||||||
:doc:`rocFFT <rocfft:index>`,1.0.36,1.0.36,1.0.32
|
:doc:`rocFFT <rocfft:index>`,1.0.36,1.0.35,1.0.32
|
||||||
:doc:`rocRAND <rocrand:index>`,4.2.0,4.2.0,3.3.0
|
:doc:`rocRAND <rocrand:index>`,4.2.0,4.1.0,3.3.0
|
||||||
:doc:`rocSOLVER <rocsolver:index>`,3.32.0,3.32.0,3.28.0
|
:doc:`rocSOLVER <rocsolver:index>`,3.32.0,3.31.0,3.28.0
|
||||||
:doc:`rocSPARSE <rocsparse:index>`,4.2.0,4.2.0,3.4.0
|
:doc:`rocSPARSE <rocsparse:index>`,4.2.0,4.1.0,3.4.0
|
||||||
:doc:`rocWMMA <rocwmma:index>`,2.2.0,2.2.0,1.7.0
|
:doc:`rocWMMA <rocwmma:index>`,2.2.0,2.1.0,1.7.0
|
||||||
:doc:`Tensile <tensile:src/index>`,4.45.0,4.45.0,4.43.0
|
:doc:`Tensile <tensile:src/index>`,4.44.0,4.44.0,4.43.0
|
||||||
,,,
|
,,,
|
||||||
PRIMITIVES,.. _primitivelibs-support-compatibility-matrix:,,
|
PRIMITIVES,.. _primitivelibs-support-compatibility-matrix:,,
|
||||||
:doc:`hipCUB <hipcub:index>`,4.2.0,4.2.0,3.4.0
|
:doc:`hipCUB <hipcub:index>`,4.2.0,4.1.0,3.4.0
|
||||||
:doc:`hipTensor <hiptensor:index>`,2.2.0,2.2.0,1.5.0
|
:doc:`hipTensor <hiptensor:index>`,2.2.0,2.0.0,1.5.0
|
||||||
:doc:`rocPRIM <rocprim:index>`,4.2.0,4.2.0,3.4.0
|
:doc:`rocPRIM <rocprim:index>`,4.2.0,4.1.0,3.4.0
|
||||||
:doc:`rocThrust <rocthrust:index>`,4.2.0,4.2.0,3.3.0
|
:doc:`rocThrust <rocthrust:index>`,4.2.0,4.1.0,3.3.0
|
||||||
,,,
|
,,,
|
||||||
SUPPORT LIBS,,,
|
SUPPORT LIBS,,,
|
||||||
`hipother <https://github.com/ROCm/hipother>`_,7.2.53211,7.2.26015,6.4.43482
|
`hipother <https://github.com/ROCm/hipother>`_,7.2.26015,7.1.52802,6.4.43482
|
||||||
`rocm-core <https://github.com/ROCm/rocm-core>`_,7.2.1,7.2.0,6.4.0
|
`rocm-core <https://github.com/ROCm/rocm-core>`_,7.2.0,7.1.1,6.4.0
|
||||||
`ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,N/A [#ROCT-rocr]_,N/A [#ROCT-rocr]_,N/A [#ROCT-rocr]_
|
`ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,N/A [#ROCT-rocr]_,N/A [#ROCT-rocr]_,N/A [#ROCT-rocr]_
|
||||||
,,,
|
,,,
|
||||||
SYSTEM MGMT TOOLS,.. _tools-support-compatibility-matrix:,,
|
SYSTEM MGMT TOOLS,.. _tools-support-compatibility-matrix:,,
|
||||||
:doc:`AMD SMI <amdsmi:index>`,26.2.2,26.2.1,25.3.0
|
:doc:`AMD SMI <amdsmi:index>`,26.2.1,26.2.0,25.3.0
|
||||||
:doc:`ROCm Data Center Tool <rdc:index>`,1.2.0,1.2.0,0.3.0
|
:doc:`ROCm Data Center Tool <rdc:index>`,1.2.0,1.2.0,0.3.0
|
||||||
:doc:`rocminfo <rocminfo:index>`,1.0.0,1.0.0,1.0.0
|
:doc:`rocminfo <rocminfo:index>`,1.0.0,1.0.0,1.0.0
|
||||||
:doc:`ROCm SMI <rocm_smi_lib:index>`,7.8.0,7.8.0,7.5.0
|
:doc:`ROCm SMI <rocm_smi_lib:index>`,7.8.0,7.8.0,7.5.0
|
||||||
@@ -125,14 +127,14 @@ compatibility and system requirements.
|
|||||||
,,,
|
,,,
|
||||||
PERFORMANCE TOOLS,,,
|
PERFORMANCE TOOLS,,,
|
||||||
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,2.6.0,2.6.0,1.4.0
|
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,2.6.0,2.6.0,1.4.0
|
||||||
:doc:`ROCm Compute Profiler <rocprofiler-compute:index>`,3.4.0,3.4.0,3.1.0
|
:doc:`ROCm Compute Profiler <rocprofiler-compute:index>`,3.4.0,3.3.1,3.1.0
|
||||||
:doc:`ROCm Systems Profiler <rocprofiler-systems:index>`,1.3.0,1.3.0,1.0.0
|
:doc:`ROCm Systems Profiler <rocprofiler-systems:index>`,1.3.0,1.2.1,1.0.0
|
||||||
:doc:`ROCProfiler <rocprofiler:index>`,2.0.70201,2.0.70200,2.0.60400
|
:doc:`ROCProfiler <rocprofiler:index>`,2.0.70200,2.0.70101,2.0.60400
|
||||||
:doc:`ROCprofiler-SDK <rocprofiler-sdk:index>`,1.1.0,1.1.0,0.6.0
|
:doc:`ROCprofiler-SDK <rocprofiler-sdk:index>`,1.1.0,1.0.0,0.6.0
|
||||||
:doc:`ROCTracer <roctracer:index>`,4.1.70201,4.1.70200,4.1.60400
|
:doc:`ROCTracer <roctracer:index>`,4.1.70200,4.1.70101,4.1.60400
|
||||||
,,,
|
,,,
|
||||||
DEVELOPMENT TOOLS,,,
|
DEVELOPMENT TOOLS,,,
|
||||||
:doc:`HIPIFY <hipify:index>`,22.0.0,22.0.0,19.0.0
|
:doc:`HIPIFY <hipify:index>`,22.0.0,20.0.0,19.0.0
|
||||||
:doc:`ROCm CMake <rocmcmakebuildtools:index>`,0.14.0,0.14.0,0.14.0
|
:doc:`ROCm CMake <rocmcmakebuildtools:index>`,0.14.0,0.14.0,0.14.0
|
||||||
:doc:`ROCdbgapi <rocdbgapi:index>`,0.77.4,0.77.4,0.77.2
|
:doc:`ROCdbgapi <rocdbgapi:index>`,0.77.4,0.77.4,0.77.2
|
||||||
:doc:`ROCm Debugger (ROCgdb) <rocgdb:index>`,16.3.0,16.3.0,15.2.0
|
:doc:`ROCm Debugger (ROCgdb) <rocgdb:index>`,16.3.0,16.3.0,15.2.0
|
||||||
@@ -142,22 +144,24 @@ compatibility and system requirements.
|
|||||||
COMPILERS,.. _compilers-support-compatibility-matrix:,,
|
COMPILERS,.. _compilers-support-compatibility-matrix:,,
|
||||||
`clang-ocl <https://github.com/ROCm/clang-ocl>`_,N/A,N/A,N/A
|
`clang-ocl <https://github.com/ROCm/clang-ocl>`_,N/A,N/A,N/A
|
||||||
:doc:`hipCC <hipcc:index>`,1.1.1,1.1.1,1.1.1
|
:doc:`hipCC <hipcc:index>`,1.1.1,1.1.1,1.1.1
|
||||||
`Flang <https://github.com/ROCm/flang>`_,22.0.0.26084,22.0.0.26014,19.0.0.25133
|
`Flang <https://github.com/ROCm/flang>`_,22.0.0.26014,20.0.025444,19.0.0.25133
|
||||||
:doc:`llvm-project <llvm-project:index>`,22.0.0.26084,22.0.0.26014,19.0.0.25133
|
:doc:`llvm-project <llvm-project:index>`,22.0.0.26014,20.0.025444,19.0.0.25133
|
||||||
`OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_,22.0.0.26084,22.0.0.26014,19.0.0.25133
|
`OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_,22.0.0.26014,20.0.025444,19.0.0.25133
|
||||||
,,,
|
,,,
|
||||||
RUNTIMES,.. _runtime-support-compatibility-matrix:,,
|
RUNTIMES,.. _runtime-support-compatibility-matrix:,,
|
||||||
:doc:`AMD CLR <hip:understand/amd_clr>`,7.2.53211,7.2.26015,6.4.43482
|
:doc:`AMD CLR <hip:understand/amd_clr>`,7.2.26015,7.1.52802,6.4.43482
|
||||||
:doc:`HIP <hip:index>`,7.2.53211,7.2.26015,6.4.43482
|
:doc:`HIP <hip:index>`,7.2.26015,7.1.52802,6.4.43482
|
||||||
`OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_,2.0.0,2.0.0,2.0.0
|
`OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_,2.0.0,2.0.0,2.0.0
|
||||||
:doc:`ROCr Runtime <rocr-runtime:index>`,1.18.0,1.18.0,1.15.0
|
:doc:`ROCr Runtime <rocr-runtime:index>`,1.18.0,1.18.0,1.15.0
|
||||||
|
|
||||||
|
|
||||||
.. rubric:: Footnotes
|
.. rubric:: Footnotes
|
||||||
|
|
||||||
.. [#os-compatibility] Some operating systems are supported on specific GPUs. For detailed information about operating systems supported on ROCm 7.2.1, see the latest :ref:`supported_distributions`. For version specific information, see `ROCm 7.2.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.0/reference/system-requirements.html#supported-operating-systems>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-operating-systems>`__.
|
.. [#os-compatibility] Some operating systems are supported on specific GPUs. For detailed information about operating systems supported on ROCm 7.2.0, see the latest :ref:`supported_distributions`. For version specific information, see `ROCm 7.1.1 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.1/reference/system-requirements.html#supported-operating-systems>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-operating-systems>`__.
|
||||||
.. [#gpu-compatibility] Some GPUs have limited operating system support. For detailed information about GPUs supporting ROCm 7.2.1, see the latest :ref:`supported_GPUs`. For version specific information, see `ROCm 7.2.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.0/reference/system-requirements.html#supported-gpus>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-gpus>`__.
|
.. [#gpu-compatibility] Some GPUs have limited operating system support. For detailed information about GPUs supporting ROCm 7.2.0, see the latest :ref:`supported_GPUs`. For version specific information, see `ROCm 7.1.1 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.1/reference/system-requirements.html#supported-gpus>`__, `ROCm 7.1.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.0/reference/system-requirements.html#supported-gpus>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-gpus>`__.
|
||||||
.. [#dgl_compat] DGL is supported only on ROCm 7.0.0, ROCm 6.4.3, and ROCm 6.4.0.
|
.. [#dgl_compat] DGL is only supported on ROCm 7.0.0, 6.4.3 and 6.4.0.
|
||||||
|
.. [#llama-cpp_compat] llama.cpp is only supported on ROCm 7.0.0 and 6.4.x.
|
||||||
|
.. [#flashinfer_compat] FlashInfer is only supported on ROCm 7.1.1 and 6.4.1.
|
||||||
.. [#mi325x_KVM] For AMD Instinct MI325X KVM SR-IOV users, do not use AMD GPU Driver (amdgpu) 30.20.0.
|
.. [#mi325x_KVM] For AMD Instinct MI325X KVM SR-IOV users, do not use AMD GPU Driver (amdgpu) 30.20.0.
|
||||||
.. [#driver_patch] AMD GPU Driver (amdgpu) 30.10.1 is a quality release that resolves an issue identified in the 30.10 release. There are no other significant changes or feature additions in ROCm 7.0.1 from ROCm 7.0.0. AMD GPU Driver (amdgpu) 30.10.1 is compatible with ROCm 7.0.1 and ROCm 7.0.0.
|
.. [#driver_patch] AMD GPU Driver (amdgpu) 30.10.1 is a quality release that resolves an issue identified in the 30.10 release. There are no other significant changes or feature additions in ROCm 7.0.1 from ROCm 7.0.0. AMD GPU Driver (amdgpu) 30.10.1 is compatible with ROCm 7.0.1 and ROCm 7.0.0.
|
||||||
.. [#kfd_support] As of ROCm 6.4.0, forward and backward compatibility between the AMD GPU Driver (amdgpu) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The supported user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and AMD GPU Driver support matrix <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html>`_.
|
.. [#kfd_support] As of ROCm 6.4.0, forward and backward compatibility between the AMD GPU Driver (amdgpu) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The supported user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and AMD GPU Driver support matrix <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html>`_.
|
||||||
@@ -168,13 +172,12 @@ compatibility and system requirements.
|
|||||||
Operating systems, kernel and Glibc versions
|
Operating systems, kernel and Glibc versions
|
||||||
*********************************************
|
*********************************************
|
||||||
|
|
||||||
For detailed information on operating system supported on ROCm 7.2.1 and associated Kernel and Glibc version, see the latest :ref:`supported_distributions`. For version specific information, see `ROCm 7.2.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.2.0/reference/system-requirements.html#supported-operating-systems>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-operating-systems>`__.
|
For detailed information on operating system supported on ROCm 7.2.0 and associated Kernel and Glibc version, see the latest :ref:`supported_distributions`. For version specific information, see `ROCm 7.1.1 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.1/reference/system-requirements.html#supported-operating-systems>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-operating-systems>`__.
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
* See `Red Hat Enterprise Linux Release Dates <https://access.redhat.com/articles/3078>`_ to learn about the specific kernel versions supported on Red Hat Enterprise Linux (RHEL).
|
* See `Red Hat Enterprise Linux Release Dates <https://access.redhat.com/articles/3078>`_ to learn about the specific kernel versions supported on Red Hat Enterprise Linux (RHEL).
|
||||||
* See `List of SUSE Linux Enterprise Server kernel <https://www.suse.com/support/kb/doc/?id=000019587>`_ to learn about the specific kernel version supported on SUSE Linux Enterprise Server (SLES).
|
* See `List of SUSE Linux Enterprise Server kernel <https://www.suse.com/support/kb/doc/?id=000019587>`_ to learn about the specific kernel version supported on SUSE Linux Enterprise Server (SLES).
|
||||||
|
|
||||||
..
|
..
|
||||||
Footnotes and ref anchors in below historical tables should be appended with "-past-60", to differentiate from the
|
Footnotes and ref anchors in below historical tables should be appended with "-past-60", to differentiate from the
|
||||||
footnote references in the above, latest, compatibility matrix. It also allows to easily find & replace.
|
footnote references in the above, latest, compatibility matrix. It also allows to easily find & replace.
|
||||||
@@ -182,6 +185,7 @@ For detailed information on operating system supported on ROCm 7.2.1 and associa
|
|||||||
delete the columns you don't need, to build the current compatibility matrix to use in above table. Find & replace all
|
delete the columns you don't need, to build the current compatibility matrix to use in above table. Find & replace all
|
||||||
instances of "-past-60" to make it ready for above table.
|
instances of "-past-60" to make it ready for above table.
|
||||||
|
|
||||||
|
|
||||||
.. _past-rocm-compatibility-matrix:
|
.. _past-rocm-compatibility-matrix:
|
||||||
|
|
||||||
Past versions of ROCm compatibility matrix
|
Past versions of ROCm compatibility matrix
|
||||||
@@ -203,7 +207,13 @@ Expand for full historical view of:
|
|||||||
.. [#os-compatibility-past-60] Some operating systems are supported on specific GPUs. For detailed information, see :ref:`supported_distributions` and select the required ROCm version for version specific support.
|
.. [#os-compatibility-past-60] Some operating systems are supported on specific GPUs. For detailed information, see :ref:`supported_distributions` and select the required ROCm version for version specific support.
|
||||||
.. [#gpu-compatibility-past-60] Some GPUs have limited operating system support. For detailed information, see :ref:`supported_GPUs` and select the required ROCm version for version specific support.
|
.. [#gpu-compatibility-past-60] Some GPUs have limited operating system support. For detailed information, see :ref:`supported_GPUs` and select the required ROCm version for version specific support.
|
||||||
.. [#tf-mi350-past-60] TensorFlow 2.17.1 is not supported on AMD Instinct MI350 Series GPUs. Use TensorFlow 2.19.1 or 2.18.1 with MI350 Series GPUs instead.
|
.. [#tf-mi350-past-60] TensorFlow 2.17.1 is not supported on AMD Instinct MI350 Series GPUs. Use TensorFlow 2.19.1 or 2.18.1 with MI350 Series GPUs instead.
|
||||||
.. [#dgl_compat-past-60] DGL is supported only on ROCm 7.0.0, ROCm 6.4.3, and ROCm 6.4.0.
|
.. [#verl_compat-past-60] verl is only supported on ROCm 7.0.0 and 6.2.0.
|
||||||
|
.. [#stanford-megatron-lm_compat-past-60] Stanford Megatron-LM is only supported on ROCm 6.3.0.
|
||||||
|
.. [#dgl_compat-past-60] DGL is only supported on ROCm 7.0.0, 6.4.3 and 6.4.0.
|
||||||
|
.. [#megablocks_compat-past-60] Megablocks is only supported on ROCm 6.3.0.
|
||||||
|
.. [#ray_compat-past-60] Ray is only supported on ROCm 7.0.0 and 6.4.1.
|
||||||
|
.. [#llama-cpp_compat-past-60] llama.cpp is only supported on ROCm 7.0.0 and 6.4.x.
|
||||||
|
.. [#flashinfer_compat-past-60] FlashInfer is only supported on ROCm 7.1.1 and 6.4.1.
|
||||||
.. [#mi325x_KVM-past-60] For AMD Instinct MI325X KVM SR-IOV users, do not use AMD GPU Driver (amdgpu) 30.20.0.
|
.. [#mi325x_KVM-past-60] For AMD Instinct MI325X KVM SR-IOV users, do not use AMD GPU Driver (amdgpu) 30.20.0.
|
||||||
.. [#driver_patch-past-60] AMD GPU Driver (amdgpu) 30.10.1 is a quality release that resolves an issue identified in the 30.10 release. There are no other significant changes or feature additions in ROCm 7.0.1 from ROCm 7.0.0. AMD GPU Driver (amdgpu) 30.10.1 is compatible with ROCm 7.0.1 and ROCm 7.0.0.
|
.. [#driver_patch-past-60] AMD GPU Driver (amdgpu) 30.10.1 is a quality release that resolves an issue identified in the 30.10 release. There are no other significant changes or feature additions in ROCm 7.0.1 from ROCm 7.0.0. AMD GPU Driver (amdgpu) 30.10.1 is compatible with ROCm 7.0.1 and ROCm 7.0.0.
|
||||||
.. [#kfd_support-past-60] As of ROCm 6.4.0, forward and backward compatibility between the AMD GPU Driver (amdgpu) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The supported user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and AMD GPU Driver support matrix <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html>`_.
|
.. [#kfd_support-past-60] As of ROCm 6.4.0, forward and backward compatibility between the AMD GPU Driver (amdgpu) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The supported user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and AMD GPU Driver support matrix <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html>`_.
|
||||||
|
|||||||
113
docs/compatibility/ml-compatibility/flashinfer-compatibility.rst
Normal file
113
docs/compatibility/ml-compatibility/flashinfer-compatibility.rst
Normal file
@@ -0,0 +1,113 @@
|
|||||||
|
:orphan:
|
||||||
|
|
||||||
|
.. meta::
|
||||||
|
:description: FlashInfer compatibility
|
||||||
|
:keywords: GPU, LLM, FlashInfer, deep learning, framework compatibility
|
||||||
|
|
||||||
|
.. version-set:: rocm_version latest
|
||||||
|
|
||||||
|
********************************************************************************
|
||||||
|
FlashInfer compatibility
|
||||||
|
********************************************************************************
|
||||||
|
|
||||||
|
`FlashInfer <https://docs.flashinfer.ai/index.html>`__ is a library and kernel generator
|
||||||
|
for Large Language Models (LLMs) that provides a high-performance implementation of graphics
|
||||||
|
processing units (GPUs) kernels. FlashInfer focuses on LLM serving and inference, as well
|
||||||
|
as advanced performance across diverse scenarios.
|
||||||
|
|
||||||
|
FlashInfer features highly efficient attention kernels, load-balanced scheduling, and memory-optimized
|
||||||
|
techniques, while supporting customized attention variants. It’s compatible with ``torch.compile``, and
|
||||||
|
offers high-performance LLM-specific operators, with easy integration through PyTorch, and C++ APIs.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The ROCm port of FlashInfer is under active development, and some features are not yet available.
|
||||||
|
For the latest feature compatibility matrix, refer to the ``README`` of the
|
||||||
|
`https://github.com/ROCm/flashinfer <https://github.com/ROCm/flashinfer>`__ repository.
|
||||||
|
|
||||||
|
Support overview
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
- The ROCm-supported version of FlashInfer is maintained in the official `https://github.com/ROCm/flashinfer
|
||||||
|
<https://github.com/ROCm/flashinfer>`__ repository, which differs from the
|
||||||
|
`https://github.com/flashinfer-ai/flashinfer <https://github.com/flashinfer-ai/flashinfer>`__
|
||||||
|
upstream repository.
|
||||||
|
|
||||||
|
- To get started and install FlashInfer on ROCm, use the prebuilt :ref:`Docker images <flashinfer-docker-compat>`,
|
||||||
|
which include ROCm, FlashInfer, and all required dependencies.
|
||||||
|
|
||||||
|
- See the :doc:`ROCm FlashInfer installation guide <rocm-install-on-linux:install/3rd-party/flashinfer-install>`
|
||||||
|
for installation and setup instructions.
|
||||||
|
|
||||||
|
- You can also consult the upstream `Installation guide <https://docs.flashinfer.ai/installation.html>`__
|
||||||
|
for additional context.
|
||||||
|
|
||||||
|
.. _flashinfer-docker-compat:
|
||||||
|
|
||||||
|
Compatibility matrix
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
.. |docker-icon| raw:: html
|
||||||
|
|
||||||
|
<i class="fab fa-docker"></i>
|
||||||
|
|
||||||
|
AMD validates and publishes `FlashInfer images <https://hub.docker.com/r/rocm/flashinfer/tags>`__
|
||||||
|
with ROCm backends on Docker Hub. The following Docker image tag and associated
|
||||||
|
inventories represent the latest available FlashInfer version from the official Docker Hub.
|
||||||
|
Click |docker-icon| to view the image on Docker Hub.
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
:header-rows: 1
|
||||||
|
:class: docker-image-compatibility
|
||||||
|
|
||||||
|
* - Docker image
|
||||||
|
- ROCm
|
||||||
|
- FlashInfer
|
||||||
|
- PyTorch
|
||||||
|
- Ubuntu
|
||||||
|
- Python
|
||||||
|
- GPU
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/flashinfer/flashinfer-0.2.5.amd2_rocm7.1.1_ubuntu24.04_py3.12_pytorch2.8/images/sha256-9ab6426750a11dbab9bcddeaccaf492683bfd96a1d60b21dd9fc3a609a98175b"><i class="fab fa-docker fa-lg"></i> rocm/flashinfer</a>
|
||||||
|
- `7.1.1 <https://repo.radeon.com/rocm/apt/7.1.1/>`__
|
||||||
|
- `v0.2.5 <https://github.com/flashinfer-ai/flashinfer/releases/tag/v0.2.5>`__
|
||||||
|
- `2.8.0 <https://github.com/ROCm/pytorch/releases/tag/v2.8.0>`__
|
||||||
|
- 24.04
|
||||||
|
- `3.12 <https://www.python.org/downloads/release/python-3129/>`__
|
||||||
|
- MI325X, MI300X
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/flashinfer/flashinfer-0.2.5_rocm6.4_ubuntu24.04_py3.12_pytorch2.7/images/sha256-558914838821c88c557fb6d42cfbc1bdb67d79d19759f37c764a9ee801f93313"><i class="fab fa-docker fa-lg"></i> rocm/flashinfer</a>
|
||||||
|
- `6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`__
|
||||||
|
- `v0.2.5 <https://github.com/flashinfer-ai/flashinfer/releases/tag/v0.2.5>`__
|
||||||
|
- `2.7.1 <https://github.com/ROCm/pytorch/releases/tag/v2.7.1>`__
|
||||||
|
- 24.04
|
||||||
|
- `3.12 <https://www.python.org/downloads/release/python-3129/>`__
|
||||||
|
- MI300X
|
||||||
|
|
||||||
|
.. _flashinfer-recommendations:
|
||||||
|
|
||||||
|
Use cases and recommendations
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
FlashInfer on ROCm enables you to perform LLM inference for both prefill and decode:
|
||||||
|
during prefill, your model efficiently processes input prompts to build KV caches
|
||||||
|
and internal activations; during decode, it generates tokens sequentially based on
|
||||||
|
prior outputs and context. Use the attention mode supported upstream (Multi-Head
|
||||||
|
Attention, Grouped-Query Attention, or Multi-Query Attention) that matches your
|
||||||
|
model configuration.
|
||||||
|
|
||||||
|
FlashInfer on ROCm also includes capabilities such as load balancing,
|
||||||
|
sparse and dense attention optimizations, and single and batch decode, alongside
|
||||||
|
prefill for high‑performance execution on MI300X GPUs.
|
||||||
|
|
||||||
|
For currently supported use cases and recommendations, refer to the `AMD ROCm blog <https://rocm.blogs.amd.com/search.html?q=flashinfer>`__,
|
||||||
|
where you can search for examples and best practices to optimize your workloads on AMD GPUs.
|
||||||
|
|
||||||
|
Previous versions
|
||||||
|
===============================================================================
|
||||||
|
See :doc:`rocm-install-on-linux:install/3rd-party/previous-versions/flashinfer-history` to find documentation for previous releases
|
||||||
|
of the ``ROCm/flashinfer`` Docker image.
|
||||||
@@ -56,15 +56,15 @@ between JAX Plugin–PJRT and JAX/JAXLIB.
|
|||||||
* - JAX Plugin-PJRT
|
* - JAX Plugin-PJRT
|
||||||
- JAX/JAXLIB
|
- JAX/JAXLIB
|
||||||
- ROCm
|
- ROCm
|
||||||
* - 0.8.2
|
|
||||||
- 0.8.2
|
|
||||||
- 7.2.1
|
|
||||||
* - 0.8.0
|
* - 0.8.0
|
||||||
- 0.8.0
|
- 0.8.0
|
||||||
- 7.2.0
|
- 7.2.0
|
||||||
* - 0.7.1
|
* - 0.7.1
|
||||||
- 0.7.1
|
- 0.7.1
|
||||||
- 7.1.1, 7.1.0
|
- 7.1.1, 7.1.0
|
||||||
|
* - 0.6.0
|
||||||
|
- 0.6.2, 0.6.0
|
||||||
|
- 7.0.2, 7.0.1, 7.0.0
|
||||||
|
|
||||||
Use cases and recommendations
|
Use cases and recommendations
|
||||||
================================================================================
|
================================================================================
|
||||||
|
|||||||
275
docs/compatibility/ml-compatibility/llama-cpp-compatibility.rst
Normal file
275
docs/compatibility/ml-compatibility/llama-cpp-compatibility.rst
Normal file
@@ -0,0 +1,275 @@
|
|||||||
|
:orphan:
|
||||||
|
|
||||||
|
.. meta::
|
||||||
|
:description: llama.cpp compatibility
|
||||||
|
:keywords: GPU, GGML, llama.cpp, deep learning, framework compatibility
|
||||||
|
|
||||||
|
.. version-set:: rocm_version latest
|
||||||
|
|
||||||
|
********************************************************************************
|
||||||
|
llama.cpp compatibility
|
||||||
|
********************************************************************************
|
||||||
|
|
||||||
|
`llama.cpp <https://github.com/ggml-org/llama.cpp>`__ is an open-source framework
|
||||||
|
for Large Language Model (LLM) inference that runs on both central processing units
|
||||||
|
(CPUs) and graphics processing units (GPUs). It is written in plain C/C++, providing
|
||||||
|
a simple, dependency-free setup.
|
||||||
|
|
||||||
|
The framework supports multiple quantization options, from 1.5-bit to 8-bit integers,
|
||||||
|
to accelerate inference and reduce memory usage. Originally built as a CPU-first library,
|
||||||
|
llama.cpp is easy to integrate with other programming environments and is widely
|
||||||
|
adopted across diverse platforms, including consumer devices.
|
||||||
|
|
||||||
|
Support overview
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
- The ROCm-supported version of llama.cpp is maintained in the official `https://github.com/ROCm/llama.cpp
|
||||||
|
<https://github.com/ROCm/llama.cpp>`__ repository, which differs from the
|
||||||
|
`https://github.com/ggml-org/llama.cpp <https://github.com/ggml-org/llama.cpp>`__ upstream repository.
|
||||||
|
|
||||||
|
- To get started and install llama.cpp on ROCm, use the prebuilt :ref:`Docker images <llama-cpp-docker-compat>`,
|
||||||
|
which include ROCm, llama.cpp, and all required dependencies.
|
||||||
|
|
||||||
|
- See the :doc:`ROCm llama.cpp installation guide <rocm-install-on-linux:install/3rd-party/llama-cpp-install>`
|
||||||
|
for installation and setup instructions.
|
||||||
|
|
||||||
|
- You can also consult the upstream `Installation guide <https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md>`__
|
||||||
|
for additional context.
|
||||||
|
|
||||||
|
.. _llama-cpp-docker-compat:
|
||||||
|
|
||||||
|
Compatibility matrix
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
.. |docker-icon| raw:: html
|
||||||
|
|
||||||
|
<i class="fab fa-docker"></i>
|
||||||
|
|
||||||
|
AMD validates and publishes `llama.cpp images <https://hub.docker.com/r/rocm/llama.cpp/tags>`__
|
||||||
|
with ROCm backends on Docker Hub. The following Docker image tags and associated
|
||||||
|
inventories represent the latest available llama.cpp versions from the official Docker Hub.
|
||||||
|
Click |docker-icon| to view the image on Docker Hub.
|
||||||
|
|
||||||
|
.. important::
|
||||||
|
|
||||||
|
Tag endings of ``_full``, ``_server``, and ``_light`` serve different purposes for entrypoints as follows:
|
||||||
|
|
||||||
|
- Full: This image includes both the main executable file and the tools to convert ``LLaMA`` models into ``ggml`` and convert into 4-bit quantization.
|
||||||
|
- Server: This image only includes the server executable file.
|
||||||
|
- Light: This image only includes the main executable file.
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
:header-rows: 1
|
||||||
|
:class: docker-image-compatibility
|
||||||
|
|
||||||
|
* - Full Docker
|
||||||
|
- Server Docker
|
||||||
|
- Light Docker
|
||||||
|
- llama.cpp
|
||||||
|
- ROCm
|
||||||
|
- Ubuntu
|
||||||
|
- GPU
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6652.amd0_rocm7.0.0_ubuntu24.04_full/images/sha256-a94f0c7a598cc6504ff9e8371c016d7a2f93e69bf54a36c870f9522567201f10g"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6652.amd0_rocm7.0.0_ubuntu24.04_server/images/sha256-be175932c3c96e882dfbc7e20e0e834f58c89c2925f48b222837ee929dfc47ee"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6652.amd0_rocm7.0.0_ubuntu24.04_light/images/sha256-d8ba0c70603da502c879b1f8010b439c8e7fa9f6cbdac8bbbbbba97cb41ebc9e"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- `b6652 <https://github.com/ROCm/llama.cpp/tree/release/b6652>`__
|
||||||
|
- `7.0.0 <https://repo.radeon.com/rocm/apt/7.0/>`__
|
||||||
|
- 24.04
|
||||||
|
- MI325X, MI300X, MI210
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6652.amd0_rocm7.0.0_ubuntu22.04_full/images/sha256-37582168984f25dce636cc7288298e06d94472ea35f65346b3541e6422b678ee"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6652.amd0_rocm7.0.0_ubuntu22.04_server/images/sha256-7e70578e6c3530c6591cc2c26da24a9ee68a20d318e12241de93c83224f83720"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6652.amd0_rocm7.0.0_ubuntu22.04_light/images/sha256-9a5231acf88b4a229677bc2c636ea3fe78a7a80f558bd80910b919855de93ad5"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- `b6652 <https://github.com/ROCm/llama.cpp/tree/release/b6652>`__
|
||||||
|
- `7.0.0 <https://repo.radeon.com/rocm/apt/7.0/>`__
|
||||||
|
- 22.04
|
||||||
|
- MI325X, MI300X, MI210
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.3_ubuntu24.04_full/images/sha256-5960fc850024a8a76451f9eaadd89b7e59981ae9f393b407310c1ddf18892577"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.3_ubuntu24.04_server/images/sha256-1b79775d9f546065a6aaf9ca426e1dd4ed4de0b8f6ee83687758cc05af6538e6"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.3_ubuntu24.04_light/images/sha256-8f863c4c2857ae42bebd64e4f1a0a1e7cc3ec4503f243e32b4a4dcad070ec361"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- `b6356 <https://github.com/ROCm/llama.cpp/tree/release/b6356>`__
|
||||||
|
- `6.4.3 <https://repo.radeon.com/rocm/apt/6.4.3/>`__
|
||||||
|
- 24.04
|
||||||
|
- MI325X, MI300X, MI210
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.3_ubuntu22.04_full/images/sha256-888879b3ee208f9247076d7984524b8d1701ac72611689e89854a1588bec9867"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.3_ubuntu22.04_server/images/sha256-90e4ff99a66743e33fd00728cd71a768588e5f5ef355aaa196669fe65ac70672"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.3_ubuntu22.04_light/images/sha256-bd447a049939cb99054f8fbf3f2352870fe906a75e2dc3339c845c08b9c53f9b"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- `b6356 <https://github.com/ROCm/llama.cpp/tree/release/b6356>`__
|
||||||
|
- `6.4.3 <https://repo.radeon.com/rocm/apt/6.4.3/>`__
|
||||||
|
- 22.04
|
||||||
|
- MI325X, MI300X, MI210
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.2_ubuntu24.04_full/images/sha256-5b3a1bc4889c1fcade434b937fbf9cc1c22ff7dc0317c130339b0c9238bc88c4"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.2_ubuntu24.04_server/images/sha256-5228ff99d0f627a9032d668f4381b2e80dc1e301adc3e0821f26d8354b175271"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.2_ubuntu24.04_light/images/sha256-b12723b332a826a89b7252dddf868cbe4d1a869562fc4aa4032f59e1a683b968"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- `b6356 <https://github.com/ROCm/llama.cpp/tree/release/b6356>`__
|
||||||
|
- `6.4.2 <https://repo.radeon.com/rocm/apt/6.4.2/>`__
|
||||||
|
- 24.04
|
||||||
|
- MI325X, MI300X, MI210
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.2_ubuntu22.04_full/images/sha256-cd6e21a6a73f59b35dd5309b09dd77654a94d783bf13a55c14eb8dbf8e9c2615"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.2_ubuntu22.04_server/images/sha256-c2b4689ab2c47e6626e8fea22d7a63eb03d47c0fde9f5ef8c9f158d15c423e58"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.2_ubuntu22.04_light/images/sha256-1acc28f29ed87db9cbda629cb29e1989b8219884afe05f9105522be929e94da4"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- `b6356 <https://github.com/ROCm/llama.cpp/tree/release/b6356>`__
|
||||||
|
- `6.4.2 <https://repo.radeon.com/rocm/apt/6.4.2/>`__
|
||||||
|
- 22.04
|
||||||
|
- MI325X, MI300X, MI210
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.1_ubuntu24.04_full/images/sha256-2f8ae8a44510d96d52dea6cb398b224f7edeb7802df7ec488c6f63d206b3cdc9"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.1_ubuntu24.04_server/images/sha256-fece497ff9f4a28b12f645de52766941da8ead8471aa1ea84b61d4b4568e51f2"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.1_ubuntu24.04_light/images/sha256-3e14352fa6f8c6128b23cf9342531c20dbfb522550b626e09d83b260a1947022"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- `b6356 <https://github.com/ROCm/llama.cpp/tree/release/b6356>`__
|
||||||
|
- `6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`__
|
||||||
|
- 24.04
|
||||||
|
- MI325X, MI300X, MI210
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.1_ubuntu22.04_full/images/sha256-80763062ef0bec15038c35fd01267f1fc99a5dd171d4b48583cc668b15efad69"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.1_ubuntu22.04_server/images/sha256-db2a6c957555ed83b819bbc54aea884a93192da0fb512dae63d32e0dc4e8ab8f"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b6356_rocm6.4.1_ubuntu22.04_light/images/sha256-c6dbb07cc655fb079d5216e4b77451cb64a9daa0585d23b6fb8b32cb22021197"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- `b6356 <https://github.com/ROCm/llama.cpp/tree/release/b6356>`__
|
||||||
|
- `6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`__
|
||||||
|
- 22.04
|
||||||
|
- MI325X, MI300X, MI210
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b5997_rocm6.4.0_ubuntu24.04_full/images/sha256-f78f6c81ab2f8e957469415fe2370a1334fe969c381d1fe46050c85effaee9d5"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b5997_rocm6.4.0_ubuntu24.04_server/images/sha256-275ad9e18f292c26a00a2de840c37917e98737a88a3520bdc35fd3fc5c9a6a9b"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/llama.cpp/llama.cpp-b5997_rocm6.4.0_ubuntu24.04_light/images/sha256-cc324e6faeedf0e400011f07b49d2dc41a16bae257b2b7befa0f4e2e97231320"><i class="fab fa-docker fa-lg"></i> rocm/llama.cpp</a>
|
||||||
|
- `b5997 <https://github.com/ROCm/llama.cpp/tree/release/b5997>`__
|
||||||
|
- `6.4.0 <https://repo.radeon.com/rocm/apt/6.4/>`__
|
||||||
|
- 24.04
|
||||||
|
- MI300X, MI210
|
||||||
|
|
||||||
|
.. _llama-cpp-key-rocm-libraries:
|
||||||
|
|
||||||
|
Key ROCm libraries for llama.cpp
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
llama.cpp functionality on ROCm is determined by its underlying library
|
||||||
|
dependencies. These ROCm components affect the capabilities, performance, and
|
||||||
|
feature set available to developers. Ensure you have the required libraries for
|
||||||
|
your corresponding ROCm version.
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - ROCm library
|
||||||
|
- ROCm 7.0.0 version
|
||||||
|
- ROCm 6.4.x version
|
||||||
|
- Purpose
|
||||||
|
- Usage
|
||||||
|
* - `hipBLAS <https://github.com/ROCm/hipBLAS>`__
|
||||||
|
- 3.0.0
|
||||||
|
- 2.4.0
|
||||||
|
- Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for
|
||||||
|
matrix and vector operations.
|
||||||
|
- Supports operations such as matrix multiplication, matrix-vector
|
||||||
|
products, and tensor contractions. Utilized in both dense and batched
|
||||||
|
linear algebra operations.
|
||||||
|
* - `hipBLASLt <https://github.com/ROCm/hipBLASLt>`__
|
||||||
|
- 1.0.0
|
||||||
|
- 0.12.0
|
||||||
|
- hipBLASLt is an extension of the hipBLAS library, providing additional
|
||||||
|
features like epilogues fused into the matrix multiplication kernel or
|
||||||
|
use of integer tensor cores.
|
||||||
|
- By setting the flag ``ROCBLAS_USE_HIPBLASLT``, you can dispatch hipblasLt
|
||||||
|
kernels where possible.
|
||||||
|
* - `rocWMMA <https://github.com/ROCm/rocWMMA>`__
|
||||||
|
- 2.0.0
|
||||||
|
- 1.7.0
|
||||||
|
- Accelerates warp-level matrix-multiply and matrix-accumulate to speed up matrix
|
||||||
|
multiplication (GEMM) and accumulation operations with mixed precision
|
||||||
|
support.
|
||||||
|
- Can be used to enhance the flash attention performance on AMD compute, by enabling
|
||||||
|
the flag during compile time.
|
||||||
|
|
||||||
|
.. _llama-cpp-uses-recommendations:
|
||||||
|
|
||||||
|
Use cases and recommendations
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
llama.cpp can be applied in a variety of scenarios, particularly when you need to meet one or more of the following requirements:
|
||||||
|
|
||||||
|
- Plain C/C++ implementation with no external dependencies
|
||||||
|
- Support for 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory usage
|
||||||
|
- Custom HIP (Heterogeneous-compute Interface for Portability) kernels for running large language models (LLMs) on AMD GPUs (graphics processing units)
|
||||||
|
- CPU (central processing unit) + GPU (graphics processing unit) hybrid inference for partially accelerating models larger than the total available VRAM (video random-access memory)
|
||||||
|
|
||||||
|
llama.cpp is also used in a range of real-world applications, including:
|
||||||
|
|
||||||
|
- Games such as `Lucy's Labyrinth <https://github.com/MorganRO8/Lucys_Labyrinth>`__:
|
||||||
|
A simple maze game where AI-controlled agents attempt to trick the player.
|
||||||
|
- Tools such as `Styled Lines <https://marketplace.unity.com/packages/tools/ai-ml-integration/style-text-webgl-ios-stand-alone-llm-llama-cpp-wrapper-292902>`__:
|
||||||
|
A proprietary, asynchronous inference wrapper for Unity3D game development, including pre-built mobile and web platform wrappers and a model example.
|
||||||
|
- Various other AI applications use llama.cpp as their inference engine;
|
||||||
|
for a detailed list, see the `user interfaces (UIs) section <https://github.com/ggml-org/llama.cpp?tab=readme-ov-file#description>`__.
|
||||||
|
|
||||||
|
For more use cases and recommendations, refer to the `AMD ROCm blog <https://rocm.blogs.amd.com/>`__,
|
||||||
|
where you can search for llama.cpp examples and best practices to optimize your workloads on AMD GPUs.
|
||||||
|
|
||||||
|
- The `Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration <https://rocm.blogs.amd.com/ecosystems-and-partners/llama-cpp/README.html>`__
|
||||||
|
blog post outlines how the open-source llama.cpp framework enables efficient LLM inference—including interactive inference with ``llama-cli``,
|
||||||
|
server deployment with ``llama-server``, GGUF model preparation and quantization, performance benchmarking, and optimizations tailored for
|
||||||
|
AMD Instinct GPUs within the ROCm ecosystem.
|
||||||
|
|
||||||
|
|
||||||
|
Previous versions
|
||||||
|
===============================================================================
|
||||||
|
See :doc:`rocm-install-on-linux:install/3rd-party/previous-versions/llama-cpp-history` to find documentation for previous releases
|
||||||
|
of the ``ROCm/llama.cpp`` Docker image.
|
||||||
104
docs/compatibility/ml-compatibility/megablocks-compatibility.rst
Normal file
104
docs/compatibility/ml-compatibility/megablocks-compatibility.rst
Normal file
@@ -0,0 +1,104 @@
|
|||||||
|
:orphan:
|
||||||
|
|
||||||
|
.. meta::
|
||||||
|
:description: Megablocks compatibility
|
||||||
|
:keywords: GPU, megablocks, deep learning, framework compatibility
|
||||||
|
|
||||||
|
.. version-set:: rocm_version latest
|
||||||
|
|
||||||
|
********************************************************************************
|
||||||
|
Megablocks compatibility
|
||||||
|
********************************************************************************
|
||||||
|
|
||||||
|
`Megablocks <https://github.com/databricks/megablocks>`__ is a lightweight library
|
||||||
|
for mixture-of-experts `(MoE) <https://huggingface.co/blog/moe>`__ training.
|
||||||
|
The core of the system is efficient "dropless-MoE" and standard MoE layers.
|
||||||
|
Megablocks is integrated with `https://github.com/stanford-futuredata/Megatron-LM
|
||||||
|
<https://github.com/stanford-futuredata/Megatron-LM>`__,
|
||||||
|
where data and pipeline parallel training of MoEs is supported.
|
||||||
|
|
||||||
|
Support overview
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
- The ROCm-supported version of Megablocks is maintained in the official `https://github.com/ROCm/megablocks
|
||||||
|
<https://github.com/ROCm/megablocks>`__ repository, which differs from the
|
||||||
|
`https://github.com/stanford-futuredata/Megatron-LM <https://github.com/stanford-futuredata/Megatron-LM>`__ upstream repository.
|
||||||
|
|
||||||
|
- To get started and install Megablocks on ROCm, use the prebuilt :ref:`Docker image <megablocks-docker-compat>`,
|
||||||
|
which includes ROCm, Megablocks, and all required dependencies.
|
||||||
|
|
||||||
|
- See the :doc:`ROCm Megablocks installation guide <rocm-install-on-linux:install/3rd-party/megablocks-install>`
|
||||||
|
for installation and setup instructions.
|
||||||
|
|
||||||
|
- You can also consult the upstream `Installation guide <https://github.com/databricks/megablocks>`__
|
||||||
|
for additional context.
|
||||||
|
|
||||||
|
.. _megablocks-docker-compat:
|
||||||
|
|
||||||
|
Compatibility matrix
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
.. |docker-icon| raw:: html
|
||||||
|
|
||||||
|
<i class="fab fa-docker"></i>
|
||||||
|
|
||||||
|
AMD validates and publishes `Megablocks images <https://hub.docker.com/r/rocm/megablocks/tags>`__
|
||||||
|
with ROCm backends on Docker Hub. The following Docker image tag and associated
|
||||||
|
inventories represent the latest available Megablocks version from the official Docker Hub.
|
||||||
|
Click |docker-icon| to view the image on Docker Hub.
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
:header-rows: 1
|
||||||
|
:class: docker-image-compatibility
|
||||||
|
|
||||||
|
* - Docker image
|
||||||
|
- ROCm
|
||||||
|
- Megablocks
|
||||||
|
- PyTorch
|
||||||
|
- Ubuntu
|
||||||
|
- Python
|
||||||
|
- GPU
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/megablocks/megablocks-0.7.0_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0/images/sha256-372ff89b96599019b8f5f9db469c84add2529b713456781fa62eb9a148659ab4"><i class="fab fa-docker fa-lg"></i> rocm/megablocks</a>
|
||||||
|
- `6.3.0 <https://repo.radeon.com/rocm/apt/6.3/>`_
|
||||||
|
- `0.7.0 <https://github.com/databricks/megablocks/releases/tag/v0.7.0>`_
|
||||||
|
- `2.4.0 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
|
||||||
|
- 24.04
|
||||||
|
- `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
|
||||||
|
- MI300X
|
||||||
|
|
||||||
|
Supported models and features with ROCm 6.3.0
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
This section summarizes the Megablocks features supported by ROCm.
|
||||||
|
|
||||||
|
* Distributed Pre-training
|
||||||
|
* Activation Checkpointing and Recomputation
|
||||||
|
* Distributed Optimizer
|
||||||
|
* Mixture-of-Experts
|
||||||
|
* dropless-Mixture-of-Experts
|
||||||
|
|
||||||
|
.. _megablocks-recommendations:
|
||||||
|
|
||||||
|
Use cases and recommendations
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
* The `Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs
|
||||||
|
<https://rocm.blogs.amd.com/artificial-intelligence/megablocks/README.html>`__
|
||||||
|
blog post guides how to leverage the ROCm platform for pre-training using the
|
||||||
|
Megablocks framework. It introduces a streamlined approach for training Mixture-of-Experts
|
||||||
|
(MoE) models using the Megablocks library on AMD hardware. Focusing on GPT-2, it
|
||||||
|
demonstrates how block-sparse computations can enhance scalability and efficiency in MoE
|
||||||
|
training. The guide provides step-by-step instructions for setting up the environment,
|
||||||
|
including cloning the repository, building the Docker image, and running the training container.
|
||||||
|
Additionally, it offers insights into utilizing the ``oscar-1GB.json`` dataset for pre-training
|
||||||
|
language models. By leveraging Megablocks and the ROCm platform, you can optimize your MoE
|
||||||
|
training workflows for large-scale transformer models.
|
||||||
|
|
||||||
|
It features how to pre-process datasets and how to begin pre-training on AMD GPUs through:
|
||||||
|
|
||||||
|
* Single-GPU pre-training
|
||||||
|
* Multi-GPU pre-training
|
||||||
|
|
||||||
@@ -399,20 +399,6 @@ with ROCm.
|
|||||||
|
|
||||||
**Note:** Only official release exists.
|
**Note:** Only official release exists.
|
||||||
|
|
||||||
Key features and enhancements for PyTorch 2.9 with ROCm 7.2.1
|
|
||||||
================================================================================
|
|
||||||
- Added Triton 3.6.x performance optimization for reduction, POI, and GEMM kernels.
|
|
||||||
|
|
||||||
- Updated native reduction kernel config for better performance on AMD GPUs.
|
|
||||||
|
|
||||||
- Optimized single-block TopK kernels with warp-level compaction.
|
|
||||||
|
|
||||||
- Optimized Radix Select by caching data on shared memory.
|
|
||||||
|
|
||||||
- Optimized Flex-Attention occupancy for head_dim=128.
|
|
||||||
|
|
||||||
- Enabled hipSOLVER path for linalg operations - cholesky, lstsq, and gels.
|
|
||||||
|
|
||||||
Key features and enhancements for PyTorch 2.9 with ROCm 7.1.1
|
Key features and enhancements for PyTorch 2.9 with ROCm 7.1.1
|
||||||
================================================================================
|
================================================================================
|
||||||
- Scaled Dot Product Attention (SDPA) upgraded to use AOTriton version 0.11b.
|
- Scaled Dot Product Attention (SDPA) upgraded to use AOTriton version 0.11b.
|
||||||
|
|||||||
114
docs/compatibility/ml-compatibility/ray-compatibility.rst
Normal file
114
docs/compatibility/ml-compatibility/ray-compatibility.rst
Normal file
@@ -0,0 +1,114 @@
|
|||||||
|
:orphan:
|
||||||
|
|
||||||
|
.. meta::
|
||||||
|
:description: Ray compatibility
|
||||||
|
:keywords: GPU, Ray, deep learning, framework compatibility
|
||||||
|
|
||||||
|
.. version-set:: rocm_version latest
|
||||||
|
|
||||||
|
*******************************************************************************
|
||||||
|
Ray compatibility
|
||||||
|
*******************************************************************************
|
||||||
|
|
||||||
|
Ray is a unified framework for scaling AI and Python applications from your laptop
|
||||||
|
to a full cluster, without changing your code. Ray consists of `a core distributed
|
||||||
|
runtime <https://docs.ray.io/en/latest/ray-core/walkthrough.html>`__ and a set of
|
||||||
|
`AI libraries <https://docs.ray.io/en/latest/ray-air/getting-started.html>`__ for
|
||||||
|
simplifying machine learning computations.
|
||||||
|
|
||||||
|
Ray is a general-purpose framework that runs many types of workloads efficiently.
|
||||||
|
Any Python application can be scaled with Ray, without extra infrastructure.
|
||||||
|
|
||||||
|
Support overview
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
- The ROCm-supported version of Ray is maintained in the official `https://github.com/ROCm/ray
|
||||||
|
<https://github.com/ROCm/ray>`__ repository, which differs from the
|
||||||
|
`https://github.com/ray-project/ray <https://github.com/ray-project/ray>`__ upstream repository.
|
||||||
|
|
||||||
|
- To get started and install Ray on ROCm, use the prebuilt :ref:`Docker image <ray-docker-compat>`,
|
||||||
|
which includes ROCm, Ray, and all required dependencies.
|
||||||
|
|
||||||
|
- See the :doc:`ROCm Ray installation guide <rocm-install-on-linux:install/3rd-party/ray-install>`
|
||||||
|
for installation and setup instructions.
|
||||||
|
|
||||||
|
- You can also consult the upstream `Installation guide <https://docs.ray.io/en/latest/ray-overview/installation.html>`__
|
||||||
|
for additional context.
|
||||||
|
|
||||||
|
.. _ray-docker-compat:
|
||||||
|
|
||||||
|
Compatibility matrix
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
.. |docker-icon| raw:: html
|
||||||
|
|
||||||
|
<i class="fab fa-docker"></i>
|
||||||
|
|
||||||
|
AMD validates and publishes `ROCm Ray Docker images <https://hub.docker.com/r/rocm/ray/tags>`__
|
||||||
|
with ROCm backends on Docker Hub. The following Docker image tags and
|
||||||
|
associated inventories represent the latest Ray version from the official Docker Hub.
|
||||||
|
Click |docker-icon| to view the image on Docker Hub.
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
:header-rows: 1
|
||||||
|
:class: docker-image-compatibility
|
||||||
|
|
||||||
|
* - Docker image
|
||||||
|
- ROCm
|
||||||
|
- Ray
|
||||||
|
- Pytorch
|
||||||
|
- Ubuntu
|
||||||
|
- Python
|
||||||
|
- GPU
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/ray/ray-2.51.1_rocm7.0.0_ubuntu22.04_py3.12_pytorch2.9.0/images/sha256-a02f6766b4ba406f88fd7e85707ec86c04b569834d869a08043ec9bcbd672168"><i class="fab fa-docker fa-lg"></i> rocm/ray</a>
|
||||||
|
- `7.0.0 <https://repo.radeon.com/rocm/apt/7.0/>`__
|
||||||
|
- `2.51.1 <https://github.com/ROCm/ray/tree/release/2.51.1>`__
|
||||||
|
- 2.9.0a0+git1c57644
|
||||||
|
- 22.04
|
||||||
|
- `3.12.12 <https://www.python.org/downloads/release/python-31212/>`__
|
||||||
|
- MI300X
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/ray/ray-2.48.0.post0_rocm6.4.1_ubuntu24.04_py3.12_pytorch2.6.0/images/sha256-0d166fe6bdced38338c78eedfb96eff92655fb797da3478a62dd636365133cc0"><i class="fab fa-docker fa-lg"></i> rocm/ray</a>
|
||||||
|
- `6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`__
|
||||||
|
- `2.48.0.post0 <https://github.com/ROCm/ray/tree/release/2.48.0.post0>`__
|
||||||
|
- 2.6.0+git684f6f2
|
||||||
|
- 24.04
|
||||||
|
- `3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
|
||||||
|
- MI300X, MI210
|
||||||
|
|
||||||
|
Use cases and recommendations
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
* The `Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm
|
||||||
|
Integration <https://rocm.blogs.amd.com/artificial-intelligence/verl-large-scale/README.html>`__
|
||||||
|
blog provides an overview of Volcano Engine Reinforcement Learning (verl)
|
||||||
|
for large language models (LLMs) and discusses its benefits in large-scale
|
||||||
|
reinforcement learning from human feedback (RLHF). It uses Ray as part of a
|
||||||
|
hybrid orchestration engine to schedule and coordinate training and inference
|
||||||
|
tasks in parallel, enabling optimized resource utilization and potential overlap
|
||||||
|
between these phases. This dynamic resource allocation strategy significantly
|
||||||
|
improves overall system efficiency. The blog presents verl’s performance results,
|
||||||
|
focusing on throughput and convergence accuracy achieved on AMD Instinct™ MI300X
|
||||||
|
GPUs. Follow this guide to get started with verl on AMD Instinct GPUs and
|
||||||
|
accelerate your RLHF training with ROCm-optimized performance.
|
||||||
|
|
||||||
|
* The `Exploring Use Cases for Scalable AI: Implementing Ray with ROCm Support for Efficient ML Workflows
|
||||||
|
<https://rocm.blogs.amd.com/artificial-intelligence/rocm-ray/README.html>`__
|
||||||
|
blog post describes key use cases such as training and inference for large language models (LLMs),
|
||||||
|
model serving, hyperparameter tuning, reinforcement learning, and the orchestration of large-scale
|
||||||
|
workloads using Ray in the ROCm environment.
|
||||||
|
|
||||||
|
For more use cases and recommendations, see the AMD GPU tabs in the `Accelerator Support
|
||||||
|
topic <https://docs.ray.io/en/latest/ray-core/scheduling/accelerators.html#accelerator-support>`__
|
||||||
|
of the Ray core documentation and refer to the `AMD ROCm blog <https://rocm.blogs.amd.com/>`__,
|
||||||
|
where you can search for Ray examples and best practices to optimize your workloads on AMD GPUs.
|
||||||
|
|
||||||
|
Previous versions
|
||||||
|
===============================================================================
|
||||||
|
See :doc:`rocm-install-on-linux:install/3rd-party/previous-versions/ray-history` to find documentation for previous releases
|
||||||
|
of the ``ROCm/ray`` Docker image.
|
||||||
@@ -0,0 +1,116 @@
|
|||||||
|
:orphan:
|
||||||
|
|
||||||
|
.. meta::
|
||||||
|
:description: Stanford Megatron-LM compatibility
|
||||||
|
:keywords: Stanford, Megatron-LM, deep learning, framework compatibility
|
||||||
|
|
||||||
|
.. version-set:: rocm_version latest
|
||||||
|
|
||||||
|
********************************************************************************
|
||||||
|
Stanford Megatron-LM compatibility
|
||||||
|
********************************************************************************
|
||||||
|
|
||||||
|
Stanford Megatron-LM is a large-scale language model training framework developed
|
||||||
|
by NVIDIA at `https://github.com/NVIDIA/Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_.
|
||||||
|
It is designed to train massive transformer-based language models efficiently by model
|
||||||
|
and data parallelism.
|
||||||
|
|
||||||
|
It provides efficient tensor, pipeline, and sequence-based model parallelism for
|
||||||
|
pre-training transformer-based language models such as GPT (Decoder Only), BERT
|
||||||
|
(Encoder Only), and T5 (Encoder-Decoder).
|
||||||
|
|
||||||
|
Support overview
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
- The ROCm-supported version of Stanford Megatron-LM is maintained in the official `https://github.com/ROCm/Stanford-Megatron-LM
|
||||||
|
<https://github.com/ROCm/Stanford-Megatron-LM>`__ repository, which differs from the
|
||||||
|
`https://github.com/stanford-futuredata/Megatron-LM <https://github.com/stanford-futuredata/Megatron-LM>`__ upstream repository.
|
||||||
|
|
||||||
|
- To get started and install Stanford Megatron-LM on ROCm, use the prebuilt :ref:`Docker image <megatron-lm-docker-compat>`,
|
||||||
|
which includes ROCm, Stanford Megatron-LM, and all required dependencies.
|
||||||
|
|
||||||
|
- See the :doc:`ROCm Stanford Megatron-LM installation guide <rocm-install-on-linux:install/3rd-party/stanford-megatron-lm-install>`
|
||||||
|
for installation and setup instructions.
|
||||||
|
|
||||||
|
- You can also consult the upstream `Installation guide <https://github.com/NVIDIA/Megatron-LM>`__
|
||||||
|
for additional context.
|
||||||
|
|
||||||
|
.. _megatron-lm-docker-compat:
|
||||||
|
|
||||||
|
Compatibility matrix
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
.. |docker-icon| raw:: html
|
||||||
|
|
||||||
|
<i class="fab fa-docker"></i>
|
||||||
|
|
||||||
|
AMD validates and publishes `Stanford Megatron-LM images <https://hub.docker.com/r/rocm/stanford-megatron-lm/tags>`_
|
||||||
|
with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated
|
||||||
|
inventories represent the latest Stanford Megatron-LM version from the official Docker Hub.
|
||||||
|
Click |docker-icon| to view the image on Docker Hub.
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
:header-rows: 1
|
||||||
|
:class: docker-image-compatibility
|
||||||
|
|
||||||
|
* - Docker image
|
||||||
|
- ROCm
|
||||||
|
- Stanford Megatron-LM
|
||||||
|
- PyTorch
|
||||||
|
- Ubuntu
|
||||||
|
- Python
|
||||||
|
- GPU
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/stanford-megatron-lm/stanford-megatron-lm85f95ae_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0/images/sha256-070556f078be10888a1421a2cb4f48c29f28b02bfeddae02588d1f7fc02a96a6"><i class="fab fa-docker fa-lg"></i> rocm/stanford-megatron-lm</a>
|
||||||
|
|
||||||
|
- `6.3.0 <https://repo.radeon.com/rocm/apt/6.3/>`_
|
||||||
|
- `85f95ae <https://github.com/stanford-futuredata/Megatron-LM/commit/85f95aef3b648075fe6f291c86714fdcbd9cd1f5>`_
|
||||||
|
- `2.4.0 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
|
||||||
|
- 24.04
|
||||||
|
- `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
|
||||||
|
- MI300X
|
||||||
|
|
||||||
|
Supported models and features with ROCm 6.3.0
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
This section details models & features that are supported by the ROCm version on Stanford Megatron-LM.
|
||||||
|
|
||||||
|
Models:
|
||||||
|
|
||||||
|
* BERT
|
||||||
|
* GPT
|
||||||
|
* T5
|
||||||
|
* ICT
|
||||||
|
|
||||||
|
Features:
|
||||||
|
|
||||||
|
* Distributed Pre-training
|
||||||
|
* Activation Checkpointing and Recomputation
|
||||||
|
* Distributed Optimizer
|
||||||
|
* Mixture-of-Experts
|
||||||
|
|
||||||
|
.. _megatron-lm-recommendations:
|
||||||
|
|
||||||
|
Use cases and recommendations
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
The following blog post mentions Megablocks, but you can run Stanford Megatron-LM with the same steps to pre-process datasets on AMD GPUs:
|
||||||
|
|
||||||
|
* The `Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs
|
||||||
|
<https://rocm.blogs.amd.com/artificial-intelligence/megablocks/README.html>`__
|
||||||
|
blog post guides how to leverage the ROCm platform for pre-training using the
|
||||||
|
Megablocks framework. It introduces a streamlined approach for training Mixture-of-Experts
|
||||||
|
(MoE) models using the Megablocks library on AMD hardware. Focusing on GPT-2, it
|
||||||
|
demonstrates how block-sparse computations can enhance scalability and efficiency in MoE
|
||||||
|
training. The guide provides step-by-step instructions for setting up the environment,
|
||||||
|
including cloning the repository, building the Docker image, and running the training container.
|
||||||
|
Additionally, it offers insights into utilizing the ``oscar-1GB.json`` dataset for pre-training
|
||||||
|
language models. By leveraging Megablocks and the ROCm platform, you can optimize your MoE
|
||||||
|
training workflows for large-scale transformer models.
|
||||||
|
|
||||||
|
It features how to pre-process datasets and how to begin pre-training on AMD GPUs through:
|
||||||
|
|
||||||
|
* Single-GPU pre-training
|
||||||
|
* Multi-GPU pre-training
|
||||||
118
docs/compatibility/ml-compatibility/verl-compatibility.rst
Normal file
118
docs/compatibility/ml-compatibility/verl-compatibility.rst
Normal file
@@ -0,0 +1,118 @@
|
|||||||
|
:orphan:
|
||||||
|
|
||||||
|
.. meta::
|
||||||
|
:description: verl compatibility
|
||||||
|
:keywords: GPU, verl, deep learning, framework compatibility
|
||||||
|
|
||||||
|
.. version-set:: rocm_version latest
|
||||||
|
|
||||||
|
*******************************************************************************
|
||||||
|
verl compatibility
|
||||||
|
*******************************************************************************
|
||||||
|
|
||||||
|
Volcano Engine Reinforcement Learning for LLMs (`verl <https://verl.readthedocs.io/en/latest/>`__)
|
||||||
|
is a reinforcement learning framework designed for large language models (LLMs).
|
||||||
|
verl offers a scalable, open-source fine-tuning solution by using a hybrid programming model
|
||||||
|
that makes it easy to define and run complex post-training dataflows efficiently.
|
||||||
|
|
||||||
|
Its modular APIs separate computation from data, allowing smooth integration with other frameworks.
|
||||||
|
It also supports flexible model placement across GPUs for efficient scaling on different cluster sizes.
|
||||||
|
verl achieves high training and generation throughput by building on existing LLM frameworks.
|
||||||
|
Its 3D-HybridEngine reduces memory use and communication overhead when switching between training
|
||||||
|
and inference, improving overall performance.
|
||||||
|
|
||||||
|
Support overview
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
- The ROCm-supported version of verl is maintained in the official `https://github.com/ROCm/verl
|
||||||
|
<https://github.com/ROCm/verl>`__ repository, which differs from the
|
||||||
|
`https://github.com/volcengine/verl <https://github.com/volcengine/verl>`__ upstream repository.
|
||||||
|
|
||||||
|
- To get started and install verl on ROCm, use the prebuilt :ref:`Docker image <verl-docker-compat>`,
|
||||||
|
which includes ROCm, verl, and all required dependencies.
|
||||||
|
|
||||||
|
- See the :doc:`ROCm verl installation guide <rocm-install-on-linux:install/3rd-party/verl-install>`
|
||||||
|
for installation and setup instructions.
|
||||||
|
|
||||||
|
- You can also consult the upstream `verl documentation <https://verl.readthedocs.io/en/latest/>`__
|
||||||
|
for additional context.
|
||||||
|
|
||||||
|
.. _verl-docker-compat:
|
||||||
|
|
||||||
|
Compatibility matrix
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
.. |docker-icon| raw:: html
|
||||||
|
|
||||||
|
<i class="fab fa-docker"></i>
|
||||||
|
|
||||||
|
AMD validates and publishes `verl Docker images <https://hub.docker.com/r/rocm/verl/tags>`_
|
||||||
|
with ROCm backends on Docker Hub. The following Docker image tag and associated inventories
|
||||||
|
represent the latest verl version from the official Docker Hub.
|
||||||
|
Click |docker-icon| to view the image on Docker Hub.
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
:header-rows: 1
|
||||||
|
:class: docker-image-compatibility
|
||||||
|
|
||||||
|
* - Docker image
|
||||||
|
- ROCm
|
||||||
|
- verl
|
||||||
|
- Ubuntu
|
||||||
|
- PyTorch
|
||||||
|
- Python
|
||||||
|
- vllm
|
||||||
|
- GPU
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/verl/verl-0.6.0.amd0_rocm7.0_vllm0.11.0.dev/images/sha256-f70a3ebc94c1f66de42a2fcc3f8a6a8d6d0881eb0e65b6958d7d6d24b3eecb0d"><i class="fab fa-docker fa-lg"></i> rocm/verl</a>
|
||||||
|
- `7.0.0 <https://repo.radeon.com/rocm/apt/7.0/>`__
|
||||||
|
- `0.6.0 <https://github.com/volcengine/verl/releases/tag/v0.6.0>`__
|
||||||
|
- 22.04
|
||||||
|
- `2.9.0 <https://github.com/ROCm/pytorch/tree/release/2.9-rocm7.x-gfx115x>`__
|
||||||
|
- `3.12.11 <https://www.python.org/downloads/release/python-31211/>`__
|
||||||
|
- `0.11.0 <https://github.com/vllm-project/vllm/releases/tag/v0.11.0>`__
|
||||||
|
- MI300X
|
||||||
|
|
||||||
|
* - .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://hub.docker.com/layers/rocm/verl/verl-0.3.0.post0_rocm6.2_vllm0.6.3/images/sha256-cbe423803fd7850448b22444176bee06f4dcf22cd3c94c27732752d3a39b04b2"><i class="fab fa-docker fa-lg"></i> rocm/verl</a>
|
||||||
|
- `6.2.0 <https://repo.radeon.com/rocm/apt/6.2/>`__
|
||||||
|
- `0.3.0.post0 <https://github.com/volcengine/verl/releases/tag/v0.3.0.post0>`__
|
||||||
|
- 20.04
|
||||||
|
- `2.5.0 <https://github.com/ROCm/pytorch/tree/release/2.5>`__
|
||||||
|
- `3.9.19 <https://www.python.org/downloads/release/python-3919/>`__
|
||||||
|
- `0.6.3 <https://github.com/vllm-project/vllm/releases/tag/v0.6.3>`__
|
||||||
|
- MI300X
|
||||||
|
|
||||||
|
.. _verl-supported_features:
|
||||||
|
|
||||||
|
Supported modules with verl on ROCm
|
||||||
|
===============================================================================
|
||||||
|
|
||||||
|
The following GPU-accelerated modules are supported with verl on ROCm:
|
||||||
|
|
||||||
|
- ``FSDP``: Training engine
|
||||||
|
- ``vllm``: Inference engine
|
||||||
|
|
||||||
|
.. _verl-recommendations:
|
||||||
|
|
||||||
|
Use cases and recommendations
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
* The benefits of verl in large-scale reinforcement learning from human feedback
|
||||||
|
(RLHF) are discussed in the `Reinforcement Learning from Human Feedback on AMD
|
||||||
|
GPUs with verl and ROCm Integration <https://rocm.blogs.amd.com/artificial-intelligence/verl-large-scale/README.html>`__
|
||||||
|
blog. The blog post outlines how the Volcano Engine Reinforcement Learning
|
||||||
|
(verl) framework integrates with the AMD ROCm platform to optimize training on
|
||||||
|
AMD Instinct™ GPUs. The guide details the process of building a Docker image,
|
||||||
|
setting up single-node and multi-node training environments, and highlights
|
||||||
|
performance benchmarks demonstrating improved throughput and convergence accuracy.
|
||||||
|
This resource serves as a comprehensive starting point for deploying verl on AMD GPUs,
|
||||||
|
facilitating efficient RLHF training workflows.
|
||||||
|
|
||||||
|
Previous versions
|
||||||
|
===============================================================================
|
||||||
|
See :doc:`rocm-install-on-linux:install/3rd-party/previous-versions/verl-history` to find documentation for previous releases
|
||||||
|
of the ``ROCm/verl`` Docker image.
|
||||||
@@ -66,7 +66,7 @@ architecture.
|
|||||||
* [AMD Instinct MI50/Vega 7nm ISA](https://www.amd.com/system/files/TechDocs/vega-7nm-shader-instruction-set-architecture.pdf)
|
* [AMD Instinct MI50/Vega 7nm ISA](https://www.amd.com/system/files/TechDocs/vega-7nm-shader-instruction-set-architecture.pdf)
|
||||||
* [AMD Instinct MI25/Vega ISA](https://www.amd.com/system/files/TechDocs/vega-shader-instruction-set-architecture.pdf)
|
* [AMD Instinct MI25/Vega ISA](https://www.amd.com/system/files/TechDocs/vega-shader-instruction-set-architecture.pdf)
|
||||||
* [AMD GCN3 ISA](https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf)
|
* [AMD GCN3 ISA](https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf)
|
||||||
* AMD Vega Architecture White Paper
|
* [AMD Vega Architecture White Paper](https://en.wikichip.org/w/images/a/a1/vega-whitepaper.pdf)
|
||||||
|
|
||||||
:::
|
:::
|
||||||
|
|
||||||
|
|||||||
26
docs/conf.py
26
docs/conf.py
@@ -81,7 +81,7 @@ latex_elements = {
|
|||||||
}
|
}
|
||||||
|
|
||||||
html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "rocm.docs.amd.com")
|
html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "rocm.docs.amd.com")
|
||||||
html_context = {"docs_header_version": "7.2.1"}
|
html_context = {"docs_header_version": "7.1.1"}
|
||||||
if os.environ.get("READTHEDOCS", "") == "True":
|
if os.environ.get("READTHEDOCS", "") == "True":
|
||||||
html_context["READTHEDOCS"] = True
|
html_context["READTHEDOCS"] = True
|
||||||
|
|
||||||
@@ -92,22 +92,28 @@ official_branch = run(["git", "rev-parse", "--abbrev-ref", "HEAD"], capture_outp
|
|||||||
project = "ROCm Documentation"
|
project = "ROCm Documentation"
|
||||||
project_path = os.path.abspath(".").replace("\\", "/")
|
project_path = os.path.abspath(".").replace("\\", "/")
|
||||||
author = "Advanced Micro Devices, Inc."
|
author = "Advanced Micro Devices, Inc."
|
||||||
copyright = "Copyright (c) 2026 Advanced Micro Devices, Inc. All rights reserved."
|
copyright = "Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved."
|
||||||
version = "7.2.1"
|
version = "7.2.0"
|
||||||
release = "7.2.1"
|
release = "7.2.0"
|
||||||
setting_all_article_info = True
|
setting_all_article_info = True
|
||||||
all_article_info_os = ["linux", "windows"]
|
all_article_info_os = ["linux", "windows"]
|
||||||
all_article_info_author = ""
|
all_article_info_author = ""
|
||||||
|
|
||||||
# pages with specific settings
|
# pages with specific settings
|
||||||
article_pages = [
|
article_pages = [
|
||||||
{"file": "about/release-notes", "os": ["linux"], "date": "2026-03-25"},
|
{"file": "about/release-notes", "os": ["linux"], "date": "2026-01-21"},
|
||||||
{"file": "release/changelog", "os": ["linux"],},
|
{"file": "release/changelog", "os": ["linux"],},
|
||||||
{"file": "compatibility/compatibility-matrix", "os": ["linux"]},
|
{"file": "compatibility/compatibility-matrix", "os": ["linux"]},
|
||||||
{"file": "compatibility/ml-compatibility/pytorch-compatibility", "os": ["linux"]},
|
{"file": "compatibility/ml-compatibility/pytorch-compatibility", "os": ["linux"]},
|
||||||
{"file": "compatibility/ml-compatibility/tensorflow-compatibility", "os": ["linux"]},
|
{"file": "compatibility/ml-compatibility/tensorflow-compatibility", "os": ["linux"]},
|
||||||
{"file": "compatibility/ml-compatibility/jax-compatibility", "os": ["linux"]},
|
{"file": "compatibility/ml-compatibility/jax-compatibility", "os": ["linux"]},
|
||||||
|
{"file": "compatibility/ml-compatibility/verl-compatibility", "os": ["linux"]},
|
||||||
|
{"file": "compatibility/ml-compatibility/stanford-megatron-lm-compatibility", "os": ["linux"]},
|
||||||
{"file": "compatibility/ml-compatibility/dgl-compatibility", "os": ["linux"]},
|
{"file": "compatibility/ml-compatibility/dgl-compatibility", "os": ["linux"]},
|
||||||
|
{"file": "compatibility/ml-compatibility/megablocks-compatibility", "os": ["linux"]},
|
||||||
|
{"file": "compatibility/ml-compatibility/ray-compatibility", "os": ["linux"]},
|
||||||
|
{"file": "compatibility/ml-compatibility/llama-cpp-compatibility", "os": ["linux"]},
|
||||||
|
{"file": "compatibility/ml-compatibility/flashinfer-compatibility", "os": ["linux"]},
|
||||||
{"file": "how-to/deep-learning-rocm", "os": ["linux"]},
|
{"file": "how-to/deep-learning-rocm", "os": ["linux"]},
|
||||||
|
|
||||||
{"file": "how-to/rocm-for-ai/index", "os": ["linux"]},
|
{"file": "how-to/rocm-for-ai/index", "os": ["linux"]},
|
||||||
@@ -162,7 +168,6 @@ article_pages = [
|
|||||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/jax-maxtext-v25.5", "os": ["linux"]},
|
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/jax-maxtext-v25.5", "os": ["linux"]},
|
||||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/jax-maxtext-v25.9", "os": ["linux"]},
|
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/jax-maxtext-v25.9", "os": ["linux"]},
|
||||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/jax-maxtext-v25.11", "os": ["linux"]},
|
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/jax-maxtext-v25.11", "os": ["linux"]},
|
||||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/jax-maxtext-v26.1", "os": ["linux"]},
|
|
||||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/mpt-llm-foundry", "os": ["linux"]},
|
{"file": "how-to/rocm-for-ai/training/benchmark-docker/mpt-llm-foundry", "os": ["linux"]},
|
||||||
|
|
||||||
{"file": "how-to/rocm-for-ai/fine-tuning/index", "os": ["linux"]},
|
{"file": "how-to/rocm-for-ai/fine-tuning/index", "os": ["linux"]},
|
||||||
@@ -202,8 +207,6 @@ article_pages = [
|
|||||||
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-25.11", "os": ["linux"]},
|
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-25.11", "os": ["linux"]},
|
||||||
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-25.12", "os": ["linux"]},
|
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-25.12", "os": ["linux"]},
|
||||||
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-25.13", "os": ["linux"]},
|
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-25.13", "os": ["linux"]},
|
||||||
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-26.1", "os": ["linux"]},
|
|
||||||
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-26.2", "os": ["linux"]},
|
|
||||||
|
|
||||||
{"file": "how-to/rocm-for-ai/inference/deploy-your-model", "os": ["linux"]},
|
{"file": "how-to/rocm-for-ai/inference/deploy-your-model", "os": ["linux"]},
|
||||||
|
|
||||||
@@ -225,8 +228,6 @@ article_pages = [
|
|||||||
{"file": "how-to/tuning-guides/mi300x/workload", "os": ["linux"]},
|
{"file": "how-to/tuning-guides/mi300x/workload", "os": ["linux"]},
|
||||||
{"file": "how-to/system-debugging", "os": ["linux"]},
|
{"file": "how-to/system-debugging", "os": ["linux"]},
|
||||||
{"file": "how-to/gpu-enabled-mpi", "os": ["linux"]},
|
{"file": "how-to/gpu-enabled-mpi", "os": ["linux"]},
|
||||||
|
|
||||||
{"file": "reference/rocm-tools", "os": ["linux"],},
|
|
||||||
]
|
]
|
||||||
|
|
||||||
external_toc_path = "./sphinx/_toc.yml"
|
external_toc_path = "./sphinx/_toc.yml"
|
||||||
@@ -244,7 +245,7 @@ external_projects_current_project = "rocm"
|
|||||||
# external_projects_remote_repository = ""
|
# external_projects_remote_repository = ""
|
||||||
|
|
||||||
html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "https://rocm-stg.amd.com/")
|
html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "https://rocm-stg.amd.com/")
|
||||||
html_context = {"docs_header_version": "7.2.1"}
|
html_context = {"docs_header_version": "7.1.0"}
|
||||||
if os.environ.get("READTHEDOCS", "") == "True":
|
if os.environ.get("READTHEDOCS", "") == "True":
|
||||||
html_context["READTHEDOCS"] = True
|
html_context["READTHEDOCS"] = True
|
||||||
|
|
||||||
@@ -280,6 +281,3 @@ html_context = {
|
|||||||
|
|
||||||
# Disable figure and table numbering
|
# Disable figure and table numbering
|
||||||
numfig = False
|
numfig = False
|
||||||
|
|
||||||
# Uncomment if facing rate limit exceed issue with local build
|
|
||||||
external_projects_remote_repository = ""
|
|
||||||
|
|||||||
Binary file not shown.
|
Before Width: | Height: | Size: 9.4 KiB |
@@ -110,7 +110,7 @@ vllm_benchmark:
|
|||||||
- model: DBRX Instruct
|
- model: DBRX Instruct
|
||||||
mad_tag: pyt_vllm_dbrx-instruct
|
mad_tag: pyt_vllm_dbrx-instruct
|
||||||
model_repo: databricks/dbrx-instruct
|
model_repo: databricks/dbrx-instruct
|
||||||
url: https://huggingface.co/databricks
|
url: https://huggingface.co/databricks/dbrx-instruct
|
||||||
precision: float16
|
precision: float16
|
||||||
- model: DBRX Instruct FP8
|
- model: DBRX Instruct FP8
|
||||||
mad_tag: pyt_vllm_dbrx_fp8
|
mad_tag: pyt_vllm_dbrx_fp8
|
||||||
|
|||||||
@@ -1,105 +0,0 @@
|
|||||||
docker:
|
|
||||||
pull_tag: rocm/pytorch-xdit:v25.13
|
|
||||||
docker_hub_url: https://hub.docker.com/layers/rocm/pytorch-xdit/v25.13/images/sha256-81954713070d67bde08595e03f62110c8a3dd66a9ae17a77d611e01f83f0f4ef
|
|
||||||
ROCm: 7.11.0
|
|
||||||
whats_new:
|
|
||||||
- "Flux.1 Kontext support"
|
|
||||||
- "Flux.2 Dev support"
|
|
||||||
- "Flux FP8 GEMM support"
|
|
||||||
- "Hybrid FP8 attention support for Wan models"
|
|
||||||
components:
|
|
||||||
TheRock:
|
|
||||||
version: 1728a81
|
|
||||||
url: https://github.com/ROCm/TheRock
|
|
||||||
rccl:
|
|
||||||
version: d23d18f
|
|
||||||
url: https://github.com/ROCm/rccl
|
|
||||||
composable_kernel:
|
|
||||||
version: ab0101c
|
|
||||||
url: https://github.com/ROCm/composable_kernel
|
|
||||||
rocm-libraries:
|
|
||||||
version: a2f7c35
|
|
||||||
url: https://github.com/ROCm/rocm-libraries
|
|
||||||
rocm-systems:
|
|
||||||
version: 659737c
|
|
||||||
url: https://github.com/ROCm/rocm-systems
|
|
||||||
torch:
|
|
||||||
version: 91be249
|
|
||||||
url: https://github.com/ROCm/pytorch
|
|
||||||
torchvision:
|
|
||||||
version: b919bd0
|
|
||||||
url: https://github.com/pytorch/vision
|
|
||||||
triton:
|
|
||||||
version: a272dfa
|
|
||||||
url: https://github.com/ROCm/triton
|
|
||||||
accelerate:
|
|
||||||
version: b521400f
|
|
||||||
url: https://github.com/huggingface/accelerate
|
|
||||||
aiter:
|
|
||||||
version: de14bec0
|
|
||||||
url: https://github.com/ROCm/aiter
|
|
||||||
diffusers:
|
|
||||||
version: a1f36ee3e
|
|
||||||
url: https://github.com/huggingface/diffusers
|
|
||||||
xfuser:
|
|
||||||
version: adf2681
|
|
||||||
url: https://github.com/xdit-project/xDiT
|
|
||||||
yunchang:
|
|
||||||
version: 2c9b712
|
|
||||||
url: https://github.com/feifeibear/long-context-attention
|
|
||||||
supported_models:
|
|
||||||
- group: Hunyuan Video
|
|
||||||
js_tag: hunyuan
|
|
||||||
models:
|
|
||||||
- model: Hunyuan Video
|
|
||||||
model_repo: tencent/HunyuanVideo
|
|
||||||
revision: refs/pr/18
|
|
||||||
url: https://huggingface.co/tencent/HunyuanVideo
|
|
||||||
github: https://github.com/Tencent-Hunyuan/HunyuanVideo
|
|
||||||
mad_tag: pyt_xdit_hunyuanvideo
|
|
||||||
js_tag: hunyuan_tag
|
|
||||||
- group: Wan-AI
|
|
||||||
js_tag: wan
|
|
||||||
models:
|
|
||||||
- model: Wan2.1
|
|
||||||
model_repo: Wan-AI/Wan2.1-I2V-14B-720P-Diffusers
|
|
||||||
url: https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P-Diffusers
|
|
||||||
github: https://github.com/Wan-Video/Wan2.1
|
|
||||||
mad_tag: pyt_xdit_wan_2_1
|
|
||||||
js_tag: wan_21_tag
|
|
||||||
- model: Wan2.2
|
|
||||||
model_repo: Wan-AI/Wan2.2-I2V-A14B-Diffusers
|
|
||||||
url: https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B-Diffusers
|
|
||||||
github: https://github.com/Wan-Video/Wan2.2
|
|
||||||
mad_tag: pyt_xdit_wan_2_2
|
|
||||||
js_tag: wan_22_tag
|
|
||||||
- group: FLUX
|
|
||||||
js_tag: flux
|
|
||||||
models:
|
|
||||||
- model: FLUX.1
|
|
||||||
model_repo: black-forest-labs/FLUX.1-dev
|
|
||||||
url: https://huggingface.co/black-forest-labs/FLUX.1-dev
|
|
||||||
github: https://github.com/black-forest-labs/flux
|
|
||||||
mad_tag: pyt_xdit_flux
|
|
||||||
js_tag: flux_1_tag
|
|
||||||
- model: FLUX.1 Kontext
|
|
||||||
model_repo: black-forest-labs/FLUX.1-Kontext-dev
|
|
||||||
url: https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev
|
|
||||||
github: https://github.com/black-forest-labs/flux
|
|
||||||
mad_tag: pyt_xdit_flux_kontext
|
|
||||||
js_tag: flux_1_kontext_tag
|
|
||||||
- model: FLUX.2
|
|
||||||
model_repo: black-forest-labs/FLUX.2-dev
|
|
||||||
url: https://huggingface.co/black-forest-labs/FLUX.2-dev
|
|
||||||
github: https://github.com/black-forest-labs/flux2
|
|
||||||
mad_tag: pyt_xdit_flux_2
|
|
||||||
js_tag: flux_2_tag
|
|
||||||
- group: StableDiffusion
|
|
||||||
js_tag: stablediffusion
|
|
||||||
models:
|
|
||||||
- model: stable-diffusion-3.5-large
|
|
||||||
model_repo: stabilityai/stable-diffusion-3.5-large
|
|
||||||
url: https://huggingface.co/stabilityai/stable-diffusion-3.5-large
|
|
||||||
github: https://github.com/Stability-AI/sd3.5
|
|
||||||
mad_tag: pyt_xdit_sd_3_5
|
|
||||||
js_tag: stable_diffusion_3_5_large_tag
|
|
||||||
@@ -1,105 +0,0 @@
|
|||||||
docker:
|
|
||||||
pull_tag: rocm/pytorch-xdit:v25.13
|
|
||||||
docker_hub_url: https://hub.docker.com/layers/rocm/pytorch-xdit/v25.13/images/sha256-81954713070d67bde08595e03f62110c8a3dd66a9ae17a77d611e01f83f0f4ef
|
|
||||||
ROCm: 7.11.0
|
|
||||||
whats_new:
|
|
||||||
- "Flux.1 Kontext support"
|
|
||||||
- "Flux.2 Dev support"
|
|
||||||
- "Flux FP8 GEMM support"
|
|
||||||
- "Hybrid FP8 attention support for Wan models"
|
|
||||||
components:
|
|
||||||
TheRock:
|
|
||||||
version: 1728a81
|
|
||||||
url: https://github.com/ROCm/TheRock
|
|
||||||
rccl:
|
|
||||||
version: d23d18f
|
|
||||||
url: https://github.com/ROCm/rccl
|
|
||||||
composable_kernel:
|
|
||||||
version: ab0101c
|
|
||||||
url: https://github.com/ROCm/composable_kernel
|
|
||||||
rocm-libraries:
|
|
||||||
version: a2f7c35
|
|
||||||
url: https://github.com/ROCm/rocm-libraries
|
|
||||||
rocm-systems:
|
|
||||||
version: 659737c
|
|
||||||
url: https://github.com/ROCm/rocm-systems
|
|
||||||
torch:
|
|
||||||
version: 91be249
|
|
||||||
url: https://github.com/ROCm/pytorch
|
|
||||||
torchvision:
|
|
||||||
version: b919bd0
|
|
||||||
url: https://github.com/pytorch/vision
|
|
||||||
triton:
|
|
||||||
version: a272dfa
|
|
||||||
url: https://github.com/ROCm/triton
|
|
||||||
accelerate:
|
|
||||||
version: b521400f
|
|
||||||
url: https://github.com/huggingface/accelerate
|
|
||||||
aiter:
|
|
||||||
version: de14bec0
|
|
||||||
url: https://github.com/ROCm/aiter
|
|
||||||
diffusers:
|
|
||||||
version: a1f36ee3e
|
|
||||||
url: https://github.com/huggingface/diffusers
|
|
||||||
xfuser:
|
|
||||||
version: adf2681
|
|
||||||
url: https://github.com/xdit-project/xDiT
|
|
||||||
yunchang:
|
|
||||||
version: 2c9b712
|
|
||||||
url: https://github.com/feifeibear/long-context-attention
|
|
||||||
supported_models:
|
|
||||||
- group: Hunyuan Video
|
|
||||||
js_tag: hunyuan
|
|
||||||
models:
|
|
||||||
- model: Hunyuan Video
|
|
||||||
model_repo: tencent/HunyuanVideo
|
|
||||||
revision: refs/pr/18
|
|
||||||
url: https://huggingface.co/tencent/HunyuanVideo
|
|
||||||
github: https://github.com/Tencent-Hunyuan/HunyuanVideo
|
|
||||||
mad_tag: pyt_xdit_hunyuanvideo
|
|
||||||
js_tag: hunyuan_tag
|
|
||||||
- group: Wan-AI
|
|
||||||
js_tag: wan
|
|
||||||
models:
|
|
||||||
- model: Wan2.1
|
|
||||||
model_repo: Wan-AI/Wan2.1-I2V-14B-720P-Diffusers
|
|
||||||
url: https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P-Diffusers
|
|
||||||
github: https://github.com/Wan-Video/Wan2.1
|
|
||||||
mad_tag: pyt_xdit_wan_2_1
|
|
||||||
js_tag: wan_21_tag
|
|
||||||
- model: Wan2.2
|
|
||||||
model_repo: Wan-AI/Wan2.2-I2V-A14B-Diffusers
|
|
||||||
url: https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B-Diffusers
|
|
||||||
github: https://github.com/Wan-Video/Wan2.2
|
|
||||||
mad_tag: pyt_xdit_wan_2_2
|
|
||||||
js_tag: wan_22_tag
|
|
||||||
- group: FLUX
|
|
||||||
js_tag: flux
|
|
||||||
models:
|
|
||||||
- model: FLUX.1
|
|
||||||
model_repo: black-forest-labs/FLUX.1-dev
|
|
||||||
url: https://huggingface.co/black-forest-labs/FLUX.1-dev
|
|
||||||
github: https://github.com/black-forest-labs/flux
|
|
||||||
mad_tag: pyt_xdit_flux
|
|
||||||
js_tag: flux_1_tag
|
|
||||||
- model: FLUX.1 Kontext
|
|
||||||
model_repo: black-forest-labs/FLUX.1-Kontext-dev
|
|
||||||
url: https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev
|
|
||||||
github: https://github.com/black-forest-labs/flux
|
|
||||||
mad_tag: pyt_xdit_flux_kontext
|
|
||||||
js_tag: flux_1_kontext_tag
|
|
||||||
- model: FLUX.2
|
|
||||||
model_repo: black-forest-labs/FLUX.2-dev
|
|
||||||
url: https://huggingface.co/black-forest-labs/FLUX.2-dev
|
|
||||||
github: https://github.com/black-forest-labs/flux2
|
|
||||||
mad_tag: pyt_xdit_flux_2
|
|
||||||
js_tag: flux_2_tag
|
|
||||||
- group: StableDiffusion
|
|
||||||
js_tag: stablediffusion
|
|
||||||
models:
|
|
||||||
- model: stable-diffusion-3.5-large
|
|
||||||
model_repo: stabilityai/stable-diffusion-3.5-large
|
|
||||||
url: https://huggingface.co/stabilityai/stable-diffusion-3.5-large
|
|
||||||
github: https://github.com/Stability-AI/sd3.5
|
|
||||||
mad_tag: pyt_xdit_sd_3_5
|
|
||||||
js_tag: stable_diffusion_3_5_large_tag
|
|
||||||
@@ -1,311 +0,0 @@
|
|||||||
docker:
|
|
||||||
pull_tag: rocm/pytorch-xdit:v26.2
|
|
||||||
docker_hub_url: https://hub.docker.com/layers/rocm/pytorch-xdit/v26.2/images/sha256-e2c504af438bb9cf60e3869c499baa5102b3d3f62141b99c49743e755ae44008
|
|
||||||
ROCm: 7.11.0
|
|
||||||
whats_new:
|
|
||||||
- "LTX-2 support"
|
|
||||||
- "Flux.2 Klein support"
|
|
||||||
- "Aiter update to support opt_groupnorm, dynamic scaling in FP8 and Sage attention v1"
|
|
||||||
components:
|
|
||||||
TheRock:
|
|
||||||
version: 1728a81
|
|
||||||
url: https://github.com/ROCm/TheRock
|
|
||||||
rccl:
|
|
||||||
version: d23d18f
|
|
||||||
url: https://github.com/ROCm/rccl
|
|
||||||
composable_kernel:
|
|
||||||
version: ab0101c
|
|
||||||
url: https://github.com/ROCm/composable_kernel
|
|
||||||
rocm-libraries:
|
|
||||||
version: a2f7c35
|
|
||||||
url: https://github.com/ROCm/rocm-libraries
|
|
||||||
rocm-systems:
|
|
||||||
version: 659737c
|
|
||||||
url: https://github.com/ROCm/rocm-systems
|
|
||||||
torch:
|
|
||||||
version: 91be249
|
|
||||||
url: https://github.com/ROCm/pytorch
|
|
||||||
torchvision:
|
|
||||||
version: b919bd0
|
|
||||||
url: https://github.com/pytorch/vision
|
|
||||||
triton:
|
|
||||||
version: a272dfa
|
|
||||||
url: https://github.com/ROCm/triton
|
|
||||||
accelerate:
|
|
||||||
version: b7f2212
|
|
||||||
url: https://github.com/huggingface/accelerate
|
|
||||||
aiter:
|
|
||||||
version: 42ae0ad
|
|
||||||
url: https://github.com/ROCm/aiter
|
|
||||||
diffusers:
|
|
||||||
version: a3dcd9
|
|
||||||
url: https://github.com/huggingface/diffusers
|
|
||||||
xfuser:
|
|
||||||
version: 635fc29
|
|
||||||
url: https://github.com/xdit-project/xDiT
|
|
||||||
yunchang:
|
|
||||||
version: 631bdfd
|
|
||||||
url: https://github.com/feifeibear/long-context-attention
|
|
||||||
supported_models:
|
|
||||||
- group: Hunyuan Video
|
|
||||||
js_tag: hunyuan
|
|
||||||
models:
|
|
||||||
- model: Hunyuan Video
|
|
||||||
model_repo: tencent/HunyuanVideo
|
|
||||||
revision: refs/pr/18
|
|
||||||
url: https://huggingface.co/tencent/HunyuanVideo
|
|
||||||
github: https://github.com/Tencent-Hunyuan/HunyuanVideo
|
|
||||||
mad_tag: pyt_xdit_hunyuanvideo
|
|
||||||
js_tag: hunyuan_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--prompt "In the large cage, two puppies were wagging their tails at each other." \'
|
|
||||||
- '--batch_size 1 \'
|
|
||||||
- '--height 720 --width 1280 \'
|
|
||||||
- '--seed 1168860793 \'
|
|
||||||
- '--num_frames 129 \'
|
|
||||||
- '--num_inference_steps 50 \'
|
|
||||||
- '--warmup_calls 1 \'
|
|
||||||
- '--num_iterations 1 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--enable_tiling --enable_slicing \'
|
|
||||||
- '--guidance_scale 6.0 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- model: Hunyuan Video 1.5
|
|
||||||
model_repo: hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v
|
|
||||||
url: https://huggingface.co/hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v
|
|
||||||
github: https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5
|
|
||||||
mad_tag: pyt_xdit_hunyuanvideo_1_5
|
|
||||||
js_tag: hunyuan_1_5_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--prompt "In the large cage, two puppies were wagging their tails at each other." \'
|
|
||||||
- '--task t2v \'
|
|
||||||
- '--height 720 --width 1280 \'
|
|
||||||
- '--seed 1168860793 \'
|
|
||||||
- '--num_frames 129 \'
|
|
||||||
- '--num_inference_steps 50 \'
|
|
||||||
- '--num_iterations 1 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--enable_tiling --enable_slicing \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- group: Wan-AI
|
|
||||||
js_tag: wan
|
|
||||||
models:
|
|
||||||
- model: Wan2.1
|
|
||||||
model_repo: Wan-AI/Wan2.1-I2V-14B-720P-Diffusers
|
|
||||||
url: https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P-Diffusers
|
|
||||||
github: https://github.com/Wan-Video/Wan2.1
|
|
||||||
mad_tag: pyt_xdit_wan_2_1
|
|
||||||
js_tag: wan_21_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline''s intricate details and the refreshing atmosphere of the seaside." \'
|
|
||||||
- '--height 720 \'
|
|
||||||
- '--width 1280 \'
|
|
||||||
- '--input_images /app/data/wan_input.jpg \'
|
|
||||||
- '--num_frames 81 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--num_iterations 1 \'
|
|
||||||
- '--num_inference_steps 40 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- model: Wan2.2
|
|
||||||
model_repo: Wan-AI/Wan2.2-I2V-A14B-Diffusers
|
|
||||||
url: https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B-Diffusers
|
|
||||||
github: https://github.com/Wan-Video/Wan2.2
|
|
||||||
mad_tag: pyt_xdit_wan_2_2
|
|
||||||
js_tag: wan_22_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline''s intricate details and the refreshing atmosphere of the seaside." \'
|
|
||||||
- '--height 720 \'
|
|
||||||
- '--width 1280 \'
|
|
||||||
- '--input_images /app/data/wan_input.jpg \'
|
|
||||||
- '--num_frames 81 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--num_iterations 1 \'
|
|
||||||
- '--num_inference_steps 40 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- group: FLUX
|
|
||||||
js_tag: flux
|
|
||||||
models:
|
|
||||||
- model: FLUX.1
|
|
||||||
model_repo: black-forest-labs/FLUX.1-dev
|
|
||||||
url: https://huggingface.co/black-forest-labs/FLUX.1-dev
|
|
||||||
github: https://github.com/black-forest-labs/flux
|
|
||||||
mad_tag: pyt_xdit_flux
|
|
||||||
js_tag: flux_1_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--prompt "A small cat" \'
|
|
||||||
- '--height 1024 \'
|
|
||||||
- '--width 1024 \'
|
|
||||||
- '--num_inference_steps 25 \'
|
|
||||||
- '--max_sequence_length 256 \'
|
|
||||||
- '--warmup_calls 5 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--guidance_scale 0.0 \'
|
|
||||||
- '--num_iterations 50 \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- model: FLUX.1 Kontext
|
|
||||||
model_repo: black-forest-labs/FLUX.1-Kontext-dev
|
|
||||||
url: https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev
|
|
||||||
github: https://github.com/black-forest-labs/flux
|
|
||||||
mad_tag: pyt_xdit_flux_kontext
|
|
||||||
js_tag: flux_1_kontext_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--prompt "Add a cool hat to the cat" \'
|
|
||||||
- '--height 1024 \'
|
|
||||||
- '--width 1024 \'
|
|
||||||
- '--num_inference_steps 30 \'
|
|
||||||
- '--max_sequence_length 512 \'
|
|
||||||
- '--warmup_calls 5 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--input_images /app/data/flux_cat.png \'
|
|
||||||
- '--guidance_scale 2.5 \'
|
|
||||||
- '--num_iterations 25 \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- model: FLUX.2
|
|
||||||
model_repo: black-forest-labs/FLUX.2-dev
|
|
||||||
url: https://huggingface.co/black-forest-labs/FLUX.2-dev
|
|
||||||
github: https://github.com/black-forest-labs/flux2
|
|
||||||
mad_tag: pyt_xdit_flux_2
|
|
||||||
js_tag: flux_2_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--prompt "Add a cool hat to the cat" \'
|
|
||||||
- '--height 1024 \'
|
|
||||||
- '--width 1024 \'
|
|
||||||
- '--num_inference_steps 50 \'
|
|
||||||
- '--max_sequence_length 512 \'
|
|
||||||
- '--warmup_calls 5 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--input_images /app/data/flux_cat.png \'
|
|
||||||
- '--guidance_scale 4.0 \'
|
|
||||||
- '--num_iterations 25 \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- model: FLUX.2 Klein
|
|
||||||
model_repo: black-forest-labs/FLUX.2-klein-9B
|
|
||||||
url: https://huggingface.co/black-forest-labs/FLUX.2-klein-9B
|
|
||||||
github: https://github.com/black-forest-labs/flux2
|
|
||||||
mad_tag: pyt_xdit_flux_2_klein
|
|
||||||
js_tag: flux_2_klein_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--prompt "A spectacular sunset over the ocean" \'
|
|
||||||
- '--height 2048 \'
|
|
||||||
- '--width 2048 \'
|
|
||||||
- '--num_inference_steps 4 \'
|
|
||||||
- '--warmup_calls 5 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--guidance_scale 1.0 \'
|
|
||||||
- '--num_iterations 25 \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- group: StableDiffusion
|
|
||||||
js_tag: stablediffusion
|
|
||||||
models:
|
|
||||||
- model: stable-diffusion-3.5-large
|
|
||||||
model_repo: stabilityai/stable-diffusion-3.5-large
|
|
||||||
url: https://huggingface.co/stabilityai/stable-diffusion-3.5-large
|
|
||||||
github: https://github.com/Stability-AI/sd3.5
|
|
||||||
mad_tag: pyt_xdit_sd_3_5
|
|
||||||
js_tag: stable_diffusion_3_5_large_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--prompt "A capybara holding a sign that reads Hello World" \'
|
|
||||||
- '--num_iterations 50 \'
|
|
||||||
- '--num_inference_steps 28 \'
|
|
||||||
- '--pipefusion_parallel_degree 4 \'
|
|
||||||
- '--use_cfg_parallel \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- group: Z-Image
|
|
||||||
js_tag: z_image
|
|
||||||
models:
|
|
||||||
- model: Z-Image Turbo
|
|
||||||
model_repo: Tongyi-MAI/Z-Image-Turbo
|
|
||||||
url: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
|
|
||||||
github: https://github.com/Tongyi-MAI/Z-Image
|
|
||||||
mad_tag: pyt_xdit_z_image_turbo
|
|
||||||
js_tag: z_image_turbo_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--prompt "A crowded beach" \'
|
|
||||||
- '--height 1088 \'
|
|
||||||
- '--width 1920 \'
|
|
||||||
- '--num_inference_steps 9 \'
|
|
||||||
- '--ulysses_degree 2 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--guidance_scale 0.0 \'
|
|
||||||
- '--num_iterations 50 \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- group: LTX
|
|
||||||
js_tag: ltx
|
|
||||||
models:
|
|
||||||
- model: LTX-2
|
|
||||||
model_repo: Lightricks/LTX-2
|
|
||||||
url: https://huggingface.co/Lightricks/LTX-2
|
|
||||||
github: https://github.com/Lightricks/LTX-2
|
|
||||||
mad_tag: pyt_xdit_ltx2
|
|
||||||
js_tag: ltx2_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--prompt "Cinematic action packed shot. The man says silently: \"We need to run.\". The camera zooms in on his mouth then immediately screams: \"NOW!\". The camera zooms back out, he turns around and bolts it." \'
|
|
||||||
- '--height 1088 \'
|
|
||||||
- '--width 1920 \'
|
|
||||||
- '--num_inference_steps 40 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--guidance_scale 4.0 \'
|
|
||||||
- '--num_iterations 1 \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
@@ -1,28 +1,31 @@
|
|||||||
docker:
|
docker:
|
||||||
pull_tag: rocm/pytorch-xdit:v26.3
|
pull_tag: rocm/pytorch-xdit:v25.13
|
||||||
docker_hub_url: https://hub.docker.com/layers/rocm/pytorch-xdit/v26.3/images/sha256-ac78a03d2911bf1b49c001d3be2e8bd745c1bc455cb49ae972825a7986880902
|
docker_hub_url: https://hub.docker.com/layers/rocm/pytorch-xdit/v25.13/images/sha256-81954713070d67bde08595e03f62110c8a3dd66a9ae17a77d611e01f83f0f4ef
|
||||||
ROCm: 7.12.0
|
ROCm: 7.11.0
|
||||||
whats_new:
|
whats_new:
|
||||||
- "Qwen-Image support"
|
- "Flux.1 Kontext support"
|
||||||
- "Qwen-Image-Edit support"
|
- "Flux.2 Dev support"
|
||||||
- "Aiter update to support Sage attention v2"
|
- "Flux FP8 GEMM support"
|
||||||
- "xDiT update to support MXFP4 GEMMs in Wan2.2, Wan2.1 and Flux.2"
|
- "Hybrid FP8 attention support for Wan models"
|
||||||
components:
|
components:
|
||||||
TheRock:
|
TheRock:
|
||||||
version: e40a6da
|
version: 1728a81
|
||||||
url: https://github.com/ROCm/TheRock
|
url: https://github.com/ROCm/TheRock
|
||||||
|
rccl:
|
||||||
|
version: d23d18f
|
||||||
|
url: https://github.com/ROCm/rccl
|
||||||
|
composable_kernel:
|
||||||
|
version: ab0101c
|
||||||
|
url: https://github.com/ROCm/composable_kernel
|
||||||
rocm-libraries:
|
rocm-libraries:
|
||||||
version: 9e9e900
|
version: a2f7c35
|
||||||
url: https://github.com/ROCm/rocm-libraries
|
url: https://github.com/ROCm/rocm-libraries
|
||||||
rocm-systems:
|
rocm-systems:
|
||||||
version: ca89a1a
|
version: 659737c
|
||||||
url: https://github.com/ROCm/rocm-systems
|
url: https://github.com/ROCm/rocm-systems
|
||||||
torch:
|
torch:
|
||||||
version: 91be249
|
version: 91be249
|
||||||
url: https://github.com/ROCm/pytorch
|
url: https://github.com/ROCm/pytorch
|
||||||
torchaudio:
|
|
||||||
version: e3c6ee2
|
|
||||||
url: https://github.com/pytorch/audio
|
|
||||||
torchvision:
|
torchvision:
|
||||||
version: b919bd0
|
version: b919bd0
|
||||||
url: https://github.com/pytorch/vision
|
url: https://github.com/pytorch/vision
|
||||||
@@ -30,19 +33,19 @@ docker:
|
|||||||
version: a272dfa
|
version: a272dfa
|
||||||
url: https://github.com/ROCm/triton
|
url: https://github.com/ROCm/triton
|
||||||
accelerate:
|
accelerate:
|
||||||
version: 46ba481
|
version: b521400f
|
||||||
url: https://github.com/huggingface/accelerate
|
url: https://github.com/huggingface/accelerate
|
||||||
aiter:
|
aiter:
|
||||||
version: 82d733f
|
version: de14bec0
|
||||||
url: https://github.com/ROCm/aiter
|
url: https://github.com/ROCm/aiter
|
||||||
diffusers:
|
diffusers:
|
||||||
version: a80b192
|
version: a1f36ee3e
|
||||||
url: https://github.com/huggingface/diffusers
|
url: https://github.com/huggingface/diffusers
|
||||||
xfuser:
|
xfuser:
|
||||||
version: 2882027
|
version: adf2681
|
||||||
url: https://github.com/xdit-project/xDiT
|
url: https://github.com/xdit-project/xDiT
|
||||||
yunchang:
|
yunchang:
|
||||||
version: 631bdfd
|
version: 2c9b712
|
||||||
url: https://github.com/feifeibear/long-context-attention
|
url: https://github.com/feifeibear/long-context-attention
|
||||||
supported_models:
|
supported_models:
|
||||||
- group: Hunyuan Video
|
- group: Hunyuan Video
|
||||||
@@ -55,46 +58,6 @@ docker:
|
|||||||
github: https://github.com/Tencent-Hunyuan/HunyuanVideo
|
github: https://github.com/Tencent-Hunyuan/HunyuanVideo
|
||||||
mad_tag: pyt_xdit_hunyuanvideo
|
mad_tag: pyt_xdit_hunyuanvideo
|
||||||
js_tag: hunyuan_tag
|
js_tag: hunyuan_tag
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--prompt "In the large cage, two puppies were wagging their tails at each other." \'
|
|
||||||
- '--batch_size 1 \'
|
|
||||||
- '--height 720 --width 1280 \'
|
|
||||||
- '--seed 1168860793 \'
|
|
||||||
- '--num_frames 129 \'
|
|
||||||
- '--num_inference_steps 50 \'
|
|
||||||
- '--warmup_calls 1 \'
|
|
||||||
- '--num_iterations 1 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--enable_tiling --enable_slicing \'
|
|
||||||
- '--guidance_scale 6.0 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- model: Hunyuan Video 1.5
|
|
||||||
model_repo: hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v
|
|
||||||
url: https://huggingface.co/hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v
|
|
||||||
github: https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5
|
|
||||||
mad_tag: pyt_xdit_hunyuanvideo_1_5
|
|
||||||
js_tag: hunyuan_1_5_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--prompt "In the large cage, two puppies were wagging their tails at each other." \'
|
|
||||||
- '--task t2v \'
|
|
||||||
- '--height 720 --width 1280 \'
|
|
||||||
- '--seed 1168860793 \'
|
|
||||||
- '--num_frames 129 \'
|
|
||||||
- '--num_inference_steps 50 \'
|
|
||||||
- '--num_iterations 1 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--enable_tiling --enable_slicing \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- group: Wan-AI
|
- group: Wan-AI
|
||||||
js_tag: wan
|
js_tag: wan
|
||||||
models:
|
models:
|
||||||
@@ -104,44 +67,12 @@ docker:
|
|||||||
github: https://github.com/Wan-Video/Wan2.1
|
github: https://github.com/Wan-Video/Wan2.1
|
||||||
mad_tag: pyt_xdit_wan_2_1
|
mad_tag: pyt_xdit_wan_2_1
|
||||||
js_tag: wan_21_tag
|
js_tag: wan_21_tag
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline''s intricate details and the refreshing atmosphere of the seaside." \'
|
|
||||||
- '--height 720 \'
|
|
||||||
- '--width 1280 \'
|
|
||||||
- '--input_images /app/data/wan_input.jpg \'
|
|
||||||
- '--num_frames 81 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--num_iterations 1 \'
|
|
||||||
- '--num_inference_steps 40 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- model: Wan2.2
|
- model: Wan2.2
|
||||||
model_repo: Wan-AI/Wan2.2-I2V-A14B-Diffusers
|
model_repo: Wan-AI/Wan2.2-I2V-A14B-Diffusers
|
||||||
url: https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B-Diffusers
|
url: https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B-Diffusers
|
||||||
github: https://github.com/Wan-Video/Wan2.2
|
github: https://github.com/Wan-Video/Wan2.2
|
||||||
mad_tag: pyt_xdit_wan_2_2
|
mad_tag: pyt_xdit_wan_2_2
|
||||||
js_tag: wan_22_tag
|
js_tag: wan_22_tag
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline''s intricate details and the refreshing atmosphere of the seaside." \'
|
|
||||||
- '--height 720 \'
|
|
||||||
- '--width 1280 \'
|
|
||||||
- '--input_images /app/data/wan_input.jpg \'
|
|
||||||
- '--num_frames 81 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--num_iterations 1 \'
|
|
||||||
- '--num_inference_steps 40 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- group: FLUX
|
- group: FLUX
|
||||||
js_tag: flux
|
js_tag: flux
|
||||||
models:
|
models:
|
||||||
@@ -151,93 +82,18 @@ docker:
|
|||||||
github: https://github.com/black-forest-labs/flux
|
github: https://github.com/black-forest-labs/flux
|
||||||
mad_tag: pyt_xdit_flux
|
mad_tag: pyt_xdit_flux
|
||||||
js_tag: flux_1_tag
|
js_tag: flux_1_tag
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--prompt "A small cat" \'
|
|
||||||
- '--height 1024 \'
|
|
||||||
- '--width 1024 \'
|
|
||||||
- '--num_inference_steps 25 \'
|
|
||||||
- '--max_sequence_length 256 \'
|
|
||||||
- '--warmup_calls 5 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--guidance_scale 0.0 \'
|
|
||||||
- '--num_iterations 50 \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- model: FLUX.1 Kontext
|
- model: FLUX.1 Kontext
|
||||||
model_repo: black-forest-labs/FLUX.1-Kontext-dev
|
model_repo: black-forest-labs/FLUX.1-Kontext-dev
|
||||||
url: https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev
|
url: https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev
|
||||||
github: https://github.com/black-forest-labs/flux
|
github: https://github.com/black-forest-labs/flux
|
||||||
mad_tag: pyt_xdit_flux_kontext
|
mad_tag: pyt_xdit_flux_kontext
|
||||||
js_tag: flux_1_kontext_tag
|
js_tag: flux_1_kontext_tag
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--prompt "Add a cool hat to the cat" \'
|
|
||||||
- '--height 1024 \'
|
|
||||||
- '--width 1024 \'
|
|
||||||
- '--num_inference_steps 30 \'
|
|
||||||
- '--max_sequence_length 512 \'
|
|
||||||
- '--warmup_calls 5 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--input_images /app/data/flux_cat.png \'
|
|
||||||
- '--guidance_scale 2.5 \'
|
|
||||||
- '--num_iterations 25 \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- model: FLUX.2
|
- model: FLUX.2
|
||||||
model_repo: black-forest-labs/FLUX.2-dev
|
model_repo: black-forest-labs/FLUX.2-dev
|
||||||
url: https://huggingface.co/black-forest-labs/FLUX.2-dev
|
url: https://huggingface.co/black-forest-labs/FLUX.2-dev
|
||||||
github: https://github.com/black-forest-labs/flux2
|
github: https://github.com/black-forest-labs/flux2
|
||||||
mad_tag: pyt_xdit_flux_2
|
mad_tag: pyt_xdit_flux_2
|
||||||
js_tag: flux_2_tag
|
js_tag: flux_2_tag
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--prompt "Add a cool hat to the cat" \'
|
|
||||||
- '--height 1024 \'
|
|
||||||
- '--width 1024 \'
|
|
||||||
- '--num_inference_steps 50 \'
|
|
||||||
- '--max_sequence_length 512 \'
|
|
||||||
- '--warmup_calls 5 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--input_images /app/data/flux_cat.png \'
|
|
||||||
- '--guidance_scale 4.0 \'
|
|
||||||
- '--num_iterations 25 \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- model: FLUX.2 Klein
|
|
||||||
model_repo: black-forest-labs/FLUX.2-klein-9B
|
|
||||||
url: https://huggingface.co/black-forest-labs/FLUX.2-klein-9B
|
|
||||||
github: https://github.com/black-forest-labs/flux2
|
|
||||||
mad_tag: pyt_xdit_flux_2_klein
|
|
||||||
js_tag: flux_2_klein_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--prompt "A spectacular sunset over the ocean" \'
|
|
||||||
- '--height 2048 \'
|
|
||||||
- '--width 2048 \'
|
|
||||||
- '--num_inference_steps 4 \'
|
|
||||||
- '--warmup_calls 5 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--guidance_scale 1.0 \'
|
|
||||||
- '--num_iterations 25 \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- group: StableDiffusion
|
- group: StableDiffusion
|
||||||
js_tag: stablediffusion
|
js_tag: stablediffusion
|
||||||
models:
|
models:
|
||||||
@@ -247,108 +103,3 @@ docker:
|
|||||||
github: https://github.com/Stability-AI/sd3.5
|
github: https://github.com/Stability-AI/sd3.5
|
||||||
mad_tag: pyt_xdit_sd_3_5
|
mad_tag: pyt_xdit_sd_3_5
|
||||||
js_tag: stable_diffusion_3_5_large_tag
|
js_tag: stable_diffusion_3_5_large_tag
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--prompt "A capybara holding a sign that reads Hello World" \'
|
|
||||||
- '--num_iterations 50 \'
|
|
||||||
- '--num_inference_steps 28 \'
|
|
||||||
- '--pipefusion_parallel_degree 4 \'
|
|
||||||
- '--use_cfg_parallel \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- group: Z-Image
|
|
||||||
js_tag: z_image
|
|
||||||
models:
|
|
||||||
- model: Z-Image Turbo
|
|
||||||
model_repo: Tongyi-MAI/Z-Image-Turbo
|
|
||||||
url: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
|
|
||||||
github: https://github.com/Tongyi-MAI/Z-Image
|
|
||||||
mad_tag: pyt_xdit_z_image_turbo
|
|
||||||
js_tag: z_image_turbo_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--prompt "A crowded beach" \'
|
|
||||||
- '--height 1088 \'
|
|
||||||
- '--width 1920 \'
|
|
||||||
- '--num_inference_steps 9 \'
|
|
||||||
- '--ulysses_degree 2 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--guidance_scale 0.0 \'
|
|
||||||
- '--num_iterations 50 \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- group: LTX
|
|
||||||
js_tag: ltx
|
|
||||||
models:
|
|
||||||
- model: LTX-2
|
|
||||||
model_repo: Lightricks/LTX-2
|
|
||||||
url: https://huggingface.co/Lightricks/LTX-2
|
|
||||||
github: https://github.com/Lightricks/LTX-2
|
|
||||||
mad_tag: pyt_xdit_ltx2
|
|
||||||
js_tag: ltx2_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--prompt "Cinematic action packed shot. The man says silently: \"We need to run.\". The camera zooms in on his mouth then immediately screams: \"NOW!\". The camera zooms back out, he turns around and bolts it." \'
|
|
||||||
- '--height 1088 \'
|
|
||||||
- '--width 1920 \'
|
|
||||||
- '--num_inference_steps 40 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--guidance_scale 4.0 \'
|
|
||||||
- '--num_iterations 1 \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- group: Qwen-Image
|
|
||||||
js_tag: qwen_image
|
|
||||||
models:
|
|
||||||
- model: Qwen-Image
|
|
||||||
model_repo: Qwen/Qwen-Image
|
|
||||||
url: https://huggingface.co/Qwen/Qwen-Image
|
|
||||||
github: https://github.com/QwenLM/Qwen-Image
|
|
||||||
mad_tag: pyt_xdit_qwen_image
|
|
||||||
js_tag: qwen_image_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--prompt "A cat holding a sign that says hello world" \'
|
|
||||||
- '--height 2048 \'
|
|
||||||
- '--width 2048 \'
|
|
||||||
- '--num_inference_steps 50 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--num_iterations 1 \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
- model: Qwen-Image-Edit
|
|
||||||
model_repo: Qwen/Qwen-Image-Edit
|
|
||||||
url: https://huggingface.co/Qwen/Qwen-Image-Edit
|
|
||||||
github: https://github.com/QwenLM/Qwen-Image
|
|
||||||
mad_tag: pyt_xdit_qwen_image_edit
|
|
||||||
js_tag: qwen_image_edit_tag
|
|
||||||
benchmark_command:
|
|
||||||
- mkdir results
|
|
||||||
- 'xdit \'
|
|
||||||
- '--model {model_repo} \'
|
|
||||||
- '--seed 42 \'
|
|
||||||
- '--prompt "Add a cool hat to the cat." \'
|
|
||||||
- '--negative_prompt "" \'
|
|
||||||
- '--input_images /app/data/flux_cat.png \'
|
|
||||||
- '--height 2048 \'
|
|
||||||
- '--width 2048 \'
|
|
||||||
- '--num_inference_steps 50 \'
|
|
||||||
- '--ulysses_degree 8 \'
|
|
||||||
- '--use_torch_compile \'
|
|
||||||
- '--num_iterations 1 \'
|
|
||||||
- '--attention_backend aiter \'
|
|
||||||
- '--output_directory results'
|
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
dockers:
|
dockers:
|
||||||
- pull_tag: rocm/jax-training:maxtext-v26.2
|
- pull_tag: rocm/jax-training:maxtext-v26.1
|
||||||
docker_hub_url: https://hub.docker.com/layers/rocm/jax-training/maxtext-v26.2/images/sha256-a89643388487b1e2fc6b6ef7bd3c44378c05d217309c977a1c18c72d05ebcaeb
|
docker_hub_url: https://hub.docker.com/layers/rocm/jax-training/maxtext-v26.1/images/sha256-901083bde353fe6362ada3036e452c792b2c96124e5900f4e9b5946c02ff9d6a
|
||||||
components:
|
components:
|
||||||
ROCm: 7.1.1
|
ROCm: 7.1.1
|
||||||
JAX: 0.8.2
|
JAX: 0.8.2
|
||||||
@@ -15,7 +15,6 @@ model_groups:
|
|||||||
mad_tag: jax_maxtext_train_llama-2-7b
|
mad_tag: jax_maxtext_train_llama-2-7b
|
||||||
model_repo: Llama-2-7B
|
model_repo: Llama-2-7B
|
||||||
precision: bf16
|
precision: bf16
|
||||||
primus_config_name: llama2_7B-pretrain.yaml
|
|
||||||
multinode_config:
|
multinode_config:
|
||||||
gfx950: env_scripts/gfx950_llama2_7b.yml
|
gfx950: env_scripts/gfx950_llama2_7b.yml
|
||||||
gfx942: env_scripts/llama2_7b.yml
|
gfx942: env_scripts/llama2_7b.yml
|
||||||
@@ -24,21 +23,18 @@ model_groups:
|
|||||||
mad_tag: jax_maxtext_train_llama-2-70b
|
mad_tag: jax_maxtext_train_llama-2-70b
|
||||||
model_repo: Llama-2-70B
|
model_repo: Llama-2-70B
|
||||||
precision: bf16
|
precision: bf16
|
||||||
primus_config_name: llama2_70B-pretrain.yaml
|
|
||||||
multinode_config:
|
multinode_config:
|
||||||
gfx950: env_scripts/gfx950_llama2_70b.yml
|
gfx950: env_scripts/gfx950_llama2_70b.yml
|
||||||
gfx942: env_scripts/llama2_70b.yml
|
gfx942: env_scripts/llama2_70b.yml
|
||||||
doc_options: ["single-node", "multi-node"]
|
doc_options: ["single-node", "multi-node"]
|
||||||
- model: Llama 3 8B
|
- model: Llama 3 8B (multi-node)
|
||||||
mad_tag: jax_maxtext_train_llama-3-8b
|
mad_tag: jax_maxtext_train_llama-3-8b
|
||||||
primus_config_name: llama3_8B-pretrain.yaml
|
|
||||||
multinode_config:
|
multinode_config:
|
||||||
gfx950: env_scripts/gfx950_llama3_8b.yml
|
gfx950: env_scripts/gfx950_llama3_8b.yml
|
||||||
gfx942: env_scripts/llama3_8b.yml
|
gfx942: env_scripts/llama3_8b.yml
|
||||||
doc_options: ["multi-node"]
|
doc_options: ["multi-node"]
|
||||||
- model: Llama 3 70B
|
- model: Llama 3 70B (multi-node)
|
||||||
mad_tag: jax_maxtext_train_llama-3-70b
|
mad_tag: jax_maxtext_train_llama-3-70b
|
||||||
primus_config_name: llama3_70B-pretrain.yaml
|
|
||||||
multinode_config:
|
multinode_config:
|
||||||
gfx950: env_scripts/gfx950_llama3_70b.yml
|
gfx950: env_scripts/gfx950_llama3_70b.yml
|
||||||
gfx942: env_scripts/llama3_70b.yml
|
gfx942: env_scripts/llama3_70b.yml
|
||||||
@@ -64,7 +60,6 @@ model_groups:
|
|||||||
mad_tag: jax_maxtext_train_llama-3.3-70b
|
mad_tag: jax_maxtext_train_llama-3.3-70b
|
||||||
model_repo: Llama-3.3-70B
|
model_repo: Llama-3.3-70B
|
||||||
precision: bf16
|
precision: bf16
|
||||||
primus_config_name: llama3.3_70B-pretrain.yaml
|
|
||||||
multinode_config:
|
multinode_config:
|
||||||
gfx950: env_scripts/gfx950_llama3.3_70b.yml
|
gfx950: env_scripts/gfx950_llama3.3_70b.yml
|
||||||
gfx942: env_scripts/llama3.3_70b.yml
|
gfx942: env_scripts/llama3.3_70b.yml
|
||||||
@@ -76,7 +71,6 @@ model_groups:
|
|||||||
mad_tag: jax_maxtext_train_deepseek-v2-lite-16b
|
mad_tag: jax_maxtext_train_deepseek-v2-lite-16b
|
||||||
model_repo: DeepSeek-V2-lite
|
model_repo: DeepSeek-V2-lite
|
||||||
precision: bf16
|
precision: bf16
|
||||||
primus_config_name: deepseek_v2_16B-pretrain.yaml
|
|
||||||
multinode_config:
|
multinode_config:
|
||||||
gfx950: env_scripts/gfx950_deepseek2_16b.yml
|
gfx950: env_scripts/gfx950_deepseek2_16b.yml
|
||||||
gfx942: env_scripts/deepseek2_16b.yml
|
gfx942: env_scripts/deepseek2_16b.yml
|
||||||
@@ -88,7 +82,6 @@ model_groups:
|
|||||||
mad_tag: jax_maxtext_train_mixtral-8x7b
|
mad_tag: jax_maxtext_train_mixtral-8x7b
|
||||||
model_repo: Mixtral-8x7B
|
model_repo: Mixtral-8x7B
|
||||||
precision: bf16
|
precision: bf16
|
||||||
primus_config_name: mixtral_8x7B-pretrain.yaml
|
|
||||||
multinode_config:
|
multinode_config:
|
||||||
gfx950: env_scripts/gfx950_mixtral_8x7b.yml
|
gfx950: env_scripts/gfx950_mixtral_8x7b.yml
|
||||||
gfx942: env_scripts/llama3_8x7b.yml
|
gfx942: env_scripts/llama3_8x7b.yml
|
||||||
|
|||||||
@@ -1,14 +1,14 @@
|
|||||||
docker:
|
docker:
|
||||||
pull_tag: rocm/primus:v26.2
|
pull_tag: rocm/primus:v26.1
|
||||||
docker_hub_url: https://hub.docker.com/layers/rocm/primus/v26.1/images/sha256-4fc8808bdb14117c6af7f38d79c809056e6fdbfd530c1fabbb61d097ddaf820d
|
docker_hub_url: https://hub.docker.com/layers/rocm/primus/v26.1/images/sha256-4fc8808bdb14117c6af7f38d79c809056e6fdbfd530c1fabbb61d097ddaf820d
|
||||||
components:
|
components:
|
||||||
ROCm: 7.2.0
|
ROCm: 7.1.0
|
||||||
PyTorch: 2.10.0+git94c6e04
|
PyTorch: 2.10.0.dev20251112+rocm7.1
|
||||||
Python: "3.12.3"
|
Python: "3.10"
|
||||||
Transformer Engine: 2.8.0.dev0+51f74fa7
|
Transformer Engine: 2.6.0.dev0+f141f34b
|
||||||
Flash Attention: 2.8.3
|
Flash Attention: 2.8.3
|
||||||
hipBLASLt: 34459f66ea
|
hipBLASLt: 34459f66ea
|
||||||
Triton: 3.5.0
|
Triton: 3.4.0
|
||||||
RCCL: 2.27.7
|
RCCL: 2.27.7
|
||||||
model_groups:
|
model_groups:
|
||||||
- group: Meta Llama
|
- group: Meta Llama
|
||||||
|
|||||||
@@ -1,88 +0,0 @@
|
|||||||
dockers:
|
|
||||||
- pull_tag: rocm/jax-training:maxtext-v26.1
|
|
||||||
docker_hub_url: https://hub.docker.com/layers/rocm/jax-training/maxtext-v26.1/images/sha256-901083bde353fe6362ada3036e452c792b2c96124e5900f4e9b5946c02ff9d6a
|
|
||||||
components:
|
|
||||||
ROCm: 7.1.1
|
|
||||||
JAX: 0.8.2
|
|
||||||
Python: 3.12
|
|
||||||
Transformer Engine: 2.8.0.dev0+aec00a7f
|
|
||||||
hipBLASLt: 1.2.x
|
|
||||||
model_groups:
|
|
||||||
- group: Meta Llama
|
|
||||||
tag: llama
|
|
||||||
models:
|
|
||||||
- model: Llama 2 7B
|
|
||||||
mad_tag: jax_maxtext_train_llama-2-7b
|
|
||||||
model_repo: Llama-2-7B
|
|
||||||
precision: bf16
|
|
||||||
multinode_config:
|
|
||||||
gfx950: env_scripts/gfx950_llama2_7b.yml
|
|
||||||
gfx942: env_scripts/llama2_7b.yml
|
|
||||||
doc_options: ["single-node", "multi-node"]
|
|
||||||
- model: Llama 2 70B
|
|
||||||
mad_tag: jax_maxtext_train_llama-2-70b
|
|
||||||
model_repo: Llama-2-70B
|
|
||||||
precision: bf16
|
|
||||||
multinode_config:
|
|
||||||
gfx950: env_scripts/gfx950_llama2_70b.yml
|
|
||||||
gfx942: env_scripts/llama2_70b.yml
|
|
||||||
doc_options: ["single-node", "multi-node"]
|
|
||||||
- model: Llama 3 8B (multi-node)
|
|
||||||
mad_tag: jax_maxtext_train_llama-3-8b
|
|
||||||
multinode_config:
|
|
||||||
gfx950: env_scripts/gfx950_llama3_8b.yml
|
|
||||||
gfx942: env_scripts/llama3_8b.yml
|
|
||||||
doc_options: ["multi-node"]
|
|
||||||
- model: Llama 3 70B (multi-node)
|
|
||||||
mad_tag: jax_maxtext_train_llama-3-70b
|
|
||||||
multinode_config:
|
|
||||||
gfx950: env_scripts/gfx950_llama3_70b.yml
|
|
||||||
gfx942: env_scripts/llama3_70b.yml
|
|
||||||
doc_options: ["multi-node"]
|
|
||||||
- model: Llama 3.1 8B
|
|
||||||
mad_tag: jax_maxtext_train_llama-3.1-8b
|
|
||||||
model_repo: Llama-3.1-8B
|
|
||||||
precision: bf16
|
|
||||||
doc_options: ["single-node"]
|
|
||||||
- model: Llama 3.1 70B
|
|
||||||
mad_tag: jax_maxtext_train_llama-3.1-70b
|
|
||||||
model_repo: Llama-3.1-70B
|
|
||||||
precision: bf16
|
|
||||||
doc_options: ["single-node"]
|
|
||||||
- model: Llama 3.1 405B (multi-node)
|
|
||||||
mad_tag: jax_maxtext_train_llama-3.1-405b
|
|
||||||
model_repo: Llama-3.1-405B
|
|
||||||
precision: bf16
|
|
||||||
multinode_config:
|
|
||||||
gfx950: env_scripts/gfx950_llama3_405b.yml
|
|
||||||
doc_options: ["multi-node"]
|
|
||||||
- model: Llama 3.3 70B
|
|
||||||
mad_tag: jax_maxtext_train_llama-3.3-70b
|
|
||||||
model_repo: Llama-3.3-70B
|
|
||||||
precision: bf16
|
|
||||||
multinode_config:
|
|
||||||
gfx950: env_scripts/gfx950_llama3.3_70b.yml
|
|
||||||
gfx942: env_scripts/llama3.3_70b.yml
|
|
||||||
doc_options: ["single-node", "multi-node"]
|
|
||||||
- group: DeepSeek
|
|
||||||
tag: deepseek
|
|
||||||
models:
|
|
||||||
- model: DeepSeek-V2-Lite (16B)
|
|
||||||
mad_tag: jax_maxtext_train_deepseek-v2-lite-16b
|
|
||||||
model_repo: DeepSeek-V2-lite
|
|
||||||
precision: bf16
|
|
||||||
multinode_config:
|
|
||||||
gfx950: env_scripts/gfx950_deepseek2_16b.yml
|
|
||||||
gfx942: env_scripts/deepseek2_16b.yml
|
|
||||||
doc_options: ["single-node", "multi-node"]
|
|
||||||
- group: Mistral AI
|
|
||||||
tag: mistral
|
|
||||||
models:
|
|
||||||
- model: Mixtral 8x7B
|
|
||||||
mad_tag: jax_maxtext_train_mixtral-8x7b
|
|
||||||
model_repo: Mixtral-8x7B
|
|
||||||
precision: bf16
|
|
||||||
multinode_config:
|
|
||||||
gfx950: env_scripts/gfx950_mixtral_8x7b.yml
|
|
||||||
gfx942: env_scripts/llama3_8x7b.yml
|
|
||||||
doc_options: ["single-node", "multi-node"]
|
|
||||||
@@ -1,58 +0,0 @@
|
|||||||
docker:
|
|
||||||
pull_tag: rocm/primus:v26.1
|
|
||||||
docker_hub_url: https://hub.docker.com/layers/rocm/primus/v26.1/images/sha256-4fc8808bdb14117c6af7f38d79c809056e6fdbfd530c1fabbb61d097ddaf820d
|
|
||||||
components:
|
|
||||||
ROCm: 7.1.0
|
|
||||||
PyTorch: 2.10.0.dev20251112+rocm7.1
|
|
||||||
Python: "3.10"
|
|
||||||
Transformer Engine: 2.6.0.dev0+f141f34b
|
|
||||||
Flash Attention: 2.8.3
|
|
||||||
hipBLASLt: 34459f66ea
|
|
||||||
Triton: 3.4.0
|
|
||||||
RCCL: 2.27.7
|
|
||||||
model_groups:
|
|
||||||
- group: Meta Llama
|
|
||||||
tag: llama
|
|
||||||
models:
|
|
||||||
- model: Llama 3.3 70B
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_llama-3.3-70b
|
|
||||||
config_name: llama3.3_70B-pretrain.yaml
|
|
||||||
- model: Llama 3.1 70B
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_llama-3.1-70b
|
|
||||||
config_name: llama3.1_70B-pretrain.yaml
|
|
||||||
- model: Llama 3.1 8B
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_llama-3.1-8b
|
|
||||||
config_name: llama3.1_8B-pretrain.yaml
|
|
||||||
- model: Llama 2 7B
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_llama-2-7b
|
|
||||||
config_name: llama2_7B-pretrain.yaml
|
|
||||||
- model: Llama 2 70B
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_llama-2-70b
|
|
||||||
config_name: llama2_70B-pretrain.yaml
|
|
||||||
- group: DeepSeek
|
|
||||||
tag: deepseek
|
|
||||||
models:
|
|
||||||
- model: DeepSeek-V3 (proxy)
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_deepseek-v3-proxy
|
|
||||||
config_name: deepseek_v3-pretrain.yaml
|
|
||||||
- model: DeepSeek-V2-Lite
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_deepseek-v2-lite-16b
|
|
||||||
config_name: deepseek_v2_lite-pretrain.yaml
|
|
||||||
- group: Mistral AI
|
|
||||||
tag: mistral
|
|
||||||
models:
|
|
||||||
- model: Mixtral 8x7B
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_mixtral-8x7b
|
|
||||||
config_name: mixtral_8x7B_v0.1-pretrain.yaml
|
|
||||||
- model: Mixtral 8x22B (proxy)
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_mixtral-8x22b-proxy
|
|
||||||
config_name: mixtral_8x22B_v0.1-pretrain.yaml
|
|
||||||
- group: Qwen
|
|
||||||
tag: qwen
|
|
||||||
models:
|
|
||||||
- model: Qwen 2.5 7B
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_qwen2.5-7b
|
|
||||||
config_name: primus_qwen2.5_7B-pretrain.yaml
|
|
||||||
- model: Qwen 2.5 72B
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_qwen2.5-72b
|
|
||||||
config_name: qwen2.5_72B-pretrain.yaml
|
|
||||||
@@ -1,32 +0,0 @@
|
|||||||
docker:
|
|
||||||
pull_tag: rocm/primus:v26.1
|
|
||||||
docker_hub_url: https://hub.docker.com/layers/rocm/primus/v26.1/images/sha256-4fc8808bdb14117c6af7f38d79c809056e6fdbfd530c1fabbb61d097ddaf820d
|
|
||||||
components:
|
|
||||||
ROCm: 7.1.0
|
|
||||||
PyTorch: 2.10.0.dev20251112+rocm7.1
|
|
||||||
Python: "3.10"
|
|
||||||
Transformer Engine: 2.6.0.dev0+f141f34b
|
|
||||||
Flash Attention: 2.8.3
|
|
||||||
hipBLASLt: 34459f66ea
|
|
||||||
model_groups:
|
|
||||||
- group: Meta Llama
|
|
||||||
tag: llama
|
|
||||||
models:
|
|
||||||
- model: Llama 3.1 8B
|
|
||||||
mad_tag: primus_pyt_train_llama-3.1-8b
|
|
||||||
model_repo: Llama-3.1-8B
|
|
||||||
url: https://huggingface.co/meta-llama/Llama-3.1-8B
|
|
||||||
precision: BF16
|
|
||||||
- model: Llama 3.1 70B
|
|
||||||
mad_tag: primus_pyt_train_llama-3.1-70b
|
|
||||||
model_repo: Llama-3.1-70B
|
|
||||||
url: https://huggingface.co/meta-llama/Llama-3.1-70B
|
|
||||||
precision: BF16
|
|
||||||
- group: DeepSeek
|
|
||||||
tag: deepseek
|
|
||||||
models:
|
|
||||||
- model: DeepSeek V3 16B
|
|
||||||
mad_tag: primus_pyt_train_deepseek-v3-16b
|
|
||||||
model_repo: DeepSeek-V3
|
|
||||||
url: https://huggingface.co/deepseek-ai/DeepSeek-V3
|
|
||||||
precision: BF16
|
|
||||||
@@ -1,14 +1,14 @@
|
|||||||
docker:
|
docker:
|
||||||
pull_tag: rocm/primus:v26.2
|
pull_tag: rocm/primus:v26.1
|
||||||
docker_hub_url: https://hub.docker.com/layers/rocm/primus/v26.2/images/sha256-9148d1bfcd579bf92f44bd89090e0d8c958f149c134b4b34b9674ab559244585
|
docker_hub_url: https://hub.docker.com/layers/rocm/primus/v26.1/images/sha256-4fc8808bdb14117c6af7f38d79c809056e6fdbfd530c1fabbb61d097ddaf820d
|
||||||
components:
|
components:
|
||||||
ROCm: 7.2.0
|
ROCm: 7.1.0
|
||||||
PyTorch: 2.10.0a0+git449b176
|
PyTorch: 2.10.0.dev20251112+rocm7.1
|
||||||
Python: "3.12.3"
|
Python: "3.10"
|
||||||
Transformer Engine: 2.8.0.dev0+51f74fa7
|
Transformer Engine: 2.6.0.dev0+f141f34b
|
||||||
Flash Attention: 2.8.3
|
Flash Attention: 2.8.3
|
||||||
hipBLASLt: 1.2.0-de5c1aebb6
|
hipBLASLt: 34459f66ea
|
||||||
Triton: 3.6.0
|
Triton: 3.4.0
|
||||||
RCCL: 2.27.7
|
RCCL: 2.27.7
|
||||||
model_groups:
|
model_groups:
|
||||||
- group: Meta Llama
|
- group: Meta Llama
|
||||||
@@ -17,30 +17,18 @@ model_groups:
|
|||||||
- model: Llama 3.3 70B
|
- model: Llama 3.3 70B
|
||||||
mad_tag: primus_pyt_megatron_lm_train_llama-3.3-70b
|
mad_tag: primus_pyt_megatron_lm_train_llama-3.3-70b
|
||||||
config_name: llama3.3_70B-pretrain.yaml
|
config_name: llama3.3_70B-pretrain.yaml
|
||||||
- model: Llama 3.1 8B
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_llama-3.1-8b
|
|
||||||
config_name: llama3.1_8B-pretrain.yaml
|
|
||||||
- model: Llama 3.1 70B
|
- model: Llama 3.1 70B
|
||||||
mad_tag: primus_pyt_megatron_lm_train_llama-3.1-70b
|
mad_tag: primus_pyt_megatron_lm_train_llama-3.1-70b
|
||||||
config_name: llama3.1_70B-pretrain.yaml
|
config_name: llama3.1_70B-pretrain.yaml
|
||||||
|
- model: Llama 3.1 8B
|
||||||
|
mad_tag: primus_pyt_megatron_lm_train_llama-3.1-8b
|
||||||
|
config_name: llama3.1_8B-pretrain.yaml
|
||||||
- model: Llama 2 7B
|
- model: Llama 2 7B
|
||||||
mad_tag: primus_pyt_megatron_lm_train_llama-2-7b
|
mad_tag: primus_pyt_megatron_lm_train_llama-2-7b
|
||||||
config_name: llama2_7B-pretrain.yaml
|
config_name: llama2_7B-pretrain.yaml
|
||||||
- model: Llama 2 70B
|
- model: Llama 2 70B
|
||||||
mad_tag: primus_pyt_megatron_lm_train_llama-2-70b
|
mad_tag: primus_pyt_megatron_lm_train_llama-2-70b
|
||||||
config_name: llama2_70B-pretrain.yaml
|
config_name: llama2_70B-pretrain.yaml
|
||||||
- group: AMD Zebra-Llama
|
|
||||||
tag: zebra-llama
|
|
||||||
models:
|
|
||||||
- model: Zebra-Llama 1B
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_zebra-llama-1b
|
|
||||||
config_name: zebra_llama_1b-pretrain.yaml
|
|
||||||
- model: Zebra-Llama 3B
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_zebra-llama-3b
|
|
||||||
config_name: zebra_llama_3b-pretrain.yaml
|
|
||||||
- model: Zebra-Llama 8B
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_zebra-llama-8b
|
|
||||||
config_name: zebra_llama_8b-pretrain.yaml
|
|
||||||
- group: DeepSeek
|
- group: DeepSeek
|
||||||
tag: deepseek
|
tag: deepseek
|
||||||
models:
|
models:
|
||||||
@@ -62,11 +50,6 @@ model_groups:
|
|||||||
- group: Qwen
|
- group: Qwen
|
||||||
tag: qwen
|
tag: qwen
|
||||||
models:
|
models:
|
||||||
- model: Qwen 3 32B SFT
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_qwen3-32b-sft
|
|
||||||
- model: Qwen 3 32B LoRA
|
|
||||||
mad_tag: primus_pyt_megatron_lm_train_qwen3-32b-lora
|
|
||||||
config_name: primus_qwen2.5_7B-pretrain.yaml
|
|
||||||
- model: Qwen 2.5 7B
|
- model: Qwen 2.5 7B
|
||||||
mad_tag: primus_pyt_megatron_lm_train_qwen2.5-7b
|
mad_tag: primus_pyt_megatron_lm_train_qwen2.5-7b
|
||||||
config_name: primus_qwen2.5_7B-pretrain.yaml
|
config_name: primus_qwen2.5_7B-pretrain.yaml
|
||||||
|
|||||||
@@ -1,15 +1,13 @@
|
|||||||
docker:
|
docker:
|
||||||
pull_tag: rocm/primus:v26.2
|
pull_tag: rocm/primus:v26.1
|
||||||
docker_hub_url: https://hub.docker.com/layers/rocm/primus/v26.2/images/sha256-9148d1bfcd579bf92f44bd89090e0d8c958f149c134b4b34b9674ab559244585
|
docker_hub_url: https://hub.docker.com/layers/rocm/primus/v26.1/images/sha256-4fc8808bdb14117c6af7f38d79c809056e6fdbfd530c1fabbb61d097ddaf820d
|
||||||
components:
|
components:
|
||||||
ROCm: 7.2.0
|
ROCm: 7.1.0
|
||||||
PyTorch: 2.10.0a0+git449b176
|
PyTorch: 2.10.0.dev20251112+rocm7.1
|
||||||
Python: "3.12.3"
|
Python: "3.10"
|
||||||
Transformer Engine: 2.8.0.dev0+51f74fa7
|
Transformer Engine: 2.6.0.dev0+f141f34b
|
||||||
Flash Attention: 2.8.3
|
Flash Attention: 2.8.3
|
||||||
hipBLASLt: 1.2.0-de5c1aebb6
|
hipBLASLt: 34459f66ea
|
||||||
Triton: 3.6.0
|
|
||||||
RCCL: 2.27.7
|
|
||||||
model_groups:
|
model_groups:
|
||||||
- group: Meta Llama
|
- group: Meta Llama
|
||||||
tag: llama
|
tag: llama
|
||||||
|
|||||||
@@ -1,11 +1,11 @@
|
|||||||
docker:
|
docker:
|
||||||
pull_tag: rocm/primus:v26.2
|
pull_tag: rocm/primus:v26.1
|
||||||
docker_hub_url: https://hub.docker.com/layers/rocm/primus/v26.1/images/sha256-4fc8808bdb14117c6af7f38d79c809056e6fdbfd530c1fabbb61d097ddaf820d
|
docker_hub_url: https://hub.docker.com/layers/rocm/primus/v26.1/images/sha256-4fc8808bdb14117c6af7f38d79c809056e6fdbfd530c1fabbb61d097ddaf820d
|
||||||
components:
|
components:
|
||||||
ROCm: 7.2.0
|
ROCm: 7.1.0
|
||||||
PyTorch: 2.10.0+git94c6e04
|
PyTorch: 2.10.0.dev20251112+rocm7.1
|
||||||
Python: "3.12.3"
|
Python: "3.10"
|
||||||
Transformer Engine: 2.8.0.dev0+51f74fa7
|
Transformer Engine: 2.6.0.dev0+f141f34b
|
||||||
Flash Attention: 2.8.3
|
Flash Attention: 2.8.3
|
||||||
hipBLASLt: 34459f66ea
|
hipBLASLt: 34459f66ea
|
||||||
model_groups:
|
model_groups:
|
||||||
|
|||||||
Binary file not shown.
|
Before Width: | Height: | Size: 283 KiB |
@@ -52,6 +52,22 @@ The table below summarizes information about ROCm-enabled deep learning framewor
|
|||||||
|
|
||||||
<a href="https://github.com/ROCm/jax"><i class="fab fa-github fa-lg"></i></a>
|
<a href="https://github.com/ROCm/jax"><i class="fab fa-github fa-lg"></i></a>
|
||||||
|
|
||||||
|
* - :doc:`verl <../compatibility/ml-compatibility/verl-compatibility>`
|
||||||
|
- :doc:`link <rocm-install-on-linux:install/3rd-party/verl-install>`
|
||||||
|
-
|
||||||
|
- Docker image
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://github.com/ROCm/verl"><i class="fab fa-github fa-lg"></i></a>
|
||||||
|
|
||||||
|
* - :doc:`Stanford Megatron-LM <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>`
|
||||||
|
- :doc:`link <rocm-install-on-linux:install/3rd-party/stanford-megatron-lm-install>`
|
||||||
|
-
|
||||||
|
- Docker image
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://github.com/ROCm/Stanford-Megatron-LM"><i class="fab fa-github fa-lg"></i></a>
|
||||||
|
|
||||||
* - :doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>`
|
* - :doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>`
|
||||||
- :doc:`link <rocm-install-on-linux:install/3rd-party/dgl-install>`
|
- :doc:`link <rocm-install-on-linux:install/3rd-party/dgl-install>`
|
||||||
-
|
-
|
||||||
@@ -60,6 +76,42 @@ The table below summarizes information about ROCm-enabled deep learning framewor
|
|||||||
|
|
||||||
<a href="https://github.com/ROCm/dgl"><i class="fab fa-github fa-lg"></i></a>
|
<a href="https://github.com/ROCm/dgl"><i class="fab fa-github fa-lg"></i></a>
|
||||||
|
|
||||||
|
* - :doc:`Megablocks <../compatibility/ml-compatibility/megablocks-compatibility>`
|
||||||
|
- :doc:`link <rocm-install-on-linux:install/3rd-party/megablocks-install>`
|
||||||
|
-
|
||||||
|
- Docker image
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://github.com/ROCm/megablocks"><i class="fab fa-github fa-lg"></i></a>
|
||||||
|
|
||||||
|
* - :doc:`Ray <../compatibility/ml-compatibility/ray-compatibility>`
|
||||||
|
- :doc:`link <rocm-install-on-linux:install/3rd-party/ray-install>`
|
||||||
|
-
|
||||||
|
- Docker image
|
||||||
|
- Wheels package
|
||||||
|
- ROCm Base Docker image
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://github.com/ROCm/ray"><i class="fab fa-github fa-lg"></i></a>
|
||||||
|
|
||||||
|
* - :doc:`llama.cpp <../compatibility/ml-compatibility/llama-cpp-compatibility>`
|
||||||
|
- :doc:`link <rocm-install-on-linux:install/3rd-party/llama-cpp-install>`
|
||||||
|
-
|
||||||
|
- Docker image
|
||||||
|
- ROCm Base Docker image
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://github.com/ROCm/llama.cpp"><i class="fab fa-github fa-lg"></i></a>
|
||||||
|
|
||||||
|
* - :doc:`FlashInfer <../compatibility/ml-compatibility/flashinfer-compatibility>`
|
||||||
|
- :doc:`link <rocm-install-on-linux:install/3rd-party/flashinfer-install>`
|
||||||
|
-
|
||||||
|
- Docker image
|
||||||
|
- ROCm Base Docker image
|
||||||
|
- .. raw:: html
|
||||||
|
|
||||||
|
<a href="https://github.com/ROCm/flashinfer"><i class="fab fa-github fa-lg"></i></a>
|
||||||
|
|
||||||
Learn how to use your ROCm deep learning environment for training, fine-tuning, inference, and performance optimization
|
Learn how to use your ROCm deep learning environment for training, fine-tuning, inference, and performance optimization
|
||||||
through the following guides.
|
through the following guides.
|
||||||
|
|
||||||
|
|||||||
@@ -11,7 +11,7 @@ xDiT diffusion inference
|
|||||||
|
|
||||||
.. caution::
|
.. caution::
|
||||||
|
|
||||||
This documentation does not reflect the latest version of xDiT diffusion
|
This documentation does not reflect the latest version of ROCm vLLM
|
||||||
inference performance documentation. See
|
inference performance documentation. See
|
||||||
:doc:`/how-to/rocm-for-ai/inference/xdit-diffusion-inference` for the latest
|
:doc:`/how-to/rocm-for-ai/inference/xdit-diffusion-inference` for the latest
|
||||||
version.
|
version.
|
||||||
|
|||||||
@@ -1,474 +0,0 @@
|
|||||||
:orphan:
|
|
||||||
|
|
||||||
.. meta::
|
|
||||||
:description: Learn to validate diffusion model video generation on MI300X, MI350X and MI355X accelerators using
|
|
||||||
prebuilt and optimized docker images.
|
|
||||||
:keywords: xDiT, diffusion, video, video generation, image, image generation, validate, benchmark
|
|
||||||
|
|
||||||
************************
|
|
||||||
xDiT diffusion inference
|
|
||||||
************************
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
This documentation does not reflect the latest version of the xDiT diffusion
|
|
||||||
inference performance documentation. See
|
|
||||||
:doc:`/how-to/rocm-for-ai/inference/xdit-diffusion-inference` for the latest
|
|
||||||
version.
|
|
||||||
|
|
||||||
.. _xdit-video-diffusion-2513:
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_25.13-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
The `rocm/pytorch-xdit <{{ docker.docker_hub_url }}>`_ Docker image offers
|
|
||||||
a prebuilt, optimized environment based on `xDiT
|
|
||||||
<https://github.com/xdit-project/xDiT>`_ for benchmarking diffusion model
|
|
||||||
video and image generation on AMD Instinct MI355X, MI350X (gfx950), MI325X,
|
|
||||||
and MI300X (gfx942) GPUs.
|
|
||||||
|
|
||||||
The image runs a preview version of ROCm using the new `TheRock
|
|
||||||
<https://github.com/ROCm/TheRock>`__ build system and includes the following
|
|
||||||
components:
|
|
||||||
|
|
||||||
.. dropdown:: Software components - {{ docker.pull_tag.split('-')|last }}
|
|
||||||
|
|
||||||
.. list-table::
|
|
||||||
:header-rows: 1
|
|
||||||
|
|
||||||
* - Software component
|
|
||||||
- Version
|
|
||||||
|
|
||||||
{% for component_name, component_data in docker.components.items() %}
|
|
||||||
* - `{{ component_name }} <{{ component_data.url }}>`_
|
|
||||||
- {{ component_data.version }}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
Follow this guide to pull the required image, spin up a container, download the model, and run a benchmark.
|
|
||||||
For preview and development releases, see `amdsiloai/pytorch-xdit <https://hub.docker.com/r/amdsiloai/pytorch-xdit>`_.
|
|
||||||
|
|
||||||
What's new
|
|
||||||
==========
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_25.13-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
{% for item in docker.whats_new %}
|
|
||||||
* {{ item }}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
.. _xdit-video-diffusion-supported-models-2513:
|
|
||||||
|
|
||||||
Supported models
|
|
||||||
================
|
|
||||||
|
|
||||||
The following models are supported for inference performance benchmarking.
|
|
||||||
Some instructions, commands, and recommendations in this documentation might
|
|
||||||
vary by model -- select one to get started.
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_25.13-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
.. raw:: html
|
|
||||||
|
|
||||||
<div id="vllm-benchmark-ud-params-picker" class="container-fluid">
|
|
||||||
<div class="row gx-0">
|
|
||||||
<div class="col-2 me-1 px-2 model-param-head">Model</div>
|
|
||||||
<div class="row col-10 pe-0">
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
<div class="col-6 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.js_tag }}" tabindex="0">{{ model_group.group }}</div>
|
|
||||||
{% endfor %}
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div class="row gx-0 pt-1">
|
|
||||||
<div class="col-2 me-1 px-2 model-param-head">Variant</div>
|
|
||||||
<div class="row col-10 pe-0">
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
{% set models = model_group.models %}
|
|
||||||
{% for model in models %}
|
|
||||||
{% if models|length % 3 == 0 %}
|
|
||||||
<div class="col-4 px-2 model-param" data-param-k="model" data-param-v="{{ model.js_tag }}" data-param-group="{{ model_group.js_tag }}" tabindex="0">{{ model.model }}</div>
|
|
||||||
{% else %}
|
|
||||||
<div class="col-6 px-2 model-param" data-param-k="model" data-param-v="{{ model.js_tag }}" data-param-group="{{ model_group.js_tag }}" tabindex="0">{{ model.model }}</div>
|
|
||||||
{% endif %}
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
{% for model in model_group.models %}
|
|
||||||
|
|
||||||
.. container:: model-doc {{ model.js_tag }}
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
To learn more about your specific model see the `{{ model.model }} model card on Hugging Face <{{ model.url }}>`_
|
|
||||||
or visit the `GitHub page <{{ model.github }}>`__. Note that some models require access authorization before use via an
|
|
||||||
external license agreement through a third party.
|
|
||||||
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
Performance measurements
|
|
||||||
========================
|
|
||||||
|
|
||||||
To evaluate performance, the `Performance results with AMD ROCm software
|
|
||||||
<https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html#tabs-a8543b7e6d-item-9eda09e707-tab>`__
|
|
||||||
page provides reference throughput and serving measurements for inferencing popular AI models.
|
|
||||||
|
|
||||||
.. important::
|
|
||||||
|
|
||||||
The performance data presented in `Performance results with AMD ROCm
|
|
||||||
software
|
|
||||||
<https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html#tabs-a8543b7e6d-item-9eda09e707-tab>`__
|
|
||||||
only reflects the latest version of this inference benchmarking environment.
|
|
||||||
The listed measurements should not be interpreted as the peak performance
|
|
||||||
achievable by AMD Instinct GPUs or ROCm software.
|
|
||||||
|
|
||||||
System validation
|
|
||||||
=================
|
|
||||||
|
|
||||||
Before running AI workloads, it's important to validate that your AMD hardware is configured
|
|
||||||
correctly and performing optimally.
|
|
||||||
|
|
||||||
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
|
|
||||||
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
|
|
||||||
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
|
|
||||||
before starting.
|
|
||||||
|
|
||||||
To test for optimal performance, consult the recommended :ref:`System health benchmarks
|
|
||||||
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
|
|
||||||
system's configuration.
|
|
||||||
|
|
||||||
Pull the Docker image
|
|
||||||
=====================
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_25.13-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
For this tutorial, it's recommended to use the latest ``{{ docker.pull_tag }}`` Docker image.
|
|
||||||
Pull the image using the following command:
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker pull {{ docker.pull_tag }}
|
|
||||||
|
|
||||||
Validate and benchmark
|
|
||||||
======================
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_25.13-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
Once the image has been downloaded you can follow these steps to
|
|
||||||
run benchmarks and generate outputs.
|
|
||||||
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
{% for model in model_group.models %}
|
|
||||||
|
|
||||||
.. container:: model-doc {{model.js_tag}}
|
|
||||||
|
|
||||||
The following commands are written for {{ model.model }}.
|
|
||||||
See :ref:`xdit-video-diffusion-supported-models-2513` to switch to another available model.
|
|
||||||
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
Choose your setup method
|
|
||||||
------------------------
|
|
||||||
|
|
||||||
You can either use an existing Hugging Face cache or download the model fresh inside the container.
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_25.13-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
{% for model in model_group.models %}
|
|
||||||
.. container:: model-doc {{model.js_tag}}
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: Option 1: Use existing Hugging Face cache
|
|
||||||
|
|
||||||
If you already have models downloaded on your host system, you can mount your existing cache.
|
|
||||||
|
|
||||||
1. Set your Hugging Face cache location.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
export HF_HOME=/your/hf_cache/location
|
|
||||||
|
|
||||||
2. Download the model (if not already cached).
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
huggingface-cli download {{ model.model_repo }} {% if model.revision %} --revision {{ model.revision }} {% endif %}
|
|
||||||
|
|
||||||
3. Launch the container with mounted cache.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker run \
|
|
||||||
-it --rm \
|
|
||||||
--cap-add=SYS_PTRACE \
|
|
||||||
--security-opt seccomp=unconfined \
|
|
||||||
--user root \
|
|
||||||
--device=/dev/kfd \
|
|
||||||
--device=/dev/dri \
|
|
||||||
--group-add video \
|
|
||||||
--ipc=host \
|
|
||||||
--network host \
|
|
||||||
--privileged \
|
|
||||||
--shm-size 128G \
|
|
||||||
--name pytorch-xdit \
|
|
||||||
-e HSA_NO_SCRATCH_RECLAIM=1 \
|
|
||||||
-e OMP_NUM_THREADS=16 \
|
|
||||||
-e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
|
|
||||||
-e HF_HOME=/app/huggingface_models \
|
|
||||||
-v $HF_HOME:/app/huggingface_models \
|
|
||||||
{{ docker.pull_tag }}
|
|
||||||
|
|
||||||
.. tab-item:: Option 2: Download inside container
|
|
||||||
|
|
||||||
If you prefer to keep the container self-contained or don't have an existing cache.
|
|
||||||
|
|
||||||
1. Launch the container
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker run \
|
|
||||||
-it --rm \
|
|
||||||
--cap-add=SYS_PTRACE \
|
|
||||||
--security-opt seccomp=unconfined \
|
|
||||||
--user root \
|
|
||||||
--device=/dev/kfd \
|
|
||||||
--device=/dev/dri \
|
|
||||||
--group-add video \
|
|
||||||
--ipc=host \
|
|
||||||
--network host \
|
|
||||||
--privileged \
|
|
||||||
--shm-size 128G \
|
|
||||||
--name pytorch-xdit \
|
|
||||||
-e HSA_NO_SCRATCH_RECLAIM=1 \
|
|
||||||
-e OMP_NUM_THREADS=16 \
|
|
||||||
-e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
|
|
||||||
{{ docker.pull_tag }}
|
|
||||||
|
|
||||||
2. Inside the container, set the Hugging Face cache location and download the model.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
export HF_HOME=/app/huggingface_models
|
|
||||||
huggingface-cli download {{ model.model_repo }} {% if model.revision %} --revision {{ model.revision }} {% endif %}
|
|
||||||
|
|
||||||
.. warning::
|
|
||||||
|
|
||||||
Models will be downloaded to the container's filesystem and will be lost when the container is removed unless you persist the data with a volume.
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
Run inference
|
|
||||||
=============
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_25.13-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
{% for model in model_group.models %}
|
|
||||||
|
|
||||||
.. container:: model-doc {{ model.js_tag }}
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MAD-integrated benchmarking
|
|
||||||
|
|
||||||
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
|
|
||||||
directory and install the required packages on the host machine.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
git clone https://github.com/ROCm/MAD
|
|
||||||
cd MAD
|
|
||||||
pip install -r requirements.txt
|
|
||||||
|
|
||||||
2. On the host machine, use this command to run the performance benchmark test on
|
|
||||||
the `{{model.model}} <{{ model.url }}>`_ model using one node.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
|
|
||||||
madengine run \
|
|
||||||
--tags {{model.mad_tag}} \
|
|
||||||
--keep-model-dir \
|
|
||||||
--live-output
|
|
||||||
|
|
||||||
MAD launches a Docker container with the name
|
|
||||||
``container_ci-{{model.mad_tag}}``. The throughput and serving reports of the
|
|
||||||
model are collected in the following paths: ``{{ model.mad_tag }}_throughput.csv``
|
|
||||||
and ``{{ model.mad_tag }}_serving.csv``.
|
|
||||||
|
|
||||||
.. tab-item:: Standalone benchmarking
|
|
||||||
|
|
||||||
To run the benchmarks for {{ model.model }}, use the following command:
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
{% if model.model == "Hunyuan Video" %}
|
|
||||||
cd /app/Hunyuanvideo
|
|
||||||
mkdir results
|
|
||||||
|
|
||||||
torchrun --nproc_per_node=8 run.py \
|
|
||||||
--model {{ model.model_repo }} \
|
|
||||||
--prompt "In the large cage, two puppies were wagging their tails at each other." \
|
|
||||||
--height 720 --width 1280 --num_frames 129 \
|
|
||||||
--num_inference_steps 50 --warmup_steps 1 --n_repeats 1 \
|
|
||||||
--ulysses_degree 8 \
|
|
||||||
--enable_tiling --enable_slicing \
|
|
||||||
--use_torch_compile \
|
|
||||||
--bench_output results
|
|
||||||
|
|
||||||
{% endif %}
|
|
||||||
{% if model.model == "Wan2.1" %}
|
|
||||||
cd /app/Wan
|
|
||||||
mkdir results
|
|
||||||
|
|
||||||
torchrun --nproc_per_node=8 /app/Wan/run.py \
|
|
||||||
--task i2v \
|
|
||||||
--height 720 \
|
|
||||||
--width 1280 \
|
|
||||||
--model {{ model.model_repo }} \
|
|
||||||
--img_file_path /app/Wan/i2v_input.JPG \
|
|
||||||
--ulysses_degree 8 \
|
|
||||||
--seed 42 \
|
|
||||||
--num_frames 81 \
|
|
||||||
--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." \
|
|
||||||
--num_repetitions 1 \
|
|
||||||
--num_inference_steps 40 \
|
|
||||||
--use_torch_compile
|
|
||||||
|
|
||||||
{% endif %}
|
|
||||||
{% if model.model == "Wan2.2" %}
|
|
||||||
cd /app/Wan
|
|
||||||
mkdir results
|
|
||||||
|
|
||||||
torchrun --nproc_per_node=8 /app/Wan/run.py \
|
|
||||||
--task i2v \
|
|
||||||
--height 720 \
|
|
||||||
--width 1280 \
|
|
||||||
--model {{ model.model_repo }} \
|
|
||||||
--img_file_path /app/Wan/i2v_input.JPG \
|
|
||||||
--ulysses_degree 8 \
|
|
||||||
--seed 42 \
|
|
||||||
--num_frames 81 \
|
|
||||||
--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." \
|
|
||||||
--num_repetitions 1 \
|
|
||||||
--num_inference_steps 40 \
|
|
||||||
--use_torch_compile
|
|
||||||
|
|
||||||
{% endif %}
|
|
||||||
|
|
||||||
{% if model.model == "FLUX.1" %}
|
|
||||||
cd /app/Flux
|
|
||||||
mkdir results
|
|
||||||
|
|
||||||
torchrun --nproc_per_node=8 /app/Flux/run.py \
|
|
||||||
--model {{ model.model_repo }} \
|
|
||||||
--seed 42 \
|
|
||||||
--prompt "A small cat" \
|
|
||||||
--height 1024 \
|
|
||||||
--width 1024 \
|
|
||||||
--num_inference_steps 25 \
|
|
||||||
--max_sequence_length 256 \
|
|
||||||
--warmup_steps 5 \
|
|
||||||
--no_use_resolution_binning \
|
|
||||||
--ulysses_degree 8 \
|
|
||||||
--use_torch_compile \
|
|
||||||
--num_repetitions 50
|
|
||||||
|
|
||||||
{% endif %}
|
|
||||||
|
|
||||||
{% if model.model == "FLUX.1 Kontext" %}
|
|
||||||
cd /app/Flux
|
|
||||||
mkdir results
|
|
||||||
|
|
||||||
torchrun --nproc_per_node=8 /app/Flux/run_usp.py \
|
|
||||||
--model {{ model.model_repo }} \
|
|
||||||
--seed 42 \
|
|
||||||
--prompt "Add a cool hat to the cat" \
|
|
||||||
--height 1024 \
|
|
||||||
--width 1024 \
|
|
||||||
--num_inference_steps 30 \
|
|
||||||
--max_sequence_length 512 \
|
|
||||||
--warmup_steps 5 \
|
|
||||||
--no_use_resolution_binning \
|
|
||||||
--ulysses_degree 8 \
|
|
||||||
--use_torch_compile \
|
|
||||||
--img_file_path /app/Flux/cat.png \
|
|
||||||
--model_type flux_kontext \
|
|
||||||
--guidance_scale 2.5 \
|
|
||||||
--num_repetitions 25
|
|
||||||
|
|
||||||
{% endif %}
|
|
||||||
|
|
||||||
{% if model.model == "FLUX.2" %}
|
|
||||||
cd /app/Flux
|
|
||||||
mkdir results
|
|
||||||
|
|
||||||
torchrun --nproc_per_node=8 /app/Flux/run_usp.py \
|
|
||||||
--model {{ model.model_repo }} \
|
|
||||||
--seed 42 \
|
|
||||||
--prompt "Add a cool hat to the cat" \
|
|
||||||
--height 1024 \
|
|
||||||
--width 1024 \
|
|
||||||
--num_inference_steps 50 \
|
|
||||||
--max_sequence_length 512 \
|
|
||||||
--warmup_steps 5 \
|
|
||||||
--no_use_resolution_binning \
|
|
||||||
--ulysses_degree 8 \
|
|
||||||
--use_torch_compile \
|
|
||||||
--img_file_paths /app/Flux/cat.png \
|
|
||||||
--model_type flux2 \
|
|
||||||
--guidance_scale 4.0 \
|
|
||||||
--num_repetitions 25
|
|
||||||
|
|
||||||
{% endif %}
|
|
||||||
|
|
||||||
{% if model.model == "stable-diffusion-3.5-large" %}
|
|
||||||
cd /app/StableDiffusion3.5
|
|
||||||
mkdir results
|
|
||||||
|
|
||||||
torchrun --nproc_per_node=8 /app/StableDiffusion3.5/run.py \
|
|
||||||
--model {{ model.model_repo }} \
|
|
||||||
--num_inference_steps 28 \
|
|
||||||
--prompt "A capybara holding a sign that reads Hello World" \
|
|
||||||
--use_torch_compile \
|
|
||||||
--pipefusion_parallel_degree 4 \
|
|
||||||
--use_cfg_parallel \
|
|
||||||
--num_repetitions 50 \
|
|
||||||
--dtype torch.float16 \
|
|
||||||
--output_path results
|
|
||||||
|
|
||||||
{% endif %}
|
|
||||||
|
|
||||||
The generated video will be stored under the results directory. For the actual benchmark step runtimes, see {% if model.model == "Hunyuan Video" %}stdout.{% elif model.model in ["Wan2.1", "Wan2.2"] %}results/outputs/rank0_*.json{% elif model.model in ["FLUX.1", "FLUX.1 Kontext", "FLUX.2"] %}results/timing.json{% elif model.model == "stable-diffusion-3.5-large"%}benchmark_results.csv{% endif %}
|
|
||||||
|
|
||||||
{% if model.model == "FLUX.1" %}You may also use ``run_usp.py`` which implements USP without modifying the default diffusers pipeline. {% endif %}
|
|
||||||
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
Previous versions
|
|
||||||
=================
|
|
||||||
|
|
||||||
See
|
|
||||||
:doc:`/how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-history`
|
|
||||||
to find documentation for previous releases of xDiT diffusion inference
|
|
||||||
performance testing.
|
|
||||||
@@ -1,322 +0,0 @@
|
|||||||
:orphan:
|
|
||||||
|
|
||||||
.. meta::
|
|
||||||
:description: Learn to validate diffusion model video generation on MI300X, MI350X and MI355X accelerators using
|
|
||||||
prebuilt and optimized docker images.
|
|
||||||
:keywords: xDiT, diffusion, video, video generation, image, image generation, validate, benchmark
|
|
||||||
|
|
||||||
************************
|
|
||||||
xDiT diffusion inference
|
|
||||||
************************
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
This documentation does not reflect the latest version of the xDiT diffusion
|
|
||||||
inference performance documentation. See
|
|
||||||
:doc:`/how-to/rocm-for-ai/inference/xdit-diffusion-inference` for the latest
|
|
||||||
version.
|
|
||||||
|
|
||||||
.. _xdit-video-diffusion-v261-v261:
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.1-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
The `rocm/pytorch-xdit <{{ docker.docker_hub_url }}>`_ Docker image offers a prebuilt, optimized environment based on `xDiT <https://github.com/xdit-project/xDiT>`_ for
|
|
||||||
benchmarking diffusion model video and image generation on gfx942 and gfx950 series (AMD Instinct™ MI300X, MI325X, MI350X, and MI355X) GPUs.
|
|
||||||
The image runs ROCm **{{docker.ROCm}}** (preview) based on `TheRock <https://github.com/ROCm/TheRock>`_
|
|
||||||
and includes the following components:
|
|
||||||
|
|
||||||
.. dropdown:: Software components - {{ docker.pull_tag.split('-')|last }}
|
|
||||||
|
|
||||||
.. list-table::
|
|
||||||
:header-rows: 1
|
|
||||||
|
|
||||||
* - Software component
|
|
||||||
- Version
|
|
||||||
|
|
||||||
{% for component_name, component_data in docker.components.items() %}
|
|
||||||
* - `{{ component_name }} <{{ component_data.url }}>`_
|
|
||||||
- {{ component_data.version }}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
Follow this guide to pull the required image, spin up a container, download the model, and run a benchmark.
|
|
||||||
For preview and development releases, see `amdsiloai/pytorch-xdit <https://hub.docker.com/r/amdsiloai/pytorch-xdit>`_.
|
|
||||||
|
|
||||||
What's new
|
|
||||||
==========
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.1-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
{% for item in docker.whats_new %}
|
|
||||||
* {{ item }}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
.. _xdit-video-diffusion-supported-models-v261:
|
|
||||||
|
|
||||||
Supported models
|
|
||||||
================
|
|
||||||
|
|
||||||
The following models are supported for inference performance benchmarking.
|
|
||||||
Some instructions, commands, and recommendations in this documentation might
|
|
||||||
vary by model -- select one to get started.
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.1-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
.. raw:: html
|
|
||||||
|
|
||||||
<div id="vllm-benchmark-ud-params-picker" class="container-fluid">
|
|
||||||
<div class="row gx-0">
|
|
||||||
<div class="col-2 me-1 px-2 model-param-head">Model</div>
|
|
||||||
<div class="row col-10 pe-0">
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
<div class="col-6 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.js_tag }}" tabindex="0">{{ model_group.group }}</div>
|
|
||||||
{% endfor %}
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div class="row gx-0 pt-1">
|
|
||||||
<div class="col-2 me-1 px-2 model-param-head">Variant</div>
|
|
||||||
<div class="row col-10 pe-0">
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
{% set models = model_group.models %}
|
|
||||||
{% for model in models %}
|
|
||||||
{% if models|length % 3 == 0 %}
|
|
||||||
<div class="col-4 px-2 model-param" data-param-k="model" data-param-v="{{ model.js_tag }}" data-param-group="{{ model_group.js_tag }}" tabindex="0">{{ model.model }}</div>
|
|
||||||
{% else %}
|
|
||||||
<div class="col-6 px-2 model-param" data-param-k="model" data-param-v="{{ model.js_tag }}" data-param-group="{{ model_group.js_tag }}" tabindex="0">{{ model.model }}</div>
|
|
||||||
{% endif %}
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
{% for model in model_group.models %}
|
|
||||||
|
|
||||||
.. container:: model-doc {{ model.js_tag }}
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
To learn more about your specific model see the `{{ model.model }} model card on Hugging Face <{{ model.url }}>`_
|
|
||||||
or visit the `GitHub page <{{ model.github }}>`__. Note that some models require access authorization before use via an
|
|
||||||
external license agreement through a third party.
|
|
||||||
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
System validation
|
|
||||||
=================
|
|
||||||
|
|
||||||
Before running AI workloads, it's important to validate that your AMD hardware is configured
|
|
||||||
correctly and performing optimally.
|
|
||||||
|
|
||||||
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
|
|
||||||
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
|
|
||||||
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
|
|
||||||
before starting.
|
|
||||||
|
|
||||||
To test for optimal performance, consult the recommended :ref:`System health benchmarks
|
|
||||||
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
|
|
||||||
system's configuration.
|
|
||||||
|
|
||||||
Pull the Docker image
|
|
||||||
=====================
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.1-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
For this tutorial, it's recommended to use the latest ``{{ docker.pull_tag }}`` Docker image.
|
|
||||||
Pull the image using the following command:
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker pull {{ docker.pull_tag }}
|
|
||||||
|
|
||||||
Validate and benchmark
|
|
||||||
======================
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.1-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
Once the image has been downloaded you can follow these steps to
|
|
||||||
run benchmarks and generate outputs.
|
|
||||||
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
{% for model in model_group.models %}
|
|
||||||
|
|
||||||
.. container:: model-doc {{model.js_tag}}
|
|
||||||
|
|
||||||
The following commands are written for {{ model.model }}.
|
|
||||||
See :ref:`xdit-video-diffusion-supported-models-v261` to switch to another available model.
|
|
||||||
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
Choose your setup method
|
|
||||||
------------------------
|
|
||||||
|
|
||||||
You can either use an existing Hugging Face cache or download the model fresh inside the container.
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.1-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
{% for model in model_group.models %}
|
|
||||||
.. container:: model-doc {{model.js_tag}}
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: Option 1: Use existing Hugging Face cache
|
|
||||||
|
|
||||||
If you already have models downloaded on your host system, you can mount your existing cache.
|
|
||||||
|
|
||||||
1. Set your Hugging Face cache location.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
export HF_HOME=/your/hf_cache/location
|
|
||||||
|
|
||||||
2. Download the model (if not already cached).
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
huggingface-cli download {{ model.model_repo }} {% if model.revision %} --revision {{ model.revision }} {% endif %}
|
|
||||||
|
|
||||||
3. Launch the container with mounted cache.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker run \
|
|
||||||
-it --rm \
|
|
||||||
--cap-add=SYS_PTRACE \
|
|
||||||
--security-opt seccomp=unconfined \
|
|
||||||
--user root \
|
|
||||||
--device=/dev/kfd \
|
|
||||||
--device=/dev/dri \
|
|
||||||
--group-add video \
|
|
||||||
--ipc=host \
|
|
||||||
--network host \
|
|
||||||
--privileged \
|
|
||||||
--shm-size 128G \
|
|
||||||
--name pytorch-xdit \
|
|
||||||
-e HSA_NO_SCRATCH_RECLAIM=1 \
|
|
||||||
-e OMP_NUM_THREADS=16 \
|
|
||||||
-e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
|
|
||||||
-e HF_HOME=/app/huggingface_models \
|
|
||||||
-v $HF_HOME:/app/huggingface_models \
|
|
||||||
{{ docker.pull_tag }}
|
|
||||||
|
|
||||||
.. tab-item:: Option 2: Download inside container
|
|
||||||
|
|
||||||
If you prefer to keep the container self-contained or don't have an existing cache.
|
|
||||||
|
|
||||||
1. Launch the container
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker run \
|
|
||||||
-it --rm \
|
|
||||||
--cap-add=SYS_PTRACE \
|
|
||||||
--security-opt seccomp=unconfined \
|
|
||||||
--user root \
|
|
||||||
--device=/dev/kfd \
|
|
||||||
--device=/dev/dri \
|
|
||||||
--group-add video \
|
|
||||||
--ipc=host \
|
|
||||||
--network host \
|
|
||||||
--privileged \
|
|
||||||
--shm-size 128G \
|
|
||||||
--name pytorch-xdit \
|
|
||||||
-e HSA_NO_SCRATCH_RECLAIM=1 \
|
|
||||||
-e OMP_NUM_THREADS=16 \
|
|
||||||
-e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
|
|
||||||
{{ docker.pull_tag }}
|
|
||||||
|
|
||||||
2. Inside the container, set the Hugging Face cache location and download the model.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
export HF_HOME=/app/huggingface_models
|
|
||||||
huggingface-cli download {{ model.model_repo }} {% if model.revision %} --revision {{ model.revision }} {% endif %}
|
|
||||||
|
|
||||||
.. warning::
|
|
||||||
|
|
||||||
Models will be downloaded to the container's filesystem and will be lost when the container is removed unless you persist the data with a volume.
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
Run inference
|
|
||||||
=============
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.1-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
{% for model in model_group.models %}
|
|
||||||
|
|
||||||
.. container:: model-doc {{ model.js_tag }}
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MAD-integrated benchmarking
|
|
||||||
|
|
||||||
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
|
|
||||||
directory and install the required packages on the host machine.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
git clone https://github.com/ROCm/MAD
|
|
||||||
cd MAD
|
|
||||||
pip install -r requirements.txt
|
|
||||||
|
|
||||||
2. On the host machine, use this command to run the performance benchmark test on
|
|
||||||
the `{{model.model}} <{{ model.url }}>`_ model using one node.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
|
|
||||||
madengine run \
|
|
||||||
--tags {{model.mad_tag}} \
|
|
||||||
--keep-model-dir \
|
|
||||||
--live-output
|
|
||||||
|
|
||||||
MAD launches a Docker container with the name
|
|
||||||
``container_ci-{{model.mad_tag}}``. The throughput and serving reports of the
|
|
||||||
model are collected in the following paths: ``{{ model.mad_tag }}_throughput.csv``
|
|
||||||
and ``{{ model.mad_tag }}_serving.csv``.
|
|
||||||
|
|
||||||
.. tab-item:: Standalone benchmarking
|
|
||||||
|
|
||||||
To run the benchmarks for {{ model.model }}, use the following command:
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
{{ model.benchmark_command
|
|
||||||
| map('replace', '{model_repo}', model.model_repo)
|
|
||||||
| map('trim')
|
|
||||||
| join('\n ') }}
|
|
||||||
|
|
||||||
The generated video will be stored under the results directory.
|
|
||||||
|
|
||||||
{% if model.model == "FLUX.1" %}You may also use ``run_usp.py`` which implements USP without modifying the default diffusers pipeline. {% endif %}
|
|
||||||
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
Previous versions
|
|
||||||
=================
|
|
||||||
|
|
||||||
See
|
|
||||||
:doc:`/how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-history`
|
|
||||||
to find documentation for previous releases of xDiT diffusion inference
|
|
||||||
performance testing.
|
|
||||||
@@ -1,320 +0,0 @@
|
|||||||
:orphan:
|
|
||||||
|
|
||||||
.. meta::
|
|
||||||
:description: Learn to validate diffusion model video generation on MI300X, MI350X and MI355X accelerators using
|
|
||||||
prebuilt and optimized docker images.
|
|
||||||
:keywords: xDiT, diffusion, video, video generation, image, image generation, validate, benchmark
|
|
||||||
|
|
||||||
************************
|
|
||||||
xDiT diffusion inference
|
|
||||||
************************
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
This documentation does not reflect the latest version of the xDiT diffusion
|
|
||||||
inference performance documentation. See
|
|
||||||
:doc:`/how-to/rocm-for-ai/inference/xdit-diffusion-inference` for the latest
|
|
||||||
version.
|
|
||||||
|
|
||||||
.. _xdit-video-diffusion-262:
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.2-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
The `rocm/pytorch-xdit <{{ docker.docker_hub_url }}>`_ Docker image offers a prebuilt, optimized environment based on `xDiT <https://github.com/xdit-project/xDiT>`_ for
|
|
||||||
benchmarking diffusion model video and image generation on gfx942 and gfx950 series (AMD Instinct™ MI300X, MI325X, MI350X, and MI355X) GPUs.
|
|
||||||
The image runs ROCm **{{docker.ROCm}}** (preview) based on `TheRock <https://github.com/ROCm/TheRock>`_
|
|
||||||
and includes the following components:
|
|
||||||
|
|
||||||
.. dropdown:: Software components - {{ docker.pull_tag.split('-')|last }}
|
|
||||||
|
|
||||||
.. list-table::
|
|
||||||
:header-rows: 1
|
|
||||||
|
|
||||||
* - Software component
|
|
||||||
- Version
|
|
||||||
|
|
||||||
{% for component_name, component_data in docker.components.items() %}
|
|
||||||
* - `{{ component_name }} <{{ component_data.url }}>`_
|
|
||||||
- {{ component_data.version }}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
Follow this guide to pull the required image, spin up a container, download the model, and run a benchmark.
|
|
||||||
For preview and development releases, see `amdsiloai/pytorch-xdit <https://hub.docker.com/r/amdsiloai/pytorch-xdit>`_.
|
|
||||||
|
|
||||||
What's new
|
|
||||||
==========
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.2-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
{% for item in docker.whats_new %}
|
|
||||||
* {{ item }}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
.. _xdit-video-diffusion-262-supported-models:
|
|
||||||
|
|
||||||
Supported models
|
|
||||||
================
|
|
||||||
|
|
||||||
The following models are supported for inference performance benchmarking.
|
|
||||||
Some instructions, commands, and recommendations in this documentation might
|
|
||||||
vary by model -- select one to get started.
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.2-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
.. raw:: html
|
|
||||||
|
|
||||||
<div id="vllm-benchmark-ud-params-picker" class="container-fluid">
|
|
||||||
<div class="row gx-0">
|
|
||||||
<div class="col-2 me-1 px-2 model-param-head">Model</div>
|
|
||||||
<div class="row col-10 pe-0">
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
<div class="col-6 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.js_tag }}" tabindex="0">{{ model_group.group }}</div>
|
|
||||||
{% endfor %}
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div class="row gx-0 pt-1">
|
|
||||||
<div class="col-2 me-1 px-2 model-param-head">Variant</div>
|
|
||||||
<div class="row col-10 pe-0">
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
{% set models = model_group.models %}
|
|
||||||
{% for model in models %}
|
|
||||||
{% if models|length % 3 == 0 %}
|
|
||||||
<div class="col-4 px-2 model-param" data-param-k="model" data-param-v="{{ model.js_tag }}" data-param-group="{{ model_group.js_tag }}" tabindex="0">{{ model.model }}</div>
|
|
||||||
{% else %}
|
|
||||||
<div class="col-6 px-2 model-param" data-param-k="model" data-param-v="{{ model.js_tag }}" data-param-group="{{ model_group.js_tag }}" tabindex="0">{{ model.model }}</div>
|
|
||||||
{% endif %}
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
{% for model in model_group.models %}
|
|
||||||
|
|
||||||
.. container:: model-doc {{ model.js_tag }}
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
To learn more about your specific model see the `{{ model.model }} model card on Hugging Face <{{ model.url }}>`_
|
|
||||||
or visit the `GitHub page <{{ model.github }}>`__. Note that some models require access authorization before use via an
|
|
||||||
external license agreement through a third party.
|
|
||||||
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
System validation
|
|
||||||
=================
|
|
||||||
|
|
||||||
Before running AI workloads, it's important to validate that your AMD hardware is configured
|
|
||||||
correctly and performing optimally.
|
|
||||||
|
|
||||||
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
|
|
||||||
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
|
|
||||||
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
|
|
||||||
before starting.
|
|
||||||
|
|
||||||
To test for optimal performance, consult the recommended :ref:`System health benchmarks
|
|
||||||
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
|
|
||||||
system's configuration.
|
|
||||||
|
|
||||||
Pull the Docker image
|
|
||||||
=====================
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.2-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
For this tutorial, it's recommended to use the latest ``{{ docker.pull_tag }}`` Docker image.
|
|
||||||
Pull the image using the following command:
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker pull {{ docker.pull_tag }}
|
|
||||||
|
|
||||||
Validate and benchmark
|
|
||||||
======================
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.2-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
Once the image has been downloaded you can follow these steps to
|
|
||||||
run benchmarks and generate outputs.
|
|
||||||
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
{% for model in model_group.models %}
|
|
||||||
|
|
||||||
.. container:: model-doc {{model.js_tag}}
|
|
||||||
|
|
||||||
The following commands are written for {{ model.model }}.
|
|
||||||
See :ref:`xdit-video-diffusion-262-supported-models` to switch to another available model.
|
|
||||||
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
Choose your setup method
|
|
||||||
------------------------
|
|
||||||
|
|
||||||
You can either use an existing Hugging Face cache or download the model fresh inside the container.
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.2-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
{% for model in model_group.models %}
|
|
||||||
.. container:: model-doc {{model.js_tag}}
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: Option 1: Use existing Hugging Face cache
|
|
||||||
|
|
||||||
If you already have models downloaded on your host system, you can mount your existing cache.
|
|
||||||
|
|
||||||
1. Set your Hugging Face cache location.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
export HF_HOME=/your/hf_cache/location
|
|
||||||
|
|
||||||
2. Download the model (if not already cached).
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
huggingface-cli download {{ model.model_repo }} {% if model.revision %} --revision {{ model.revision }} {% endif %}
|
|
||||||
|
|
||||||
3. Launch the container with mounted cache.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker run \
|
|
||||||
-it --rm \
|
|
||||||
--cap-add=SYS_PTRACE \
|
|
||||||
--security-opt seccomp=unconfined \
|
|
||||||
--user root \
|
|
||||||
--device=/dev/kfd \
|
|
||||||
--device=/dev/dri \
|
|
||||||
--group-add video \
|
|
||||||
--ipc=host \
|
|
||||||
--network host \
|
|
||||||
--privileged \
|
|
||||||
--shm-size 128G \
|
|
||||||
--name pytorch-xdit \
|
|
||||||
-e HSA_NO_SCRATCH_RECLAIM=1 \
|
|
||||||
-e OMP_NUM_THREADS=16 \
|
|
||||||
-e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
|
|
||||||
-e HF_HOME=/app/huggingface_models \
|
|
||||||
-v $HF_HOME:/app/huggingface_models \
|
|
||||||
{{ docker.pull_tag }}
|
|
||||||
|
|
||||||
.. tab-item:: Option 2: Download inside container
|
|
||||||
|
|
||||||
If you prefer to keep the container self-contained or don't have an existing cache.
|
|
||||||
|
|
||||||
1. Launch the container
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker run \
|
|
||||||
-it --rm \
|
|
||||||
--cap-add=SYS_PTRACE \
|
|
||||||
--security-opt seccomp=unconfined \
|
|
||||||
--user root \
|
|
||||||
--device=/dev/kfd \
|
|
||||||
--device=/dev/dri \
|
|
||||||
--group-add video \
|
|
||||||
--ipc=host \
|
|
||||||
--network host \
|
|
||||||
--privileged \
|
|
||||||
--shm-size 128G \
|
|
||||||
--name pytorch-xdit \
|
|
||||||
-e HSA_NO_SCRATCH_RECLAIM=1 \
|
|
||||||
-e OMP_NUM_THREADS=16 \
|
|
||||||
-e CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
|
|
||||||
{{ docker.pull_tag }}
|
|
||||||
|
|
||||||
2. Inside the container, set the Hugging Face cache location and download the model.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
export HF_HOME=/app/huggingface_models
|
|
||||||
huggingface-cli download {{ model.model_repo }} {% if model.revision %} --revision {{ model.revision }} {% endif %}
|
|
||||||
|
|
||||||
.. warning::
|
|
||||||
|
|
||||||
Models will be downloaded to the container's filesystem and will be lost when the container is removed unless you persist the data with a volume.
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
Run inference
|
|
||||||
=============
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/xdit_26.2-inference-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
|
|
||||||
{% for model_group in docker.supported_models %}
|
|
||||||
{% for model in model_group.models %}
|
|
||||||
|
|
||||||
.. container:: model-doc {{ model.js_tag }}
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MAD-integrated benchmarking
|
|
||||||
|
|
||||||
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
|
|
||||||
directory and install the required packages on the host machine.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
git clone https://github.com/ROCm/MAD
|
|
||||||
cd MAD
|
|
||||||
pip install -r requirements.txt
|
|
||||||
|
|
||||||
2. On the host machine, use this command to run the performance benchmark test on
|
|
||||||
the `{{model.model}} <{{ model.url }}>`_ model using one node.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
|
|
||||||
madengine run \
|
|
||||||
--tags {{model.mad_tag}} \
|
|
||||||
--keep-model-dir \
|
|
||||||
--live-output
|
|
||||||
|
|
||||||
MAD launches a Docker container with the name
|
|
||||||
``container_ci-{{model.mad_tag}}``. The throughput and serving reports of the
|
|
||||||
model are collected in the following paths: ``{{ model.mad_tag }}_throughput.csv``
|
|
||||||
and ``{{ model.mad_tag }}_serving.csv``.
|
|
||||||
|
|
||||||
.. tab-item:: Standalone benchmarking
|
|
||||||
|
|
||||||
To run the benchmarks for {{ model.model }}, use the following command:
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
{{ model.benchmark_command
|
|
||||||
| map('replace', '{model_repo}', model.model_repo)
|
|
||||||
| map('trim')
|
|
||||||
| join('\n ') }}
|
|
||||||
|
|
||||||
The generated content and timing information will be stored under the results directory.
|
|
||||||
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
Previous versions
|
|
||||||
=================
|
|
||||||
|
|
||||||
See
|
|
||||||
:doc:`/how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-history`
|
|
||||||
to find documentation for previous releases of xDiT diffusion inference
|
|
||||||
performance testing.
|
|
||||||
@@ -15,35 +15,11 @@ benchmarking, see the version-specific documentation.
|
|||||||
- Components
|
- Components
|
||||||
- Resources
|
- Resources
|
||||||
|
|
||||||
* - ``rocm/pytorch-xdit:v26.3`` (latest)
|
* - ``rocm/pytorch-xdit:v25.13`` (latest)
|
||||||
-
|
-
|
||||||
* TheRock e40a6da
|
|
||||||
-
|
|
||||||
* :doc:`Documentation </how-to/rocm-for-ai/inference/xdit-diffusion-inference>`
|
|
||||||
* `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-xdit/v26.3/images/sha256-ac78a03d2911bf1b49c001d3be2e8bd745c1bc455cb49ae972825a7986880902>`__
|
|
||||||
|
|
||||||
* - ``rocm/pytorch-xdit:v26.2``
|
|
||||||
-
|
|
||||||
* `ROCm 7.11.0 preview <https://rocm.docs.amd.com/en/7.11.0-preview/about/release-notes.html>`__
|
|
||||||
* TheRock 1728a81
|
* TheRock 1728a81
|
||||||
-
|
-
|
||||||
* :doc:`Documentation <xdit-26.2>`
|
* :doc:`Documentation <../../xdit-diffusion-inference>`
|
||||||
* `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-xdit/v26.2/images/sha256-e2c504af438bb9cf60e3869c499baa5102b3d3f62141b99c49743e755ae44008>`__
|
|
||||||
|
|
||||||
* - ``rocm/pytorch-xdit:v26.1``
|
|
||||||
-
|
|
||||||
* `ROCm 7.11.0 preview <https://rocm.docs.amd.com/en/7.11.0-preview/about/release-notes.html>`__
|
|
||||||
* TheRock 1728a81
|
|
||||||
-
|
|
||||||
* :doc:`Documentation <xdit-26.1>`
|
|
||||||
* `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-xdit/v26.1/images/sha256-4e35ebcad47042a41389b992ecb3489b3b0a922e4c34c7a0dd1098733a3db513>`__
|
|
||||||
|
|
||||||
* - ``rocm/pytorch-xdit:v25.13``
|
|
||||||
-
|
|
||||||
* `ROCm 7.11.0 preview <https://rocm.docs.amd.com/en/7.11.0-preview/about/release-notes.html>`__
|
|
||||||
* TheRock 1728a81
|
|
||||||
-
|
|
||||||
* :doc:`Documentation <xdit-25.13>`
|
|
||||||
* `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-xdit/v25.13/images/sha256-81954713070d67bde08595e03f62110c8a3dd66a9ae17a77d611e01f83f0f4ef>`__
|
* `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-xdit/v25.13/images/sha256-81954713070d67bde08595e03f62110c8a3dd66a9ae17a77d611e01f83f0f4ef>`__
|
||||||
|
|
||||||
* - ``rocm/pytorch-xdit:v25.12``
|
* - ``rocm/pytorch-xdit:v25.12``
|
||||||
|
|||||||
@@ -2,65 +2,547 @@
|
|||||||
:description: Learn how to validate LLM inference performance on MI300X GPUs using AMD MAD and the ROCm vLLM Docker image.
|
:description: Learn how to validate LLM inference performance on MI300X GPUs using AMD MAD and the ROCm vLLM Docker image.
|
||||||
:keywords: model, MAD, automation, dashboarding, validate
|
:keywords: model, MAD, automation, dashboarding, validate
|
||||||
|
|
||||||
**************
|
**********************************
|
||||||
vLLM inference
|
vLLM inference performance testing
|
||||||
**************
|
**********************************
|
||||||
|
|
||||||
The `ROCm-enabled vLLM Docker image
|
.. _vllm-benchmark-unified-docker-1210:
|
||||||
<https://hub.docker.com/r/vllm/vllm-openai-rocm/tags>`__ offers a prebuilt,
|
|
||||||
optimized environment for large language model (LLM) inference on AMD Instinct
|
|
||||||
MI355X, MI350X, MI325X and MI300X GPUs. This ROCm vLLM Docker image integrates
|
|
||||||
vLLM and PyTorch tailored specifically for AMD Instinct data center GPUs.
|
|
||||||
|
|
||||||
This container integrates ROCm, PyTorch, and vLLM with optimizations tailored
|
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/vllm-benchmark-models.yaml
|
||||||
for AMD Instinct data center GPUs, enabling consistent and reproducible
|
|
||||||
inference deployments.
|
{% set docker = data.dockers[0] %}
|
||||||
|
|
||||||
|
The `ROCm vLLM Docker <{{ docker.docker_hub_url }}>`_ image offers a
|
||||||
|
prebuilt, optimized environment for validating large language model (LLM)
|
||||||
|
inference performance on AMD Instinct™ MI355X, MI350X, MI325X and MI300X
|
||||||
|
GPUs. This ROCm vLLM Docker image integrates vLLM and PyTorch tailored
|
||||||
|
specifically for AMD data center GPUs and includes the following components:
|
||||||
|
|
||||||
|
.. tab-set::
|
||||||
|
|
||||||
|
.. tab-item:: {{ docker.pull_tag }}
|
||||||
|
|
||||||
|
.. list-table::
|
||||||
|
:header-rows: 1
|
||||||
|
|
||||||
|
* - Software component
|
||||||
|
- Version
|
||||||
|
|
||||||
|
{% for component_name, component_version in docker.components.items() %}
|
||||||
|
* - {{ component_name }}
|
||||||
|
- {{ component_version }}
|
||||||
|
{% endfor %}
|
||||||
|
|
||||||
|
With this Docker image, you can quickly test the :ref:`expected
|
||||||
|
inference performance numbers <vllm-benchmark-performance-measurements-1210>` for
|
||||||
|
AMD Instinct GPUs.
|
||||||
|
|
||||||
What's new
|
What's new
|
||||||
==========
|
==========
|
||||||
|
|
||||||
- For vLLM release notes on model support, hardware and performance improvements,
|
The following is summary of notable changes since the :doc:`previous ROCm/vLLM
|
||||||
and other highlights, see the `vLLM Releases page
|
Docker release <previous-versions/vllm-history>`.
|
||||||
<https://github.com/vllm-project/vllm/releases>`__ on GitHub.
|
|
||||||
|
|
||||||
- It's now recommended to use the upstream vLLM documentation at `docs.vllm.ai
|
- Improved performance on Llama 3 MXFP4 through AITER optimizations and improved kernel fusion.
|
||||||
<https://docs.vllm.ai>`__ for the latest inference and deployment guides.
|
|
||||||
|
|
||||||
Get started
|
.. _vllm-benchmark-supported-models-1210:
|
||||||
===========
|
|
||||||
|
|
||||||
For a consistent and portable inference environment, it's recommended to use Docker. vLLM
|
Supported models
|
||||||
offers a Docker image `vllm/vllm-openai-rocm
|
================
|
||||||
<https://hub.docker.com/r/vllm/vllm-openai-rocm/tags>`__ for deployment on AMD
|
|
||||||
GPUs. Use the following command to pull the latest Docker image from Docker Hub.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/vllm-benchmark-models.yaml
|
||||||
|
|
||||||
docker pull vllm/vllm-openai-rocm:latest
|
{% set docker = data.dockers[0] %}
|
||||||
|
{% set model_groups = data.model_groups %}
|
||||||
|
|
||||||
After pulling the Docker image, follow the vLLM usage documentation: `Using
|
.. _vllm-benchmark-available-models-1210:
|
||||||
vLLM <https://docs.vllm.ai/en/latest/usage/>`__.
|
|
||||||
|
The following models are supported for inference performance benchmarking
|
||||||
|
with vLLM and ROCm. Some instructions, commands, and recommendations in this
|
||||||
|
documentation might vary by model -- select one to get started. MXFP4 models
|
||||||
|
are only supported on MI355X and MI350X GPUs.
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
<div id="vllm-benchmark-ud-params-picker" class="container-fluid">
|
||||||
|
<div class="row gx-0">
|
||||||
|
<div class="col-2 me-1 px-2 model-param-head">Model</div>
|
||||||
|
<div class="row col-10 pe-0">
|
||||||
|
{% for model_group in model_groups %}
|
||||||
|
<div class="col-4 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
|
||||||
|
{% endfor %}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="row gx-0 pt-1">
|
||||||
|
<div class="col-2 me-1 px-2 model-param-head">Variant</div>
|
||||||
|
<div class="row col-10 pe-0">
|
||||||
|
{% for model_group in model_groups %}
|
||||||
|
{% set models = model_group.models %}
|
||||||
|
{% for model in models %}
|
||||||
|
{% if models|length % 3 == 0 %}
|
||||||
|
<div class="col-4 px-2 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
|
||||||
|
{% else %}
|
||||||
|
<div class="col-6 px-2 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
|
||||||
|
{% endif %}
|
||||||
|
{% endfor %}
|
||||||
|
{% endfor %}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
.. _vllm-benchmark-vllm-1210:
|
||||||
|
|
||||||
|
{% for model_group in model_groups %}
|
||||||
|
{% for model in model_group.models %}
|
||||||
|
|
||||||
|
.. container:: model-doc {{ model.mad_tag }}
|
||||||
|
|
||||||
|
|
||||||
|
{% if model.precision == "float4" %}
|
||||||
|
.. important::
|
||||||
|
|
||||||
|
MXFP4 is supported only on MI355X and MI350X GPUs.
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if model.mad_tag in ["pyt_vllm_mixtral-8x7b", "pyt_vllm_mixtral-8x7b_fp8", "pyt_vllm_mixtral-8x22b", "pyt_vllm_mixtral-8x22b_fp8", "pyt_vllm_deepseek-r1"] %}
|
||||||
|
.. caution::
|
||||||
|
|
||||||
|
There is a known regression with AITER for MoE models such as Mixtral and
|
||||||
|
DeepSeek-R1. Consider using the :doc:`previous release
|
||||||
|
<previous-versions/vllm-0.11.1-20251103>`
|
||||||
|
``rocm/vllm:rocm7.0.0_vllm_0.11.1_20251103`` for better performance.
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
See the `{{ model.model }} model card on Hugging Face <{{ model.url }}>`_ to learn more about your selected model.
|
||||||
|
Some models require access authorization prior to use via an external license agreement through a third party.
|
||||||
|
{% if model.precision == "float8" and model.model_repo.startswith("amd") %}
|
||||||
|
This model uses FP8 quantization via `AMD Quark <https://quark.docs.amd.com/latest/>`__ for efficient inference on AMD GPUs.
|
||||||
|
{% endif %}
|
||||||
|
{% if model.precision == "float4" and model.model_repo.startswith("amd") %}
|
||||||
|
This model uses FP4 quantization via `AMD Quark <https://quark.docs.amd.com/latest/>`__ for efficient inference on AMD GPUs.
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% endfor %}
|
||||||
|
{% endfor %}
|
||||||
|
|
||||||
|
.. _vllm-benchmark-performance-measurements-1210:
|
||||||
|
|
||||||
|
Performance measurements
|
||||||
|
========================
|
||||||
|
|
||||||
|
To evaluate performance, the
|
||||||
|
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
|
||||||
|
page provides reference throughput and serving measurements for inferencing popular AI models.
|
||||||
|
|
||||||
|
.. important::
|
||||||
|
|
||||||
|
The performance data presented in
|
||||||
|
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
|
||||||
|
only reflects the latest version of this inference benchmarking environment.
|
||||||
|
The listed measurements should not be interpreted as the peak performance achievable by AMD Instinct GPUs or ROCm software.
|
||||||
|
|
||||||
|
System validation
|
||||||
|
=================
|
||||||
|
|
||||||
|
Before running AI workloads, it's important to validate that your AMD hardware is configured
|
||||||
|
correctly and performing optimally.
|
||||||
|
|
||||||
|
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
|
||||||
|
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
|
||||||
|
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
|
||||||
|
before starting training.
|
||||||
|
|
||||||
|
To test for optimal performance, consult the recommended :ref:`System health benchmarks
|
||||||
|
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
|
||||||
|
system's configuration.
|
||||||
|
|
||||||
|
Pull the Docker image
|
||||||
|
=====================
|
||||||
|
|
||||||
|
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/vllm-benchmark-models.yaml
|
||||||
|
|
||||||
|
{% set docker = data.dockers[0] %}
|
||||||
|
|
||||||
|
Download the `ROCm vLLM Docker image <{{ docker.docker_hub_url }}>`_.
|
||||||
|
Use the following command to pull the Docker image from Docker Hub.
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
docker pull {{ docker.pull_tag }}
|
||||||
|
|
||||||
|
Benchmarking
|
||||||
|
============
|
||||||
|
|
||||||
|
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/vllm-benchmark-models.yaml
|
||||||
|
|
||||||
|
{% set docker = data.dockers[0] %}
|
||||||
|
{% set model_groups = data.model_groups %}
|
||||||
|
|
||||||
|
Once the setup is complete, choose between two options to reproduce the
|
||||||
|
benchmark results:
|
||||||
|
|
||||||
|
.. _vllm-benchmark-mad-1210:
|
||||||
|
|
||||||
|
{% for model_group in model_groups %}
|
||||||
|
{% for model in model_group.models %}
|
||||||
|
|
||||||
|
{% set serv_config = model.config.serving %}
|
||||||
|
{% set acc_config = model.config.accuracy %}
|
||||||
|
{% set ex_config = model.config.ex %}
|
||||||
|
|
||||||
|
.. container:: model-doc {{model.mad_tag}}
|
||||||
|
|
||||||
|
.. tab-set::
|
||||||
|
|
||||||
|
.. tab-item:: MAD-integrated benchmarking
|
||||||
|
|
||||||
|
The following run command is tailored to {{ model.model }}.
|
||||||
|
See :ref:`vllm-benchmark-supported-models-1210` to switch to another available model.
|
||||||
|
|
||||||
|
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
|
||||||
|
directory and install the required packages on the host machine.
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
git clone https://github.com/ROCm/MAD
|
||||||
|
cd MAD
|
||||||
|
pip install -r requirements.txt
|
||||||
|
|
||||||
|
2. On the host machine, use this command to run the performance benchmark test on
|
||||||
|
the `{{model.model}} <{{ model.url }}>`_ model using one node with the
|
||||||
|
:literal:`{{model.precision}}` data type.
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
|
||||||
|
madengine run \
|
||||||
|
--tags {{model.mad_tag}} \
|
||||||
|
--keep-model-dir \
|
||||||
|
--live-output
|
||||||
|
|
||||||
|
MAD launches a Docker container with the name
|
||||||
|
``container_ci-{{model.mad_tag}}``. The throughput and serving reports of the
|
||||||
|
model are collected in the following paths: ``{{ model.mad_tag }}_throughput.csv``
|
||||||
|
and ``{{ model.mad_tag }}_serving.csv``.
|
||||||
|
|
||||||
|
Although the :ref:`available models
|
||||||
|
<vllm-benchmark-available-models-1210>` are preconfigured to collect
|
||||||
|
offline throughput and online serving performance data, you can
|
||||||
|
also change the benchmarking parameters. See the standalone
|
||||||
|
benchmarking tab for more information.
|
||||||
|
|
||||||
|
{% if model.tunableop %}
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
For improved performance, consider enabling :ref:`PyTorch TunableOp <mi300x-tunableop>`.
|
||||||
|
TunableOp automatically explores different implementations and configurations of certain PyTorch
|
||||||
|
operators to find the fastest one for your hardware.
|
||||||
|
|
||||||
|
By default, ``{{model.mad_tag}}`` runs with TunableOp disabled (see
|
||||||
|
`<https://github.com/ROCm/MAD/blob/develop/models.json>`__). To enable it, include
|
||||||
|
the ``--tunableop on`` argument in your run.
|
||||||
|
|
||||||
|
Enabling TunableOp triggers a two-pass run -- a warm-up followed by the
|
||||||
|
performance-collection run.
|
||||||
|
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
.. tab-item:: Standalone benchmarking
|
||||||
|
|
||||||
|
The following commands are optimized for {{ model.model }}.
|
||||||
|
See :ref:`vllm-benchmark-supported-models-1210` to switch to another available model.
|
||||||
|
|
||||||
|
.. seealso::
|
||||||
|
|
||||||
|
For more information on configuration, see the `config files
|
||||||
|
<https://github.com/ROCm/MAD/tree/develop/scripts/vllm/configs>`__
|
||||||
|
in the MAD repository. Refer to the `vLLM engine <https://docs.vllm.ai/en/latest/configuration/engine_args.html#engineargs>`__
|
||||||
|
for descriptions of available configuration options
|
||||||
|
and `Benchmarking vLLM <https://github.com/vllm-project/vllm/blob/main/benchmarks/README.md>`__ for
|
||||||
|
additional benchmarking information.
|
||||||
|
|
||||||
|
.. rubric:: Launch the container
|
||||||
|
|
||||||
|
You can run the vLLM benchmark tool independently by starting the
|
||||||
|
`Docker container <{{ docker.docker_hub_url }}>`_ as shown
|
||||||
|
in the following snippet.
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
docker pull {{ docker.pull_tag }}
|
||||||
|
docker run -it \
|
||||||
|
--device=/dev/kfd \
|
||||||
|
--device=/dev/dri \
|
||||||
|
--group-add video \
|
||||||
|
--shm-size 16G \
|
||||||
|
--security-opt seccomp=unconfined \
|
||||||
|
--security-opt apparmor=unconfined \
|
||||||
|
--cap-add=SYS_PTRACE \
|
||||||
|
-v $(pwd):/workspace \
|
||||||
|
--env HUGGINGFACE_HUB_CACHE=/workspace \
|
||||||
|
--name test \
|
||||||
|
{{ docker.pull_tag }}
|
||||||
|
|
||||||
|
.. rubric:: Run the inference benchmarks
|
||||||
|
|
||||||
|
.. tab-set::
|
||||||
|
|
||||||
|
.. tab-item:: Latency command
|
||||||
|
|
||||||
|
Use the following command to start the latency benchmark.
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
model={{ model.model_repo }}
|
||||||
|
tp={{ serv_config.tp }}
|
||||||
|
batch_size=16
|
||||||
|
in={{ serv_config.inp | default(1024) }}
|
||||||
|
out={{ serv_config.out | default(1024) }}
|
||||||
|
dtype={{ serv_config.dtype | default("auto") }}
|
||||||
|
kv_cache_dtype={{ ex_config.kv_cache_dtype | default("auto") }}
|
||||||
|
max_num_seqs={{ ex_config.max_num_seqs | default(1024) }}
|
||||||
|
max_num_batched_tokens={{ ex_config.max_num_batched_tokens }}
|
||||||
|
max_model_len={{ ex_config.max_model_len }}
|
||||||
|
|
||||||
|
vllm bench latency --model $model \
|
||||||
|
-tp $tp \
|
||||||
|
--batch-size $batch_size \
|
||||||
|
--input-len $in \
|
||||||
|
--output-len $out \
|
||||||
|
--dtype $dtype \
|
||||||
|
--kv-cache-dtype $kv_cache_dtype \
|
||||||
|
--max-num-seqs $max_num_seqs \
|
||||||
|
--max-num-batched-tokens $max_num_batched_tokens \
|
||||||
|
--max-model-len $max_model_len \
|
||||||
|
--output-json ${model}_throughput.json \
|
||||||
|
|
||||||
|
.. tab-item:: Throughput command
|
||||||
|
|
||||||
|
Use the following command to start the throughput benchmark.
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
model={{ model.model_repo }}
|
||||||
|
tp={{ serv_config.tp }}
|
||||||
|
num_prompts={{ model.config.num_prompts | default(1024) }}
|
||||||
|
in={{ serv_config.inp | default(1024) }}
|
||||||
|
out={{ serv_config.out | default(1024) }}
|
||||||
|
dtype={{ serv_config.dtype | default("auto") }}
|
||||||
|
kv_cache_dtype={{ ex_config.kv_cache_dtype | default("auto") }}
|
||||||
|
max_num_seqs={{ ex_config.max_num_seqs | default(1024) }}
|
||||||
|
max_num_batched_tokens={{ ex_config.max_num_batched_tokens }}
|
||||||
|
max_model_len={{ ex_config.max_model_len }}
|
||||||
|
|
||||||
|
vllm bench throughput --model $model \
|
||||||
|
-tp $tp \
|
||||||
|
--num-prompts $num_prompts \
|
||||||
|
--input-len $in \
|
||||||
|
--output-len $out \
|
||||||
|
--dtype $dtype \
|
||||||
|
--kv-cache-dtype $kv_cache_dtype \
|
||||||
|
--max-num-seqs $max_num_seqs \
|
||||||
|
--max-num-batched-tokens $max_num_batched_tokens \
|
||||||
|
--max-model-len $max_model_len \
|
||||||
|
--trust-remote-code \
|
||||||
|
--output-json ${model}_throughput.json \
|
||||||
|
--gpu-memory-utilization {{ model.config.gpu_memory_utilization | default(0.9) }}
|
||||||
|
|
||||||
|
.. tab-item:: Serving command
|
||||||
|
|
||||||
|
1. Start the server using the following command:
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
model={{ model.model_repo }}
|
||||||
|
tp={{ serv_config.tp }}
|
||||||
|
dtype={{ serv_config.dtype }}
|
||||||
|
kv_cache_dtype={{ ex_config.kv_cache_dtype }}
|
||||||
|
max_num_seqs=1024
|
||||||
|
max_num_batched_tokens={{ ex_config.max_num_batched_tokens }}
|
||||||
|
max_model_len={{ ex_config.max_model_len }}
|
||||||
|
|
||||||
|
vllm serve $model \
|
||||||
|
-tp $tp \
|
||||||
|
--dtype $dtype \
|
||||||
|
--kv-cache-dtype $kv_cache_dtype \
|
||||||
|
--max-num-seqs $max_num_seqs \
|
||||||
|
--max-num-batched-tokens $max_num_batched_tokens \
|
||||||
|
--max-model-len $max_model_len \
|
||||||
|
--no-enable-prefix-caching \
|
||||||
|
--swap-space 16 \
|
||||||
|
--disable-log-requests
|
||||||
|
|
||||||
|
Wait until the model has loaded and the server is ready to accept requests.
|
||||||
|
|
||||||
|
2. On another terminal on the same machine, run the benchmark:
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
# Connect to the container
|
||||||
|
docker exec -it test bash
|
||||||
|
|
||||||
|
# Wait for the server to start
|
||||||
|
until curl -s http://localhost:8000/v1/models; do sleep 30; done
|
||||||
|
|
||||||
|
# Run the benchmark
|
||||||
|
model={{ model.model_repo }}
|
||||||
|
max_concurrency=1
|
||||||
|
num_prompts=10
|
||||||
|
in={{ serv_config.inp | default("1024") }}
|
||||||
|
out={{ serv_config.out | default("1024") }}
|
||||||
|
vllm bench serve --model $model \
|
||||||
|
--percentile-metrics "ttft,tpot,itl,e2el" \
|
||||||
|
--dataset-name random \
|
||||||
|
--ignore-eos \
|
||||||
|
--max-concurrency $max_concurrency \
|
||||||
|
--num-prompts $num_prompts \
|
||||||
|
--random-input-len $in \
|
||||||
|
--random-output-len $out \
|
||||||
|
--trust-remote-code \
|
||||||
|
--save-result \
|
||||||
|
--result-filename ${model}_serving.json
|
||||||
|
|
||||||
|
{% if acc_config %}
|
||||||
|
.. tab-item:: Accuracy command
|
||||||
|
|
||||||
|
1. Start the server using the following command:
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
model={{ model.model_repo }}
|
||||||
|
tp={{ acc_config.tp }}
|
||||||
|
dtype={{ acc_config.dtype }}
|
||||||
|
kv_cache_dtype={{ ex_config.kv_cache_dtype }}
|
||||||
|
max_num_seqs=1024
|
||||||
|
max_num_batched_tokens={{ ex_config.max_num_batched_tokens }}
|
||||||
|
max_model_len={{ ex_config.max_model_len }}
|
||||||
|
|
||||||
|
vllm serve $model \
|
||||||
|
-tp $tp \
|
||||||
|
--dtype $dtype \
|
||||||
|
--kv-cache-dtype $kv_cache_dtype \
|
||||||
|
--max-num-seqs $max_num_seqs \
|
||||||
|
--max-num-batched-tokens $max_num_batched_tokens \
|
||||||
|
--max-model-len $max_model_len \
|
||||||
|
--no-enable-prefix-caching \
|
||||||
|
--swap-space 16 \
|
||||||
|
--disable-log-requests
|
||||||
|
|
||||||
|
Wait until the model has loaded and the server is ready to accept requests.
|
||||||
|
|
||||||
|
2. On another terminal on the same machine, run the benchmark:
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
# Connect to the container
|
||||||
|
docker exec -it test bash
|
||||||
|
|
||||||
|
# Wait for the server to start
|
||||||
|
until curl -s http://localhost:8000/v1/models; do sleep 30; done
|
||||||
|
|
||||||
|
# Install lm-eval
|
||||||
|
pip install lm-eval[api]
|
||||||
|
|
||||||
|
# Run the benchmark
|
||||||
|
model={{ acc_config.model }}
|
||||||
|
lm_eval --model local-completions \
|
||||||
|
--model_args model=$model,max_gen_toks=2048,num_concurrent=256,max_retries=10,base_url=http://localhost:8000/v1/completions \
|
||||||
|
--tasks gsm8k --limit 250 --output_path ./tmp
|
||||||
|
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
<style>
|
||||||
|
mjx-container[jax="CHTML"][display="true"] {
|
||||||
|
text-align: left;
|
||||||
|
margin: 0;
|
||||||
|
}
|
||||||
|
</style>
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Throughput is calculated as:
|
||||||
|
|
||||||
|
- .. math:: throughput\_tot = requests \times (\mathsf{\text{input lengths}} + \mathsf{\text{output lengths}}) / elapsed\_time
|
||||||
|
|
||||||
|
- .. math:: throughput\_gen = requests \times \mathsf{\text{output lengths}} / elapsed\_time
|
||||||
|
{% endfor %}
|
||||||
|
{% endfor %}
|
||||||
|
|
||||||
|
Advanced usage
|
||||||
|
==============
|
||||||
|
|
||||||
|
For information on experimental features and known issues related to ROCm optimization efforts on vLLM,
|
||||||
|
see the developer's guide at `<https://github.com/ROCm/vllm/blob/documentation/docs/dev-docker/README.md>`__.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
If you’re using this Docker image on other AMD GPUs such as the AMD Instinct MI200 Series or Radeon, add ``export VLLM_ROCM_USE_AITER=0`` to your command, since AITER is only supported on gfx942 and gfx950 architectures.
|
||||||
|
|
||||||
|
Reproducing the Docker image
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
To reproduce this ROCm-enabled vLLM Docker image release, follow these steps:
|
||||||
|
|
||||||
|
1. Clone the `vLLM repository <https://github.com/vllm-project/vllm>`__.
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
git clone https://github.com/vllm-project/vllm.git
|
||||||
|
cd vllm
|
||||||
|
|
||||||
|
2. Use the following command to build the image directly from the specified commit.
|
||||||
|
|
||||||
|
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/vllm-benchmark-models.yaml
|
||||||
|
|
||||||
|
{% set docker = data.dockers[0] %}
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
docker build -f docker/Dockerfile.rocm \
|
||||||
|
--build-arg REMOTE_VLLM=1 \
|
||||||
|
--build-arg VLLM_REPO=https://github.com/ROCm/vllm \
|
||||||
|
--build-arg VLLM_BRANCH="{{ docker.dockerfile.commit }}" \
|
||||||
|
-t vllm-rocm .
|
||||||
|
|
||||||
|
.. tip::
|
||||||
|
|
||||||
|
Replace ``vllm-rocm`` with your desired image tag.
|
||||||
|
|
||||||
|
Known issues
|
||||||
|
============
|
||||||
|
|
||||||
|
There is a known regression with AITER for MoE models such as Mixtral and
|
||||||
|
DeepSeek-R1. Consider using the :doc:`previous release
|
||||||
|
<previous-versions/vllm-0.11.1-20251103>`
|
||||||
|
(``rocm/vllm:rocm7.0.0_vllm_0.11.1_20251103``) for better performance.
|
||||||
|
|
||||||
Further reading
|
Further reading
|
||||||
===============
|
===============
|
||||||
|
|
||||||
|
- To learn more about the options for latency and throughput benchmark scripts,
|
||||||
|
see `<https://github.com/ROCm/vllm/tree/main/benchmarks>`_.
|
||||||
|
|
||||||
|
- To learn more about MAD and the ``madengine`` CLI, see the `MAD usage guide <https://github.com/ROCm/MAD?tab=readme-ov-file#usage-guide>`__.
|
||||||
|
|
||||||
|
- To learn more about system settings and management practices to configure your system for
|
||||||
|
AMD Instinct MI300X Series GPUs, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_.
|
||||||
|
|
||||||
- See :ref:`fine-tuning-llms-vllm` and :ref:`mi300x-vllm-optimization` for
|
- See :ref:`fine-tuning-llms-vllm` and :ref:`mi300x-vllm-optimization` for
|
||||||
a brief introduction to vLLM and optimization strategies.
|
a brief introduction to vLLM and optimization strategies.
|
||||||
|
|
||||||
- For a list of other ready-made Docker images for AI with ROCm, see
|
- For application performance optimization strategies for HPC and AI workloads,
|
||||||
`AMD Infinity Hub <https://www.amd.com/en/developer/resources/infinity-hub.html#f-amd_hub_category=AI%20%26%20ML%20Models>`__.
|
including inference with vLLM, see :doc:`/how-to/rocm-for-ai/inference-optimization/workload`.
|
||||||
|
|
||||||
.. _vllm-inference-previous-versions:
|
- For a list of other ready-made Docker images for AI with ROCm, see
|
||||||
|
`AMD Infinity Hub <https://www.amd.com/en/developer/resources/infinity-hub.html#f-amd_hub_category=AI%20%26%20ML%20Models>`_.
|
||||||
|
|
||||||
Previous versions
|
Previous versions
|
||||||
=================
|
=================
|
||||||
|
|
||||||
It's now recommended to use the upstream vLLM documentation at `docs.vllm.ai
|
See :doc:`previous-versions/vllm-history` to find documentation for previous releases
|
||||||
<https://docs.vllm.ai>`__ for the latest deployment guides.
|
of the ``ROCm/vllm`` Docker image.
|
||||||
|
|
||||||
You can find legacy versions of this documentation at
|
|
||||||
:doc:`previous-versions/vllm-history` which provide instructions for
|
|
||||||
inference performance testing for select models. See the `Use AMD's Docker
|
|
||||||
images
|
|
||||||
<https://docs.vllm.ai/en/stable/deployment/docker/#use-amds-docker-images>`__
|
|
||||||
note in the vLLM documentation for more information.
|
|
||||||
|
|||||||
@@ -20,7 +20,7 @@ training, fine-tuning, and inference. It leverages popular machine learning fram
|
|||||||
|
|
||||||
- :doc:`LLM inference frameworks <llm-inference-frameworks>`
|
- :doc:`LLM inference frameworks <llm-inference-frameworks>`
|
||||||
|
|
||||||
- :doc:`vLLM inference <benchmark-docker/vllm>`
|
- :doc:`vLLM inference performance testing <benchmark-docker/vllm>`
|
||||||
|
|
||||||
- :doc:`PyTorch inference performance testing <benchmark-docker/pytorch-inference>`
|
- :doc:`PyTorch inference performance testing <benchmark-docker/pytorch-inference>`
|
||||||
|
|
||||||
|
|||||||
@@ -13,10 +13,15 @@ xDiT diffusion inference
|
|||||||
|
|
||||||
{% set docker = data.docker %}
|
{% set docker = data.docker %}
|
||||||
|
|
||||||
The `rocm/pytorch-xdit <{{ docker.docker_hub_url }}>`_ Docker image offers a prebuilt, optimized environment based on `xDiT <https://github.com/xdit-project/xDiT>`_ for
|
The `rocm/pytorch-xdit <{{ docker.docker_hub_url }}>`_ Docker image offers
|
||||||
benchmarking diffusion model video and image generation on gfx942 and gfx950 series (AMD Instinct™ MI300X, MI325X, MI350X, and MI355X) GPUs.
|
a prebuilt, optimized environment based on `xDiT
|
||||||
The image runs ROCm **{{docker.ROCm}}** (preview) based on `TheRock <https://github.com/ROCm/TheRock>`_
|
<https://github.com/xdit-project/xDiT>`_ for benchmarking diffusion model
|
||||||
and includes the following components:
|
video and image generation on AMD Instinct MI355X, MI350X (gfx950), MI325X,
|
||||||
|
and MI300X (gfx942) GPUs.
|
||||||
|
|
||||||
|
The image runs a preview version of ROCm using the new `TheRock
|
||||||
|
<https://github.com/ROCm/TheRock>`__ build system and includes the following
|
||||||
|
components:
|
||||||
|
|
||||||
.. dropdown:: Software components - {{ docker.pull_tag.split('-')|last }}
|
.. dropdown:: Software components - {{ docker.pull_tag.split('-')|last }}
|
||||||
|
|
||||||
@@ -100,6 +105,22 @@ vary by model -- select one to get started.
|
|||||||
{% endfor %}
|
{% endfor %}
|
||||||
{% endfor %}
|
{% endfor %}
|
||||||
|
|
||||||
|
Performance measurements
|
||||||
|
========================
|
||||||
|
|
||||||
|
To evaluate performance, the `Performance results with AMD ROCm software
|
||||||
|
<https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html#tabs-a8543b7e6d-item-9eda09e707-tab>`__
|
||||||
|
page provides reference throughput and serving measurements for inferencing popular AI models.
|
||||||
|
|
||||||
|
.. important::
|
||||||
|
|
||||||
|
The performance data presented in `Performance results with AMD ROCm
|
||||||
|
software
|
||||||
|
<https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html#tabs-a8543b7e6d-item-9eda09e707-tab>`__
|
||||||
|
only reflects the latest version of this inference benchmarking environment.
|
||||||
|
The listed measurements should not be interpreted as the peak performance
|
||||||
|
achievable by AMD Instinct GPUs or ROCm software.
|
||||||
|
|
||||||
System validation
|
System validation
|
||||||
=================
|
=================
|
||||||
|
|
||||||
@@ -290,13 +311,146 @@ Run inference
|
|||||||
To run the benchmarks for {{ model.model }}, use the following command:
|
To run the benchmarks for {{ model.model }}, use the following command:
|
||||||
|
|
||||||
.. code-block:: shell
|
.. code-block:: shell
|
||||||
|
{% if model.model == "Hunyuan Video" %}
|
||||||
|
cd /app/Hunyuanvideo
|
||||||
|
mkdir results
|
||||||
|
|
||||||
{{ model.benchmark_command
|
torchrun --nproc_per_node=8 run.py \
|
||||||
| map('replace', '{model_repo}', model.model_repo)
|
--model {{ model.model_repo }} \
|
||||||
| map('trim')
|
--prompt "In the large cage, two puppies were wagging their tails at each other." \
|
||||||
| join('\n ') }}
|
--height 720 --width 1280 --num_frames 129 \
|
||||||
|
--num_inference_steps 50 --warmup_steps 1 --n_repeats 1 \
|
||||||
|
--ulysses_degree 8 \
|
||||||
|
--enable_tiling --enable_slicing \
|
||||||
|
--use_torch_compile \
|
||||||
|
--bench_output results
|
||||||
|
|
||||||
The generated content and timing information will be stored under the results directory.
|
{% endif %}
|
||||||
|
{% if model.model == "Wan2.1" %}
|
||||||
|
cd /app/Wan
|
||||||
|
mkdir results
|
||||||
|
|
||||||
|
torchrun --nproc_per_node=8 /app/Wan/run.py \
|
||||||
|
--task i2v \
|
||||||
|
--height 720 \
|
||||||
|
--width 1280 \
|
||||||
|
--model {{ model.model_repo }} \
|
||||||
|
--img_file_path /app/Wan/i2v_input.JPG \
|
||||||
|
--ulysses_degree 8 \
|
||||||
|
--seed 42 \
|
||||||
|
--num_frames 81 \
|
||||||
|
--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." \
|
||||||
|
--num_repetitions 1 \
|
||||||
|
--num_inference_steps 40 \
|
||||||
|
--use_torch_compile
|
||||||
|
|
||||||
|
{% endif %}
|
||||||
|
{% if model.model == "Wan2.2" %}
|
||||||
|
cd /app/Wan
|
||||||
|
mkdir results
|
||||||
|
|
||||||
|
torchrun --nproc_per_node=8 /app/Wan/run.py \
|
||||||
|
--task i2v \
|
||||||
|
--height 720 \
|
||||||
|
--width 1280 \
|
||||||
|
--model {{ model.model_repo }} \
|
||||||
|
--img_file_path /app/Wan/i2v_input.JPG \
|
||||||
|
--ulysses_degree 8 \
|
||||||
|
--seed 42 \
|
||||||
|
--num_frames 81 \
|
||||||
|
--prompt "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside." \
|
||||||
|
--num_repetitions 1 \
|
||||||
|
--num_inference_steps 40 \
|
||||||
|
--use_torch_compile
|
||||||
|
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if model.model == "FLUX.1" %}
|
||||||
|
cd /app/Flux
|
||||||
|
mkdir results
|
||||||
|
|
||||||
|
torchrun --nproc_per_node=8 /app/Flux/run.py \
|
||||||
|
--model {{ model.model_repo }} \
|
||||||
|
--seed 42 \
|
||||||
|
--prompt "A small cat" \
|
||||||
|
--height 1024 \
|
||||||
|
--width 1024 \
|
||||||
|
--num_inference_steps 25 \
|
||||||
|
--max_sequence_length 256 \
|
||||||
|
--warmup_steps 5 \
|
||||||
|
--no_use_resolution_binning \
|
||||||
|
--ulysses_degree 8 \
|
||||||
|
--use_torch_compile \
|
||||||
|
--num_repetitions 50
|
||||||
|
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if model.model == "FLUX.1 Kontext" %}
|
||||||
|
cd /app/Flux
|
||||||
|
mkdir results
|
||||||
|
|
||||||
|
torchrun --nproc_per_node=8 /app/Flux/run_usp.py \
|
||||||
|
--model {{ model.model_repo }} \
|
||||||
|
--seed 42 \
|
||||||
|
--prompt "Add a cool hat to the cat" \
|
||||||
|
--height 1024 \
|
||||||
|
--width 1024 \
|
||||||
|
--num_inference_steps 30 \
|
||||||
|
--max_sequence_length 512 \
|
||||||
|
--warmup_steps 5 \
|
||||||
|
--no_use_resolution_binning \
|
||||||
|
--ulysses_degree 8 \
|
||||||
|
--use_torch_compile \
|
||||||
|
--img_file_path /app/Flux/cat.png \
|
||||||
|
--model_type flux_kontext \
|
||||||
|
--guidance_scale 2.5 \
|
||||||
|
--num_repetitions 25
|
||||||
|
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if model.model == "FLUX.2" %}
|
||||||
|
cd /app/Flux
|
||||||
|
mkdir results
|
||||||
|
|
||||||
|
torchrun --nproc_per_node=8 /app/Flux/run_usp.py \
|
||||||
|
--model {{ model.model_repo }} \
|
||||||
|
--seed 42 \
|
||||||
|
--prompt "Add a cool hat to the cat" \
|
||||||
|
--height 1024 \
|
||||||
|
--width 1024 \
|
||||||
|
--num_inference_steps 50 \
|
||||||
|
--max_sequence_length 512 \
|
||||||
|
--warmup_steps 5 \
|
||||||
|
--no_use_resolution_binning \
|
||||||
|
--ulysses_degree 8 \
|
||||||
|
--use_torch_compile \
|
||||||
|
--img_file_paths /app/Flux/cat.png \
|
||||||
|
--model_type flux2 \
|
||||||
|
--guidance_scale 4.0 \
|
||||||
|
--num_repetitions 25
|
||||||
|
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
{% if model.model == "stable-diffusion-3.5-large" %}
|
||||||
|
cd /app/StableDiffusion3.5
|
||||||
|
mkdir results
|
||||||
|
|
||||||
|
torchrun --nproc_per_node=8 /app/StableDiffusion3.5/run.py \
|
||||||
|
--model {{ model.model_repo }} \
|
||||||
|
--num_inference_steps 28 \
|
||||||
|
--prompt "A capybara holding a sign that reads Hello World" \
|
||||||
|
--use_torch_compile \
|
||||||
|
--pipefusion_parallel_degree 4 \
|
||||||
|
--use_cfg_parallel \
|
||||||
|
--num_repetitions 50 \
|
||||||
|
--dtype torch.float16 \
|
||||||
|
--output_path results
|
||||||
|
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
|
The generated video will be stored under the results directory. For the actual benchmark step runtimes, see {% if model.model == "Hunyuan Video" %}stdout.{% elif model.model in ["Wan2.1", "Wan2.2"] %}results/outputs/rank0_*.json{% elif model.model in ["FLUX.1", "FLUX.1 Kontext", "FLUX.2"] %}results/timing.json{% elif model.model == "stable-diffusion-3.5-large"%}benchmark_results.csv{% endif %}
|
||||||
|
|
||||||
|
{% if model.model == "FLUX.1" %}You may also use ``run_usp.py`` which implements USP without modifying the default diffusers pipeline. {% endif %}
|
||||||
|
|
||||||
{% endfor %}
|
{% endfor %}
|
||||||
{% endfor %}
|
{% endfor %}
|
||||||
@@ -304,7 +458,5 @@ Run inference
|
|||||||
Previous versions
|
Previous versions
|
||||||
=================
|
=================
|
||||||
|
|
||||||
See
|
See :doc:`benchmark-docker/previous-versions/xdit-history` to find documentation for previous releases
|
||||||
:doc:`/how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/xdit-history`
|
of xDiT diffusion inference performance testing.
|
||||||
to find documentation for previous releases of xDiT diffusion inference
|
|
||||||
performance testing.
|
|
||||||
|
|||||||
@@ -2,18 +2,13 @@
|
|||||||
:description: How to train a model using JAX MaxText for ROCm.
|
:description: How to train a model using JAX MaxText for ROCm.
|
||||||
:keywords: ROCm, AI, LLM, train, jax, torch, Llama, flux, tutorial, docker
|
:keywords: ROCm, AI, LLM, train, jax, torch, Llama, flux, tutorial, docker
|
||||||
|
|
||||||
********************************************
|
******************************************
|
||||||
Training a model with Primus and JAX MaxText
|
Training a model with JAX MaxText on ROCm
|
||||||
********************************************
|
******************************************
|
||||||
|
|
||||||
The JAX MaxText for ROCm training Docker image provides a prebuilt environment
|
|
||||||
for training on AMD Instinct MI355X, MI350X, MI325X, and MI300X GPUs, with
|
|
||||||
essential components such as JAX, XLA, ROCm libraries, and MaxText utilities.
|
|
||||||
|
|
||||||
The image also integrates with `Primus <https://github.com/AMD-AGI/Primus>`__,
|
|
||||||
a high-level training framework that supports multiple backends. You can use
|
|
||||||
the unified ``primus-cli`` to run training jobs using the JAX MaxText backend.
|
|
||||||
|
|
||||||
|
The MaxText for ROCm training Docker image
|
||||||
|
provides a prebuilt environment for training on AMD Instinct MI355X, MI350X, MI325X, and MI300X GPUs,
|
||||||
|
including essential components like JAX, XLA, ROCm libraries, and MaxText utilities.
|
||||||
It includes the following software components:
|
It includes the following software components:
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/jax-maxtext-benchmark-models.yaml
|
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/jax-maxtext-benchmark-models.yaml
|
||||||
@@ -52,7 +47,7 @@ MaxText with on ROCm provides the following key features to train large language
|
|||||||
|
|
||||||
- NANOO FP8 (for MI300X series GPUs) and FP8 (for MI355X and MI350X) quantization support
|
- NANOO FP8 (for MI300X series GPUs) and FP8 (for MI355X and MI350X) quantization support
|
||||||
|
|
||||||
.. _amd-maxtext-model-support-v26.2:
|
.. _amd-maxtext-model-support-v26.1:
|
||||||
|
|
||||||
Supported models
|
Supported models
|
||||||
================
|
================
|
||||||
@@ -134,7 +129,7 @@ Use the following command to pull the Docker image from Docker Hub.
|
|||||||
|
|
||||||
docker pull {{ docker.pull_tag }}
|
docker pull {{ docker.pull_tag }}
|
||||||
|
|
||||||
.. _amd-maxtext-multi-node-setup-v26.2:
|
.. _amd-maxtext-multi-node-setup-v26.1:
|
||||||
|
|
||||||
Multi-node configuration
|
Multi-node configuration
|
||||||
------------------------
|
------------------------
|
||||||
@@ -142,7 +137,7 @@ Multi-node configuration
|
|||||||
See :doc:`/how-to/rocm-for-ai/system-setup/multi-node-setup` to configure your
|
See :doc:`/how-to/rocm-for-ai/system-setup/multi-node-setup` to configure your
|
||||||
environment for multi-node training.
|
environment for multi-node training.
|
||||||
|
|
||||||
.. _amd-maxtext-get-started-v26.2:
|
.. _amd-maxtext-get-started-v26.1:
|
||||||
|
|
||||||
Benchmarking
|
Benchmarking
|
||||||
============
|
============
|
||||||
@@ -163,145 +158,11 @@ benchmark results:
|
|||||||
|
|
||||||
.. tab-set::
|
.. tab-set::
|
||||||
|
|
||||||
{% if model.primus_config_name %}
|
|
||||||
.. tab-item:: Primus benchmarking
|
|
||||||
|
|
||||||
.. container:: model-doc {{ model.mad_tag }}
|
|
||||||
|
|
||||||
The following run commands are tailored to {{ model.model }}.
|
|
||||||
See :ref:`amd-maxtext-model-support-v26.2` to switch to another available model.
|
|
||||||
|
|
||||||
.. rubric:: Download the Docker image and required packages
|
|
||||||
|
|
||||||
1. Pull the ``{{ docker.pull_tag }}`` Docker image from Docker Hub.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker pull {{ docker.pull_tag }}
|
|
||||||
|
|
||||||
2. Run the Docker container.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker run -it \
|
|
||||||
--device /dev/dri \
|
|
||||||
--device /dev/kfd \
|
|
||||||
--network host \
|
|
||||||
--ipc host \
|
|
||||||
--group-add video \
|
|
||||||
--cap-add SYS_PTRACE \
|
|
||||||
--security-opt seccomp=unconfined \
|
|
||||||
--privileged \
|
|
||||||
-v $HOME:$HOME \
|
|
||||||
-v $HOME/.ssh:/root/.ssh \
|
|
||||||
-v $HF_HOME:/hf_cache \
|
|
||||||
-e HF_HOME=/hf_cache \
|
|
||||||
-e MAD_SECRETS_HFTOKEN=$MAD_SECRETS_HFTOKEN
|
|
||||||
--shm-size 64G \
|
|
||||||
--name training_env \
|
|
||||||
{{ docker.pull_tag }}
|
|
||||||
|
|
||||||
Use these commands if you exit the ``training_env`` container and need to return to it.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker start training_env
|
|
||||||
docker exec -it training_env bash
|
|
||||||
|
|
||||||
3. Clone the Primus repository.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
git clone https://github.com/AMD-AIG-AIMA/Primus.git
|
|
||||||
cd Primus
|
|
||||||
git checkout dev/fuyuajin/maxtext-backend-test
|
|
||||||
git submodule update --init third_party/maxtext/
|
|
||||||
|
|
||||||
.. rubric:: Run the training job with primus-cli
|
|
||||||
|
|
||||||
For detailed usage instructions for ``primus-cli``, see the
|
|
||||||
`Primus CLI User Guide
|
|
||||||
<https://github.com/AMD-AGI/Primus/blob/main/docs/cli/PRIMUS-CLI-GUIDE.md>`__.
|
|
||||||
|
|
||||||
Use the following examples to run training with ``primus-cli``:
|
|
||||||
|
|
||||||
- Direct mode: run directly on the current host or within an existing Docker container
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MI355X
|
|
||||||
:sync: mi355x
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
./primus-cli direct \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/maxtext/configs/MI355X/{{ model.primus_config_name }}
|
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
|
||||||
:sync: mi300x
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
./primus-cli direct \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/maxtext/configs/MI300X/{{ model.primus_config_name }}
|
|
||||||
|
|
||||||
- Container mode: run in Docker containers
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MI355X
|
|
||||||
:sync: mi355x
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
./primus-cli container --image {{ docker.pull_tag }} \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/maxtext/configs/MI355X/{{ model.primus_config_name }}
|
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
|
||||||
:sync: mi300x
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
./primus-cli container --image rocm/jax-training:maxtext-v26.2 \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/maxtext/configs/MI300X/{{ model.primus_config_name }}
|
|
||||||
|
|
||||||
|
|
||||||
- Slurm mode: run distributed training on a Slurm cluster
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MI355X
|
|
||||||
:sync: mi355x
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
# Use a custom config file, where you can specify
|
|
||||||
# the Docker image and set environment variables.
|
|
||||||
./primus-cli --config my_maxtext_config.yaml slurm srun -N 8 \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/maxtext/configs/MI355X/{{ model.primus_config_name }}
|
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
|
||||||
:sync: mi300x
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
# Use a custom config file, where you can specify
|
|
||||||
# the Docker image and set environment variables.
|
|
||||||
./primus-cli --config my_maxtext_config.yaml slurm srun -N 8 \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/maxtext/configs/MI300X/{{ model.primus_config_name }}
|
|
||||||
{% endif %}
|
|
||||||
|
|
||||||
{% if model.mad_tag and "single-node" in model.doc_options %}
|
{% if model.mad_tag and "single-node" in model.doc_options %}
|
||||||
.. tab-item:: MAD-integrated benchmarking
|
.. tab-item:: MAD-integrated benchmarking
|
||||||
|
|
||||||
The following run command is tailored to {{ model.model }}.
|
The following run command is tailored to {{ model.model }}.
|
||||||
See :ref:`amd-maxtext-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-maxtext-model-support-v26.1` to switch to another available model.
|
||||||
|
|
||||||
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
|
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
|
||||||
directory and install the required packages on the host machine.
|
directory and install the required packages on the host machine.
|
||||||
@@ -332,7 +193,7 @@ benchmark results:
|
|||||||
.. tab-item:: Standalone benchmarking
|
.. tab-item:: Standalone benchmarking
|
||||||
|
|
||||||
The following commands are optimized for {{ model.model }}. See
|
The following commands are optimized for {{ model.model }}. See
|
||||||
:ref:`amd-maxtext-model-support-v26.2` to switch to another
|
:ref:`amd-maxtext-model-support-v26.1` to switch to another
|
||||||
available model. Some instructions and resources might not be
|
available model. Some instructions and resources might not be
|
||||||
available for all models and configurations.
|
available for all models and configurations.
|
||||||
|
|
||||||
@@ -452,7 +313,7 @@ benchmark results:
|
|||||||
|
|
||||||
[docker_image] (optional)
|
[docker_image] (optional)
|
||||||
The Docker image to use. If not specified, it defaults to
|
The Docker image to use. If not specified, it defaults to
|
||||||
``rocm/jax-training:maxtext-v26.2``.
|
``rocm/jax-training:maxtext-v26.1``.
|
||||||
|
|
||||||
For example, to run a multi-node training benchmark on {{ model.model }}:
|
For example, to run a multi-node training benchmark on {{ model.model }}:
|
||||||
|
|
||||||
@@ -477,7 +338,7 @@ benchmark results:
|
|||||||
{% else %}
|
{% else %}
|
||||||
.. rubric:: Multi-node training
|
.. rubric:: Multi-node training
|
||||||
|
|
||||||
For multi-node training examples, choose a model from :ref:`amd-maxtext-model-support-v26.2`
|
For multi-node training examples, choose a model from :ref:`amd-maxtext-model-support-v26.1`
|
||||||
with an available `multi-node training script <https://github.com/ROCm/MAD/tree/develop/scripts/jax-maxtext/env_scripts>`__.
|
with an available `multi-node training script <https://github.com/ROCm/MAD/tree/develop/scripts/jax-maxtext/env_scripts>`__.
|
||||||
{% endif %}
|
{% endif %}
|
||||||
{% endfor %}
|
{% endfor %}
|
||||||
@@ -490,6 +351,10 @@ Known issues
|
|||||||
a workaround, turn off input sequence packing (``packing=False``).
|
a workaround, turn off input sequence packing (``packing=False``).
|
||||||
This will be fixed in a future release.
|
This will be fixed in a future release.
|
||||||
|
|
||||||
|
- Docker ``rocm/jax-training:maxtext-v26.1`` does not include `Primus
|
||||||
|
<https://github.com/AMD-AGI/Primus/tree/main>`__. It is planned to be
|
||||||
|
supported in a future release.
|
||||||
|
|
||||||
Further reading
|
Further reading
|
||||||
===============
|
===============
|
||||||
|
|
||||||
|
|||||||
@@ -17,22 +17,6 @@ previous releases of the ``ROCm/jax-training`` Docker image on `Docker Hub <http
|
|||||||
- Components
|
- Components
|
||||||
- Resources
|
- Resources
|
||||||
|
|
||||||
* - 26.2 (latest)
|
|
||||||
-
|
|
||||||
* ROCm 7.1.1
|
|
||||||
* JAX 0.8.2
|
|
||||||
-
|
|
||||||
* :doc:`Documentation <../jax-maxtext>`
|
|
||||||
* `Docker Hub <https://hub.docker.com/layers/rocm/jax-training/maxtext-v26.2/images/sha256-a89643388487b1e2fc6b6ef7bd3c44378c05d217309c977a1c18c72d05ebcaeb>`__
|
|
||||||
|
|
||||||
* - 26.1
|
|
||||||
-
|
|
||||||
* ROCm 7.1.1
|
|
||||||
* JAX 0.8.2
|
|
||||||
-
|
|
||||||
* :doc:`Documentation <jax-maxtext-v26.1>`
|
|
||||||
* `Docker Hub <https://hub.docker.com/layers/rocm/jax-training/maxtext-v26.1/images/sha256-901083bde353fe6362ada3036e452c792b2c96124e5900f4e9b5946c02ff9d6a>`__
|
|
||||||
|
|
||||||
* - 25.11
|
* - 25.11
|
||||||
-
|
-
|
||||||
* ROCm 7.1.0
|
* ROCm 7.1.0
|
||||||
|
|||||||
@@ -1,380 +0,0 @@
|
|||||||
:orphan:
|
|
||||||
|
|
||||||
.. meta::
|
|
||||||
:description: How to train a model using JAX MaxText for ROCm.
|
|
||||||
:keywords: ROCm, AI, LLM, train, jax, torch, Llama, flux, tutorial, docker
|
|
||||||
|
|
||||||
******************************************
|
|
||||||
Training a model with JAX MaxText on ROCm
|
|
||||||
******************************************
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
This documentation does not reflect the latest version of ROCm JAX MaxText
|
|
||||||
training performance documentation. See :doc:`../jax-maxtext` for the latest version.
|
|
||||||
|
|
||||||
The MaxText for ROCm training Docker image
|
|
||||||
provides a prebuilt environment for training on AMD Instinct MI355X, MI350X, MI325X, and MI300X GPUs,
|
|
||||||
including essential components like JAX, XLA, ROCm libraries, and MaxText utilities.
|
|
||||||
It includes the following software components:
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/jax-maxtext-v26.1-benchmark-models.yaml
|
|
||||||
|
|
||||||
{% set dockers = data.dockers %}
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
{% for docker in dockers %}
|
|
||||||
{% set jax_version = docker.components["JAX"] %}
|
|
||||||
|
|
||||||
.. tab-item:: ``{{ docker.pull_tag }}``
|
|
||||||
:sync: {{ docker.pull_tag }}
|
|
||||||
|
|
||||||
.. list-table::
|
|
||||||
:header-rows: 1
|
|
||||||
|
|
||||||
* - Software component
|
|
||||||
- Version
|
|
||||||
|
|
||||||
{% for component_name, component_version in docker.components.items() %}
|
|
||||||
* - {{ component_name }}
|
|
||||||
- {{ component_version }}
|
|
||||||
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
MaxText with on ROCm provides the following key features to train large language models efficiently:
|
|
||||||
|
|
||||||
- Transformer Engine (TE)
|
|
||||||
|
|
||||||
- Flash Attention (FA) 3 -- with or without sequence input packing
|
|
||||||
|
|
||||||
- GEMM tuning
|
|
||||||
|
|
||||||
- Multi-node support
|
|
||||||
|
|
||||||
- NANOO FP8 (for MI300X series GPUs) and FP8 (for MI355X and MI350X) quantization support
|
|
||||||
|
|
||||||
.. _amd-maxtext-model-support-v26.1:
|
|
||||||
|
|
||||||
Supported models
|
|
||||||
================
|
|
||||||
|
|
||||||
The following models are pre-optimized for performance on AMD Instinct
|
|
||||||
GPUs. Some instructions, commands, and available training
|
|
||||||
configurations in this documentation might vary by model -- select one to get
|
|
||||||
started.
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/jax-maxtext-v26.1-benchmark-models.yaml
|
|
||||||
|
|
||||||
{% set model_groups = data.model_groups %}
|
|
||||||
.. raw:: html
|
|
||||||
|
|
||||||
<div id="vllm-benchmark-ud-params-picker" class="container-fluid">
|
|
||||||
<div class="row gx-0">
|
|
||||||
<div class="col-2 me-1 px-2 model-param-head">Model</div>
|
|
||||||
<div class="row col-10 pe-0">
|
|
||||||
{% for model_group in model_groups %}
|
|
||||||
<div class="col-4 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
|
|
||||||
{% endfor %}
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div class="row gx-0 pt-1">
|
|
||||||
<div class="col-2 me-1 px-2 model-param-head">Variant</div>
|
|
||||||
<div class="row col-10 pe-0">
|
|
||||||
{% for model_group in model_groups %}
|
|
||||||
{% set models = model_group.models %}
|
|
||||||
{% for model in models %}
|
|
||||||
{% if models|length % 3 == 0 %}
|
|
||||||
<div class="col-4 px-2 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
|
|
||||||
{% else %}
|
|
||||||
<div class="col-6 px-2 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
|
|
||||||
{% endif %}
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
Some models, such as Llama 3, require an external license agreement through
|
|
||||||
a third party (for example, Meta).
|
|
||||||
|
|
||||||
System validation
|
|
||||||
=================
|
|
||||||
|
|
||||||
Before running AI workloads, it's important to validate that your AMD hardware is configured
|
|
||||||
correctly and performing optimally.
|
|
||||||
|
|
||||||
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
|
|
||||||
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
|
|
||||||
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
|
|
||||||
before starting training.
|
|
||||||
|
|
||||||
To test for optimal performance, consult the recommended :ref:`System health benchmarks
|
|
||||||
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
|
|
||||||
system's configuration.
|
|
||||||
|
|
||||||
Environment setup
|
|
||||||
=================
|
|
||||||
|
|
||||||
This Docker image is optimized for specific model configurations outlined
|
|
||||||
as follows. Performance can vary for other training workloads, as AMD
|
|
||||||
doesn’t validate configurations and run conditions outside those described.
|
|
||||||
|
|
||||||
Pull the Docker image
|
|
||||||
---------------------
|
|
||||||
|
|
||||||
Use the following command to pull the Docker image from Docker Hub.
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/jax-maxtext-v26.1-benchmark-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.dockers[0] %}
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker pull {{ docker.pull_tag }}
|
|
||||||
|
|
||||||
.. _amd-maxtext-multi-node-setup-v26.1:
|
|
||||||
|
|
||||||
Multi-node configuration
|
|
||||||
------------------------
|
|
||||||
|
|
||||||
See :doc:`/how-to/rocm-for-ai/system-setup/multi-node-setup` to configure your
|
|
||||||
environment for multi-node training.
|
|
||||||
|
|
||||||
.. _amd-maxtext-get-started-v26.1:
|
|
||||||
|
|
||||||
Benchmarking
|
|
||||||
============
|
|
||||||
|
|
||||||
Once the setup is complete, choose between two options to reproduce the
|
|
||||||
benchmark results:
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/jax-maxtext-v26.1-benchmark-models.yaml
|
|
||||||
|
|
||||||
.. _vllm-benchmark-mad:
|
|
||||||
|
|
||||||
{% set docker = data.dockers[0] %}
|
|
||||||
{% set model_groups = data.model_groups %}
|
|
||||||
{% for model_group in model_groups %}
|
|
||||||
{% for model in model_group.models %}
|
|
||||||
|
|
||||||
.. container:: model-doc {{model.mad_tag}}
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
{% if model.mad_tag and "single-node" in model.doc_options %}
|
|
||||||
.. tab-item:: MAD-integrated benchmarking
|
|
||||||
|
|
||||||
The following run command is tailored to {{ model.model }}.
|
|
||||||
See :ref:`amd-maxtext-model-support-v26.1` to switch to another available model.
|
|
||||||
|
|
||||||
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
|
|
||||||
directory and install the required packages on the host machine.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
git clone https://github.com/ROCm/MAD
|
|
||||||
cd MAD
|
|
||||||
pip install -r requirements.txt
|
|
||||||
|
|
||||||
2. Use this command to run the performance benchmark test on the {{ model.model }} model
|
|
||||||
using one GPU with the :literal:`{{model.precision}}` data type on the host machine.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
|
|
||||||
madengine run \
|
|
||||||
--tags {{model.mad_tag}} \
|
|
||||||
--keep-model-dir \
|
|
||||||
--live-output \
|
|
||||||
--timeout 28800
|
|
||||||
|
|
||||||
MAD launches a Docker container with the name
|
|
||||||
``container_ci-{{model.mad_tag}}``. The latency and throughput reports of the
|
|
||||||
model are collected in the following path: ``~/MAD/perf.csv/``.
|
|
||||||
{% endif %}
|
|
||||||
|
|
||||||
.. tab-item:: Standalone benchmarking
|
|
||||||
|
|
||||||
The following commands are optimized for {{ model.model }}. See
|
|
||||||
:ref:`amd-maxtext-model-support-v26.1` to switch to another
|
|
||||||
available model. Some instructions and resources might not be
|
|
||||||
available for all models and configurations.
|
|
||||||
|
|
||||||
.. rubric:: Download the Docker image and required scripts
|
|
||||||
|
|
||||||
Run the JAX MaxText benchmark tool independently by starting the
|
|
||||||
Docker container as shown in the following snippet.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker pull {{ docker.pull_tag }}
|
|
||||||
|
|
||||||
{% if model.model_repo and "single-node" in model.doc_options %}
|
|
||||||
.. rubric:: Single node training
|
|
||||||
|
|
||||||
1. Set up environment variables.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
export MAD_SECRETS_HFTOKEN=<Your Hugging Face token>
|
|
||||||
export HF_HOME=<Location of saved/cached Hugging Face models>
|
|
||||||
|
|
||||||
``MAD_SECRETS_HFTOKEN`` is your Hugging Face access token to access models, tokenizers, and data.
|
|
||||||
See `User access tokens <https://huggingface.co/docs/hub/en/security-tokens>`__.
|
|
||||||
|
|
||||||
``HF_HOME`` is where ``huggingface_hub`` will store local data. See `huggingface_hub CLI <https://huggingface.co/docs/huggingface_hub/main/en/guides/cli#huggingface-cli-download>`__.
|
|
||||||
If you already have downloaded or cached Hugging Face artifacts, set this variable to that path.
|
|
||||||
Downloaded files typically get cached to ``~/.cache/huggingface``.
|
|
||||||
|
|
||||||
2. Launch the Docker container.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker run -it \
|
|
||||||
--device=/dev/dri \
|
|
||||||
--device=/dev/kfd \
|
|
||||||
--network host \
|
|
||||||
--ipc host \
|
|
||||||
--group-add video \
|
|
||||||
--cap-add=SYS_PTRACE \
|
|
||||||
--security-opt seccomp=unconfined \
|
|
||||||
--privileged \
|
|
||||||
-v $HOME:$HOME \
|
|
||||||
-v $HOME/.ssh:/root/.ssh \
|
|
||||||
-v $HF_HOME:/hf_cache \
|
|
||||||
-e HF_HOME=/hf_cache \
|
|
||||||
-e MAD_SECRETS_HFTOKEN=$MAD_SECRETS_HFTOKEN
|
|
||||||
--shm-size 64G \
|
|
||||||
--name training_env \
|
|
||||||
{{ docker.pull_tag }}
|
|
||||||
|
|
||||||
3. In the Docker container, clone the ROCm MAD repository and navigate to the
|
|
||||||
benchmark scripts directory at ``MAD/scripts/jax-maxtext``.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
git clone https://github.com/ROCm/MAD
|
|
||||||
cd MAD/scripts/jax-maxtext
|
|
||||||
|
|
||||||
4. Run the setup scripts to install libraries and datasets needed
|
|
||||||
for benchmarking.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
./jax-maxtext_benchmark_setup.sh -m {{ model.model_repo }}
|
|
||||||
|
|
||||||
5. To run the training benchmark without quantization, use the following command:
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
./jax-maxtext_benchmark_report.sh -m {{ model.model_repo }}
|
|
||||||
|
|
||||||
For quantized training, run the script with the appropriate option for your Instinct GPU.
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MI355X and MI350X
|
|
||||||
|
|
||||||
For ``fp8`` quantized training on MI355X and MI350X GPUs, use the following command:
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
./jax-maxtext_benchmark_report.sh -m {{ model.model_repo }} -q fp8
|
|
||||||
|
|
||||||
{% if model.model_repo not in ["Llama-3.1-70B", "Llama-3.3-70B"] %}
|
|
||||||
.. tab-item:: MI325X and MI300X
|
|
||||||
|
|
||||||
For ``nanoo_fp8`` quantized training on MI300X series GPUs, use the following command:
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
./jax-maxtext_benchmark_report.sh -m {{ model.model_repo }} -q nanoo_fp8
|
|
||||||
{% endif %}
|
|
||||||
|
|
||||||
{% endif %}
|
|
||||||
{% if model.multinode_config and "multi-node" in model.doc_options %}
|
|
||||||
.. rubric:: Multi-node training
|
|
||||||
|
|
||||||
The following SLURM scripts will launch the Docker container and
|
|
||||||
run the benchmark. Run them outside of any Docker container. The
|
|
||||||
unified multi-node benchmark script accepts a configuration file
|
|
||||||
that specifies the model and training parameters.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
sbatch -N <NUM_NODES> jax_maxtext_multinode_benchmark.sh <config_file.yml> [docker_image]
|
|
||||||
|
|
||||||
<NUM_NODES>
|
|
||||||
The number of nodes to use for training (for example, 2, 4,
|
|
||||||
8).
|
|
||||||
|
|
||||||
<config_file.yml>
|
|
||||||
Path to the YAML configuration file containing model and
|
|
||||||
training parameters. Configuration files are available in the
|
|
||||||
``scripts/jax-maxtext/env_scripts/`` directory for different
|
|
||||||
models and GPU architectures.
|
|
||||||
|
|
||||||
[docker_image] (optional)
|
|
||||||
The Docker image to use. If not specified, it defaults to
|
|
||||||
``rocm/jax-training:maxtext-v26.1``.
|
|
||||||
|
|
||||||
For example, to run a multi-node training benchmark on {{ model.model }}:
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
{% if model.multinode_config.gfx950 %}
|
|
||||||
.. tab-item:: MI355X and MI350X (gfx950)
|
|
||||||
|
|
||||||
.. code-block:: bash
|
|
||||||
|
|
||||||
sbatch -N 4 jax_maxtext_multinode_benchmark.sh {{ model.multinode_config.gfx950 }}
|
|
||||||
{% endif %}
|
|
||||||
|
|
||||||
{% if model.multinode_config.gfx942 %}
|
|
||||||
.. tab-item:: MI325X and MI300X (gfx942)
|
|
||||||
|
|
||||||
.. code-block:: bash
|
|
||||||
|
|
||||||
sbatch -N 4 jax_maxtext_multinode_benchmark.sh {{ model.multinode_config.gfx942 }}
|
|
||||||
{% endif %}
|
|
||||||
|
|
||||||
{% else %}
|
|
||||||
.. rubric:: Multi-node training
|
|
||||||
|
|
||||||
For multi-node training examples, choose a model from :ref:`amd-maxtext-model-support-v26.1`
|
|
||||||
with an available `multi-node training script <https://github.com/ROCm/MAD/tree/develop/scripts/jax-maxtext/env_scripts>`__.
|
|
||||||
{% endif %}
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
Known issues
|
|
||||||
============
|
|
||||||
|
|
||||||
- You might see NaNs in the losses when setting ``packing=True``. As
|
|
||||||
a workaround, turn off input sequence packing (``packing=False``).
|
|
||||||
This will be fixed in a future release.
|
|
||||||
|
|
||||||
- Docker ``rocm/jax-training:maxtext-v26.1`` does not include `Primus
|
|
||||||
<https://github.com/AMD-AGI/Primus/tree/main>`__. It is planned to be
|
|
||||||
supported in a future release.
|
|
||||||
|
|
||||||
Further reading
|
|
||||||
===============
|
|
||||||
|
|
||||||
- To learn more about MAD and the ``madengine`` CLI, see the `MAD usage guide <https://github.com/ROCm/MAD?tab=readme-ov-file#usage-guide>`__.
|
|
||||||
|
|
||||||
- To learn more about system settings and management practices to configure your system for
|
|
||||||
AMD Instinct MI300X Series GPUs, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_.
|
|
||||||
|
|
||||||
- For a list of other ready-made Docker images for AI with ROCm, see
|
|
||||||
`AMD Infinity Hub <https://www.amd.com/en/developer/resources/infinity-hub.html#f-amd_hub_category=AI%20%26%20ML%20Models>`_.
|
|
||||||
|
|
||||||
Previous versions
|
|
||||||
=================
|
|
||||||
|
|
||||||
See :doc:`previous-versions/jax-maxtext-history` to find documentation for previous releases
|
|
||||||
of the ``ROCm/jax-training`` Docker image.
|
|
||||||
@@ -7,7 +7,7 @@ Megatron-LM training performance testing version history
|
|||||||
This table lists previous versions of the ROCm Megatron-LM training Docker image for
|
This table lists previous versions of the ROCm Megatron-LM training Docker image for
|
||||||
inference performance testing. For detailed information about available models
|
inference performance testing. For detailed information about available models
|
||||||
for benchmarking, see the version-specific documentation. You can find tagged
|
for benchmarking, see the version-specific documentation. You can find tagged
|
||||||
previous releases of the ``ROCm/primus`` Docker image on `Docker Hub <https://hub.docker.com/r/rocm/megatron-lm/tags>`__.
|
previous releases of the ``ROCm/megatron-lm`` Docker image on `Docker Hub <https://hub.docker.com/r/rocm/megatron-lm/tags>`__.
|
||||||
|
|
||||||
.. list-table::
|
.. list-table::
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
@@ -16,20 +16,13 @@ previous releases of the ``ROCm/primus`` Docker image on `Docker Hub <https://hu
|
|||||||
- Components
|
- Components
|
||||||
- Resources
|
- Resources
|
||||||
|
|
||||||
* - v26.2 (latest)
|
* - v26.1 (latest)
|
||||||
-
|
|
||||||
* ROCm 7.2.0
|
|
||||||
* PyTorch 2.10.0+git94c6e04
|
|
||||||
-
|
|
||||||
* :doc:`Primus Megatron documentation <../primus-megatron>`
|
|
||||||
* `Docker Hub <https://hub.docker.com/layers/rocm/primus/v26.2/images/sha256-9148d1bfcd579bf92f44bd89090e0d8c958f149c134b4b34b9674ab559244585>`__
|
|
||||||
|
|
||||||
* - v26.1
|
|
||||||
-
|
-
|
||||||
* ROCm 7.1.0
|
* ROCm 7.1.0
|
||||||
* PyTorch 2.10.0.dev20251112+rocm7.1
|
* PyTorch 2.10.0.dev20251112+rocm7.1
|
||||||
-
|
-
|
||||||
* :doc:`Primus Megatron documentation <primus-megatron-v26.1>`
|
* :doc:`Primus Megatron documentation <../primus-megatron>`
|
||||||
|
* :doc:`Megatron-LM (legacy) documentation <../megatron-lm>`
|
||||||
* `Docker Hub <https://hub.docker.com/layers/rocm/primus/v26.1/images/sha256-4fc8808bdb14117c6af7f38d79c809056e6fdbfd530c1fabbb61d097ddaf820d>`__
|
* `Docker Hub <https://hub.docker.com/layers/rocm/primus/v26.1/images/sha256-4fc8808bdb14117c6af7f38d79c809056e6fdbfd530c1fabbb61d097ddaf820d>`__
|
||||||
|
|
||||||
* - v25.11
|
* - v25.11
|
||||||
@@ -38,7 +31,7 @@ previous releases of the ``ROCm/primus`` Docker image on `Docker Hub <https://hu
|
|||||||
* PyTorch 2.10.0.dev20251112+rocm7.1
|
* PyTorch 2.10.0.dev20251112+rocm7.1
|
||||||
-
|
-
|
||||||
* :doc:`Primus Megatron documentation <primus-megatron-v25.11>`
|
* :doc:`Primus Megatron documentation <primus-megatron-v25.11>`
|
||||||
* :doc:`Megatron-LM (legacy) documentation <megatron-lm-v25.11>`
|
* :doc:`Megatron-LM (legacy) documentation <megatron-lm-v25.10>`
|
||||||
* `Docker Hub <https://hub.docker.com/layers/rocm/primus/v25.11/images/sha256-71aa65a9bfc8e9dd18bce5b68c81caff864f223e9afa75dc1b719671a1f4a3c3>`__
|
* `Docker Hub <https://hub.docker.com/layers/rocm/primus/v25.11/images/sha256-71aa65a9bfc8e9dd18bce5b68c81caff864f223e9afa75dc1b719671a1f4a3c3>`__
|
||||||
|
|
||||||
* - v25.10
|
* - v25.10
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@@ -1,457 +0,0 @@
|
|||||||
:orphan:
|
|
||||||
|
|
||||||
.. meta::
|
|
||||||
:description: How to train a model using PyTorch for ROCm.
|
|
||||||
:keywords: ROCm, AI, LLM, train, PyTorch, torch, Llama, flux, tutorial, docker
|
|
||||||
|
|
||||||
****************************************
|
|
||||||
Training a model with Primus and PyTorch
|
|
||||||
****************************************
|
|
||||||
|
|
||||||
.. caution::
|
|
||||||
|
|
||||||
This documentation does not reflect the latest version of ROCm Primus PyTorch training
|
|
||||||
performance benchmark documentation. See :doc:`../primus-pytorch` for the latest version.
|
|
||||||
|
|
||||||
`Primus <https://github.com/AMD-AGI/Primus>`__ is a unified and flexible
|
|
||||||
LLM training framework designed to streamline training. It streamlines LLM
|
|
||||||
training on AMD Instinct GPUs using a modular, reproducible configuration paradigm.
|
|
||||||
Primus now supports the PyTorch torchtitan backend.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
For a unified training solution on AMD GPUs with ROCm, the `rocm/pytorch-training
|
|
||||||
<https://hub.docker.com/r/rocm/pytorch-training/>`__ Docker Hub registry will be
|
|
||||||
deprecated soon in favor of `rocm/primus <https://hub.docker.com/r/rocm/primus>`__.
|
|
||||||
The ``rocm/primus`` Docker containers will cover PyTorch training ecosystem frameworks,
|
|
||||||
including torchtitan and :doc:`Megatron-LM <primus-megatron>`.
|
|
||||||
|
|
||||||
Primus with the PyTorch torchtitan backend is designed to replace the
|
|
||||||
:doc:`ROCm PyTorch training <pytorch-training>` workflow. See
|
|
||||||
:doc:`pytorch-training` to see steps to run workloads without Primus.
|
|
||||||
|
|
||||||
AMD provides a ready-to-use Docker image for MI355X, MI350X, MI325X, and
|
|
||||||
MI300X GPUs containing essential components for Primus and PyTorch training
|
|
||||||
with Primus Turbo optimizations.
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-pytorch-v26.1-benchmark-models.yaml
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: {{ data.docker.pull_tag }}
|
|
||||||
:sync: {{ data.docker.pull_tag }}
|
|
||||||
|
|
||||||
.. list-table::
|
|
||||||
:header-rows: 1
|
|
||||||
|
|
||||||
* - Software component
|
|
||||||
- Version
|
|
||||||
|
|
||||||
{% for component_name, component_version in data.docker.components.items() %}
|
|
||||||
* - {{ component_name }}
|
|
||||||
- {{ component_version }}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
.. _amd-primus-pytorch-model-support-v26.01:
|
|
||||||
|
|
||||||
Supported models
|
|
||||||
================
|
|
||||||
|
|
||||||
The following models are pre-optimized for performance on the AMD Instinct MI325X and MI300X GPUs.
|
|
||||||
Some instructions, commands, and training recommendations in this documentation might
|
|
||||||
vary by model -- select one to get started.
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-pytorch-v26.1-benchmark-models.yaml
|
|
||||||
|
|
||||||
{% set model_groups = data.model_groups %}
|
|
||||||
.. raw:: html
|
|
||||||
|
|
||||||
<div id="vllm-benchmark-ud-params-picker" class="container-fluid">
|
|
||||||
<div class="row gx-0">
|
|
||||||
<div class="col-2 me-1 px-2 model-param-head">Model</div>
|
|
||||||
<div class="row col-10 pe-0">
|
|
||||||
{% for model_group in model_groups %}
|
|
||||||
<div class="col-6 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
|
|
||||||
{% endfor %}
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
<div class="row gx-0 pt-1">
|
|
||||||
<div class="col-2 me-1 px-2 model-param-head">Variant</div>
|
|
||||||
<div class="row col-10 pe-0">
|
|
||||||
{% for model_group in model_groups %}
|
|
||||||
{% set models = model_group.models %}
|
|
||||||
{% for model in models %}
|
|
||||||
{% if models|length % 3 == 0 %}
|
|
||||||
<div class="col-4 px-2 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
|
|
||||||
{% else %}
|
|
||||||
<div class="col-6 px-2 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
|
|
||||||
{% endif %}
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
.. seealso::
|
|
||||||
|
|
||||||
For additional workloads, including Llama 3.3, Llama 3.2, Llama 2, GPT OSS, Qwen, and Flux models,
|
|
||||||
see the documentation :doc:`pytorch-training` (without Primus)
|
|
||||||
|
|
||||||
.. _amd-primus-pytorch-performance-measurements-v26.01:
|
|
||||||
|
|
||||||
System validation
|
|
||||||
=================
|
|
||||||
|
|
||||||
Before running AI workloads, it's important to validate that your AMD hardware is configured
|
|
||||||
correctly and performing optimally.
|
|
||||||
|
|
||||||
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
|
|
||||||
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
|
|
||||||
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
|
|
||||||
before starting training.
|
|
||||||
|
|
||||||
To test for optimal performance, consult the recommended :ref:`System health benchmarks
|
|
||||||
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
|
|
||||||
system's configuration.
|
|
||||||
|
|
||||||
This Docker image is optimized for specific model configurations outlined
|
|
||||||
below. Performance can vary for other training workloads, as AMD
|
|
||||||
doesn’t test configurations and run conditions outside those described.
|
|
||||||
|
|
||||||
Pull the Docker image
|
|
||||||
=====================
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-pytorch-v26.1-benchmark-models.yaml
|
|
||||||
|
|
||||||
Use the following command to pull the Docker image from Docker Hub.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker pull {{ data.docker.pull_tag }}
|
|
||||||
|
|
||||||
Run training
|
|
||||||
============
|
|
||||||
|
|
||||||
Once the setup is complete, choose between the following two workflows to start benchmarking training.
|
|
||||||
For fine-tuning workloads and multi-node training examples, see :doc:`pytorch-training` (without Primus).
|
|
||||||
For best performance on MI325X, MI350X, and MI355X GPUs, you might need to
|
|
||||||
tweak some configurations (such as batch sizes).
|
|
||||||
|
|
||||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-pytorch-v26.1-benchmark-models.yaml
|
|
||||||
|
|
||||||
{% set docker = data.docker %}
|
|
||||||
{% set model_groups = data.model_groups %}
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: Primus benchmarking
|
|
||||||
|
|
||||||
{% for model_group in model_groups %}
|
|
||||||
{% for model in model_group.models %}
|
|
||||||
|
|
||||||
.. container:: model-doc {{ model.mad_tag }}
|
|
||||||
|
|
||||||
The following run commands are tailored to {{ model.model }}.
|
|
||||||
See :ref:`amd-primus-pytorch-model-support-v26.01` to switch to another available model.
|
|
||||||
|
|
||||||
.. rubric:: Download the Docker image and required packages
|
|
||||||
|
|
||||||
1. Pull the ``{{ docker.pull_tag }}`` Docker image from Docker Hub.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker pull {{ docker.pull_tag }}
|
|
||||||
|
|
||||||
2. Run the Docker container.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker run -it \
|
|
||||||
--device /dev/dri \
|
|
||||||
--device /dev/kfd \
|
|
||||||
--network host \
|
|
||||||
--ipc host \
|
|
||||||
--group-add video \
|
|
||||||
--cap-add SYS_PTRACE \
|
|
||||||
--security-opt seccomp=unconfined \
|
|
||||||
--privileged \
|
|
||||||
-v $HOME:$HOME \
|
|
||||||
-v $HOME/.ssh:/root/.ssh \
|
|
||||||
--shm-size 64G \
|
|
||||||
--name training_env \
|
|
||||||
{{ docker.pull_tag }}
|
|
||||||
|
|
||||||
Use these commands if you exit the ``training_env`` container and need to return to it.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
docker start training_env
|
|
||||||
docker exec -it training_env bash
|
|
||||||
|
|
||||||
The Docker container hosts verified commit ``9c529cd4`` of the `Primus
|
|
||||||
<https://github.com/AMD-AGI/Primus/tree/9c529cd4a934a68a880ede036c3e97b792e38167/>`__ repository.
|
|
||||||
|
|
||||||
.. rubric:: Prepare training datasets and dependencies
|
|
||||||
|
|
||||||
The following benchmarking examples require downloading models and datasets
|
|
||||||
from Hugging Face. To ensure successful access to gated repos, set your
|
|
||||||
``HF_TOKEN``.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
export HF_TOKEN=$your_personal_hugging_face_access_token
|
|
||||||
|
|
||||||
.. rubric:: Pretraining
|
|
||||||
|
|
||||||
To get started, navigate to the ``Primus`` directory in your container.
|
|
||||||
|
|
||||||
.. code-block::
|
|
||||||
|
|
||||||
cd /workspace/Primus
|
|
||||||
|
|
||||||
Now, to start the pretraining benchmark, use the ``run_pretrain.sh`` script
|
|
||||||
included with Primus with the appropriate options.
|
|
||||||
|
|
||||||
.. rubric:: Benchmarking examples
|
|
||||||
|
|
||||||
.. container:: model-doc primus_pyt_train_llama-3.1-8b
|
|
||||||
|
|
||||||
Use the following command to run train Llama 3.1 8B with BF16 precision using Primus torchtitan.
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MI355X and MI350X
|
|
||||||
:sync: MI355X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_llama3.1_8B.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/torchtitan/configs/MI355X/llama3.1_8B-BF16-pretrain.yaml
|
|
||||||
|
|
||||||
.. tab-item:: MI325X
|
|
||||||
:sync: MI325X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_llama3.1_8B.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/torchtitan/configs/MI300X/llama3.1_8B-BF16-pretrain.yaml \
|
|
||||||
--training.local_batch_size 6
|
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
|
||||||
:sync: MI300X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_llama3.1_8B.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/torchtitan/configs/MI300X/llama3.1_8B-BF16-pretrain.yaml
|
|
||||||
|
|
||||||
To train Llama 3.1 8B with FP8 precision, use the following command.
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MI355X and MI350X
|
|
||||||
:sync: MI355X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_llama3.1_8B_fp8.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/torchtitan/configs/MI355X/llama3.1_8B-FP8-pretrain.yaml
|
|
||||||
|
|
||||||
.. tab-item:: MI325X
|
|
||||||
:sync: MI325X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_llama3.1_8B_fp8.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/torchtitan/configs/MI300X/llama3.1_8B-FP8-pretrain.yaml \
|
|
||||||
--training.local_batch_size 7
|
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
|
||||||
:sync: MI300X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_llama3.1_8B_fp8.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/torchtitan/configs/MI300X/llama3.1_8B-FP8-pretrain.yaml
|
|
||||||
|
|
||||||
.. container:: model-doc primus_pyt_train_llama-3.1-70b
|
|
||||||
|
|
||||||
Use the following command to run train Llama 3.1 70B with BF16 precision using Primus torchtitan.
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MI355X and MI350X
|
|
||||||
:sync: MI355X and MI300X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_llama3.1_70B.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/torchtitan/configs/MI355X/llama3.1_70B-BF16-pretrain.yaml
|
|
||||||
|
|
||||||
.. tab-item:: MI325X
|
|
||||||
:sync: MI325X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_llama3.1_70B.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/torchtitan/configs/MI300X/llama3.1_70B-BF16-pretrain.yaml \
|
|
||||||
--training.local_batch_size 6
|
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
|
||||||
:sync: MI300X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_llama3.1_70B.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/torchtitan/configs/MI300X/llama3.1_70B-BF16-pretrain.yaml
|
|
||||||
|
|
||||||
To train Llama 3.1 70B with FP8 precision, use the following command.
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MI355X and MI350X
|
|
||||||
:sync: MI355X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_llama3.1_70B_fp8.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/torchtitan/configs/MI355X/llama3.1_70B-FP8-pretrain.yaml
|
|
||||||
|
|
||||||
.. tab-item:: MI325X
|
|
||||||
:sync: MI325X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_llama3.1_70B_fp8.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/torchtitan/configs/MI300X/llama3.1_70B-FP8-pretrain.yaml \
|
|
||||||
--training.local_batch_size 5
|
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
|
||||||
:sync: MI300X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_llama3.1_70B_fp8.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/torchtitan/configs/MI300X/llama3.1_70B-FP8-pretrain.yaml
|
|
||||||
|
|
||||||
.. container:: model-doc primus_pyt_train_deepseek-v3-16b
|
|
||||||
|
|
||||||
Use the following command to run train DeepSeek V3 16B with BF16 precision using Primus torchtitan.
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MI355X and MI350X
|
|
||||||
:sync: MI355X and MI300X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_deepseek_v3_16b.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/torchtitan/configs/MI355X/deepseek_v3_16b-pretrain.yaml
|
|
||||||
|
|
||||||
.. tab-item:: MI325X
|
|
||||||
:sync: MI325X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_deepseek_v3_16b.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/torchtitan/configs/MI300X/deepseek_v3_16b-pretrain.yaml \
|
|
||||||
--training.local_batch_size 10
|
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
|
||||||
:sync: MI300X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_deepseek_v3_16b.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/torchtitan/configs/MI300X/deepseek_v3_16b-pretrain.yaml
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
.. tab-item:: MAD-integrated benchmarking
|
|
||||||
|
|
||||||
{% for model_group in model_groups %}
|
|
||||||
{% for model in model_group.models %}
|
|
||||||
|
|
||||||
.. container:: model-doc {{ model.mad_tag }}
|
|
||||||
|
|
||||||
The following run command is tailored to {{ model.model }}.
|
|
||||||
See :ref:`amd-primus-pytorch-model-support-v26.01` to switch to another available model.
|
|
||||||
|
|
||||||
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
|
|
||||||
directory and install the required packages on the host machine.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
git clone https://github.com/ROCm/MAD
|
|
||||||
cd MAD
|
|
||||||
pip install -r requirements.txt
|
|
||||||
|
|
||||||
2. For example, use this command to run the performance benchmark test on the {{ model.model }} model
|
|
||||||
using one node with the {{ model.precision }} data type on the host machine.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
|
|
||||||
madengine run \
|
|
||||||
--tags {{ model.mad_tag }} \
|
|
||||||
--keep-model-dir \
|
|
||||||
--live-output \
|
|
||||||
--timeout 28800
|
|
||||||
|
|
||||||
MAD launches a Docker container with the name
|
|
||||||
``container_ci-{{ model.mad_tag }}``. The latency and throughput reports of the
|
|
||||||
model are collected in ``~/MAD/perf.csv``.
|
|
||||||
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
Further reading
|
|
||||||
===============
|
|
||||||
|
|
||||||
- For an introduction to Primus, see `Primus: A Lightweight, Unified Training
|
|
||||||
Framework for Large Models on AMD GPUs <https://rocm.blogs.amd.com/software-tools-optimization/primus/README.html>`__.
|
|
||||||
|
|
||||||
- To learn more about MAD and the ``madengine`` CLI, see the `MAD usage guide <https://github.com/ROCm/MAD?tab=readme-ov-file#usage-guide>`__.
|
|
||||||
|
|
||||||
- To learn more about system settings and management practices to configure your system for
|
|
||||||
AMD Instinct MI300X Series GPUs, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_.
|
|
||||||
|
|
||||||
- For a list of other ready-made Docker images for AI with ROCm, see
|
|
||||||
`AMD Infinity Hub <https://www.amd.com/en/developer/resources/infinity-hub.html#f-amd_hub_category=AI%20%26%20ML%20Models>`_.
|
|
||||||
|
|
||||||
Previous versions
|
|
||||||
=================
|
|
||||||
|
|
||||||
See :doc:`pytorch-training-history` to find documentation for previous releases
|
|
||||||
of the ``ROCm/pytorch-training`` Docker image.
|
|
||||||
@@ -7,7 +7,7 @@ PyTorch training performance testing version history
|
|||||||
This table lists previous versions of the ROCm PyTorch training Docker image for
|
This table lists previous versions of the ROCm PyTorch training Docker image for
|
||||||
inference performance testing. For detailed information about available models
|
inference performance testing. For detailed information about available models
|
||||||
for benchmarking, see the version-specific documentation. You can find tagged
|
for benchmarking, see the version-specific documentation. You can find tagged
|
||||||
previous releases of the ``ROCm/primus`` Docker image on `Docker Hub <https://hub.docker.com/r/rocm/pytorch-training/tags>`_.
|
previous releases of the ``ROCm/pytorch-training`` Docker image on `Docker Hub <https://hub.docker.com/r/rocm/pytorch-training/tags>`_.
|
||||||
|
|
||||||
.. list-table::
|
.. list-table::
|
||||||
:header-rows: 1
|
:header-rows: 1
|
||||||
@@ -16,20 +16,13 @@ previous releases of the ``ROCm/primus`` Docker image on `Docker Hub <https://hu
|
|||||||
- Components
|
- Components
|
||||||
- Resources
|
- Resources
|
||||||
|
|
||||||
* - v26.2 (latest)
|
* - v26.1 (latest)
|
||||||
-
|
|
||||||
* ROCm 7.2.0
|
|
||||||
* PyTorch 2.10.0+git94c6e04
|
|
||||||
-
|
|
||||||
* :doc:`Primus PyTorch training documentation <../primus-pytorch>`
|
|
||||||
* `Docker Hub <https://hub.docker.com/layers/rocm/primus/v26.2/images/sha256-9148d1bfcd579bf92f44bd89090e0d8c958f149c134b4b34b9674ab559244585>`__
|
|
||||||
|
|
||||||
* - v26.1
|
|
||||||
-
|
-
|
||||||
* ROCm 7.1.0
|
* ROCm 7.1.0
|
||||||
* PyTorch 2.10.0.dev20251112+rocm7.1
|
* PyTorch 2.10.0.dev20251112+rocm7.1
|
||||||
-
|
-
|
||||||
* :doc:`Primus PyTorch training documentation <primus-pytorch-v26.1>`
|
* :doc:`Primus PyTorch training documentation <../primus-megatron>`
|
||||||
|
* :doc:`PyTorch training (legacy) documentation <../megatron-lm>`
|
||||||
* `Docker Hub <https://hub.docker.com/layers/rocm/primus/v26.1/images/sha256-4fc8808bdb14117c6af7f38d79c809056e6fdbfd530c1fabbb61d097ddaf820d>`__
|
* `Docker Hub <https://hub.docker.com/layers/rocm/primus/v26.1/images/sha256-4fc8808bdb14117c6af7f38d79c809056e6fdbfd530c1fabbb61d097ddaf820d>`__
|
||||||
|
|
||||||
* - v25.11
|
* - v25.11
|
||||||
|
|||||||
@@ -47,7 +47,7 @@ Megatron-LM.
|
|||||||
- {{ component_version }}
|
- {{ component_version }}
|
||||||
{% endfor %}
|
{% endfor %}
|
||||||
|
|
||||||
.. _amd-primus-megatron-lm-model-support-v26.2:
|
.. _amd-primus-megatron-lm-model-support-v26.01:
|
||||||
|
|
||||||
Supported models
|
Supported models
|
||||||
================
|
================
|
||||||
@@ -65,21 +65,9 @@ might vary by model -- select one to get started.
|
|||||||
<div class="row gx-0">
|
<div class="row gx-0">
|
||||||
<div class="col-2 me-1 px-2 model-param-head">Model</div>
|
<div class="col-2 me-1 px-2 model-param-head">Model</div>
|
||||||
<div class="row col-10 pe-0">
|
<div class="row col-10 pe-0">
|
||||||
{% set tag = "llama" %}
|
{% for model_group in model_groups %}
|
||||||
{% set group = "Meta Llama" %}
|
<div class="col-3 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
|
||||||
<div class="col-6 px-2 model-param" data-param-k="model-group" data-param-v="{{ tag }}" tabindex="0">{{ group }}</div>
|
{% endfor %}
|
||||||
{% set tag = "zebra-llama" %}
|
|
||||||
{% set group = "AMD Zebra-Llama" %}
|
|
||||||
<div class="col-6 px-2 model-param" data-param-k="model-group" data-param-v="{{ tag }}" tabindex="0">{{ group }}</div>
|
|
||||||
{% set tag = "deepseek" %}
|
|
||||||
{% set group = "DeepSeek" %}
|
|
||||||
<div class="col-4 px-2 model-param" data-param-k="model-group" data-param-v="{{ tag }}" tabindex="0">{{ group }}</div>
|
|
||||||
{% set tag = "mistral" %}
|
|
||||||
{% set group = "Mistral AI" %}
|
|
||||||
<div class="col-4 px-2 model-param" data-param-k="model-group" data-param-v="{{ tag }}" tabindex="0">{{ group }}</div>
|
|
||||||
{% set tag = "qwen" %}
|
|
||||||
{% set group = "Qwen" %}
|
|
||||||
<div class="col-4 px-2 model-param" data-param-k="model-group" data-param-v="{{ tag }}" tabindex="0">{{ group }}</div>
|
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
@@ -120,7 +108,7 @@ To test for optimal performance, consult the recommended :ref:`System health ben
|
|||||||
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
|
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
|
||||||
system's configuration.
|
system's configuration.
|
||||||
|
|
||||||
.. _mi300x-amd-primus-megatron-lm-training-v26.2:
|
.. _mi300x-amd-primus-megatron-lm-training-v26.01:
|
||||||
|
|
||||||
Environment setup
|
Environment setup
|
||||||
=================
|
=================
|
||||||
@@ -130,7 +118,7 @@ Environment setup
|
|||||||
Use the following instructions to set up the environment, configure the script to train models, and
|
Use the following instructions to set up the environment, configure the script to train models, and
|
||||||
reproduce the benchmark results on AMD Instinct GPUs.
|
reproduce the benchmark results on AMD Instinct GPUs.
|
||||||
|
|
||||||
.. _amd-primus-megatron-lm-requirements-v26.2:
|
.. _amd-primus-megatron-lm-requirements-v26.01:
|
||||||
|
|
||||||
Pull the Docker image
|
Pull the Docker image
|
||||||
|
|
||||||
@@ -172,7 +160,7 @@ Pull the Docker image
|
|||||||
The Docker container hosts verified commit ``9c529cd4`` of the `Primus
|
The Docker container hosts verified commit ``9c529cd4`` of the `Primus
|
||||||
<https://github.com/AMD-AGI/Primus/tree/9c529cd4a934a68a880ede036c3e97b792e38167>`__ repository.
|
<https://github.com/AMD-AGI/Primus/tree/9c529cd4a934a68a880ede036c3e97b792e38167>`__ repository.
|
||||||
|
|
||||||
.. _amd-primus-megatron-lm-environment-setup-v26.2:
|
.. _amd-primus-megatron-lm-environment-setup-v26.01:
|
||||||
|
|
||||||
Configuration
|
Configuration
|
||||||
=============
|
=============
|
||||||
@@ -219,7 +207,7 @@ You can use either mock data or real data for training.
|
|||||||
|
|
||||||
Ensure that the files are accessible inside the Docker container.
|
Ensure that the files are accessible inside the Docker container.
|
||||||
|
|
||||||
.. _amd-primus-megatron-lm-tokenizer-v26.2:
|
.. _amd-primus-megatron-lm-tokenizer-v26.01:
|
||||||
|
|
||||||
Tokenizer
|
Tokenizer
|
||||||
---------
|
---------
|
||||||
@@ -232,7 +220,7 @@ right permissions to access the tokenizer for each model.
|
|||||||
# Export your HF_TOKEN in the workspace
|
# Export your HF_TOKEN in the workspace
|
||||||
export HF_TOKEN=<your_hftoken>
|
export HF_TOKEN=<your_hftoken>
|
||||||
|
|
||||||
.. _amd-primus-megatron-lm-run-training-v26.2:
|
.. _amd-primus-megatron-lm-run-training-v26.01:
|
||||||
|
|
||||||
Run training
|
Run training
|
||||||
============
|
============
|
||||||
@@ -249,12 +237,14 @@ To run training on a single node, navigate to ``/workspace/Primus`` and use the
|
|||||||
.. code-block:: shell
|
.. code-block:: shell
|
||||||
|
|
||||||
pip install -r requirements.txt
|
pip install -r requirements.txt
|
||||||
|
export HSA_NO_SCRATCH_RECLAIM=1
|
||||||
|
export NVTE_CK_USES_BWD_V3=1
|
||||||
|
|
||||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.3-70b
|
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.3-70b
|
||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to Llama 3.3 70B.
|
The following run commands are tailored to Llama 3.3 70B.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To run pre-training for Llama 3.3 70B BF16, run:
|
To run pre-training for Llama 3.3 70B BF16, run:
|
||||||
|
|
||||||
@@ -289,7 +279,7 @@ To run training on a single node, navigate to ``/workspace/Primus`` and use the
|
|||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to Llama 3.1 8B.
|
The following run commands are tailored to Llama 3.1 8B.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To run pre-training for Llama 3.1 8B FP8, run:
|
To run pre-training for Llama 3.1 8B FP8, run:
|
||||||
|
|
||||||
@@ -353,7 +343,7 @@ To run training on a single node, navigate to ``/workspace/Primus`` and use the
|
|||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to Llama 3.1 70B.
|
The following run commands are tailored to Llama 3.1 70B.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To run pre-training for Llama 3.1 70B BF16, run:
|
To run pre-training for Llama 3.1 70B BF16, run:
|
||||||
|
|
||||||
@@ -367,9 +357,7 @@ To run training on a single node, navigate to ``/workspace/Primus`` and use the
|
|||||||
bash runner/primus-cli direct \
|
bash runner/primus-cli direct \
|
||||||
--log_file /tmp/primus_llama3.1_70B.log \
|
--log_file /tmp/primus_llama3.1_70B.log \
|
||||||
-- train pretrain \
|
-- train pretrain \
|
||||||
--config examples/megatron/configs/MI355X/llama3.1_70B-BF16-pretrain.yaml \
|
--config examples/megatron/configs/MI355X/llama3.1_70B-BF16-pretrain.yaml
|
||||||
--micro_batch_size 8 \
|
|
||||||
--global_batch_size 128
|
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
.. tab-item:: MI300X
|
||||||
:sync: MI325X and MI300X
|
:sync: MI325X and MI300X
|
||||||
@@ -429,7 +417,7 @@ To run training on a single node, navigate to ``/workspace/Primus`` and use the
|
|||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to Llama 2 7B.
|
The following run commands are tailored to Llama 2 7B.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To run pre-training for Llama 2 7B FP8, run:
|
To run pre-training for Llama 2 7B FP8, run:
|
||||||
|
|
||||||
@@ -493,7 +481,7 @@ To run training on a single node, navigate to ``/workspace/Primus`` and use the
|
|||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to Llama 2 70B.
|
The following run commands are tailored to Llama 2 70B.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To run pre-training for Llama 2 70B BF16, run:
|
To run pre-training for Llama 2 70B BF16, run:
|
||||||
|
|
||||||
@@ -528,7 +516,7 @@ To run training on a single node, navigate to ``/workspace/Primus`` and use the
|
|||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to DeepSeek-V3.
|
The following run commands are tailored to DeepSeek-V3.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To run training on a single node for DeepSeek-V3 (MoE with expert parallel) BF16 with 3-layer proxy,
|
To run training on a single node for DeepSeek-V3 (MoE with expert parallel) BF16 with 3-layer proxy,
|
||||||
use the following command:
|
use the following command:
|
||||||
@@ -548,9 +536,7 @@ To run training on a single node, navigate to ``/workspace/Primus`` and use the
|
|||||||
--moe_layer_freq 1 \
|
--moe_layer_freq 1 \
|
||||||
--train_iters 50 \
|
--train_iters 50 \
|
||||||
--micro_batch_size 8 \
|
--micro_batch_size 8 \
|
||||||
--global_batch_size 64 \
|
--global_batch_size 64
|
||||||
--moe_use_fused_router_with_aux_score True \
|
|
||||||
--moe_permute_fusion True
|
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
.. tab-item:: MI300X
|
||||||
:sync: MI325X and MI300X
|
:sync: MI325X and MI300X
|
||||||
@@ -576,7 +562,7 @@ To run training on a single node, navigate to ``/workspace/Primus`` and use the
|
|||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to DeepSeek-V2-Lite.
|
The following run commands are tailored to DeepSeek-V2-Lite.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To run training on a single node for DeepSeek-V2-Lite (MoE with expert parallel) BF16,
|
To run training on a single node for DeepSeek-V2-Lite (MoE with expert parallel) BF16,
|
||||||
use the following command:
|
use the following command:
|
||||||
@@ -591,11 +577,7 @@ To run training on a single node, navigate to ``/workspace/Primus`` and use the
|
|||||||
bash runner/primus-cli direct \
|
bash runner/primus-cli direct \
|
||||||
--log_file /tmp/primus_deepseek_v2_lite.log \
|
--log_file /tmp/primus_deepseek_v2_lite.log \
|
||||||
-- train pretrain \
|
-- train pretrain \
|
||||||
--config examples/megatron/configs//MI355X/deepseek_v2_lite-BF16-pretrain.yaml \
|
--config examples/megatron/configs//MI355X/deepseek_v2_lite-BF16-pretrain.yaml
|
||||||
--use_turbo_grouped_mlp False \
|
|
||||||
--moe_use_legacy_grouped_gemm True \
|
|
||||||
--moe_use_fused_router_with_aux_score True \
|
|
||||||
--moe_permute_fusion True
|
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
.. tab-item:: MI300X
|
||||||
:sync: MI325X and MI300X
|
:sync: MI325X and MI300X
|
||||||
@@ -616,7 +598,7 @@ To run training on a single node, navigate to ``/workspace/Primus`` and use the
|
|||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to Mixtral 8x7B.
|
The following run commands are tailored to Mixtral 8x7B.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To run training on a single node for Mixtral 8x7B (MoE with expert parallel),
|
To run training on a single node for Mixtral 8x7B (MoE with expert parallel),
|
||||||
use the following command:
|
use the following command:
|
||||||
@@ -652,7 +634,7 @@ To run training on a single node, navigate to ``/workspace/Primus`` and use the
|
|||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to Mixtral 8x22B.
|
The following run commands are tailored to Mixtral 8x22B.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To run training on a single node for Mixtral 8x22B BF16 (MoE with expert parallel) 4-layer proxy,
|
To run training on a single node for Mixtral 8x22B BF16 (MoE with expert parallel) 4-layer proxy,
|
||||||
use the following command:
|
use the following command:
|
||||||
@@ -689,83 +671,11 @@ To run training on a single node, navigate to ``/workspace/Primus`` and use the
|
|||||||
--global_batch_size 16 \
|
--global_batch_size 16 \
|
||||||
--train_iters 50
|
--train_iters 50
|
||||||
|
|
||||||
.. container:: model-doc primus_pyt_megatron_lm_train_qwen3-32b-lora
|
|
||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
|
||||||
The following run commands are tailored to post-training Qwen 3 32B (LoRA).
|
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
|
||||||
|
|
||||||
To run training on a single node for Qwen 3 32B BF16 (SFT), use the following
|
|
||||||
command:
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MI355X and MI350X
|
|
||||||
:sync: MI355X and MI350X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_qwen3_32b.log \
|
|
||||||
-- train posttrain \
|
|
||||||
--config examples/megatron_bridge/configs/MI355X/qwen3_32b_lora_posttrain.yaml
|
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
|
||||||
:sync: MI325X and MI300X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
# Set the variables for better performance
|
|
||||||
# only on MI325X and MI300X
|
|
||||||
export PRIMUS_TURBO_ATTN_V3_ATOMIC_FP32=1
|
|
||||||
export NVTE_CK_IS_V3_ATOMIC_FP32=1
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_qwen3_32b.log \
|
|
||||||
-- train posttrain \
|
|
||||||
--config examples/megatron_bridge/configs/MI300X/qwen3_32b_lora_posttrain.yaml
|
|
||||||
|
|
||||||
.. container:: model-doc primus_pyt_megatron_lm_train_qwen3-32b-sft
|
|
||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
|
||||||
The following run commands are tailored to post-training Qwen 3 32B (SFT).
|
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
|
||||||
|
|
||||||
To run training on a single node for Qwen 3 32B BF16 (SFT), use the following
|
|
||||||
command:
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MI355X and MI350X
|
|
||||||
:sync: MI355X and MI350X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_qwen3_32b_sft.log \
|
|
||||||
-- train posttrain \
|
|
||||||
--config examples/megatron_bridge/configs/MI355X/qwen3_32b_sft_posttrain.yaml
|
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
|
||||||
:sync: MI325X and MI300X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
# Set the variables for better performance
|
|
||||||
# only on MI325X and MI300X
|
|
||||||
export PRIMUS_TURBO_ATTN_V3_ATOMIC_FP32=1
|
|
||||||
export NVTE_CK_IS_V3_ATOMIC_FP32=1
|
|
||||||
|
|
||||||
bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_qwen3_32b_sft.log \
|
|
||||||
-- train posttrain \
|
|
||||||
--config examples/megatron_bridge/configs/MI300X/qwen3_32b_sft_posttrain.yaml
|
|
||||||
|
|
||||||
.. container:: model-doc primus_pyt_megatron_lm_train_qwen2.5-7b
|
.. container:: model-doc primus_pyt_megatron_lm_train_qwen2.5-7b
|
||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to Qwen 2.5 7B.
|
The following run commands are tailored to Qwen 2.5 7B.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To run training on a single node for Qwen 2.5 7B BF16, use the following
|
To run training on a single node for Qwen 2.5 7B BF16, use the following
|
||||||
command:
|
command:
|
||||||
@@ -830,7 +740,7 @@ To run training on a single node, navigate to ``/workspace/Primus`` and use the
|
|||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to Qwen 2.5 72B.
|
The following run commands are tailored to Qwen 2.5 72B.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To run the training on a single node for Qwen 2.5 72B BF16, use the following command.
|
To run the training on a single node for Qwen 2.5 72B BF16, use the following command.
|
||||||
|
|
||||||
@@ -861,112 +771,7 @@ To run training on a single node, navigate to ``/workspace/Primus`` and use the
|
|||||||
-- train pretrain \
|
-- train pretrain \
|
||||||
--config examples/megatron/configs/MI300X/qwen2.5_72B-BF16-pretrain.yaml
|
--config examples/megatron/configs/MI300X/qwen2.5_72B-BF16-pretrain.yaml
|
||||||
|
|
||||||
.. container:: model-doc primus_pyt_megatron_lm_train_zebra-llama-1b
|
.. _amd-primus-megatron-multi-node-examples-v26.01:
|
||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
|
||||||
The following run commands are tailored to Zebra-Llama 1B.
|
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
|
||||||
|
|
||||||
To run the training on a single node for AMD Zebra-Llama 1B BF16, use the following command.
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MI355X and MI350X
|
|
||||||
:sync: MI355X and MI350X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
PRIMUS_TRAIN_RUNTIME=legacy bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_zebra_llama_1B.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/megatron/configs/MI355X/zebra_llama_1B-pretrain.yaml
|
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
|
||||||
:sync: MI325X and MI300X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
# Set the variables for better performance
|
|
||||||
# only on MI325X and MI300X
|
|
||||||
export PRIMUS_TURBO_ATTN_V3_ATOMIC_FP32=1
|
|
||||||
export NVTE_CK_IS_V3_ATOMIC_FP32=1
|
|
||||||
|
|
||||||
PRIMUS_TRAIN_RUNTIME=legacy bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_zebra_llama_1B.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/megatron/configs/MI300X/zebra_llama_1B-pretrain.yaml
|
|
||||||
|
|
||||||
.. container:: model-doc primus_pyt_megatron_lm_train_zebra-llama-3b
|
|
||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
|
||||||
The following run commands are tailored to Zebra-Llama 3B.
|
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
|
||||||
|
|
||||||
To run the training on a single node for AMD Zebra-Llama 3B BF16, use the following command.
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MI355X and MI350X
|
|
||||||
:sync: MI355X and MI350X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
PRIMUS_TRAIN_RUNTIME=legacy bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_zebra_llama_3B.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/megatron/configs/MI355X/zebra_llama_3B-pretrain.yaml
|
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
|
||||||
:sync: MI325X and MI300X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
# Set the variables for better performance
|
|
||||||
# only on MI325X and MI300X
|
|
||||||
export PRIMUS_TURBO_ATTN_V3_ATOMIC_FP32=1
|
|
||||||
export NVTE_CK_IS_V3_ATOMIC_FP32=1
|
|
||||||
|
|
||||||
PRIMUS_TRAIN_RUNTIME=legacy bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_zebra_llama_3B.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/megatron/configs/MI300X/zebra_llama_3B-pretrain.yaml
|
|
||||||
|
|
||||||
.. container:: model-doc primus_pyt_megatron_lm_train_zebra-llama-8b
|
|
||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
|
||||||
The following run commands are tailored to Zebra Llama 8B.
|
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
|
||||||
|
|
||||||
To run the training on a single node for AMD Zebra-Llama 8B BF16, use the following command.
|
|
||||||
|
|
||||||
.. tab-set::
|
|
||||||
|
|
||||||
.. tab-item:: MI355X and MI350X
|
|
||||||
:sync: MI355X and MI350X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
PRIMUS_TRAIN_RUNTIME=legacy bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_zebra_llama_8B.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/megatron/configs/MI355X/zebra_llama_8B-pretrain.yaml
|
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
|
||||||
:sync: MI325X and MI300X
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
# Set the variables for better performance
|
|
||||||
# only on MI325X and MI300X
|
|
||||||
export PRIMUS_TURBO_ATTN_V3_ATOMIC_FP32=1
|
|
||||||
export NVTE_CK_IS_V3_ATOMIC_FP32=1
|
|
||||||
|
|
||||||
PRIMUS_TRAIN_RUNTIME=legacy bash runner/primus-cli direct \
|
|
||||||
--log_file /tmp/primus_zebra_llama_8B.log \
|
|
||||||
-- train pretrain \
|
|
||||||
--config examples/megatron/configs/MI300X/zebra_llama_8B-pretrain.yaml
|
|
||||||
|
|
||||||
.. _amd-primus-megatron-multi-node-examples-v26.2:
|
|
||||||
|
|
||||||
Multi-node training examples
|
Multi-node training examples
|
||||||
----------------------------
|
----------------------------
|
||||||
@@ -984,11 +789,14 @@ to launch the multi-node workload. Use the following steps to setup your environ
|
|||||||
.. code-block:: shell
|
.. code-block:: shell
|
||||||
|
|
||||||
git clone --recurse-submodules https://github.com/AMD-AGI/Primus.git
|
git clone --recurse-submodules https://github.com/AMD-AGI/Primus.git
|
||||||
cd Primus/
|
cd Primus
|
||||||
git checkout 44f780d
|
git checkout c4c083de64ba3e8f19ccc9629411267108931f9e
|
||||||
git submodule update --init --recursive
|
git submodule update --init --recursive
|
||||||
|
|
||||||
export DOCKER_IMAGE={{ docker.pull_tag }}
|
export DOCKER_IMAGE={{ docker.pull_tag }}
|
||||||
export HF_TOKEN=<your_HF_token>
|
export HF_TOKEN=<your_HF_token>
|
||||||
|
export HSA_NO_SCRATCH_RECLAIM=1
|
||||||
|
export NVTE_CK_USES_BWD_V3=1
|
||||||
export NCCL_IB_HCA=<your_NCCL_IB_HCA> # specify which RDMA interfaces to use for communication
|
export NCCL_IB_HCA=<your_NCCL_IB_HCA> # specify which RDMA interfaces to use for communication
|
||||||
export NCCL_SOCKET_IFNAME=<your_NCCL_SOCKET_IFNAME> # your Network Interface
|
export NCCL_SOCKET_IFNAME=<your_NCCL_SOCKET_IFNAME> # your Network Interface
|
||||||
export GLOO_SOCKET_IFNAME=<your_GLOO_SOCKET_IFNAME> # your Network Interface
|
export GLOO_SOCKET_IFNAME=<your_GLOO_SOCKET_IFNAME> # your Network Interface
|
||||||
@@ -1005,13 +813,13 @@ to launch the multi-node workload. Use the following steps to setup your environ
|
|||||||
* If ``NCCL_IB_HCA`` and ``NCCL_SOCKET_IFNAME`` are not set, Primus will try to auto-detect. However, since NICs can vary accross different cluster, it is encouraged to explicitly export your NCCL parameters for the cluster.
|
* If ``NCCL_IB_HCA`` and ``NCCL_SOCKET_IFNAME`` are not set, Primus will try to auto-detect. However, since NICs can vary accross different cluster, it is encouraged to explicitly export your NCCL parameters for the cluster.
|
||||||
* To find your network interface, you can use ``ip a``.
|
* To find your network interface, you can use ``ip a``.
|
||||||
* To find RDMA interfaces, you can use ``ibv_devices`` to get the list of all the RDMA/IB devices.
|
* To find RDMA interfaces, you can use ``ibv_devices`` to get the list of all the RDMA/IB devices.
|
||||||
* Remember to set ``DOCKER_IMAGE`` and ``HF_TOKEN`` (see :ref:`amd-primus-megatron-lm-tokenizer-v26.2`) as appropriate.
|
* Remember to set ``DOCKER_IMAGE`` and ``HF_TOKEN`` (see :ref:`amd-primus-megatron-lm-tokenizer-v26.01`) as appropriate.
|
||||||
|
|
||||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.1-8b
|
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.1-8b
|
||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to Llama 3.1 8B.
|
The following run commands are tailored to Llama 3.1 8B.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To train Llama 3.1 8B FP8 on 8 nodes, run:
|
To train Llama 3.1 8B FP8 on 8 nodes, run:
|
||||||
|
|
||||||
@@ -1028,7 +836,7 @@ to launch the multi-node workload. Use the following steps to setup your environ
|
|||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to Llama 2 7B.
|
The following run commands are tailored to Llama 2 7B.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To train Llama 2 7B FP8 on 8 nodes, run:
|
To train Llama 2 7B FP8 on 8 nodes, run:
|
||||||
|
|
||||||
@@ -1045,7 +853,7 @@ to launch the multi-node workload. Use the following steps to setup your environ
|
|||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to Llama 3.1 70B.
|
The following run commands are tailored to Llama 3.1 70B.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To train Llama 3.1 70B FP8 on 8 nodes, run:
|
To train Llama 3.1 70B FP8 on 8 nodes, run:
|
||||||
|
|
||||||
@@ -1075,7 +883,7 @@ to launch the multi-node workload. Use the following steps to setup your environ
|
|||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to Llama 2 70B.
|
The following run commands are tailored to Llama 2 70B.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To train Llama 2 70B FP8 on 8 nodes, run:
|
To train Llama 2 70B FP8 on 8 nodes, run:
|
||||||
|
|
||||||
@@ -1105,7 +913,7 @@ to launch the multi-node workload. Use the following steps to setup your environ
|
|||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to Llama 3.3 70B.
|
The following run commands are tailored to Llama 3.3 70B.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To train Llama 3.3 70B FP8 on 8 nodes, run:
|
To train Llama 3.3 70B FP8 on 8 nodes, run:
|
||||||
|
|
||||||
@@ -1135,7 +943,7 @@ to launch the multi-node workload. Use the following steps to setup your environ
|
|||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to Llama 2 70B.
|
The following run commands are tailored to Llama 2 70B.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To train Mixtral 8x7B BF16 on 8 nodes, run:
|
To train Mixtral 8x7B BF16 on 8 nodes, run:
|
||||||
|
|
||||||
@@ -1153,7 +961,7 @@ to launch the multi-node workload. Use the following steps to setup your environ
|
|||||||
|
|
||||||
Once setup is complete, run the appropriate training command.
|
Once setup is complete, run the appropriate training command.
|
||||||
The following run commands are tailored to Llama 2 70B.
|
The following run commands are tailored to Llama 2 70B.
|
||||||
See :ref:`amd-primus-megatron-lm-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-megatron-lm-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
To train Qwen2.5 72B FP8 on 8 nodes, run:
|
To train Qwen2.5 72B FP8 on 8 nodes, run:
|
||||||
|
|
||||||
@@ -1168,7 +976,7 @@ to launch the multi-node workload. Use the following steps to setup your environ
|
|||||||
--global_batch_size 512 \
|
--global_batch_size 512 \
|
||||||
--recompute_num_layers 80 \
|
--recompute_num_layers 80 \
|
||||||
|
|
||||||
.. _amd-primus-megatron-lm-benchmark-test-vars-v26.2:
|
.. _amd-primus-megatron-lm-benchmark-test-vars-v26.01:
|
||||||
|
|
||||||
Key options
|
Key options
|
||||||
-----------
|
-----------
|
||||||
@@ -1210,6 +1018,14 @@ recompute_granularity
|
|||||||
num_layers
|
num_layers
|
||||||
For using a reduced number of layers as with proxy models.
|
For using a reduced number of layers as with proxy models.
|
||||||
|
|
||||||
|
Known issues
|
||||||
|
============
|
||||||
|
|
||||||
|
DeepSeekV3 proxy model and Mixtral 8x22B proxy model may exit with an error
|
||||||
|
due to a memory free issue. However, this does not impacts training runs. All
|
||||||
|
iterations, in this case 50, should have been completed before the exit and
|
||||||
|
the results should be available in the end.
|
||||||
|
|
||||||
Further reading
|
Further reading
|
||||||
===============
|
===============
|
||||||
|
|
||||||
|
|||||||
@@ -45,7 +45,7 @@ with Primus Turbo optimizations.
|
|||||||
- {{ component_version }}
|
- {{ component_version }}
|
||||||
{% endfor %}
|
{% endfor %}
|
||||||
|
|
||||||
.. _amd-primus-pytorch-model-support-v26.2:
|
.. _amd-primus-pytorch-model-support-v26.01:
|
||||||
|
|
||||||
Supported models
|
Supported models
|
||||||
================
|
================
|
||||||
@@ -91,7 +91,7 @@ vary by model -- select one to get started.
|
|||||||
For additional workloads, including Llama 3.3, Llama 3.2, Llama 2, GPT OSS, Qwen, and Flux models,
|
For additional workloads, including Llama 3.3, Llama 3.2, Llama 2, GPT OSS, Qwen, and Flux models,
|
||||||
see the documentation :doc:`pytorch-training` (without Primus)
|
see the documentation :doc:`pytorch-training` (without Primus)
|
||||||
|
|
||||||
.. _amd-primus-pytorch-performance-measurements-v26.2:
|
.. _amd-primus-pytorch-performance-measurements-v26.01:
|
||||||
|
|
||||||
System validation
|
System validation
|
||||||
=================
|
=================
|
||||||
@@ -138,6 +138,44 @@ tweak some configurations (such as batch sizes).
|
|||||||
|
|
||||||
.. tab-set::
|
.. tab-set::
|
||||||
|
|
||||||
|
.. tab-item:: MAD-integrated benchmarking
|
||||||
|
|
||||||
|
{% for model_group in model_groups %}
|
||||||
|
{% for model in model_group.models %}
|
||||||
|
|
||||||
|
.. container:: model-doc {{ model.mad_tag }}
|
||||||
|
|
||||||
|
The following run command is tailored to {{ model.model }}.
|
||||||
|
See :ref:`amd-primus-pytorch-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
|
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
|
||||||
|
directory and install the required packages on the host machine.
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
git clone https://github.com/ROCm/MAD
|
||||||
|
cd MAD
|
||||||
|
pip install -r requirements.txt
|
||||||
|
|
||||||
|
2. For example, use this command to run the performance benchmark test on the {{ model.model }} model
|
||||||
|
using one node with the {{ model.precision }} data type on the host machine.
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
|
||||||
|
madengine run \
|
||||||
|
--tags {{ model.mad_tag }} \
|
||||||
|
--keep-model-dir \
|
||||||
|
--live-output \
|
||||||
|
--timeout 28800
|
||||||
|
|
||||||
|
MAD launches a Docker container with the name
|
||||||
|
``container_ci-{{ model.mad_tag }}``. The latency and throughput reports of the
|
||||||
|
model are collected in ``~/MAD/perf.csv``.
|
||||||
|
|
||||||
|
{% endfor %}
|
||||||
|
{% endfor %}
|
||||||
|
|
||||||
.. tab-item:: Primus benchmarking
|
.. tab-item:: Primus benchmarking
|
||||||
|
|
||||||
{% for model_group in model_groups %}
|
{% for model_group in model_groups %}
|
||||||
@@ -146,7 +184,7 @@ tweak some configurations (such as batch sizes).
|
|||||||
.. container:: model-doc {{ model.mad_tag }}
|
.. container:: model-doc {{ model.mad_tag }}
|
||||||
|
|
||||||
The following run commands are tailored to {{ model.model }}.
|
The following run commands are tailored to {{ model.model }}.
|
||||||
See :ref:`amd-primus-pytorch-model-support-v26.2` to switch to another available model.
|
See :ref:`amd-primus-pytorch-model-support-v26.01` to switch to another available model.
|
||||||
|
|
||||||
.. rubric:: Download the Docker image and required packages
|
.. rubric:: Download the Docker image and required packages
|
||||||
|
|
||||||
@@ -224,6 +262,17 @@ tweak some configurations (such as batch sizes).
|
|||||||
-- train pretrain \
|
-- train pretrain \
|
||||||
--config examples/torchtitan/configs/MI355X/llama3.1_8B-BF16-pretrain.yaml
|
--config examples/torchtitan/configs/MI355X/llama3.1_8B-BF16-pretrain.yaml
|
||||||
|
|
||||||
|
.. tab-item:: MI325X
|
||||||
|
:sync: MI325X
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
bash runner/primus-cli direct \
|
||||||
|
--log_file /tmp/primus_llama3.1_8B.log \
|
||||||
|
-- train pretrain \
|
||||||
|
--config examples/torchtitan/configs/MI300X/llama3.1_8B-BF16-pretrain.yaml \
|
||||||
|
--training.local_batch_size 6
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
.. tab-item:: MI300X
|
||||||
:sync: MI300X
|
:sync: MI300X
|
||||||
|
|
||||||
@@ -248,6 +297,17 @@ tweak some configurations (such as batch sizes).
|
|||||||
-- train pretrain \
|
-- train pretrain \
|
||||||
--config examples/torchtitan/configs/MI355X/llama3.1_8B-FP8-pretrain.yaml
|
--config examples/torchtitan/configs/MI355X/llama3.1_8B-FP8-pretrain.yaml
|
||||||
|
|
||||||
|
.. tab-item:: MI325X
|
||||||
|
:sync: MI325X
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
bash runner/primus-cli direct \
|
||||||
|
--log_file /tmp/primus_llama3.1_8B_fp8.log \
|
||||||
|
-- train pretrain \
|
||||||
|
--config examples/torchtitan/configs/MI300X/llama3.1_8B-FP8-pretrain.yaml \
|
||||||
|
--training.local_batch_size 7
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
.. tab-item:: MI300X
|
||||||
:sync: MI300X
|
:sync: MI300X
|
||||||
|
|
||||||
@@ -274,6 +334,17 @@ tweak some configurations (such as batch sizes).
|
|||||||
-- train pretrain \
|
-- train pretrain \
|
||||||
--config examples/torchtitan/configs/MI355X/llama3.1_70B-BF16-pretrain.yaml
|
--config examples/torchtitan/configs/MI355X/llama3.1_70B-BF16-pretrain.yaml
|
||||||
|
|
||||||
|
.. tab-item:: MI325X
|
||||||
|
:sync: MI325X
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
bash runner/primus-cli direct \
|
||||||
|
--log_file /tmp/primus_llama3.1_70B.log \
|
||||||
|
-- train pretrain \
|
||||||
|
--config examples/torchtitan/configs/MI300X/llama3.1_70B-BF16-pretrain.yaml \
|
||||||
|
--training.local_batch_size 6
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
.. tab-item:: MI300X
|
||||||
:sync: MI300X
|
:sync: MI300X
|
||||||
|
|
||||||
@@ -298,6 +369,17 @@ tweak some configurations (such as batch sizes).
|
|||||||
-- train pretrain \
|
-- train pretrain \
|
||||||
--config examples/torchtitan/configs/MI355X/llama3.1_70B-FP8-pretrain.yaml
|
--config examples/torchtitan/configs/MI355X/llama3.1_70B-FP8-pretrain.yaml
|
||||||
|
|
||||||
|
.. tab-item:: MI325X
|
||||||
|
:sync: MI325X
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
bash runner/primus-cli direct \
|
||||||
|
--log_file /tmp/primus_llama3.1_70B_fp8.log \
|
||||||
|
-- train pretrain \
|
||||||
|
--config examples/torchtitan/configs/MI300X/llama3.1_70B-FP8-pretrain.yaml \
|
||||||
|
--training.local_batch_size 5
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
.. tab-item:: MI300X
|
||||||
:sync: MI300X
|
:sync: MI300X
|
||||||
|
|
||||||
@@ -324,6 +406,17 @@ tweak some configurations (such as batch sizes).
|
|||||||
-- train pretrain \
|
-- train pretrain \
|
||||||
--config examples/torchtitan/configs/MI355X/deepseek_v3_16b-pretrain.yaml
|
--config examples/torchtitan/configs/MI355X/deepseek_v3_16b-pretrain.yaml
|
||||||
|
|
||||||
|
.. tab-item:: MI325X
|
||||||
|
:sync: MI325X
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
|
||||||
|
bash runner/primus-cli direct \
|
||||||
|
--log_file /tmp/primus_deepseek_v3_16b.log \
|
||||||
|
-- train pretrain \
|
||||||
|
--config examples/torchtitan/configs/MI300X/deepseek_v3_16b-pretrain.yaml \
|
||||||
|
--training.local_batch_size 10
|
||||||
|
|
||||||
.. tab-item:: MI300X
|
.. tab-item:: MI300X
|
||||||
:sync: MI300X
|
:sync: MI300X
|
||||||
|
|
||||||
@@ -336,44 +429,6 @@ tweak some configurations (such as batch sizes).
|
|||||||
{% endfor %}
|
{% endfor %}
|
||||||
{% endfor %}
|
{% endfor %}
|
||||||
|
|
||||||
.. tab-item:: MAD-integrated benchmarking
|
|
||||||
|
|
||||||
{% for model_group in model_groups %}
|
|
||||||
{% for model in model_group.models %}
|
|
||||||
|
|
||||||
.. container:: model-doc {{ model.mad_tag }}
|
|
||||||
|
|
||||||
The following run command is tailored to {{ model.model }}.
|
|
||||||
See :ref:`amd-primus-pytorch-model-support-v26.2` to switch to another available model.
|
|
||||||
|
|
||||||
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
|
|
||||||
directory and install the required packages on the host machine.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
git clone https://github.com/ROCm/MAD
|
|
||||||
cd MAD
|
|
||||||
pip install -r requirements.txt
|
|
||||||
|
|
||||||
2. For example, use this command to run the performance benchmark test on the {{ model.model }} model
|
|
||||||
using one node with the {{ model.precision }} data type on the host machine.
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
|
|
||||||
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
|
|
||||||
madengine run \
|
|
||||||
--tags {{ model.mad_tag }} \
|
|
||||||
--keep-model-dir \
|
|
||||||
--live-output \
|
|
||||||
--timeout 28800
|
|
||||||
|
|
||||||
MAD launches a Docker container with the name
|
|
||||||
``container_ci-{{ model.mad_tag }}``. The latency and throughput reports of the
|
|
||||||
model are collected in ``~/MAD/perf.csv``.
|
|
||||||
|
|
||||||
{% endfor %}
|
|
||||||
{% endfor %}
|
|
||||||
|
|
||||||
Further reading
|
Further reading
|
||||||
===============
|
===============
|
||||||
|
|
||||||
|
|||||||
@@ -14,7 +14,6 @@ optimize performance for specific types of workloads or use-cases. The contents
|
|||||||
|
|
||||||
.. grid-item-card:: AMD RDNA
|
.. grid-item-card:: AMD RDNA
|
||||||
|
|
||||||
* :doc:`AMD RDNA3.5 system optimization <strixhalo>`
|
|
||||||
* :doc:`AMD RDNA2 system optimization <w6000-v620>`
|
* :doc:`AMD RDNA2 system optimization <w6000-v620>`
|
||||||
|
|
||||||
.. grid-item-card:: AMD Instinct
|
.. grid-item-card:: AMD Instinct
|
||||||
|
|||||||
@@ -1,289 +0,0 @@
|
|||||||
.. meta::
|
|
||||||
:description: Learn about system settings and performance tuning for AMD Strix Halo (Ryzen AI MAX/MAX+) APUs.
|
|
||||||
:keywords: Strix Halo, Ryzen AI MAX, workstation, BIOS, installation, APU, optimization, ROCm
|
|
||||||
|
|
||||||
.. _strix-halo-optimization:
|
|
||||||
|
|
||||||
==========================================
|
|
||||||
AMD Strix Halo system optimization
|
|
||||||
==========================================
|
|
||||||
|
|
||||||
This document provides guidance for optimizing systems powered by AMD Ryzen AI
|
|
||||||
MAX and MAX+ processors (codenamed Strix Halo). These APUs combine
|
|
||||||
high-performance CPU cores with integrated RDNA 3.5 graphics and support up to
|
|
||||||
128GB of unified LPDDR5X-8000 memory, making them particularly well-suited for:
|
|
||||||
|
|
||||||
* LLM development and inference systems
|
|
||||||
* High-performance workstations
|
|
||||||
* Virtualization hosts running multiple VMs
|
|
||||||
* GPU compute and parallel processing
|
|
||||||
* Gaming systems
|
|
||||||
* Home servers and AI development platforms
|
|
||||||
|
|
||||||
The main purpose of this document is to help users utilize Strix Halo APUs to
|
|
||||||
their full potential through proper system configuration.
|
|
||||||
|
|
||||||
.. _memory-settings:
|
|
||||||
|
|
||||||
Memory settings
|
|
||||||
===============
|
|
||||||
|
|
||||||
On Strix Halo GPUs (gfx1151) memory access is handled through GPU Virtual Memory
|
|
||||||
(GPUVM), which provides per-process GPU virtual address spaces (VMIDs) rather
|
|
||||||
than a separate, discrete VRAM pool.
|
|
||||||
|
|
||||||
As a result, memory on Strix Halo is **mapped**, not physically partitioned.
|
|
||||||
The terms Graphics Address Remapping Table (GART) and GTT (Graphics Translation
|
|
||||||
Table) describe limits on how much system memory can be mapped into GPU address
|
|
||||||
spaces and who can use it, rather than distinct types of physical memory.
|
|
||||||
|
|
||||||
* **GART**
|
|
||||||
|
|
||||||
Defines the amount of platform address space (system RAM or Memory-Mapped I/O)
|
|
||||||
that can be mapped into the GPU virtual address space used by the kernel driver.
|
|
||||||
On systems with physically shared CPU and GPU memory, such as Strix Halo, this
|
|
||||||
mapped system memory effectively serves as VRAM for the GPU. GART is typically
|
|
||||||
kept relatively small to limit GPU page-table size and is mainly used for
|
|
||||||
driver-internal operations.
|
|
||||||
|
|
||||||
* **GTT**
|
|
||||||
|
|
||||||
Defines the amount of system RAM that can be mapped into GPU virtual address
|
|
||||||
spaces for user processes. This is the memory pool used by applications such
|
|
||||||
as PyTorch and other AI/compute workloads. GTT allocations are dynamic and are
|
|
||||||
not permanently reserved, allowing the operating system to reclaim memory when
|
|
||||||
it is not actively used by the GPU. By default, the GTT limit is set to
|
|
||||||
approximately 50% of total system RAM.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
On systems with physically shared CPU and GPU memory such as Strix Halo,
|
|
||||||
several terms are often used interchangeably in firmware menus, documentations
|
|
||||||
and community discussions:
|
|
||||||
|
|
||||||
* VRAM
|
|
||||||
* Carve-out
|
|
||||||
* GART
|
|
||||||
* Dedicated GPU memory
|
|
||||||
* Firmware-reserved GPU memory
|
|
||||||
|
|
||||||
In this document, we will use VRAM from this point onward.
|
|
||||||
|
|
||||||
If desired, you can adjust how much memory is preferentially available to the
|
|
||||||
GPU by:
|
|
||||||
|
|
||||||
* Increasing the VRAM in BIOS, or
|
|
||||||
|
|
||||||
* Reducing the configured GTT size so it is smaller than the reserved amount.
|
|
||||||
|
|
||||||
If the GTT size bigger than VRAM at that case the amdgpu driver for VRAM allocation
|
|
||||||
using GTT (GTT-backed allocations) as you can see in
|
|
||||||
`torvalds/linux@759e764 <https://github.com/torvalds/linux/commit/759e764f7d587283b4e0b01ff930faca64370e59>`_
|
|
||||||
commit.
|
|
||||||
|
|
||||||
Because memory is physically shared, there is no performance distinction
|
|
||||||
similar to discrete GPUs where dedicated VRAM is significantly faster than
|
|
||||||
system memory. Firmware may optionally reserve some memory exclusively for GPU
|
|
||||||
use, but this provides little benefit for most workloads while permanently
|
|
||||||
reducing available system memory.
|
|
||||||
|
|
||||||
For this reason, AI frameworks typically prefer GTT-backed allocations. GTT
|
|
||||||
allows large, flexible mappings without permanently reserving memory, resulting
|
|
||||||
in better overall system utilization on unified memory systems.
|
|
||||||
|
|
||||||
Configuring shared memory limits on linux
|
|
||||||
-----------------------------------------
|
|
||||||
|
|
||||||
The maximum amount of shared GPU-accessible memory can be increased by changing
|
|
||||||
the kernel **Translation Table Manager (TTM)** page limit. This setting controls
|
|
||||||
how many system memory pages may be mapped for GPU use and is exposed at:
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
/sys/module/ttm/parameters/pages_limit
|
|
||||||
|
|
||||||
The value is expressed in **pages**, not bytes or gigabytes (GB).
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
AMD recommends keeping the dedicated VRAM reservation in BIOS small
|
|
||||||
(for example 0.5 GB) and increasing the shared (TTM/GTT) limit instead.
|
|
||||||
|
|
||||||
A helper utility is available to simplify configuration.
|
|
||||||
|
|
||||||
1. Install ``pipx``:
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
sudo apt install pipx
|
|
||||||
pipx ensurepath
|
|
||||||
|
|
||||||
2. Install the AMD debug tools:
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
pipx install amd-debug-tools
|
|
||||||
|
|
||||||
3. Query the current shared memory configuration:
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
amd-ttm
|
|
||||||
|
|
||||||
4. Set the usable shared memory (in GB):
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
amd-ttm --set <NUM>
|
|
||||||
|
|
||||||
5. Reboot for changes to take effect.
|
|
||||||
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
The amd-ttm convert the pages to GB to help the users.
|
|
||||||
|
|
||||||
**Example with output**
|
|
||||||
|
|
||||||
Check the current settings:
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
amd-ttm
|
|
||||||
💻 Current TTM pages limit: 16469033 pages (62.82 GB)
|
|
||||||
💻 Total system memory: 125.65 GB
|
|
||||||
|
|
||||||
Change the usable shared memory:
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
❯ amd-ttm --set 100
|
|
||||||
🐧 Successfully set TTM pages limit to 26214400 pages (100.00 GB)
|
|
||||||
🐧 Configuration written to /etc/modprobe.d/ttm.conf
|
|
||||||
○ NOTE: You need to reboot for changes to take effect.
|
|
||||||
Would you like to reboot the system now? (y/n): y
|
|
||||||
|
|
||||||
Revert to kernel defaults:
|
|
||||||
|
|
||||||
::
|
|
||||||
|
|
||||||
❯ amd-ttm --clear
|
|
||||||
🐧 Configuration /etc/modprobe.d/ttm.conf removed
|
|
||||||
Would you like to reboot the system now? (y/n): y
|
|
||||||
|
|
||||||
.. _operating-system-support:
|
|
||||||
|
|
||||||
Operating system support
|
|
||||||
========================
|
|
||||||
|
|
||||||
The ROCm compatibility tables can be found at the following links:
|
|
||||||
|
|
||||||
- `System requirements (Linux) <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html>`_
|
|
||||||
- `System requirements (Windows) <https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html>`_
|
|
||||||
|
|
||||||
However, for Strix Halo there are additional kernel version requirements,
|
|
||||||
which are described in the following section.
|
|
||||||
|
|
||||||
Required kernel version
|
|
||||||
-----------------------
|
|
||||||
|
|
||||||
Support for Strix Halo requires specific fixes in the Linux kernel that
|
|
||||||
update internal limits in the AMD KFD driver to ensure correct queue
|
|
||||||
creation and memory availability checks. Without these updates, GPU
|
|
||||||
compute workloads may fail to initialize or behave unpredictably. The
|
|
||||||
necessary Linux kernel patches have been merged upstream and are
|
|
||||||
included in Linux kernel 6.18.4 and newer releases.
|
|
||||||
|
|
||||||
The following commits are required for Strix Halo support:
|
|
||||||
|
|
||||||
- `gregkh/linux@7f26af7 <https://github.com/gregkh/linux/commit/7f26af7bf9b76c2c2a1a761aab5803e52be21eea>`_
|
|
||||||
- `gregkh/linux@7445db6 <https://github.com/gregkh/linux/commit/7445db6a7d5a0242d8214582b480600b266cba9e>`_
|
|
||||||
|
|
||||||
The table below reflects compatibility for **AMD-released pre-built ROCm
|
|
||||||
binaries only**. Distributions that ship **native ROCm packaging** may
|
|
||||||
provide different support levels.
|
|
||||||
|
|
||||||
.. list-table::
|
|
||||||
:header-rows: 0
|
|
||||||
:widths: 10 90
|
|
||||||
|
|
||||||
* - ❌
|
|
||||||
- Unsupported combination
|
|
||||||
* - ⚠️
|
|
||||||
- Unstable / experimental combination
|
|
||||||
* - ✅
|
|
||||||
- Stable and supported combination
|
|
||||||
|
|
||||||
.. list-table::
|
|
||||||
:header-rows: 1
|
|
||||||
:widths: 12 14 14 16 14 16 16
|
|
||||||
|
|
||||||
* - ROCm Release
|
|
||||||
- Ubuntu 24.04 HWE
|
|
||||||
- Ubuntu 24.04 OEM (<= 6.14.0-1017)
|
|
||||||
- Ubuntu 24.04 OEM (>= 6.14.0-1018)
|
|
||||||
- Ubuntu 26.04 Generic
|
|
||||||
- Generic Distro < 6.18.4
|
|
||||||
- Generic Distro >= 6.18.4
|
|
||||||
|
|
||||||
* - 7.11.0
|
|
||||||
- ⚠️
|
|
||||||
- ⚠️
|
|
||||||
- ✅
|
|
||||||
- ✅
|
|
||||||
- ⚠️
|
|
||||||
- ✅
|
|
||||||
|
|
||||||
* - 7.10.0
|
|
||||||
- ⚠️
|
|
||||||
- ⚠️
|
|
||||||
- ❌
|
|
||||||
- ❌
|
|
||||||
- ⚠️
|
|
||||||
- ❌
|
|
||||||
|
|
||||||
* - 7.9.0
|
|
||||||
- ⚠️
|
|
||||||
- ⚠️
|
|
||||||
- ❌
|
|
||||||
- ❌
|
|
||||||
- ⚠️
|
|
||||||
- ❌
|
|
||||||
|
|
||||||
* - 7.2.1
|
|
||||||
- ⚠️
|
|
||||||
- ⚠️
|
|
||||||
- ✅
|
|
||||||
- ✅
|
|
||||||
- ⚠️
|
|
||||||
- ✅
|
|
||||||
|
|
||||||
* - 7.2.0
|
|
||||||
- ❌
|
|
||||||
- ✅
|
|
||||||
- ✅
|
|
||||||
- ✅
|
|
||||||
- ❌
|
|
||||||
- ✅
|
|
||||||
|
|
||||||
* - 7.1.x
|
|
||||||
- ⚠️
|
|
||||||
- ⚠️
|
|
||||||
- ❌
|
|
||||||
- ❌
|
|
||||||
- ⚠️
|
|
||||||
- ❌
|
|
||||||
|
|
||||||
* - 6.4.x
|
|
||||||
- ⚠️
|
|
||||||
- ⚠️
|
|
||||||
- ❌
|
|
||||||
- ❌
|
|
||||||
- ⚠️
|
|
||||||
- ❌
|
|
||||||
|
|
||||||
The following distributions include the required fixes in their
|
|
||||||
native packaging, independent of AMD pre-built binaries:
|
|
||||||
|
|
||||||
- Fedora 43
|
|
||||||
- Ubuntu 26.04
|
|
||||||
- Arch Linux
|
|
||||||
@@ -63,13 +63,12 @@ ROCm documentation is organized into the following categories:
|
|||||||
:class-body: rocm-card-banner rocm-hue-6
|
:class-body: rocm-card-banner rocm-hue-6
|
||||||
<!-- markdownlint-disable MD051 -->
|
<!-- markdownlint-disable MD051 -->
|
||||||
* [ROCm libraries](./reference/api-libraries.md)
|
* [ROCm libraries](./reference/api-libraries.md)
|
||||||
* [ROCm tools, compilers, and runtime API](./reference/rocm-tools.md)
|
* [ROCm tools, compilers, and runtimes](./reference/rocm-tools.md)
|
||||||
* [GPU hardware specifications](./reference/gpu-arch-specs.rst)
|
* [GPU hardware specifications](./reference/gpu-arch-specs.rst)
|
||||||
* [Hardware atomics operation support](./reference/gpu-atomics-operation.rst)
|
* [Hardware atomics operation support](./reference/gpu-atomics-operation.rst)
|
||||||
* [Environment variables](./reference/env-variables.rst)
|
* [Environment variables](./reference/env-variables.rst)
|
||||||
* [Data types and precision support](./reference/precision-support.rst)
|
* [Data types and precision support](./reference/precision-support.rst)
|
||||||
* [Graph safe support](./reference/graph-safe-support.rst)
|
* [Graph safe support](./reference/graph-safe-support.rst)
|
||||||
* [ROCm glossary](./reference/glossary.rst)
|
|
||||||
<!-- markdownlint-enable MD051 -->
|
<!-- markdownlint-enable MD051 -->
|
||||||
:::
|
:::
|
||||||
|
|
||||||
|
|||||||
@@ -74,8 +74,7 @@ Other useful variables
|
|||||||
ROCR-Runtime environment variables
|
ROCR-Runtime environment variables
|
||||||
==================================
|
==================================
|
||||||
|
|
||||||
The following table lists the :doc:`ROCR-Runtime <rocr-runtime:index>`
|
The following table lists the ROCR-Runtime environment variables:
|
||||||
environment variables:
|
|
||||||
|
|
||||||
.. remote-content::
|
.. remote-content::
|
||||||
:repo: ROCm/rocm-systems
|
:repo: ROCm/rocm-systems
|
||||||
@@ -120,11 +119,8 @@ documentation.
|
|||||||
- Performance tuning, kernel selection, logging, and debugging for BLAS
|
- Performance tuning, kernel selection, logging, and debugging for BLAS
|
||||||
operations.
|
operations.
|
||||||
|
|
||||||
* - :doc:`rocSHMEM <rocshmem:api/env_variables>`
|
* - :doc:`rocSolver <rocsolver:reference/env_variables>`
|
||||||
- Control the behavior of rocSHMEM.
|
- Control logging of rocSolver.
|
||||||
|
|
||||||
* - :doc:`rocSOLVER <rocsolver:reference/env_variables>`
|
|
||||||
- Control logging of rocSOLVER.
|
|
||||||
|
|
||||||
* - :doc:`rocSPARSE <rocsparse:reference/env_variables>`
|
* - :doc:`rocSPARSE <rocsparse:reference/env_variables>`
|
||||||
- Control logging of rocSPARSE.
|
- Control logging of rocSPARSE.
|
||||||
|
|||||||
@@ -1,24 +0,0 @@
|
|||||||
.. meta::
|
|
||||||
:description: AMD ROCm Glossary
|
|
||||||
:keywords: AMD, ROCm, glossary, terminology, device hardware,
|
|
||||||
device software, host software, performance
|
|
||||||
|
|
||||||
.. _glossary:
|
|
||||||
|
|
||||||
********************************************************************************
|
|
||||||
ROCm glossary
|
|
||||||
********************************************************************************
|
|
||||||
|
|
||||||
This glossary provides concise definitions of key terms and concepts in AMD ROCm
|
|
||||||
programming. Each entry includes a brief description and a link to detailed
|
|
||||||
documentation for in-depth information.
|
|
||||||
|
|
||||||
The glossary is organized into four sections:
|
|
||||||
|
|
||||||
* :doc:`glossary/device-hardware` — Hardware components (for example, Compute
|
|
||||||
Units, cores, memory)
|
|
||||||
* :doc:`glossary/device-software` — Software abstractions (programming model,
|
|
||||||
ISA, thread hierarchy)
|
|
||||||
* :doc:`glossary/host-software` — Development tools (HIP, compilers, libraries,
|
|
||||||
profilers)
|
|
||||||
* :doc:`glossary/performance` — Performance metrics and optimization concepts
|
|
||||||
@@ -1,254 +0,0 @@
|
|||||||
.. meta::
|
|
||||||
:description: Device hardware glossary for AMD GPUs
|
|
||||||
:keywords: AMD, ROCm, GPU, device hardware, compute units, cores, MFMA,
|
|
||||||
architecture, register file, cache, HBM
|
|
||||||
|
|
||||||
.. _glossary-device-hardware:
|
|
||||||
|
|
||||||
************************
|
|
||||||
Device hardware glossary
|
|
||||||
************************
|
|
||||||
|
|
||||||
This section provides concise definitions of hardware components and architectural
|
|
||||||
features of AMD GPUs.
|
|
||||||
|
|
||||||
.. glossary::
|
|
||||||
:sorted:
|
|
||||||
|
|
||||||
AMD device architecture
|
|
||||||
AMD device architecture is based on unified, programmable compute
|
|
||||||
engines known as :term:`compute units (CUs) <Compute units>`. See
|
|
||||||
:ref:`hip:hardware_implementation` for details.
|
|
||||||
|
|
||||||
Compute units
|
|
||||||
Compute units (CUs) are the fundamental programmable execution engines
|
|
||||||
in AMD GPUs capable of running complex programs. See
|
|
||||||
:ref:`hip:compute_unit` for details.
|
|
||||||
|
|
||||||
ALU
|
|
||||||
Arithmetic logic units (ALUs) are the primary arithmetic engines that
|
|
||||||
execute mathematical and logical operations within
|
|
||||||
:term:`compute units <Compute units>`. See :ref:`hip:valu` for details.
|
|
||||||
|
|
||||||
SALU
|
|
||||||
Scalar :term:`ALUs <ALU>` (SALUs) operate on a single value per
|
|
||||||
:term:`wavefront <Wavefront>` and manage all control flow.
|
|
||||||
|
|
||||||
VALU
|
|
||||||
Vector :term:`ALUs <ALU>` (VALUs) perform an arithmetic or logical
|
|
||||||
operation on data for each :term:`work-item <Work-item (Thread)>` in a
|
|
||||||
:term:`wavefront <Wavefront>`, enabling data-parallel execution.
|
|
||||||
|
|
||||||
Special function unit
|
|
||||||
Special function units (SFUs) accelerate transcendental and reciprocal
|
|
||||||
mathematical functions such as ``exp``, ``log``, ``sin``, and ``cos``.
|
|
||||||
See :ref:`hip:sfu` for details.
|
|
||||||
|
|
||||||
Load/store unit
|
|
||||||
Load/store units (LSUs) handle data transfer between
|
|
||||||
:term:`compute units <Compute units>` and the GPU's memory subsystems,
|
|
||||||
managing thousands of concurrent memory operations. See :ref:`hip:lsu`
|
|
||||||
for details.
|
|
||||||
|
|
||||||
Work-group (Block)
|
|
||||||
A work-group (also called a block) is a collection of
|
|
||||||
:term:`wavefronts <Wavefront (Warp)>` scheduled together on a single
|
|
||||||
:term:`compute unit <Compute units>` that can coordinate through
|
|
||||||
:term:`Local data share <Local data share>` memory. See
|
|
||||||
:ref:`hip:inherent_thread_hierarchy_block` for work-group details.
|
|
||||||
|
|
||||||
Work-item (Thread)
|
|
||||||
A work-item (also called a thread) is the smallest unit of execution on
|
|
||||||
an AMD GPU and represents a single element of work. See
|
|
||||||
:ref:`hip:work-item` for thread hierarchy details.
|
|
||||||
|
|
||||||
Wavefront (Warp)
|
|
||||||
A wavefront (also called a warp) is a group of
|
|
||||||
:term:`work-items <Work-item (Thread)>` that execute in parallel on a
|
|
||||||
single :term:`compute unit <Compute units>`, sharing one
|
|
||||||
instruction stream. See :ref:`hip:wavefront` for execution details.
|
|
||||||
|
|
||||||
Wavefront scheduler
|
|
||||||
The wavefront scheduler in each :term:`compute unit <Compute units>`
|
|
||||||
decides which :term:`wavefront <wavefront>` to execute each clock cycle,
|
|
||||||
enabling rapid context switching for latency hiding. See
|
|
||||||
:ref:`hip:wave-scheduling` for details.
|
|
||||||
|
|
||||||
Wavefront size
|
|
||||||
The wavefront size is the number of
|
|
||||||
:term:`work-items <Work-item (Thread)>` that execute together in a
|
|
||||||
single :term:`wavefront <Wavefront (Warp)>`. For AMD Instinct GPUs, the
|
|
||||||
wavefront size is 64 threads, while AMD Radeon GPUs have a wavefront
|
|
||||||
size of 32 threads. See :ref:`hip:wavefront` for details.
|
|
||||||
|
|
||||||
SIMD core
|
|
||||||
SIMD cores are execution lanes that perform scalar and vector arithmetic
|
|
||||||
operations inside each :term:`compute unit <Compute unit>`. See
|
|
||||||
:ref:`hip:cdna_architecture` and :ref:`hip:rdna_architecture` for
|
|
||||||
details.
|
|
||||||
|
|
||||||
Matrix cores (MFMA units)
|
|
||||||
Matrix cores (MFMA units) are specialized execution units that perform
|
|
||||||
large-scale matrix operations in a single instruction, delivering high
|
|
||||||
throughput for AI and HPC workloads. See :ref:`hip:mfma_units` for
|
|
||||||
details.
|
|
||||||
|
|
||||||
Data movement engine
|
|
||||||
Data movement engines (DMEs) are specialized hardware units in AMD
|
|
||||||
Instinct MI300 and MI350 series GPUs that accelerate multi-dimensional
|
|
||||||
tensor data copies between global memory and on-chip memory. See
|
|
||||||
:ref:`hip:dme` for details.
|
|
||||||
|
|
||||||
GFX IP
|
|
||||||
GFX IP (Graphics IP) versions are identifiers that specify which
|
|
||||||
instruction formats, memory models, and compute features are supported
|
|
||||||
by each AMD GPU generation. See :ref:`hip:gfx_ip` for versioning
|
|
||||||
information.
|
|
||||||
|
|
||||||
GFX IP major version
|
|
||||||
The :term:`GFX IP <GFX IP>` major version represents the GPU's core
|
|
||||||
instruction set and architecture. For example, a GFX IP `11` major
|
|
||||||
version corresponds to the RDNA3 architecture, influencing driver
|
|
||||||
support and available compute features. See :ref:`hip:gfx_ip` for
|
|
||||||
versioning information.
|
|
||||||
|
|
||||||
GFX IP minor version
|
|
||||||
The :term:`GFX IP <GFX IP>` minor version represents specific variations
|
|
||||||
within a :term:`GFX IP <GFX IP>` major version and affects feature sets,
|
|
||||||
optimizations, and driver behavior. Different GPU models within the same
|
|
||||||
major version can have unique capabilities, impacting performance and
|
|
||||||
supported instructions. See :ref:`hip:gfx_ip` for versioning
|
|
||||||
information.
|
|
||||||
|
|
||||||
Compute unit versioning
|
|
||||||
:term:`Compute units <Compute units>` are versioned with
|
|
||||||
:term:`GFX IP <GFX IP>` identifiers that define their microarchitectural
|
|
||||||
features and instruction set compatibility. See :ref:`hip:gfx_ip` for
|
|
||||||
details.
|
|
||||||
|
|
||||||
Register file
|
|
||||||
The register file is the primary on-chip memory store in each
|
|
||||||
:term:`compute unit <Compute units>`, holding data between arithmetic
|
|
||||||
and memory operations. See :ref:`hip:memory_hierarchy` for details.
|
|
||||||
|
|
||||||
SGPR file
|
|
||||||
The :term:`SGPR <SGPR>` file is the
|
|
||||||
:term:`register file <Register file>` that holds data used by the
|
|
||||||
:term:`scalar ALU <SALU>`.
|
|
||||||
|
|
||||||
VGPR file
|
|
||||||
The :term:`VGPR <VGPR>` file is the
|
|
||||||
:term:`register file <Register file>` that holds data used by the
|
|
||||||
:term:`vector ALU <VALU>`. GPUs with
|
|
||||||
:term:`matrix cores <Matrix cores (MFMA units)>` also have
|
|
||||||
:term:`AccVGPR <AccVGPR>` files, used specifically for matrix
|
|
||||||
instructions.
|
|
||||||
|
|
||||||
L0 instruction cache
|
|
||||||
On AMD Radeon GPUs, the level 0 (L0) instruction cache is local to each
|
|
||||||
:term:`WGP <WGP>` and thus shared between the WGP's
|
|
||||||
:term:`compute units <Compute units>`.
|
|
||||||
|
|
||||||
L0 scalar cache
|
|
||||||
On AMD Radeon GPUs, the level 0 (L0) scalar data cache is local to each
|
|
||||||
:term:`WGP <WGP>` and thus shared between the WGP's
|
|
||||||
:term:`compute units <Compute units>`. It provides the
|
|
||||||
:term:`scalar ALU <SALU>` with fast access to recently used data.
|
|
||||||
|
|
||||||
L0 vector cache
|
|
||||||
On AMD Radeon GPUs, the level 0 (L0) vector data cache is local to each
|
|
||||||
:term:`WGP <WGP>` and thus shared between the WGP's
|
|
||||||
:term:`compute units <Compute units>`. It provides the
|
|
||||||
:term:`vector ALU <VALU>` with fast access to recently used data.
|
|
||||||
|
|
||||||
L1 instruction cache
|
|
||||||
On AMD Instinct GPUs, the level 1 (L1) instruction cache is local to
|
|
||||||
each :term:`compute unit <Compute units>`. On AMD Radeon GPUs, the
|
|
||||||
L1 instruction cache does not exist as a separate cache level, and
|
|
||||||
instructions are stored in the
|
|
||||||
:term:`L0 instruction cache <L0 instruction cache>`.
|
|
||||||
|
|
||||||
L1 scalar cache
|
|
||||||
On AMD Instinct GPUs, the level 1 (L1) scalar data cache is local to
|
|
||||||
each :term:`compute unit <Compute units>`, providing the
|
|
||||||
:term:`scalar ALU <SALU>` with fast access to recently used data. On AMD
|
|
||||||
Radeon GPUs, the L1 scalar cache does not exist as a separate cache
|
|
||||||
level, and recently used scalar data is stored in the
|
|
||||||
:term:`L0 scalar cache <L0 scalar cache>`.
|
|
||||||
|
|
||||||
L1 vector cache
|
|
||||||
On AMD Instinct GPUs, the level 1 (L1) vector data cache is local to
|
|
||||||
each :term:`compute unit <Compute units>`, providing the
|
|
||||||
:term:`vector ALU <VALU>` with fast access to recently used data. On AMD
|
|
||||||
Radeon GPUs, the L1 vector cache does not exist as a separate cache
|
|
||||||
level, and recently used vector data is stored in the
|
|
||||||
:term:`L0 vector cache <L0 vector cache>`.
|
|
||||||
|
|
||||||
Graphics L1 cache
|
|
||||||
On AMD Radeon GPUs, the read-only graphics level 1 (L1) cache is local
|
|
||||||
to groups of :term:`WGPs <WGP>` called shader arrays, providing fast
|
|
||||||
access to recently used data. AMD Instinct GPUs do not feature the
|
|
||||||
graphics L1 cache.
|
|
||||||
|
|
||||||
L2 cache
|
|
||||||
On AMD Instinct MI100 series GPUs, the L2 cache is shared across the
|
|
||||||
entire chip, while for all other AMD GPUs the L2 caches are shared by
|
|
||||||
the :term:`compute units <Compute units>` on the same :term:`GCD <GCD>`
|
|
||||||
or :term:`XCD <XCD>`.
|
|
||||||
|
|
||||||
Infinity Cache (L3 cache)
|
|
||||||
On AMD Instinct MI300 and MI350 series GPUs and AMD Radeon GPUs, the
|
|
||||||
Infinity Cache is the last level cache of the cache hierarchy. It is
|
|
||||||
shared by all :term:`compute units <Compute units>` and
|
|
||||||
:term:`WGPs <WGP>` on the GPU.
|
|
||||||
|
|
||||||
GPU RAM (VRAM)
|
|
||||||
GPU RAM, also known as :term:`global memory <Global memory>` in the HIP
|
|
||||||
programming model, is the large, high-capacity off-chip memory subsystem
|
|
||||||
accessible by all :term:`compute units <Compute units>`, forming the
|
|
||||||
foundation of the device's :ref:`memory hierarchy <hip:hbm>`.
|
|
||||||
|
|
||||||
Local data share
|
|
||||||
Local data share (LDS) is fast on-chip memory local to each
|
|
||||||
:term:`compute unit <Compute units>` and shared among
|
|
||||||
:term:`work-items <Work-item (Thread)>` in a
|
|
||||||
:term:`work-group <Work-group (Block)>`, enabling efficient coordination
|
|
||||||
and data reuse. In the HIP programming model, the LDS is known as shared
|
|
||||||
memory. See :ref:`hip:lds` for LDS programming details.
|
|
||||||
|
|
||||||
Registers
|
|
||||||
Registers are the lowest level of the memory hierarchy, storing
|
|
||||||
per-thread temporary variables and intermediate results. See
|
|
||||||
:ref:`hip:memory_hierarchy` for register usage details.
|
|
||||||
|
|
||||||
SGPR
|
|
||||||
Scalar general-purpose :term:`registers <Registers>` (SGPRs) hold data
|
|
||||||
produced and consumed by a :term:`compute unit <Compute units>`'s
|
|
||||||
:term:`scalar ALU <SALU>`.
|
|
||||||
|
|
||||||
VGPR
|
|
||||||
Vector general-purpose :term:`registers <Registers>` (VGPRs) hold data
|
|
||||||
produced and consumed by a :term:`compute unit <Compute units>`'s
|
|
||||||
:term:`vector ALU <VALU>`.
|
|
||||||
|
|
||||||
AccVGPR
|
|
||||||
Accumulation General Purpose Vector Registers (AccVGPRs) are a special
|
|
||||||
type of :term:`VGPRs <VGPR>` used exclusively for matrix operations.
|
|
||||||
|
|
||||||
XCD
|
|
||||||
On AMD Instinct MI300 and MI350 series GPUs, the Accelerator Complex Die
|
|
||||||
(XCD) contains the GPU's computational elements and lower levels of the
|
|
||||||
cache hierarchy. See :doc:`../../conceptual/gpu-arch/mi300` for details.
|
|
||||||
|
|
||||||
GCD
|
|
||||||
On AMD Instinct MI100 and MI250 series GPUs and AMD Radeon GPUs, the
|
|
||||||
Graphics Compute Die (GCD) contains the GPU's computational elements
|
|
||||||
and lower levels of the cache hierarchy. See
|
|
||||||
:doc:`../../conceptual/gpu-arch/mi250` for details.
|
|
||||||
|
|
||||||
WGP
|
|
||||||
A Workgroup Processor (WGP) is a hardware unit on AMD Radeon GPUs that
|
|
||||||
contains two :term:`compute units <Compute units>` and their associated
|
|
||||||
resources, enabling efficient scheduling and execution of
|
|
||||||
:term:`wavefronts <wavefront>`. See :ref:`hip:rdna_architecture` for
|
|
||||||
details.
|
|
||||||
@@ -1,74 +0,0 @@
|
|||||||
.. meta::
|
|
||||||
:description: Device software glossary for AMD GPUs
|
|
||||||
:keywords: AMD, ROCm, GPU, device software, programming model, AMDGPU,
|
|
||||||
assembly, IR, GFX IP, wavefront, work-group, HIP kernel, thread hierarchy
|
|
||||||
|
|
||||||
.. _glossary-device-software:
|
|
||||||
|
|
||||||
************************
|
|
||||||
Device software glossary
|
|
||||||
************************
|
|
||||||
|
|
||||||
This section provides brief definitions of software abstractions and programming
|
|
||||||
models that run on AMD GPUs.
|
|
||||||
|
|
||||||
.. glossary::
|
|
||||||
:sorted:
|
|
||||||
|
|
||||||
ROCm programming model
|
|
||||||
The ROCm programming model defines how AMD GPUs execute massively
|
|
||||||
parallel programs using hierarchical
|
|
||||||
:term:`work-groups <Work-group (Block)>`, memory scopes, and barrier
|
|
||||||
synchronization. See :ref:`hip:programming_model` for complete details.
|
|
||||||
|
|
||||||
AMDGPU assembly
|
|
||||||
AMDGPU assembly (GFX ISA) is the low-level assembly format for programs
|
|
||||||
running on AMD GPUs, generated by the
|
|
||||||
:term:`ROCm compiler toolchain <HIP compiler>`. See
|
|
||||||
:ref:`hip:amdgpu_assembly` for instruction set details.
|
|
||||||
|
|
||||||
AMDGPU intermediate representation
|
|
||||||
AMDGPU IR is an intermediate representation for GPU code, serving as a
|
|
||||||
virtual instruction set between high-level languages and
|
|
||||||
:term:`architecture-specific assembly <AMDGPU assembly>`. See
|
|
||||||
:ref:`hip:amdgpu_ir` for compilation details.
|
|
||||||
|
|
||||||
LLVM target name
|
|
||||||
The LLVM target name is a string identifier corresponding to a specific
|
|
||||||
:term:`GFX IP <GFX IP>` version that is passed to the
|
|
||||||
:term:`HIP compiler <HIP compiler>` toolchain to specify the target GPU
|
|
||||||
architecture for code generation.
|
|
||||||
See :doc:`llvm-project:reference/rocmcc` for details.
|
|
||||||
|
|
||||||
Grid
|
|
||||||
A grid represents the collection of all
|
|
||||||
:term:`work-groups <Work-group (Block)>` executing a single
|
|
||||||
:term:`kernel <HIP kernel>` across the entire GPU. See
|
|
||||||
:ref:`hip:inherent_thread_hierarchy_grid` for grid execution details.
|
|
||||||
|
|
||||||
HIP kernel
|
|
||||||
A HIP kernel is the unit of GPU code that executes in parallel across
|
|
||||||
many :term:`threads <Work-item (Thread)>`, distributed across the GPU's
|
|
||||||
:term:`compute units <Compute units>`. See :ref:`hip:device_program` for
|
|
||||||
kernel programming details.
|
|
||||||
|
|
||||||
HIP thread hierarchy
|
|
||||||
The thread hierarchy structures parallel work from individual
|
|
||||||
:term:`threads <Work-item (Thread)>` to
|
|
||||||
:term:`blocks <Work-group (Block)>` to :term:`grids <Grid>`, mapping
|
|
||||||
onto hardware from :term:`SIMD lanes <SIMD core>` to
|
|
||||||
:term:`compute units <Compute units>` to the entire GPU. See
|
|
||||||
:ref:`hip:inherent_thread_model` for complete details.
|
|
||||||
|
|
||||||
HIP memory hierarchy
|
|
||||||
The memory hierarchy pairs each
|
|
||||||
:term:`thread hierarchy <HIP thread hierarchy>` level with corresponding
|
|
||||||
memory scopes, from :term:`private registers <Register>` to
|
|
||||||
:term:`LDS <Local data share>` to :term:`GPU RAM <GPU RAM (VRAM)>`. See
|
|
||||||
:ref:`hip:memory_hierarchy` for memory architecture details.
|
|
||||||
|
|
||||||
Global memory
|
|
||||||
Global memory is the :term:`device-wide memory <GPU RAM (VRAM)>`
|
|
||||||
accessible to all :term:`threads <Work-item (Thread)>`, physically
|
|
||||||
implemented as HBM or GDDR. See :ref:`hip:memory_hierarchy` for global
|
|
||||||
memory details.
|
|
||||||
@@ -1,67 +0,0 @@
|
|||||||
.. meta::
|
|
||||||
:description: Host software glossary for AMD GPUs
|
|
||||||
:keywords: AMD, ROCm, GPU, host software, HIP, compiler, runtime, libraries,
|
|
||||||
profiler, amd-smi
|
|
||||||
|
|
||||||
.. _glossary-host-software:
|
|
||||||
|
|
||||||
**********************
|
|
||||||
Host software glossary
|
|
||||||
**********************
|
|
||||||
|
|
||||||
This section provides brief definitions of development tools, compilers,
|
|
||||||
libraries, and runtime environments for programming AMD GPUs.
|
|
||||||
|
|
||||||
.. glossary::
|
|
||||||
:sorted:
|
|
||||||
|
|
||||||
ROCm software platform
|
|
||||||
ROCm is AMD's GPU software stack, providing compiler
|
|
||||||
toolchains, runtime environments, and performance libraries for HPC and
|
|
||||||
AI applications. See :doc:`../../what-is-rocm` for a complete component
|
|
||||||
overview.
|
|
||||||
|
|
||||||
HIP C++ language extension
|
|
||||||
HIP extends the C++ language with additional features designed for
|
|
||||||
programming heterogeneous applications. These extensions mostly relate
|
|
||||||
to the kernel language, but some can also be applied to host
|
|
||||||
functionality. See :doc:`hip:how-to/hip_cpp_language_extensions` for
|
|
||||||
language fundamentals.
|
|
||||||
|
|
||||||
AMD SMI
|
|
||||||
The ``amd-smi`` command-line utility queries, monitors, and manages
|
|
||||||
AMD GPU state, providing hardware information and performance metrics.
|
|
||||||
See :doc:`amdsmi:index` for detailed usage.
|
|
||||||
|
|
||||||
HIP runtime API
|
|
||||||
The HIP runtime API provides an interface for GPU programming, offering
|
|
||||||
functions for memory management, kernel launches, and synchronization. See
|
|
||||||
:ref:`hip:hip_runtime_api_how-to` for API overview.
|
|
||||||
|
|
||||||
HIP compiler
|
|
||||||
The HIP compiler ``amdclang++`` compiles HIP C++ programs into binaries
|
|
||||||
that contain both host CPU and device GPU code. See
|
|
||||||
:doc:`llvm-project:reference/rocmcc` for compiler flags and options.
|
|
||||||
|
|
||||||
HIP runtime compiler
|
|
||||||
The HIP Runtime Compiler (HIPRTC) compiles HIP source code at runtime
|
|
||||||
into :term:`AMDGPU <AMDGPU assembly>` binary code objects, enabling
|
|
||||||
just-in-time kernel generation, device-specific optimization, and
|
|
||||||
dynamic code creation for different GPUs. See
|
|
||||||
:ref:`hip:hip_runtime_compiler_how-to` for API details.
|
|
||||||
|
|
||||||
ROCgdb
|
|
||||||
ROCgdb is AMD's source-level debugger for HIP and ROCm applications,
|
|
||||||
enabling debugging of both host CPU and GPU device code, including
|
|
||||||
kernel breakpoints, stepping, and variable inspection. See
|
|
||||||
:doc:`rocgdb:index` for usage and command reference.
|
|
||||||
|
|
||||||
rocprofv3
|
|
||||||
``rocprofv3`` is AMD's primary performance analysis tool, providing
|
|
||||||
profiling, tracing, and performance counter collection.
|
|
||||||
See :ref:`rocprofiler-sdk:using-rocprofv3` for profiling workflows.
|
|
||||||
|
|
||||||
ROCm and LLVM binary utilities
|
|
||||||
ROCm and LLVM binary utilities are command-line tools for examining and
|
|
||||||
manipulating GPU binaries and code objects. See
|
|
||||||
:ref:`hip:binary_utilities` for utility details.
|
|
||||||
@@ -1,135 +0,0 @@
|
|||||||
.. meta::
|
|
||||||
:description: Performance glossary for AMD GPUs
|
|
||||||
:keywords: AMD, ROCm, GPU, performance, optimization, roofline, bottleneck,
|
|
||||||
occupancy, bandwidth, latency hiding, divergence
|
|
||||||
|
|
||||||
.. _glossary-performance:
|
|
||||||
|
|
||||||
*****************************
|
|
||||||
Performance analysis glossary
|
|
||||||
*****************************
|
|
||||||
|
|
||||||
This section provides brief definitions of performance analysis concepts and
|
|
||||||
optimization techniques.
|
|
||||||
|
|
||||||
.. glossary::
|
|
||||||
:sorted:
|
|
||||||
|
|
||||||
Roofline model
|
|
||||||
The roofline model is a visual performance model that determines whether
|
|
||||||
a program is :term:`compute-bound <Compute-bound>` or
|
|
||||||
:term:`memory-bound <Memory-bound>`. See :ref:`hip:roofline_model` for
|
|
||||||
roofline analysis.
|
|
||||||
|
|
||||||
Compute-bound
|
|
||||||
Compute-bound kernels are limited by the
|
|
||||||
:term:`arithmetic bandwidth <Arithmetic bandwidth>` of the GPU's
|
|
||||||
:term:`compute units <Compute units>` rather than
|
|
||||||
:term:`memory bandwidth <Memory bandwidth>`. See
|
|
||||||
:ref:`hip:compute_bound` for compute-bound analysis.
|
|
||||||
|
|
||||||
Memory-bound
|
|
||||||
Memory-bound kernels are limited by
|
|
||||||
:term:`memory bandwidth <Memory bandwidth>` rather than
|
|
||||||
:term:`arithmetic bandwidth <Arithmetic bandwidth>`, typically due to
|
|
||||||
low :term:`arithmetic intensity <Arithmetic intensity>`. See
|
|
||||||
:ref:`hip:memory_bound` for memory-bound analysis.
|
|
||||||
|
|
||||||
Arithmetic intensity
|
|
||||||
Arithmetic intensity is the ratio of arithmetic operations to memory
|
|
||||||
operations in a kernel, and determines performance characteristics. See
|
|
||||||
:ref:`hip:arithmetic_intensity` for intensity analysis.
|
|
||||||
|
|
||||||
Overhead
|
|
||||||
Overhead latency is the time spent with no useful work being done, often
|
|
||||||
due to CPU-side bottlenecks or kernel launch delays. See
|
|
||||||
:ref:`hip:performance_bottlenecks` for details.
|
|
||||||
|
|
||||||
Little's Law
|
|
||||||
Little's Law relates concurrency, latency, and throughput, determining
|
|
||||||
how much independent work must be in flight to hide latency. See
|
|
||||||
:ref:`hip:littles_law` for latency hiding details.
|
|
||||||
|
|
||||||
Memory bandwidth
|
|
||||||
Memory bandwidth is the maximum rate at which data can be transferred
|
|
||||||
between memory hierarchy levels, typically measured in bytes per
|
|
||||||
second. See :ref:`hip:memory_bound` for details.
|
|
||||||
|
|
||||||
Arithmetic bandwidth
|
|
||||||
Arithmetic bandwidth is the peak rate at which arithmetic work can be
|
|
||||||
performed, defining the compute roof in
|
|
||||||
:term:`roofline models <Roofline model>`. See :ref:`hip:compute_bound`
|
|
||||||
for details.
|
|
||||||
|
|
||||||
Latency hiding
|
|
||||||
Latency hiding masks long-latency operations by running many concurrent
|
|
||||||
threads, keeping execution pipelines busy. See :ref:`hip:latency_hiding`
|
|
||||||
for details.
|
|
||||||
|
|
||||||
Wavefront execution state
|
|
||||||
Wavefront execution states (*active*, *stalled*, *eligible*, *selected*)
|
|
||||||
describe the scheduling status of :term:`wavefronts <Wavefront>` on AMD
|
|
||||||
GPUs. See :ref:`hip:wavefront_execution` for state definitions.
|
|
||||||
|
|
||||||
Active cycle
|
|
||||||
An active cycle is a clock cycle in which a
|
|
||||||
:term:`compute unit <Compute units>` has at least one active
|
|
||||||
:term:`wavefront <Wavefront>` resident. See
|
|
||||||
:ref:`hip:wavefront_execution` for details.
|
|
||||||
|
|
||||||
Occupancy
|
|
||||||
Occupancy is the ratio of active :term:`wavefronts <Wavefront>` to the
|
|
||||||
maximum number of wavefronts that can be active on a
|
|
||||||
:term:`compute unit <Compute units>`. See :ref:`hip:occupancy` for
|
|
||||||
occupancy analysis.
|
|
||||||
|
|
||||||
Pipe utilization
|
|
||||||
Pipe utilization measures how effectively a kernel uses the execution
|
|
||||||
pipelines within each :term:`compute unit <Compute units>`. See
|
|
||||||
:ref:`hip:pipe_utilization` for utilization details.
|
|
||||||
|
|
||||||
Peak rate
|
|
||||||
Peak rate is the theoretical maximum throughput at which a hardware
|
|
||||||
system can complete work under ideal conditions. See
|
|
||||||
:ref:`hip:theoretical_performance_limits` for details.
|
|
||||||
|
|
||||||
Issue efficiency
|
|
||||||
Issue efficiency measures how effectively the
|
|
||||||
:term:`wavefront scheduler <Wavefront scheduler>` keeps
|
|
||||||
execution pipelines busy by issuing instructions. See
|
|
||||||
:ref:`hip:issue_efficiency` for efficiency metrics.
|
|
||||||
|
|
||||||
CU utilization
|
|
||||||
CU utilization measures the percentage of time that
|
|
||||||
:term:`compute units <Compute units>` are actively executing
|
|
||||||
instructions. See :ref:`hip:cu_utilization` for utilization analysis.
|
|
||||||
|
|
||||||
Wavefront divergence
|
|
||||||
Wavefront divergence occurs when threads within a
|
|
||||||
:term:`wavefront <Wavefront>` take different execution paths due to
|
|
||||||
conditional statements. See :ref:`hip:branch_efficiency` for divergence
|
|
||||||
handling details.
|
|
||||||
|
|
||||||
Branch efficiency
|
|
||||||
Branch efficiency measures how often all threads within a
|
|
||||||
:term:`wavefront <Wavefront>` take the same execution path, quantifying
|
|
||||||
control-flow uniformity. See :ref:`hip:branch_efficiency` for branch
|
|
||||||
analysis.
|
|
||||||
|
|
||||||
Memory coalescing
|
|
||||||
Memory coalescing improves :term:`memory bandwidth <Memory bandwidth>`
|
|
||||||
by servicing many logical loads or stores with fewer physical memory
|
|
||||||
transactions. See :ref:`hip:memory_coalescing_theory` for coalescing
|
|
||||||
patterns.
|
|
||||||
|
|
||||||
Bank conflict
|
|
||||||
A bank conflict occurs when multiple threads simultaneously access
|
|
||||||
different addresses in the same :term:`LDS bank <Local data share>`,
|
|
||||||
serializing accesses. See :ref:`hip:bank_conflicts_theory` for details.
|
|
||||||
|
|
||||||
Register pressure
|
|
||||||
Register pressure occurs when excessive register demand limits the
|
|
||||||
number of active :term:`wavefronts <Wavefront>` per
|
|
||||||
:term:`compute unit <Compute units>`, reducing
|
|
||||||
:term:`occupancy <Occupancy>`. See
|
|
||||||
:ref:`hip:register_pressure_theory` for details.
|
|
||||||
@@ -9,12 +9,6 @@ The following tables provide an overview of the hardware specifications for AMD
|
|||||||
|
|
||||||
For more information about ROCm hardware compatibility, see the ROCm `Compatibility matrix <https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html>`_.
|
For more information about ROCm hardware compatibility, see the ROCm `Compatibility matrix <https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html>`_.
|
||||||
|
|
||||||
For a description of the terms used in the table, see the
|
|
||||||
:ref:`ROCm glossary <glossary>`, or for more detailed information about GPU
|
|
||||||
architecture and programming models, see the
|
|
||||||
:ref:`specific documents and guides <gpu-arch-documentation>`, or
|
|
||||||
:doc:`Understanding the HIP programming model<hip:understand/programming_model>`.
|
|
||||||
|
|
||||||
.. tab-set::
|
.. tab-set::
|
||||||
|
|
||||||
.. tab-item:: AMD Instinct GPUs
|
.. tab-item:: AMD Instinct GPUs
|
||||||
@@ -1133,3 +1127,125 @@ architecture and programming models, see the
|
|||||||
- 32
|
- 32
|
||||||
- 11
|
- 11
|
||||||
- 5
|
- 5
|
||||||
|
|
||||||
|
Glossary
|
||||||
|
========
|
||||||
|
|
||||||
|
For more information about the terms used, see the
|
||||||
|
:ref:`specific documents and guides <gpu-arch-documentation>`, or
|
||||||
|
:doc:`Understanding the HIP programming model<hip:understand/programming_model>`.
|
||||||
|
|
||||||
|
**LLVM target name**
|
||||||
|
|
||||||
|
Argument to pass to clang in ``--offload-arch`` to compile code for the given
|
||||||
|
architecture.
|
||||||
|
|
||||||
|
**VRAM**
|
||||||
|
|
||||||
|
Amount of memory available on the GPU.
|
||||||
|
|
||||||
|
**Compute Units**
|
||||||
|
|
||||||
|
Number of compute units on the GPU.
|
||||||
|
|
||||||
|
**Wavefront Size**
|
||||||
|
|
||||||
|
Amount of work items that execute in parallel on a single compute unit. This
|
||||||
|
is equivalent to the warp size in HIP.
|
||||||
|
|
||||||
|
**LDS**
|
||||||
|
|
||||||
|
The Local Data Share (LDS) is a low-latency, high-bandwidth scratch pad
|
||||||
|
memory. It is local to the compute units, and can be shared by all work items
|
||||||
|
in a work group. In HIP, the LDS can be used for shared memory, which is
|
||||||
|
shared by all threads in a block.
|
||||||
|
|
||||||
|
**L3 Cache (CDNA/GCN only)**
|
||||||
|
|
||||||
|
Size of the level 3 cache. Shared by all compute units on the same GPU. Caches
|
||||||
|
data and instructions. Similar to the Infinity Cache on RDNA architectures.
|
||||||
|
|
||||||
|
**Infinity Cache (RDNA only)**
|
||||||
|
|
||||||
|
Size of the infinity cache. Shared by all compute units on the same GPU. Caches
|
||||||
|
data and instructions. Similar to the L3 Cache on CDNA/GCN architectures.
|
||||||
|
|
||||||
|
**L2 Cache**
|
||||||
|
|
||||||
|
Size of the level 2 cache. Shared by all compute units on the same GCD. Caches
|
||||||
|
data and instructions.
|
||||||
|
|
||||||
|
**Graphics L1 Cache (RDNA only)**
|
||||||
|
|
||||||
|
An additional cache level that only exists in RDNA architectures. Local to a
|
||||||
|
shader array.
|
||||||
|
|
||||||
|
**L1 Vector Cache (CDNA/GCN only)**
|
||||||
|
|
||||||
|
Size of the level 1 vector data cache. Local to a compute unit. This is the L0
|
||||||
|
vector cache in RDNA architectures.
|
||||||
|
|
||||||
|
**L1 Scalar Cache (CDNA/GCN only)**
|
||||||
|
|
||||||
|
Size of the level 1 scalar data cache. Usually shared by several compute
|
||||||
|
units. This is the L0 scalar cache in RDNA architectures.
|
||||||
|
|
||||||
|
**L1 Instruction Cache (CDNA/GCN only)**
|
||||||
|
|
||||||
|
Size of the level 1 instruction cache. Usually shared by several compute
|
||||||
|
units. This is the L0 instruction cache in RDNA architectures.
|
||||||
|
|
||||||
|
**L0 Vector Cache (RDNA only)**
|
||||||
|
|
||||||
|
Size of the level 0 vector data cache. Local to a compute unit. This is the L1
|
||||||
|
vector cache in CDNA/GCN architectures.
|
||||||
|
|
||||||
|
**L0 Scalar Cache (RDNA only)**
|
||||||
|
|
||||||
|
Size of the level 0 scalar data cache. Usually shared by several compute
|
||||||
|
units. This is the L1 scalar cache in CDNA/GCN architectures.
|
||||||
|
|
||||||
|
**L0 Instruction Cache (RDNA only)**
|
||||||
|
|
||||||
|
Size of the level 0 instruction cache. Usually shared by several compute
|
||||||
|
units. This is the L1 instruction cache in CDNA/GCN architectures.
|
||||||
|
|
||||||
|
**VGPR File**
|
||||||
|
|
||||||
|
Size of the Vector General Purpose Register (VGPR) file and. It holds data used in
|
||||||
|
vector instructions.
|
||||||
|
GPUs with matrix cores also have AccVGPRs, which are Accumulation General
|
||||||
|
Purpose Vector Registers, used specifically in matrix instructions.
|
||||||
|
|
||||||
|
**SGPR File**
|
||||||
|
|
||||||
|
Size of the Scalar General Purpose Register (SGPR) file. Holds data used in
|
||||||
|
scalar instructions.
|
||||||
|
|
||||||
|
**GFXIP**
|
||||||
|
|
||||||
|
GFXIP (Graphics IP) is a versioning system used by AMD to identify the GPU
|
||||||
|
architecture and its instruction set. It helps categorize different generations
|
||||||
|
of GPUs and their feature sets.
|
||||||
|
|
||||||
|
**GFXIP major version**
|
||||||
|
|
||||||
|
Defines the GPU's core instruction set and architecture, which determines
|
||||||
|
compatibility with software stacks such as HIP and OpenCL. For example, a GFXIP
|
||||||
|
11 major version corresponds to the RDNA 3 (Navi 3x) architecture, influencing
|
||||||
|
driver support and available compute features.
|
||||||
|
|
||||||
|
**GFXIP minor version**
|
||||||
|
|
||||||
|
Represents specific variations within a GFXIP major version and affects feature sets,
|
||||||
|
optimizations, and driver behavior in software stacks such as HIP and OpenCL. Different
|
||||||
|
GPU models within the same major version can have unique capabilities, impacting
|
||||||
|
performance and supported instructions.
|
||||||
|
|
||||||
|
**GCD**
|
||||||
|
|
||||||
|
Graphics Compute Die.
|
||||||
|
|
||||||
|
**XCD**
|
||||||
|
|
||||||
|
Accelerator Complex Die.
|
||||||
|
|||||||
@@ -6,7 +6,7 @@
|
|||||||
algebra, AMD">
|
algebra, AMD">
|
||||||
</head>
|
</head>
|
||||||
|
|
||||||
# ROCm tools, compilers, and runtime API
|
# ROCm tools, compilers, and runtimes
|
||||||
|
|
||||||
::::{grid} 1 2 2 2
|
::::{grid} 1 2 2 2
|
||||||
:gutter: 3
|
:gutter: 3
|
||||||
@@ -59,12 +59,14 @@
|
|||||||
* [FLANG](https://github.com/ROCm/flang/)
|
* [FLANG](https://github.com/ROCm/flang/)
|
||||||
:::
|
:::
|
||||||
|
|
||||||
:::{grid-item-card} Runtime API
|
:::{grid-item-card} Runtimes
|
||||||
:class-body: rocm-card-banner rocm-hue-12
|
:class-body: rocm-card-banner rocm-hue-12
|
||||||
|
|
||||||
(runtimes)=
|
(runtimes)=
|
||||||
|
|
||||||
|
* {doc}`AMD Compute Language Runtime (CLR) <hip:understand/amd_clr>`
|
||||||
* {doc}`HIP <hip:index>`
|
* {doc}`HIP <hip:index>`
|
||||||
|
* {doc}`ROCR-Runtime <rocr-runtime:index>`
|
||||||
:::
|
:::
|
||||||
|
|
||||||
::::
|
::::
|
||||||
|
|||||||
@@ -10,7 +10,6 @@
|
|||||||
|
|
||||||
| Version | Release date |
|
| Version | Release date |
|
||||||
| ------- | ------------ |
|
| ------- | ------------ |
|
||||||
| [7.2.1](https://rocm.docs.amd.com/en/docs-7.2.1/) | March 25, 2026 |
|
|
||||||
| [7.2.0](https://rocm.docs.amd.com/en/docs-7.2.0/) | January 21, 2026 |
|
| [7.2.0](https://rocm.docs.amd.com/en/docs-7.2.0/) | January 21, 2026 |
|
||||||
| [7.1.1](https://rocm.docs.amd.com/en/docs-7.1.1/) | November 26, 2025 |
|
| [7.1.1](https://rocm.docs.amd.com/en/docs-7.1.1/) | November 26, 2025 |
|
||||||
| [7.1.0](https://rocm.docs.amd.com/en/docs-7.1.0/) | October 30, 2025 |
|
| [7.1.0](https://rocm.docs.amd.com/en/docs-7.1.0/) | October 30, 2025 |
|
||||||
|
|||||||
@@ -35,8 +35,20 @@ subtrees:
|
|||||||
title: TensorFlow compatibility
|
title: TensorFlow compatibility
|
||||||
- file: compatibility/ml-compatibility/jax-compatibility.rst
|
- file: compatibility/ml-compatibility/jax-compatibility.rst
|
||||||
title: JAX compatibility
|
title: JAX compatibility
|
||||||
|
- file: compatibility/ml-compatibility/verl-compatibility.rst
|
||||||
|
title: verl compatibility
|
||||||
|
- file: compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst
|
||||||
|
title: Stanford Megatron-LM compatibility
|
||||||
- file: compatibility/ml-compatibility/dgl-compatibility.rst
|
- file: compatibility/ml-compatibility/dgl-compatibility.rst
|
||||||
title: DGL compatibility
|
title: DGL compatibility
|
||||||
|
- file: compatibility/ml-compatibility/megablocks-compatibility.rst
|
||||||
|
title: Megablocks compatibility
|
||||||
|
- file: compatibility/ml-compatibility/ray-compatibility.rst
|
||||||
|
title: Ray compatibility
|
||||||
|
- file: compatibility/ml-compatibility/llama-cpp-compatibility.rst
|
||||||
|
title: llama.cpp compatibility
|
||||||
|
- file: compatibility/ml-compatibility/flashinfer-compatibility.rst
|
||||||
|
title: FlashInfer compatibility
|
||||||
- file: how-to/build-rocm.rst
|
- file: how-to/build-rocm.rst
|
||||||
title: Build ROCm from source
|
title: Build ROCm from source
|
||||||
|
|
||||||
@@ -65,14 +77,14 @@ subtrees:
|
|||||||
title: Train a model with Primus and Megatron-LM
|
title: Train a model with Primus and Megatron-LM
|
||||||
entries:
|
entries:
|
||||||
- file: how-to/rocm-for-ai/training/benchmark-docker/megatron-lm.rst
|
- file: how-to/rocm-for-ai/training/benchmark-docker/megatron-lm.rst
|
||||||
title: Train a model with Megatron-LM (legacy)
|
title: Train a model with Megatron-LM
|
||||||
- file: how-to/rocm-for-ai/training/benchmark-docker/primus-pytorch.rst
|
- file: how-to/rocm-for-ai/training/benchmark-docker/primus-pytorch.rst
|
||||||
title: Train a model with Primus and PyTorch
|
title: Train a model with Primus and PyTorch
|
||||||
entries:
|
entries:
|
||||||
- file: how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.rst
|
- file: how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.rst
|
||||||
title: Train a model with PyTorch (legacy)
|
title: Train a model with PyTorch
|
||||||
- file: how-to/rocm-for-ai/training/benchmark-docker/jax-maxtext.rst
|
- file: how-to/rocm-for-ai/training/benchmark-docker/jax-maxtext.rst
|
||||||
title: Train a model with Primus and JAX MaxText
|
title: Train a model with JAX MaxText
|
||||||
- file: how-to/rocm-for-ai/training/benchmark-docker/mpt-llm-foundry
|
- file: how-to/rocm-for-ai/training/benchmark-docker/mpt-llm-foundry
|
||||||
title: Train a model with LLM Foundry
|
title: Train a model with LLM Foundry
|
||||||
- file: how-to/rocm-for-ai/training/scale-model-training.rst
|
- file: how-to/rocm-for-ai/training/scale-model-training.rst
|
||||||
@@ -102,7 +114,7 @@ subtrees:
|
|||||||
- file: how-to/rocm-for-ai/inference/llm-inference-frameworks.rst
|
- file: how-to/rocm-for-ai/inference/llm-inference-frameworks.rst
|
||||||
title: LLM inference frameworks
|
title: LLM inference frameworks
|
||||||
- file: how-to/rocm-for-ai/inference/benchmark-docker/vllm.rst
|
- file: how-to/rocm-for-ai/inference/benchmark-docker/vllm.rst
|
||||||
title: vLLM inference
|
title: vLLM inference performance testing
|
||||||
- file: how-to/rocm-for-ai/inference/benchmark-docker/pytorch-inference.rst
|
- file: how-to/rocm-for-ai/inference/benchmark-docker/pytorch-inference.rst
|
||||||
title: PyTorch inference performance testing
|
title: PyTorch inference performance testing
|
||||||
- file: how-to/rocm-for-ai/inference/benchmark-docker/sglang.rst
|
- file: how-to/rocm-for-ai/inference/benchmark-docker/sglang.rst
|
||||||
@@ -211,7 +223,7 @@ subtrees:
|
|||||||
- file: reference/api-libraries.md
|
- file: reference/api-libraries.md
|
||||||
title: ROCm libraries
|
title: ROCm libraries
|
||||||
- file: reference/rocm-tools.md
|
- file: reference/rocm-tools.md
|
||||||
title: ROCm tools, compilers, and runtime API
|
title: ROCm tools, compilers, and runtimes
|
||||||
- file: reference/gpu-arch-specs.rst
|
- file: reference/gpu-arch-specs.rst
|
||||||
- file: reference/gpu-atomics-operation.rst
|
- file: reference/gpu-atomics-operation.rst
|
||||||
- file: reference/env-variables.rst
|
- file: reference/env-variables.rst
|
||||||
@@ -220,18 +232,6 @@ subtrees:
|
|||||||
title: Data types and precision support
|
title: Data types and precision support
|
||||||
- file: reference/graph-safe-support.rst
|
- file: reference/graph-safe-support.rst
|
||||||
title: Graph safe support
|
title: Graph safe support
|
||||||
- file: reference/glossary.rst
|
|
||||||
title: ROCm glossary
|
|
||||||
subtrees:
|
|
||||||
- entries:
|
|
||||||
- file: reference/glossary/device-hardware.rst
|
|
||||||
title: Device hardware
|
|
||||||
- file: reference/glossary/device-software.rst
|
|
||||||
title: Device software
|
|
||||||
- file: reference/glossary/host-software.rst
|
|
||||||
title: Host software
|
|
||||||
- file: reference/glossary/performance.rst
|
|
||||||
title: Performance
|
|
||||||
|
|
||||||
- caption: Contribute
|
- caption: Contribute
|
||||||
entries:
|
entries:
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
rocm-docs-core==1.33.1
|
rocm-docs-core==1.31.3
|
||||||
sphinx-reredirects
|
sphinx-reredirects
|
||||||
sphinx-sitemap
|
sphinx-sitemap
|
||||||
sphinxcontrib.datatemplates==0.11.0
|
sphinxcontrib.datatemplates==0.11.0
|
||||||
|
|||||||
@@ -37,7 +37,7 @@ click==8.3.1
|
|||||||
# sphinx-external-toc
|
# sphinx-external-toc
|
||||||
comm==0.2.3
|
comm==0.2.3
|
||||||
# via ipykernel
|
# via ipykernel
|
||||||
cryptography==46.0.6
|
cryptography==46.0.3
|
||||||
# via pyjwt
|
# via pyjwt
|
||||||
debugpy==1.8.19
|
debugpy==1.8.19
|
||||||
# via ipykernel
|
# via ipykernel
|
||||||
@@ -156,7 +156,7 @@ pydata-sphinx-theme==0.15.4
|
|||||||
# sphinx-book-theme
|
# sphinx-book-theme
|
||||||
pygithub==2.8.1
|
pygithub==2.8.1
|
||||||
# via rocm-docs-core
|
# via rocm-docs-core
|
||||||
pygments==2.20.0
|
pygments==2.19.2
|
||||||
# via
|
# via
|
||||||
# accessible-pygments
|
# accessible-pygments
|
||||||
# ipython
|
# ipython
|
||||||
@@ -184,11 +184,11 @@ referencing==0.37.0
|
|||||||
# via
|
# via
|
||||||
# jsonschema
|
# jsonschema
|
||||||
# jsonschema-specifications
|
# jsonschema-specifications
|
||||||
requests==2.33.0
|
requests==2.32.5
|
||||||
# via
|
# via
|
||||||
# pygithub
|
# pygithub
|
||||||
# sphinx
|
# sphinx
|
||||||
rocm-docs-core==1.33.1
|
rocm-docs-core==1.31.3
|
||||||
# via -r requirements.in
|
# via -r requirements.in
|
||||||
rpds-py==0.30.0
|
rpds-py==0.30.0
|
||||||
# via
|
# via
|
||||||
|
|||||||
@@ -10,13 +10,13 @@ ROCm is a software stack, composed primarily of open-source software, that
|
|||||||
provides the tools for programming AMD Graphics Processing Units (GPUs), from
|
provides the tools for programming AMD Graphics Processing Units (GPUs), from
|
||||||
low-level kernels to high-level end-user applications.
|
low-level kernels to high-level end-user applications.
|
||||||
|
|
||||||
.. image:: data/rocm-software-stack-7_2_1.png
|
.. image:: data/rocm-software-stack-7_0_0.jpg
|
||||||
:width: 800
|
:width: 800
|
||||||
:alt: AMD's ROCm software stack and enabling technologies.
|
:alt: AMD's ROCm software stack and enabling technologies.
|
||||||
:align: center
|
:align: center
|
||||||
|
|
||||||
Specifically, ROCm provides the tools for
|
Specifically, ROCm provides the tools for
|
||||||
:doc:`HIP <hip:index>`,
|
:doc:`HIP (Heterogeneous-computing Interface for Portability) <hip:index>`,
|
||||||
OpenCL and OpenMP. These include compilers, libraries for high-level
|
OpenCL and OpenMP. These include compilers, libraries for high-level
|
||||||
functions, debuggers, profilers and runtimes.
|
functions, debuggers, profilers and runtimes.
|
||||||
|
|
||||||
@@ -143,14 +143,16 @@ Compilers
|
|||||||
.. csv-table::
|
.. csv-table::
|
||||||
:header: "Component", "Description"
|
:header: "Component", "Description"
|
||||||
|
|
||||||
":doc:`HIPCC <hipcc:index>`", "Compiler driver utility that calls Clang and passes the appropriate include and library options for the target compiler and HIP infrastructure"
|
":doc:`HIPCC <hipcc:index>`", "Compiler driver utility that calls Clang or NVCC and passes the appropriate include and library options for the target compiler and HIP infrastructure"
|
||||||
":doc:`ROCm compilers <llvm-project:index>`", "ROCm LLVM compiler infrastructure"
|
":doc:`ROCm compilers <llvm-project:index>`", "ROCm LLVM compiler infrastructure"
|
||||||
"`FLANG <https://github.com/ROCm/flang/>`_", "An out-of-tree Fortran compiler targeting LLVM"
|
"`FLANG <https://github.com/ROCm/flang/>`_", "An out-of-tree Fortran compiler targeting LLVM"
|
||||||
|
|
||||||
Runtime API
|
Runtimes
|
||||||
-----------------------------------------------
|
-----------------------------------------------
|
||||||
|
|
||||||
.. csv-table::
|
.. csv-table::
|
||||||
:header: "Component", "Description"
|
:header: "Component", "Description"
|
||||||
|
|
||||||
":doc:`HIP <hip:index>`", "HIP is a C++ runtime API and kernel language for AMD GPUs"
|
":doc:`AMD Compute Language Runtime (CLR) <hip:understand/amd_clr>`", "Contains source code for AMD's compute language runtimes: HIP and OpenCL"
|
||||||
|
":doc:`HIP <hip:index>`", "AMD's GPU programming language extension and the GPU runtime"
|
||||||
|
":doc:`ROCR-Runtime <rocr-runtime:index>`", "User-mode API interfaces and libraries necessary for host applications to launch compute kernels on available HSA ROCm kernel agents"
|
||||||
|
|||||||
Reference in New Issue
Block a user