Compare commits

..

20 Commits

Author SHA1 Message Date
peterjunpark
ffa6360979 [docs/7.10] post-install - improve exports (#5783) 2025-12-16 16:25:41 -05:00
peterjunpark
b22c24b949 [docs/7.10.0] Radeon cards: suggest amdgpu over RSL driver (#5770) 2025-12-12 17:19:04 -05:00
peterjunpark
2563bd2a22 [docs/7.10.0] selector: responsive css for narrow viewports (#5769)
Fix element overflow on narrow/mobile screens.

Fix incorrect warning in matrix.py extension
2025-12-12 15:20:09 -05:00
peterjunpark
a04bb5b714 clean up ./.venv/... --> .venv/... (#5767) 2025-12-12 00:21:09 -05:00
peterjunpark
4c1f7d402f [docs/7.10.0] Fix typo in venv command and installation prerequisites (#5766)
* fmt

* Fix installation prerequisites and source venv typo
2025-12-12 00:06:32 -05:00
peterjunpark
e88e961519 [docs/7.10.0] selector.py: Make ids even more unique for selected content (#5765) 2025-12-11 17:47:58 -05:00
peterjunpark
5f6018027b fix conf.py (#5762) 2025-12-11 17:03:50 -05:00
Peter Park
2e614775ab Update docs for TheRock 7.10 release 2025-12-11 16:51:22 -05:00
peterjunpark
c92c9d2b57 [docs/7.9.0] Update banner msg (#5704)
update
2025-11-26 14:36:10 -05:00
peterjunpark
1a2e4f1cc0 [docs/7.9.0] Fix rocm-cmake github link due to non-existent tag (#5684) 2025-11-20 13:57:27 -05:00
peterjunpark
e76255db98 [docs/7.9.0] Add xDiT diffusion inference doc (#5676) 2025-11-18 16:56:00 -05:00
Alex Xu
cdbcad6930 add preview announcement 2025-11-13 15:57:09 -05:00
Alex Xu
dc19d266b4 update rocm-docs-core to show preview banner 2025-11-12 13:53:45 -05:00
Alex Xu
8584fa3b25 update rocm-docs-core to 1.29.0
(cherry picked from commit 39de859bd1)
2025-11-10 14:12:28 -05:00
peterjunpark
163d46ca5c [docs/7.9.0] Fix xref in Ubuntu prerequsites and RST heading overline (#5569) 2025-10-24 16:11:28 -04:00
peterjunpark
56f93e72de [docs/7.9.0] Use "generic" rocm-docs-core theme (#5568)
* use "generic" rocm-docs-core theme to tweak header

* restore "nav_secondary_items"
2025-10-24 11:04:12 -04:00
peterjunpark
f004891485 [docs/7.9.0] Note rocprofiler-sdk is Instinct only. Reorg some files to match docs/7.0.x. (#5563)
* move versions.md and compat-matrix to match prod; note rocprofiler-sdk is instinct only

* update href in versions.md
2025-10-23 12:32:20 -04:00
peterjunpark
3c61d4fb05 [docs/7.9.0] Add build from source overview page / Point to therocm-7.9.0 in components list (#5546)
* Update links to components to point the `therock-7.9.0` ref

* Add build from source page

* lint: fix caps and update .wordlist.txt

* add link to "development manuals" list

* add links to TheRock's development guide and fix step 4

* wording and fmt

* fix spacing

* fix fmt

* Fix documentation linting errors

* fix spacing
2025-10-21 12:41:56 -04:00
peterjunpark
39783dcee0 [docs/7.9.0] Fix GPU marketing names in 7.9.0 release.md and compatibility matrix / Add SD3.5 ComfyUI example (#5552)
* Add SD3.5 example to comfyui doc

* Fix Ryzen AI Max (PRO) SKU names

* Add names in multi line format
2025-10-21 11:02:48 -04:00
peterjunpark
c965b7e98e Add custom version history (#5551) 2025-10-20 18:11:38 -04:00
238 changed files with 20042 additions and 7299 deletions

View File

@@ -37,7 +37,6 @@ parameters:
- libdrm-dev
- libelf-dev
- libnuma-dev
- libsimde-dev
- ninja-build
- pkg-config
- name: rocmDependencies

View File

@@ -70,7 +70,7 @@ parameters:
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: rccl_build_${{ job.target }}
timeoutInMinutes: 120
timeoutInMinutes: 90
variables:
- group: common
- template: /.azuredevops/variables-global.yml

View File

@@ -17,14 +17,8 @@ parameters:
- libdw-dev
- libglfw3-dev
- libmsgpack-dev
- libomp-dev
- libopencv-dev
- libtbb-dev
- libtiff-dev
- libva-amdgpu-dev
- libavcodec-dev
- libavformat-dev
- libavutil-dev
- ninja-build
- python3-pip
- name: rocmDependencies
@@ -44,9 +38,7 @@ parameters:
- hipTensor
- llvm-project
- MIOpen
- MIVisionX
- rocBLAS
- rocDecode
- rocFFT
- rocJPEG
- rocPRIM
@@ -58,7 +50,6 @@ parameters:
- rocSPARSE
- rocThrust
- rocWMMA
- rpp
- name: rocmTestDependencies
type: object
default:
@@ -75,10 +66,7 @@ parameters:
- hipSPARSE
- hipTensor
- llvm-project
- MIOpen
- MIVisionX
- rocBLAS
- rocDecode
- rocFFT
- rocminfo
- rocPRIM
@@ -92,7 +80,6 @@ parameters:
- rocThrust
- roctracer
- rocWMMA
- rpp
- name: jobMatrix
type: object

View File

@@ -43,14 +43,9 @@ parameters:
- ninja-build
- python3-pip
- python3-venv
- googletest
- libgtest-dev
- libgmock-dev
- libboost-filesystem-dev
- name: pipModules
type: object
default:
- msgpack
- joblib
- "packaging>=22.0"
- pytest
@@ -152,13 +147,6 @@ jobs:
echo "##vso[task.prependpath]$USER_BASE/bin"
echo "##vso[task.setvariable variable=PytestCmakePath]$USER_BASE/share/Pytest/cmake"
displayName: Set cmake configure paths
- task: Bash@3
displayName: Add ROCm binaries to PATH
inputs:
targetType: inline
script: |
echo "##vso[task.prependpath]$(Agent.BuildDirectory)/rocm/bin"
echo "##vso[task.prependpath]$(Agent.BuildDirectory)/rocm/llvm/bin"
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}

View File

@@ -1,63 +0,0 @@
parameters:
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
- name: cli11Version
type: string
default: ''
- name: aptPackages
type: object
default:
- cmake
- git
- ninja-build
- name: jobMatrix
type: object
default:
buildJobs:
- { os: ubuntu2204, packageManager: apt}
- { os: almalinux8, packageManager: dnf}
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: cli11_${{ job.os }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool:
vmImage: 'ubuntu-22.04'
${{ if eq(job.os, 'almalinux8') }}:
container:
image: rocmexternalcicd.azurecr.io/manylinux228:latest
endpoint: ContainerService3
workspace:
clean: all
steps:
- checkout: none
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- task: Bash@3
displayName: Clone cli11 ${{ parameters.cli11Version }}
inputs:
targetType: inline
script: git clone https://github.com/CLIUtils/CLI11.git -b ${{ parameters.cli11Version }}
workingDirectory: $(Agent.BuildDirectory)
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
cmakeBuildDir: $(Agent.BuildDirectory)/CLI11/build
cmakeSourceDir: $(Agent.BuildDirectory)/CLI11
useAmdclang: false
extraBuildFlags: >-
-DCMAKE_BUILD_TYPE=Release
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
os: ${{ job.os }}

View File

@@ -1,66 +0,0 @@
parameters:
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
- name: yamlcppVersion
type: string
default: ''
- name: aptPackages
type: object
default:
- cmake
- git
- ninja-build
- name: jobMatrix
type: object
default:
buildJobs:
- { os: ubuntu2204, packageManager: apt}
- { os: almalinux8, packageManager: dnf}
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: yamlcpp_${{ job.os }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool:
vmImage: 'ubuntu-22.04'
${{ if eq(job.os, 'almalinux8') }}:
container:
image: rocmexternalcicd.azurecr.io/manylinux228:latest
endpoint: ContainerService3
workspace:
clean: all
steps:
- checkout: none
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- task: Bash@3
displayName: Clone yaml-cpp ${{ parameters.yamlcppVersion }}
inputs:
targetType: inline
script: git clone https://github.com/jbeder/yaml-cpp.git -b ${{ parameters.yamlcppVersion }}
workingDirectory: $(Agent.BuildDirectory)
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
cmakeBuildDir: $(Agent.BuildDirectory)/yaml-cpp/build
cmakeSourceDir: $(Agent.BuildDirectory)/yaml-cpp
useAmdclang: false
extraBuildFlags: >-
-DCMAKE_BUILD_TYPE=Release
-DYAML_CPP_BUILD_TOOLS=OFF
-DYAML_BUILD_SHARED_LIBS=OFF
-DYAML_CPP_INSTALL=ON
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
os: ${{ job.os }}

View File

@@ -1,23 +0,0 @@
variables:
- group: common
- template: /.azuredevops/variables-global.yml
parameters:
- name: cli11Version
type: string
default: "main"
resources:
repositories:
- repository: pipelines_repo
type: github
endpoint: ROCm
name: ROCm/ROCm
trigger: none
pr: none
jobs:
- template: ${{ variables.CI_DEPENDENCIES_PATH }}/cli11.yml
parameters:
cli11Version: ${{ parameters.cli11Version }}

View File

@@ -1,24 +0,0 @@
variables:
- group: common
- template: /.azuredevops/variables-global.yml
parameters:
- name: yamlcppVersion
type: string
default: "0.8.0"
resources:
repositories:
- repository: pipelines_repo
type: github
endpoint: ROCm
name: ROCm/ROCm
trigger: none
pr: none
jobs:
- template: ${{ variables.CI_DEPENDENCIES_PATH }}/yamlcpp.yml
parameters:
yamlcppVersion: ${{ parameters.yamlcppVersion }}

View File

@@ -3,11 +3,8 @@ ABI
ACE
ACEs
ACS
AccVGPR
AccVGPRs
AITER
ALU
AllReduce
AMD
AMDGPU
AMDGPUs
@@ -22,28 +19,32 @@ APIC
APIs
APU
APUs
ASAN
ASIC
ASICs
ASan
ASAN
ASm
ATI
atomicRMW
AccVGPR
AccVGPRs
AddressSanitizer
AlexNet
AllReduce
Andrej
Arb
Autocast
BARs
BatchNorm
BKC
BLAS
BMC
BabelStream
BatchNorm
Bitwise
Blit
Blockwise
Bluefield
Bootloader
BrainFloat
Broadcom
CAS
CCD
@@ -62,7 +63,6 @@ CPF
CPP
CPU
CPUs
Cron
CSC
CSDATA
CSE
@@ -73,8 +73,8 @@ CTests
CU
CUDA
CUs
CXX
CX
CXX
Cavium
CentOS
ChatGPT
@@ -87,35 +87,32 @@ Concretized
Conda
ConnectX
CountOnes
Cron
CuPy
da
Dashboarding
Dataloading
DBRX
DDR
DF
DGEMM
DGL
DGLGraph
dGPU
dGPUs
DIMM
DKMS
DL
DMA
DOMContentLoaded
DNN
DNNL
DOMContentLoaded
DPM
DRI
DW
DWORD
Dashboarding
Dask
DataFrame
DataLoader
DataParallel
Dataloading
Debian
decompositions
DeepSeek
DeepSpeed
Dependabot
@@ -123,24 +120,20 @@ Deprecations
DevCap
DirectX
Disaggregated
disaggregated
Dockerfile
Dockerized
Doxygen
dropless
ELMo
ENDPGM
EPYC
ESXi
EoS
etcd
fas
FBGEMM
FIFOs
FFT
FFTs
FFmpeg
FHS
FIFOs
FIXME
FMA
FP
@@ -149,8 +142,8 @@ Filesystem
FindDb
Flang
FlashAttention
FlashInfers
FlashInfer
FlashInfers
FluxBenchmark
Fortran
Fuyu
@@ -167,16 +160,12 @@ GDS
GEMM
GEMMs
GFLOPS
GFortran
GFXIP
GFortran
GGUF
Gemma
GiB
GIM
GL
Glibc
GLXT
Gloo
GMI
GPG
GPGPU
@@ -185,23 +174,25 @@ GPT
GPU
GPU's
GPUs
Graphbolt
GraphSage
GRBM
GRE
Gemma
GenAI
GenZ
GiB
GitHub
Gitpod
Glibc
Gloo
GraphSage
Graphbolt
HBM
HCA
HGX
HIPCC
hipDataType
HIPExtension
HIPIFY
HIPification
hipification
HIPify
HPC
HPCG
@@ -211,11 +202,12 @@ HSA
HW
HWE
HWS
HX
Haswell
Higgs
href
Hyperparameters
Huggingface
HunyuanVideo
Hyperparameters
IB
ICD
ICT
@@ -224,13 +216,10 @@ IDE
IDEs
IFWI
IMDb
IncDec
instrSize
interpolators
IOMMU
IOP
IOPS
IOPM
IOPS
IOV
IRQ
ISA
@@ -238,6 +227,7 @@ ISV
ISVs
ITL
ImageNet
IncDec
InfiniBand
Inlines
IntelliSense
@@ -246,8 +236,8 @@ Intersphinx
Intra
Ioffe
JAX's
Jinja
JSON
Jinja
Jupyter
KFD
KFDTest
@@ -255,10 +245,10 @@ KMD
KV
KVM
Karpathy's
KiB
Kineto
Keras
Khronos
KiB
Kineto
LAPACK
LCLK
LDS
@@ -268,21 +258,20 @@ LLVM
LM
LRU
LSAN
LSTMs
LSan
LTS
LSTMs
LteAll
LanguageCrossEntropy
LoRA
LteAll
MECO
MEM
MERCHANTABILITY
MFMA
MiB
MIGraphX
MIOpen
MIOpenGEMM
MIOpen's
MIOpenGEMM
MIVisionX
MLM
MMA
@@ -293,9 +282,9 @@ MNIST
MPI
MPT
MSVC
mul
MVAPICH
MVFFR
MXFP
Makefile
Makefiles
ManyLinux
@@ -308,16 +297,16 @@ Megatron
Mellanox
Mellanox's
Meta's
MiB
Miniconda
MirroredStrategy
Mixtral
MosaicML
MoEs
Mooncake
MosaicML
Mpops
Multicore
Multithreaded
MXFP
MyEnvironment
MyST
NANOO
@@ -325,24 +314,27 @@ NBIO
NBIOs
NCCL
NCF
NCS
NFS
NIC
NICs
NLI
NLP
NN
NOP
NPKit
NPS
NSP
NUMA
NVCC
NVIDIA
NVLink
NVPTX
NaN
Nano
Navi
Noncoherently
NoReturn
Noncoherently
NousResearch's
NumPy
OAM
@@ -369,13 +361,10 @@ OpenVX
OpenXLA
Optim
Oversubscription
PagedAttention
Pallas
PCC
PCI
PCIe
PEFT
perf
PEQT
PIL
PILImage
@@ -385,6 +374,8 @@ PRNG
PRs
PaLM
Pageable
PagedAttention
Pallas
PeerDirect
PerfDb
Perfetto
@@ -397,8 +388,8 @@ Pretraining
Primus
Profiler's
PyPi
Pytest
PyTorch
Pytest
Qcycles
Qwen
RAII
@@ -409,16 +400,16 @@ RDC's
RDMA
RDNA
README
Recomputation
RHEL
RLHF
RMW
RNN
RNNs
ROC
ROCProfiler
ROCT
ROCTx
ROCTracer
ROCTx
ROCclr
ROCdbgapi
ROCgdb
@@ -433,8 +424,10 @@ RPP
RST
RW
Radeon
Recomputation
RelWithDebInfo
Req
ResNet
Rickle
RoCE
Runfile
@@ -442,7 +435,6 @@ Ryzen
SALU
SBIOS
SCA
ScaledGEMM
SDK
SDKs
SDMA
@@ -461,7 +453,6 @@ SKU
SKUs
SLES
SLURM
Slurm
SMEM
SMFMA
SMI
@@ -472,17 +463,17 @@ SRAM
SRAMECC
SVD
SWE
ScaledGEMM
SerDes
ShareGPT
Shlens
simd
Skylake
Slurm
Softmax
Spack
SplitK
Supermicro
Szegedy
TagRAM
TCA
TCC
TCCs
@@ -490,36 +481,33 @@ TCI
TCIU
TCP
TCR
TVM
TheRock
THREADGROUPS
threadgroups
TensorRT
TensorFloat
TF
TFLOPS
THREADGROUPS
TP
TPS
TPU
TPUs
TSME
TVM
TagRAM
Tagram
Taichi
Taichi's
Tagram
TensileLite
TensorBoard
TensorFloat
TensorFlow
TensorParallel
TensorRT
TheRock
ToC
TopK
TorchAudio
torchaudio
TorchElastic
TorchMIGraphX
torchrec
TorchScript
TorchServe
torchserve
torchtext
TorchVision
TransferBench
TrapStatus
@@ -531,17 +519,18 @@ UE
UIF
UMC
USM
USM
UTCL
UTCL
UTIL
UTIL
UltraChat
Uncached
Unittests
Unhandled
unwindowed
Unittests
VALU
VBIOS
VCN
verl's
VGPR
VGPRs
VM
@@ -552,6 +541,7 @@ VSIX
VSkipped
Vanhoucke
Vulkan
WDAG
WGP
WGPs
WR
@@ -560,7 +550,6 @@ WikiText
Wojna
Workgroups
Writebacks
xcc
XCD
XCDs
XGBoost
@@ -580,8 +569,8 @@ ZeRO
ZenDNN
accuracies
activations
addr
addEventListener
addr
ade
ai
alloc
@@ -592,6 +581,7 @@ amdgpu
api
aten
atmi
atomicRMW
atomics
autogenerated
autotune
@@ -608,17 +598,16 @@ bilinear
bitcode
bitsandbytes
bitwise
Bitwise
blit
bootloader
boson
bosons
br
BrainFloat
btn
buildable
bursty
bzip
cTDP
cacheable
carveout
cd
@@ -633,6 +622,7 @@ cmd
coalescable
codename
collater
comfyui
comgr
compat
completers
@@ -649,16 +639,18 @@ copyable
cpp
csn
cuBLAS
cuda
cuDNN
cudnn
cuFFT
cuLIB
cuRAND
cuSOLVER
cuSPARSE
cuda
cudnn
customizations
cTDP
dGPU
dGPUs
da
dataset
datasets
dataspace
@@ -668,8 +660,9 @@ datatypes
dbgapi
de
deallocation
debuggability
debian
debuggability
decompositions
deepseek
denoise
denoised
@@ -684,10 +677,12 @@ devicelibs
devsel
dgl
dimensionality
disaggregated
disambiguates
distro
distros
dkms
dropless
dtype
eb
el
@@ -700,10 +695,14 @@ endpgm
enqueue
env
epilog
etcd
etcetera
ethernet
exascale
executables
fam
fam
fas
ffmpeg
filesystem
forEach
@@ -730,8 +729,8 @@ heterogenous
hipBLAS
hipBLASLt
hipBLASLt's
hipblaslt
hipCUB
hipDataType
hipFFT
hipFORT
hipLIB
@@ -742,10 +741,12 @@ hipSPARSELt
hipTensor
hipamd
hipblas
hipblaslt
hipcc
hipcub
hipfft
hipfort
hipification
hipify
hipsolver
hipsparse
@@ -754,6 +755,7 @@ hostname
hotspotting
hpc
hpp
href
hsa
hsakmt
hyperparameter
@@ -769,7 +771,9 @@ init
initializer
inlining
installable
instrSize
interop
interpolators
interprocedural
intra
intrinsics
@@ -812,23 +816,21 @@ mjx
mkdir
mlirmiopen
mtypes
mul
mutex
mvffr
namespace
namespaces
nanoGPT
NCS
NOP
NVLink
num
numref
ocl
ol
opencl
opencv
openmp
openssl
optimizers
ol
os
oversubscription
pageable
@@ -839,6 +841,7 @@ param
parameterization
passthrough
pe
perf
perfcounter
performant
perl
@@ -871,8 +874,6 @@ pseudorandom
px
py
pytorch
recommender
recommenders
quantile
quantizer
quasirandom
@@ -884,8 +885,10 @@ radeon
rccl
rdc
rdma
redhat
reStructuredText
recommender
recommenders
redhat
redirections
refactorization
reformats
@@ -899,7 +902,6 @@ rescaling
reusability
rhel
rl
RLHF
roadmap
roc
rocAL
@@ -926,8 +928,8 @@ rocm
rocminfo
rocprim
rocprof
rocprofv
rocprofiler
rocprofv
rocr
rocrand
rocsolver
@@ -938,7 +940,6 @@ rst
runtime
runtimes
ryzen
ResNet
sL
scalability
scalable
@@ -953,6 +954,7 @@ sglang
shader
sharding
sigmoid
simd
sles
sm
smi
@@ -973,6 +975,7 @@ submodule
submodules
subnet
supercomputing
suse
symlink
symlinks
sys
@@ -982,6 +985,7 @@ td
tensorfloat
tf
th
threadgroups
tokenization
tokenize
tokenized
@@ -991,17 +995,21 @@ toolchain
toolchains
toolset
toolsets
torchaudio
torchrec
torchserve
torchtext
torchtitan
torchvision
tp
tqdm
tracebacks
txt
TopK
uarch
ubuntu
uncached
udev
uncacheable
uncached
uncorrectable
underoptimized
unhandled
@@ -1010,12 +1018,11 @@ unmapped
unsqueeze
unstacking
unswitching
untar
untrusted
untuned
unwindowed
upvote
USM
UTCL
UTIL
utils
vL
variational
@@ -1027,6 +1034,7 @@ vectorized
vectorizer
vectorizes
verl
verl's
virtualize
virtualized
vjxb
@@ -1045,9 +1053,12 @@ writeback
writebacks
wrreq
wzo
xargs
xDiT
xGMI
xPacked
xargs
xcc
xdit
xz
yaml
ysvmadyb

View File

@@ -26,10 +26,16 @@ source software compilers, debuggers, and libraries. ROCm is fully integrated in
## Getting and Building ROCm from Source
Please use [TheRock](https://github.com/ROCm/TheRock/tree/release/therock-7.9) build system to build ROCm from source.
Please use [TheRock](https://github.com/ROCm/TheRock) build system to build ROCm from source.
## ROCm documentation
This repository contains the [manifest file](https://gerrit.googlesource.com/git-repo/+/HEAD/docs/manifest-format.md)
for ROCm releases, changelogs, and release information.
The `default.xml` file contains information for all repositories and the associated commit used to build
the current ROCm release; `default.xml` uses the [Manifest Format repository](https://gerrit.googlesource.com/git-repo/).
Source code for our documentation is located in the `/docs` folder of most ROCm repositories. The
`develop` branch of our repositories contains content for the next ROCm release.

File diff suppressed because it is too large Load Diff

67
default.xml Normal file
View File

@@ -0,0 +1,67 @@
<?xml version="1.0" encoding="UTF-8"?>
<manifest>
<remote name="rocm-org" fetch="https://github.com/ROCm/" />
<default revision="refs/tags/rocm-7.0.2"
remote="rocm-org"
sync-c="true"
sync-j="4" />
<!--list of projects for ROCm-->
<project name="ROCK-Kernel-Driver" />
<project name="ROCR-Runtime" />
<project name="amdsmi" />
<project name="aqlprofile" />
<project name="rdc" />
<project name="rocm_bandwidth_test" />
<project name="rocm_smi_lib" />
<project name="rocm-core" />
<project name="rocm-examples" />
<project name="rocminfo" />
<project name="rocprofiler" />
<project name="rocprofiler-register" />
<project name="rocprofiler-sdk" />
<project name="rocprofiler-compute" />
<project name="rocprofiler-systems" />
<project name="roctracer" />
<!--HIP Projects-->
<project name="hip" />
<project name="hip-tests" />
<project name="HIPIFY" />
<project name="clr" />
<project name="hipother" />
<!-- The following projects are all associated with the AMDGPU LLVM compiler -->
<project name="half" />
<project name="llvm-project" />
<project name="spirv-llvm-translator" />
<!-- gdb projects -->
<project name="ROCdbgapi" />
<project name="ROCgdb" />
<project name="rocr_debug_agent" />
<!-- ROCm Libraries -->
<project groups="mathlibs" name="AMDMIGraphX" />
<project groups="mathlibs" name="MIVisionX" />
<project groups="mathlibs" name="ROCmValidationSuite" />
<project groups="mathlibs" name="composable_kernel" />
<project groups="mathlibs" name="hipTensor" />
<project groups="mathlibs" name="hipfort" />
<project groups="mathlibs" name="rccl" />
<project groups="mathlibs" name="rocAL" />
<project groups="mathlibs" name="rocALUTION" />
<project groups="mathlibs" name="rocDecode" />
<project groups="mathlibs" name="rocJPEG" />
<!-- The following components have been migrated to rocm-libraries:
hipBLAS-common hipBLAS hipBLASLt hipCUB
hipFFT hipRAND hipSPARSE hipSPARSELt
MIOpen rocBLAS rocFFT rocPRIM rocRAND
rocSPARSE rocThrust Tensile -->
<project groups="mathlibs" name="rocm-libraries" />
<project groups="mathlibs" name="rocPyDecode" />
<project groups="mathlibs" name="rocSHMEM" />
<project groups="mathlibs" name="rocWMMA" />
<project groups="mathlibs" name="rocm-cmake" />
<project groups="mathlibs" name="rpp" />
<project groups="mathlibs" name="TransferBench" />
<!-- Projects for OpenMP-Extras -->
<project name="aomp" path="openmp-extras/aomp" />
<project name="aomp-extras" path="openmp-extras/aomp-extras" />
<project name="flang" path="openmp-extras/flang" />
</manifest>

View File

@@ -0,0 +1,111 @@
****************************************
ROCm |ROCM_VERSION| compatibility matrix
****************************************
To plan your ROCm |ROCM_VERSION| installation, use the following selector to
view ROCm compatibility and system requirements information for your AMD
hardware configuration. For installation instructions, see
:doc:`/install/rocm`.
.. include:: ./includes/selector.rst
----
Hardware, software, and firmware requirements
=============================================
ROCm depends on a coordinated stack of compatible firmware, driver, and user
space components. Maintaining version alignment between these layers ensures
expected GPU operation and performance, especially for AMD data center products.
Future preview releases will expand hardware and operating system coverage.
ROCm 7.10.0 enables support for primarily compute workloads. Future releases
will support mixed workloads (compute and graphics).
.. selected:: os=ubuntu os=rhel os=sles
.. selected:: fam=radeon-pro fam=radeon
If youre interested in testing AMD Radeon™ GPUs with preview support for
graphics use cases with AMD ROCm 7.10.0, install Radeon Software for Linux
version 25.30.1 from `Linux Drivers for AMD Radeon and Radeon PRO
Graphics <https://www.amd.com/en/support/download/linux-drivers.html>`__.
.. selected:: fam=ryzen
If you're interested in testing AMD Ryzen™ APUs with preview support for
graphics use cases with AMD ROCm 7.10.0, use the inbox graphics drivers of
Ubuntu 24.04.3.
.. include:: ./includes/system-instinct.rst
.. include:: ./includes/system-radeon-pro.rst
.. include:: ./includes/system-radeon.rst
.. include:: ./includes/system-ryzen.rst
----
.. _rocm-compat-frameworks:
Deep learning frameworks
========================
ROCm |ROCM_VERSION| provides optimized support for popular deep learning
frameworks. The following table lists supported frameworks and their supported
versions.
.. _rocm-compat-pytorch:
.. matrix::
.. matrix-head::
.. matrix-row::
.. matrix-cell:: Framework
:header:
.. matrix-cell:: Supported versions
:header:
.. matrix-row::
:show-when: fam=instinct fam=ryzen
.. matrix-cell:: PyTorch
.. matrix-cell:: 2.9.1, 2.8.0, 2.7.1
:show-when: os=ubuntu os=rhel os=sles
.. matrix-cell:: 2.9.1
:show-when: os=windows
.. matrix-row::
:show-when: fam=radeon fam=radeon-pro
.. matrix-cell:: PyTorch
.. matrix-cell:: 2.9.1
For installation instructions, see :ref:`pip-install-pytorch`.
.. _rocm-compat-python:
.. note::
ROCm |ROCM_VERSION| is compatible with Python versions **3.11**, **3.12**,
and **3.13**.
----
ROCm Core SDK components
========================
The following table lists core components included in the ROCm |ROCM_VERSION|
release. Expect future releases in this stream to expand the list of
components.
.. include:: ./includes/core-sdk-components-linux.rst
.. include:: ./includes/core-sdk-components-windows.rst

View File

@@ -0,0 +1,214 @@
.. matrix::
:show-when: os=ubuntu os=rhel os=sles
.. matrix-head::
.. matrix-row::
:header:
.. matrix-cell:: Component group
.. matrix-cell:: Component name
.. matrix-row::
.. matrix-cell:: Runtimes and compilers
:rowspan: 5
.. matrix-cell::
`HIP <https://github.com/ROCm/rocm-systems/tree/therock-7.10/projects/hip>`__
.. matrix-row::
.. matrix-cell::
`HIPIFY <https://github.com/ROCm/HIPIFY/tree/therock-7.10>`__
.. matrix-row::
.. matrix-cell::
`LLVM <https://github.com/ROCm/llvm-project/tree/therock-7.10>`__
.. matrix-row::
.. matrix-cell::
`ROCr Runtime <https://github.com/ROCm/rocm-systems/tree/therock-7.10/projects/rocr-runtime>`__
.. matrix-row::
.. matrix-cell::
`SPIRV-LLVM-Translator <https://github.com/ROCm/SPIRV-LLVM-Translator/tree/therock-7.10>`__
.. matrix-row::
.. matrix-cell:: Control and monitoring tools
:rowspan: 2
.. matrix-cell::
`AMD SMI <https://github.com/ROCm/amdsmi/tree/release/therock-7.10>`__
.. matrix-row::
.. matrix-cell::
`rocminfo <https://github.com/ROCm/rocm-systems/tree/therock-7.10/projects/rocminfo>`__
.. matrix-row::
.. matrix-cell:: Profiling and debugging tools
:rowspan: 3
:show-when: fam=instinct
.. matrix-cell:: Profiling and debugging tools
:rowspan: 2
:show-when: fam=radeon-pro fam=radeon fam=ryzen
.. matrix-row::
.. matrix-cell::
`ROCm Compute Profiler (rocprofiler-compute) <https://github.com/ROCm/rocm-systems/tree/therock-7.10/projects/rocprofiler-compute>`__
.. matrix-row::
:show-when: fam=instinct
.. matrix-cell::
`ROCprofiler-SDK <https://github.com/ROCm/rocm-systems/tree/therock-7.10/projects/rocprofiler-sdk>`__
.. matrix-row::
.. matrix-cell:: Math and compute libraries
:rowspan: 18
:show-when: fam=instinct
.. matrix-cell:: Math and compute libraries
:rowspan: 17
:show-when: fam=radeon-pro fam=radeon fam=ryzen
.. matrix-cell::
`rocBLAS <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/rocblas>`__
.. matrix-row::
.. matrix-cell::
`hipBLAS <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/hipblas>`__
.. matrix-row::
.. matrix-cell::
`hipBLASLt <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/hipblaslt>`__
.. matrix-row::
.. matrix-cell::
`rocFFT <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/rocfft>`__
.. matrix-row::
.. matrix-cell::
`hipFFT <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/hipfft>`__
.. matrix-row::
.. matrix-cell::
`rocRAND <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/rocrand>`__
.. matrix-row::
.. matrix-cell::
`hipRAND <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/hiprand>`__
.. matrix-row::
.. matrix-cell::
`rocSOLVER <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/rocsolver>`__
.. matrix-row::
.. matrix-cell::
`hipSOLVER <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/hipsolver>`__
.. matrix-row::
.. matrix-cell::
`rocSPARSE <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/rocsparse>`__
.. matrix-row::
.. matrix-cell::
`hipSPARSE <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/hipsparse>`__
.. matrix-row::
.. matrix-cell::
`hipSPARSELt <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/hipsparselt>`__
.. matrix-row::
.. matrix-cell::
`rocPRIM <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/rocprim>`__
.. matrix-row::
.. matrix-cell::
`rocThrust <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/rocthrust>`__
.. matrix-row::
.. matrix-cell::
`hipCUB <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/hipcub>`__
.. matrix-row::
.. matrix-cell::
`rocWMMA <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/rocwmma>`__
.. matrix-row::
.. matrix-cell::
`Composable Kernel <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/composablekernel>`__ (partial, limited support)
.. matrix-row::
:show-when: fam=instinct
.. matrix-cell::
`MIOpen <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/miopen>`__
.. matrix-row::
.. matrix-cell:: Communication librarires
.. matrix-cell::
`RCCL <https://github.com/ROCm/rccl/tree/release/therock-7.10>`__
.. matrix-row::
.. matrix-cell:: Support libraries
.. matrix-cell::
`ROCm CMake <https://github.com/ROCm/rocm-cmake/tree/therock-7.10>`__

View File

@@ -0,0 +1,139 @@
.. matrix::
:show-when: os=windows
.. matrix-head::
.. matrix-row::
:header:
.. matrix-cell:: Component group
.. matrix-cell:: Component name
.. matrix-row::
.. matrix-cell:: Runtimes and compilers
:rowspan: 3
.. matrix-cell::
`HIP <https://github.com/ROCm/rocm-systems/tree/therock-7.10/projects/hip>`__
.. matrix-row::
.. matrix-cell::
`HIPIFY <https://github.com/ROCm/HIPIFY/tree/therock-7.10>`__
.. matrix-row::
.. matrix-cell::
`LLVM <https://github.com/ROCm/llvm-project/tree/therock-7.10>`__
.. matrix-row::
.. matrix-cell:: Control and monitoring tools
.. matrix-cell:: hipinfo
.. matrix-row::
.. matrix-cell:: Math and compute libraries
:rowspan: 15
.. matrix-cell::
`rocBLAS <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/rocblas>`__
.. matrix-row::
.. matrix-cell::
`hipBLAS <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/hipblas>`__
.. matrix-row::
.. matrix-cell::
`hipBLASLt <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/hipblaslt>`__
.. matrix-row::
.. matrix-cell::
`rocFFT <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/rocfft>`__
.. matrix-row::
.. matrix-cell::
`hipFFT <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/hipfft>`__
.. matrix-row::
.. matrix-cell::
`rocRAND <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/rocrand>`__
.. matrix-row::
.. matrix-cell::
`hipRAND <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/rocsolver>`__
.. matrix-row::
.. matrix-cell::
`rocSOLVER <https://github.com/ROCm/rocm-libraries/tree/therock-7.9.0/projects/rocsolver>`__
.. matrix-row::
.. matrix-cell::
`hipSOLVER <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/hipsolver>`__
.. matrix-row::
.. matrix-cell::
`rocSPARSE <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/rocsparse>`__
.. matrix-row::
.. matrix-cell::
`hipSPARSE <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/hipsparse>`__
.. matrix-row::
.. matrix-cell::
`rocPRIM <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/rocprim>`__
.. matrix-row::
.. matrix-cell::
`rocThrust <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/rocthrust>`__
.. matrix-row::
.. matrix-cell::
`hipCUB <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/hipcub>`__
.. matrix-row::
.. matrix-cell::
`rocWMMA <https://github.com/ROCm/rocm-libraries/tree/therock-7.10/projects/rocwmma>`__
.. matrix-row::
.. matrix-cell:: Support libraries
.. matrix-cell::
`ROCm CMake <https://github.com/ROCm/rocm-cmake/tree/therock-7.10>`__

View File

@@ -0,0 +1,128 @@
.. selector:: AMD device family
:key: fam
.. selector-option:: Instinct
:value: instinct
:width: 3
.. selector-option:: Radeon PRO
:value: radeon-pro
:width: 3
.. selector-option:: Radeon RX
:value: radeon
:width: 3
.. selector-option:: Ryzen AI
:value: ryzen
:width: 3
.. selector:: Instinct GPU
:key: gfx
:show-when: fam=instinct
.. selector-info:: https://www.amd.com/en/products/accelerators/instinct.html
.. selector-option:: Instinct MI355X<br>Instinct MI350X
:value: 950
:width: 4
.. selector-option:: Instinct MI325X<br>Instinct MI300X<br>Instinct MI300A
:value: 942
:width: 4
.. selector-option:: Instinct MI250X<br>Instinct MI250<br>Instinct MI210
:value: 90a
:width: 4
.. selector:: Radeon PRO GPU
:key: gfx
:show-when: fam=radeon-pro
.. selector-info:: https://www.amd.com/en/products/graphics/workstations.html
.. selector-option:: Radeon PRO W7900D<br>Radeon PRO W7900<br>Radeon PRO W7800 48GB<br>Radeon PRO W7800
:value: 1100
:width: 6
.. selector-option:: Radeon PRO W7700
:value: 1101
:width: 6
.. selector:: Radeon RX GPU
:key: gfx
:show-when: fam=radeon
.. selector-info:: https://www.amd.com/en/products/graphics/desktops/radeon.html
.. selector-option:: Radeon RX 7900 XTX<br>Radeon RX 7900 XT<br>Radeon RX 7900 GRE
:value: 1100
.. selector-option:: Radeon RX 7800 XT<br>Radeon RX 7700 XT
:value: 1101
.. selector:: Ryzen AI APU
:key: gfx
:show-when: fam=ryzen
.. selector-info:: https://www.amd.com/en/products/processors/workstations/mobile.html
.. selector-option:: Ryzen AI Max+ PRO 395<br>Ryzen AI Max PRO 390, 385, 380<br>Ryzen AI Max+ 395<br>Ryzen AI Max 390, 385
:value: 1151
:width: 6
.. selector-option:: Ryzen AI 9 HX 375<br>Ryzen AI 9 HX 370<br>Ryzen AI 9 365
:value: 1150
:width: 6
.. selector:: Operating system
:key: os
:show-when: fam=instinct
.. selector-option:: Ubuntu
:value: ubuntu
:icon: fab fa-ubuntu fa-lg
:width: 4
.. selector-option:: RHEL
:value: rhel
:icon: fab fa-redhat fa-lg
:width: 4
.. selector-option:: SLES
:value: sles
:icon: fab fa-suse fa-lg
:width: 4
.. selector:: Operating system
:key: os
:show-when: fam=radeon-pro fam=radeon
.. selector-option:: Ubuntu
:value: ubuntu
:icon: fab fa-ubuntu fa-lg
:width: 4
.. selector-option:: RHEL
:value: rhel
:icon: fab fa-redhat fa-lg
:width: 4
.. selector-option:: Windows
:value: windows
:icon: fab fa-windows fa-lg
:width: 4
.. selector:: Operating system
:key: os
:show-when: fam=ryzen
.. selector-option:: Ubuntu
:value: ubuntu
:icon: fab fa-ubuntu fa-lg
:width: 6
.. selector-option:: Windows
:value: windows
:icon: fab fa-windows fa-lg
:width: 6

View File

@@ -0,0 +1,147 @@
.. matrix::
:show-when: fam=instinct
.. matrix-row::
:show-when: gfx=950
.. matrix-cell:: AMD Instinct MI350 Series
:header:
.. matrix-cell::
`Instinct MI355X <https://www.amd.com/en/products/accelerators/instinct/mi350/mi355x.html>`__
`Instinct MI350X <https://www.amd.com/en/products/accelerators/instinct/mi350/mi350x.html>`__
.. matrix-row::
:show-when: gfx=942
.. matrix-cell:: AMD Instinct MI300 Series
:header:
.. matrix-cell::
`Instinct MI325X <https://www.amd.com/en/products/accelerators/instinct/mi300/mi325x.html>`__
`Instinct MI300X <https://www.amd.com/en/products/accelerators/instinct/mi300/mi300x.html>`__
`Instinct MI300A <https://www.amd.com/en/products/accelerators/instinct/mi300/mi300a.html>`__
.. matrix-row::
:show-when: gfx=90a
.. matrix-cell:: AMD Instinct MI200 Series
:header:
.. matrix-cell::
`Instinct MI250X <https://www.amd.com/en/products/accelerators/instinct/mi200/mi250x.html>`__
`Instinct MI250 <https://www.amd.com/en/products/accelerators/instinct/mi200/mi250.html>`__
`Instinct MI210 <https://www.amd.com/en/products/accelerators/instinct/mi200/mi210.html>`__
.. matrix-row::
.. matrix-cell:: Architecture
:header:
.. matrix-cell:: CDNA 4
:show-when: gfx=950
.. matrix-cell:: CDNA 3
:show-when: gfx=942
.. matrix-cell:: CDNA 2
:show-when: gfx=90a
.. matrix-row::
.. matrix-cell:: LLVM target
:header:
.. matrix-cell:: gfx950
:show-when: gfx=950
.. matrix-cell:: gfx942
:show-when: gfx=942
.. matrix-cell:: gfx90a
:show-when: gfx=90a
.. matrix-row::
:show-when: os=ubuntu
.. matrix-cell:: Supported Ubuntu versions
:header:
.. matrix-cell::
Ubuntu 24.04.3 (GA kernel: 6.8)
Ubuntu 22.04.5 (GA kernel: 5.15)
.. matrix-row::
:show-when: os=rhel
.. matrix-cell:: Supported Red Hat Enterprise Linux versions
:header:
.. matrix-cell::
RHEL 10.1 (kernel: 6.12.0-124)
RHEL 10.0 (kernel: 6.12.0-55)
RHEL 9.7 (kernel: 5.14.0-611)
RHEL 9.6 (kernel: 5.14.0-570)
RHEL 8.10 (kernel: 4.18.0-553)
.. matrix-row::
:show-when: os=sles
.. matrix-cell:: Supported SUSE Linux Enterprise Server version
:header:
.. matrix-cell:: SLES 15.7 (kernel: 6.4.0-150700.51)
.. matrix-row::
.. matrix-cell:: Supported AMD GPU Driver (amdgpu) versions
:header:
.. matrix-cell::
`30.20.0 <https://instinct.docs.amd.com/projects/amdgpu-docs/en/docs-30.20.0/>`__,
`30.10.2 <https://instinct.docs.amd.com/projects/amdgpu-docs/en/docs-30.10.2/>`__,
`30.10.1 <https://instinct.docs.amd.com/projects/amdgpu-docs/en/docs-30.10.1/>`__,
`30.10.0 <https://instinct.docs.amd.com/projects/amdgpu-docs/en/docs-30.10/>`__
.. matrix-row::
.. matrix-cell:: Supported PLDM bundle (firmware) versions
:header:
.. matrix-cell:: 01.25.15.04, 01.25.13.09
:show-when: gfx=950
.. matrix-cell::
:show-when: gfx=942
**MI325X** 01.25.04.02, 01.25.03.03
**MI300X** 01.25.05.00 (or later), 01.25.03.12
**MI300A** BKC 26, 25
.. matrix-cell::
:show-when: gfx=90a
**MI250X** IFWI 47 (or later)
**MI250** Maintenance update 5 with IFWI 75 (or later)
**MI210** Maintenance update 5 with IFWI 75 (or later)

View File

@@ -0,0 +1,111 @@
.. matrix::
:show-when: fam=radeon-pro
.. matrix-row::
:show-when: gfx=1100
.. matrix-cell:: AMD Radeon PRO W7000 Series
:header:
.. matrix-cell::
`Radeon PRO W7900D <https://www.amd.com/en/support/downloads/drivers.html/graphics/radeon-pro/radeon-pro-w7000-series/amd-radeon-pro-w7900d.html>`__
`Radeon PRO W7900 <https://www.amd.com/en/products/graphics/workstations/radeon-pro/w7900.html>`__
`Radeon PRO W7800 48GB <https://www.amd.com/en/products/graphics/workstations/radeon-pro/w7800-48gb.html>`__
`Radeon PRO W7800 <https://www.amd.com/en/products/graphics/workstations/radeon-pro/w7800.html>`__
.. matrix-row::
:show-when: gfx=1101
.. matrix-cell:: AMD Radeon PRO W7000 Series
:header:
.. matrix-cell::
`Radeon PRO W7700 <https://www.amd.com/en/products/graphics/workstations/radeon-pro/w7700.html>`__
.. matrix-row::
.. matrix-cell:: Architecture
:header:
.. matrix-cell:: RDNA 3
.. matrix-row::
.. matrix-cell:: LLVM target
:header:
.. matrix-cell:: gfx1101
:show-when: gfx=1101
.. matrix-cell:: gfx1100
:show-when: gfx=1100
.. matrix-row::
:show-when: os=ubuntu
.. matrix-cell:: Supported Ubuntu versions
:header:
.. matrix-cell::
24.04.3 (GA kernel: 6.8)
22.04.5 (GA kernel: 5.15)
.. matrix-row::
:show-when: os=rhel
.. matrix-cell:: Supported RHEL versions
:header:
.. matrix-cell:: 10.1, 10.0
.. matrix-row::
:show-when: os=windows
.. matrix-cell:: Supported Windows version
:header:
.. matrix-cell:: Windows 11 25H2
.. matrix-row::
.. matrix-cell:: Supported AMD GPU Driver (amdgpu) versions
:header:
.. matrix-cell::
`30.20.0 <https://instinct.docs.amd.com/projects/amdgpu-docs/en/docs-30.20.0/>`__,
`30.10.2 <https://instinct.docs.amd.com/projects/amdgpu-docs/en/docs-30.10.2/>`__,
`30.10.1 <https://instinct.docs.amd.com/projects/amdgpu-docs/en/docs-30.10.1/>`__,
`30.10.0 <https://instinct.docs.amd.com/projects/amdgpu-docs/en/docs-30.10/>`__
.. matrix-row::
:show-when: os=ubuntu os=rhel os=sles
.. matrix-cell:: Supported Radeon Software for Linux version
:header:
.. matrix-cell::
`25.30.1 <https://www.amd.com/en/support/download/linux-drivers.html#linux-for-radeon-pro>`__
.. matrix-row::
:show-when: os=windows
.. matrix-cell:: Supported Adrenalin Driver version
:header:
.. matrix-cell::
`25.11.1 <https://www.amd.com/en/resources/support-articles/release-notes/RN-RAD-WIN-25-11-1.html>`__
(generally recommended)
`25.20.01.17 <https://www.amd.com/en/resources/support-articles/release-notes/RN-AMDGPU-WINDOWS-PYTORCH-7-1-1.html>`__
(recommended for ComfyUI)

View File

@@ -0,0 +1,112 @@
.. matrix::
:show-when: fam=radeon
.. matrix-row::
:show-when: gfx=1100
.. matrix-cell:: AMD Radeon RX 7000 Series
:header:
.. matrix-cell::
`Radeon RX 7900 XTX <https://www.amd.com/en/products/graphics/desktops/radeon/7000-series/amd-radeon-rx-7900xtx.html>`__
`Radeon RX 7900 XT <https://www.amd.com/en/products/graphics/desktops/radeon/7000-series/amd-radeon-rx-7900xt.html>`__
`Radeon RX 7900 GRE <https://www.amd.com/en/products/graphics/desktops/radeon/7000-series/amd-radeon-rx-7900-gre.html>`__
.. matrix-row::
:show-when: gfx=1101
.. matrix-cell:: AMD Radeon RX 7000 Series
:header:
.. matrix-cell::
`Radeon RX 7800 XT <https://www.amd.com/en/products/graphics/desktops/radeon/7000-series/amd-radeon-rx-7800-xt.html>`__
`Radeon RX 7700 XT <https://www.amd.com/en/products/graphics/desktops/radeon/7000-series/amd-radeon-rx-7700-xt.html>`__
.. matrix-row::
.. matrix-cell:: Architecture
:header:
.. matrix-cell:: RDNA 3
:show-when: gfx=1101 gfx=1100
.. matrix-row::
.. matrix-cell:: LLVM target
:header:
.. matrix-cell:: gfx1100
:show-when: gfx=1100
.. matrix-cell:: gfx1101
:show-when: gfx=1101
.. matrix-row::
:show-when: os=ubuntu
.. matrix-cell:: Supported Ubuntu versions
:header:
.. matrix-cell::
24.04.3 (GA kernel: 6.8)
22.04.5 (GA kernel: 5.15)
.. matrix-row::
:show-when: os=rhel
.. matrix-cell:: Supported RHEL versions
:header:
.. matrix-cell:: 10.1, 10.0
.. matrix-row::
:show-when: os=windows
.. matrix-cell:: Supported Windows version
:header:
.. matrix-cell:: Windows 11 25H2
.. matrix-row::
.. matrix-cell:: Supported AMD GPU Driver (amdgpu) versions
:header:
.. matrix-cell::
`30.20.0 <https://instinct.docs.amd.com/projects/amdgpu-docs/en/docs-30.20.0/>`__,
`30.10.2 <https://instinct.docs.amd.com/projects/amdgpu-docs/en/docs-30.10.2/>`__,
`30.10.1 <https://instinct.docs.amd.com/projects/amdgpu-docs/en/docs-30.10.1/>`__,
`30.10.0 <https://instinct.docs.amd.com/projects/amdgpu-docs/en/docs-30.10/>`__
.. matrix-row::
:show-when: os=ubuntu os=rhel os=sles
.. matrix-cell:: Supported Radeon Software for Linux version
:header:
.. matrix-cell::
`25.30.1 <https://www.amd.com/en/support/download/linux-drivers.html#linux-for-radeon-pro>`__
.. matrix-row::
:show-when: os=windows
.. matrix-cell:: Supported Adrenalin Driver version
:header:
.. matrix-cell::
`25.11.1 <https://www.amd.com/en/resources/support-articles/release-notes/RN-RAD-WIN-25-11-1.html>`__
(generally recommended)
`25.20.01.17 <https://www.amd.com/en/resources/support-articles/release-notes/RN-AMDGPU-WINDOWS-PYTORCH-7-1-1.html>`__
(recommended for ComfyUI)

View File

@@ -0,0 +1,103 @@
.. matrix::
:show-when: fam=ryzen
.. matrix-row::
:show-when: gfx=1151
.. matrix-cell:: AMD Ryzen AI Max PRO 300 Series
:header:
.. matrix-cell::
`Ryzen AI Max+ PRO 395 <https://www.amd.com/en/products/processors/laptop/ryzen-pro/ai-max-pro-300-series/amd-ryzen-ai-max-plus-pro-395.html>`__
`Ryzen AI Max PRO 390 <https://www.amd.com/en/products/processors/laptop/ryzen-pro/ai-max-pro-300-series/amd-ryzen-ai-max-pro-390.html>`__
`Ryzen AI Max PRO 385 <https://www.amd.com/en/products/processors/laptop/ryzen-pro/ai-max-pro-300-series/amd-ryzen-ai-max-pro-385.html>`__
`Ryzen AI Max PRO 380 <https://www.amd.com/en/products/processors/laptop/ryzen-pro/ai-max-pro-300-series/amd-ryzen-ai-max-pro-380.html>`__
.. matrix-row::
:show-when: gfx=1151
.. matrix-cell:: AMD Ryzen AI Max 300 Series
:header:
.. matrix-cell::
`Ryzen AI Max+ 395 <https://www.amd.com/en/products/processors/laptop/ryzen/ai-300-series/amd-ryzen-ai-max-plus-395.html>`__
`Ryzen AI Max 390 <https://www.amd.com/en/products/processors/laptop/ryzen/ai-300-series/amd-ryzen-ai-max-390.html>`__
`Ryzen AI Max 385 <https://www.amd.com/en/products/processors/laptop/ryzen/ai-300-series/amd-ryzen-ai-max-385.html>`__
.. matrix-row::
:show-when: gfx=1150
.. matrix-cell:: AMD Ryzen AI 300 Series
:header:
.. matrix-cell::
`Ryzen AI 9 HX 375 <https://www.amd.com/en/products/processors/laptop/ryzen/ai-300-series/amd-ryzen-ai-9-hx-375.html>`__
`Ryzen AI 9 HX 370 <https://www.amd.com/en/products/processors/laptop/ryzen/ai-300-series/amd-ryzen-ai-9-hx-370.html>`__
`Ryzen AI 9 365 <https://www.amd.com/en/products/processors/laptop/ryzen/ai-300-series/amd-ryzen-ai-9-365.html>`__
.. matrix-row::
.. matrix-cell:: Architecture
:header:
.. matrix-cell:: RDNA 3.5
.. matrix-row::
.. matrix-cell:: LLVM target
:header:
.. matrix-cell:: gfx1151
:show-when: gfx=1151
.. matrix-cell:: gfx1150
:show-when: gfx=1150
.. matrix-row::
:show-when: os=ubuntu
.. matrix-cell:: Supported Ubuntu versions
:header:
.. matrix-cell:: 24.04.3 (HWE kernel: 6.14)
.. matrix-row::
:show-when: os=windows
.. matrix-cell:: Supported Windows version
:header:
.. matrix-cell:: Windows 11 25H2
.. matrix-row::
:show-when: os=ubuntu
.. matrix-cell:: Supported kernel driver version
:header:
.. matrix-cell:: Inbox kernel driver in Ubuntu 24.04.3
.. matrix-row::
:show-when: os=windows
.. matrix-cell:: Supported Adrenalin Driver version
:header:
.. matrix-cell::
`25.11.1 <https://www.amd.com/en/resources/support-articles/release-notes/RN-RAD-WIN-25-11-1.html>`__
(generally recommended)
`25.20.01.17 <https://www.amd.com/en/resources/support-articles/release-notes/RN-AMDGPU-WINDOWS-PYTORCH-7-1-1.html>`__
(recommended for ComfyUI)

View File

@@ -9,8 +9,8 @@ import shutil
import sys
from pathlib import Path
ROCM_VERSION = "7.9.0"
GA_DATE = "2025-10-20"
ROCM_VERSION = "7.10.0"
GA_DATE = "2025-12-11"
DOCS_DIR = Path(__file__).parent.resolve()
ROOT_DIR = DOCS_DIR.parent
@@ -102,46 +102,40 @@ html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "rocm.docs.amd.com")
html_context = {}
# configurations for PDF output by Read the Docs
project = "ROCm Documentation"
project = "ROCm documentation"
project_path = str(DOCS_DIR).replace("\\", "/")
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) %Y Advanced Micro Devices, Inc. All rights reserved."
version = ROCM_VERSION
release = ROCM_VERSION
setting_all_article_info = True
setting_all_article_info = False
all_article_info_os = ["linux", "windows"]
all_article_info_author = ""
# pages with specific settings
article_pages = [
{"file": "about/release-notes", "date": GA_DATE},
{"file": "about/release-notes", "date": GA_DATE, "os": ["linux", "windows"]},
]
external_toc_path = "./sphinx/_toc.yml"
# Register Sphinx extensions and static assets
sys.path.append(str(DOCS_DIR / "extension"))
# html_static_path = ["sphinx/static/css", "extension/how-to/rocm-for-ai/inference"]
# html_css_files = [
# "rocm_custom.css",
# "rocm_rn.css",
# "dynamic_picker.css",
# "vllm-benchmark.css",
# ]
templates_path = ["extension/rocm_docs_custom/templates", "extension/templates"]
extensions = [
"rocm_docs",
"rocm_docs_custom.selector",
"rocm_docs_custom.table",
"rocm_docs_custom.matrix",
"rocm_docs_custom.icon",
# "sphinxcontrib.datatemplates",
# "sphinx_reredirects",
# "sphinx_sitemap",
# "sphinxcontrib.datatemplates",
# "version-ref",
# "csv-to-list-table",
]
templates_path = ["extension/rocm_docs_custom/templates"]
html_static_path = ["sphinx/static"]
html_js_files = ["setup-toc-install-headings.js"]
# compatibility_matrix_file = str(
# DOCS_DIR / "compatibility/compatibility-matrix-historical-6.0.csv"
@@ -150,17 +144,33 @@ extensions = [
external_projects_current_project = "rocm"
html_theme = "rocm_docs_theme"
html_theme_options = {
"flavor": "rocm-docs-home",
"announcement": f"This is ROCm {ROCM_VERSION} technology preview release documentation. For the latest production stream release, refer to <a id='rocm-banner' href='https://rocm.docs.amd.com/en/latest/'>ROCm documentation</a>.",
"flavor": "generic",
"header_title": f"ROCm™ {ROCM_VERSION} Preview",
"header_link": f"https://rocm.docs.amd.com/en/{ROCM_VERSION}-preview/index.html",
"version_list_link": f"https://rocm.docs.amd.com/en/{ROCM_VERSION}-preview/release/versions.html",
"nav_secondary_items": {
"GitHub": "https://github.com/ROCm/ROCm",
"Community": "https://github.com/ROCm/ROCm/discussions",
"Blogs": "https://rocm.blogs.amd.com/",
"Instinct™ Docs": "https://instinct.docs.amd.com/",
"Support": "https://github.com/ROCm/ROCm/issues/new/choose",
},
"link_main_doc": False,
"secondary_sidebar_items": {
"**": ["page-toc"],
"compatibility/compatibility-matrix": ["selector-toc2"],
"install/rocm": ["selector-toc2"],
"install/compatibility-matrix": ["selector-toc2"],
}
"rocm-for-ai/pytorch-comfyui": ["selector-toc2"],
},
}
html_title = f"AMD ROCm {ROCM_VERSION} preview"
numfig = False
rst_prolog = f"""
.. |ROCM_VERSION| replace:: {ROCM_VERSION}
"""
suppress_warnings = ["autosectionlabel.*"]
# html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "https://rocm-stg.amd.com/")

Binary file not shown.

Before

Width:  |  Height:  |  Size: 586 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 167 KiB

View File

Before

Width:  |  Height:  |  Size: 47 KiB

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 778 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 171 KiB

View File

@@ -1,16 +1,47 @@
dockers:
- pull_tag: rocm/jax-training:maxtext-v25.9
- pull_tag: rocm/jax-training:maxtext-v25.7-jax060
docker_hub_url: https://hub.docker.com/layers/rocm/jax-training/maxtext-v25.7/images/sha256-45f4c727d4019a63fc47313d3a5f5a5105569539294ddfd2d742218212ae9025
components:
ROCm: 7.0.0
JAX: 0.6.2
Python: 3.10.18
Transformer Engine: 2.2.0.dev0+c91bac54
ROCm: 6.4.1
JAX: 0.6.0
Python: 3.10.12
Transformer Engine: 2.1.0+90d703dd
hipBLASLt: 1.1.0-499ece1c21
- pull_tag: rocm/jax-training:maxtext-v25.7
docker_hub_url: https://hub.docker.com/layers/rocm/jax-training/maxtext-v25.7/images/sha256-45f4c727d4019a63fc47313d3a5f5a5105569539294ddfd2d742218212ae9025
components:
ROCm: 6.4.1
JAX: 0.5.0
Python: 3.10.12
Transformer Engine: 2.1.0+90d703dd
hipBLASLt: 1.x.x
model_groups:
- group: Meta Llama
tag: llama
models:
- model: Llama 3.3 70B
mad_tag: jax_maxtext_train_llama-3.3-70b
model_repo: Llama-3.3-70B
precision: bf16
doc_options: ["single-node"]
- model: Llama 3.1 8B
mad_tag: jax_maxtext_train_llama-3.1-8b
model_repo: Llama-3.1-8B
precision: bf16
doc_options: ["single-node"]
- model: Llama 3.1 70B
mad_tag: jax_maxtext_train_llama-3.1-70b
model_repo: Llama-3.1-70B
precision: bf16
doc_options: ["single-node"]
- model: Llama 3 8B
mad_tag: jax_maxtext_train_llama-3-8b
multinode_training_script: llama3_8b_multinode.sh
doc_options: ["multi-node"]
- model: Llama 3 70B
mad_tag: jax_maxtext_train_llama-3-70b
multinode_training_script: llama3_70b_multinode.sh
doc_options: ["multi-node"]
- model: Llama 2 7B
mad_tag: jax_maxtext_train_llama-2-7b
model_repo: Llama-2-7B
@@ -23,29 +54,6 @@ model_groups:
precision: bf16
multinode_training_script: llama2_70b_multinode.sh
doc_options: ["single-node", "multi-node"]
- model: Llama 3 8B (multi-node)
mad_tag: jax_maxtext_train_llama-3-8b
multinode_training_script: llama3_8b_multinode.sh
doc_options: ["multi-node"]
- model: Llama 3 70B (multi-node)
mad_tag: jax_maxtext_train_llama-3-70b
multinode_training_script: llama3_70b_multinode.sh
doc_options: ["multi-node"]
- model: Llama 3.1 8B
mad_tag: jax_maxtext_train_llama-3.1-8b
model_repo: Llama-3.1-8B
precision: bf16
doc_options: ["single-node"]
- model: Llama 3.1 70B
mad_tag: jax_maxtext_train_llama-3.1-70b
model_repo: Llama-3.1-70B
precision: bf16
doc_options: ["single-node"]
- model: Llama 3.3 70B
mad_tag: jax_maxtext_train_llama-3.3-70b
model_repo: Llama-3.3-70B
precision: bf16
doc_options: ["single-node"]
- group: DeepSeek
tag: deepseek
models:

View File

@@ -1,21 +1,14 @@
dockers:
MI355X and MI350X:
pull_tag: rocm/megatron-lm:v25.9_gfx950
docker_hub_url: https://hub.docker.com/layers/rocm/megatron-lm/v25.9_gfx950/images/sha256-1a198be32f49efd66d0ff82066b44bd99b3e6b04c8e0e9b36b2c481e13bff7b6
components: &docker_components
ROCm: 7.0.0
Primus: aab4234
PyTorch: 2.9.0.dev20250821+rocm7.0.0.lw.git125803b7
- pull_tag: rocm/megatron-lm:v25.8_py310
docker_hub_url: https://hub.docker.com/layers/rocm/megatron-lm/v25.8_py310/images/sha256-50fc824361054e445e86d5d88d5f58817f61f8ec83ad4a7e43ea38bbc4a142c0
components:
ROCm: 6.4.3
PyTorch: 2.8.0a0+gitd06a406
Python: "3.10"
Transformer Engine: 2.2.0.dev0+54dd2bdc
Flash Attention: 2.8.3
hipBLASLt: 911283acd1
Triton: 3.4.0+rocm7.0.0.git56765e8c
RCCL: 2.26.6
MI325X and MI300X:
pull_tag: rocm/megatron-lm:v25.9_gfx942
docker_hub_url: https://hub.docker.com/layers/rocm/megatron-lm/v25.9_gfx942/images/sha256-df6ab8f45b4b9ceb100fb24e19b2019a364e351ee3b324dbe54466a1d67f8357
components: *docker_components
hipBLASLt: d1b517fc7a
Triton: 3.3.0
RCCL: 2.22.3
model_groups:
- group: Meta Llama
tag: llama
@@ -26,6 +19,8 @@ model_groups:
mad_tag: pyt_megatron_lm_train_llama-3.1-8b
- model: Llama 3.1 70B
mad_tag: pyt_megatron_lm_train_llama-3.1-70b
- model: Llama 3.1 70B (proxy)
mad_tag: pyt_megatron_lm_train_llama-3.1-70b-proxy
- model: Llama 2 7B
mad_tag: pyt_megatron_lm_train_llama-2-7b
- model: Llama 2 70B

View File

@@ -1,72 +0,0 @@
dockers:
- pull_tag: rocm/jax-training:maxtext-v25.7-jax060
docker_hub_url: https://hub.docker.com/layers/rocm/jax-training/maxtext-v25.7/images/sha256-45f4c727d4019a63fc47313d3a5f5a5105569539294ddfd2d742218212ae9025
components:
ROCm: 6.4.1
JAX: 0.6.0
Python: 3.10.12
Transformer Engine: 2.1.0+90d703dd
hipBLASLt: 1.1.0-499ece1c21
- pull_tag: rocm/jax-training:maxtext-v25.7
docker_hub_url: https://hub.docker.com/layers/rocm/jax-training/maxtext-v25.7/images/sha256-45f4c727d4019a63fc47313d3a5f5a5105569539294ddfd2d742218212ae9025
components:
ROCm: 6.4.1
JAX: 0.5.0
Python: 3.10.12
Transformer Engine: 2.1.0+90d703dd
hipBLASLt: 1.x.x
model_groups:
- group: Meta Llama
tag: llama
models:
- model: Llama 3.3 70B
mad_tag: jax_maxtext_train_llama-3.3-70b
model_repo: Llama-3.3-70B
precision: bf16
doc_options: ["single-node"]
- model: Llama 3.1 8B
mad_tag: jax_maxtext_train_llama-3.1-8b
model_repo: Llama-3.1-8B
precision: bf16
doc_options: ["single-node"]
- model: Llama 3.1 70B
mad_tag: jax_maxtext_train_llama-3.1-70b
model_repo: Llama-3.1-70B
precision: bf16
doc_options: ["single-node"]
- model: Llama 3 8B
mad_tag: jax_maxtext_train_llama-3-8b
multinode_training_script: llama3_8b_multinode.sh
doc_options: ["multi-node"]
- model: Llama 3 70B
mad_tag: jax_maxtext_train_llama-3-70b
multinode_training_script: llama3_70b_multinode.sh
doc_options: ["multi-node"]
- model: Llama 2 7B
mad_tag: jax_maxtext_train_llama-2-7b
model_repo: Llama-2-7B
precision: bf16
multinode_training_script: llama2_7b_multinode.sh
doc_options: ["single-node", "multi-node"]
- model: Llama 2 70B
mad_tag: jax_maxtext_train_llama-2-70b
model_repo: Llama-2-70B
precision: bf16
multinode_training_script: llama2_70b_multinode.sh
doc_options: ["single-node", "multi-node"]
- group: DeepSeek
tag: deepseek
models:
- model: DeepSeek-V2-Lite (16B)
mad_tag: jax_maxtext_train_deepseek-v2-lite-16b
model_repo: DeepSeek-V2-lite
precision: bf16
doc_options: ["single-node"]
- group: Mistral AI
tag: mistral
models:
- model: Mixtral 8x7B
mad_tag: jax_maxtext_train_mixtral-8x7b
model_repo: Mixtral-8x7B
precision: bf16
doc_options: ["single-node"]

View File

@@ -1,48 +0,0 @@
dockers:
- pull_tag: rocm/megatron-lm:v25.8_py310
docker_hub_url: https://hub.docker.com/layers/rocm/megatron-lm/v25.8_py310/images/sha256-50fc824361054e445e86d5d88d5f58817f61f8ec83ad4a7e43ea38bbc4a142c0
components:
ROCm: 6.4.3
PyTorch: 2.8.0a0+gitd06a406
Python: "3.10"
Transformer Engine: 2.2.0.dev0+54dd2bdc
hipBLASLt: d1b517fc7a
Triton: 3.3.0
RCCL: 2.22.3
model_groups:
- group: Meta Llama
tag: llama
models:
- model: Llama 3.3 70B
mad_tag: pyt_megatron_lm_train_llama-3.3-70b
- model: Llama 3.1 8B
mad_tag: pyt_megatron_lm_train_llama-3.1-8b
- model: Llama 3.1 70B
mad_tag: pyt_megatron_lm_train_llama-3.1-70b
- model: Llama 3.1 70B (proxy)
mad_tag: pyt_megatron_lm_train_llama-3.1-70b-proxy
- model: Llama 2 7B
mad_tag: pyt_megatron_lm_train_llama-2-7b
- model: Llama 2 70B
mad_tag: pyt_megatron_lm_train_llama-2-70b
- group: DeepSeek
tag: deepseek
models:
- model: DeepSeek-V3 (proxy)
mad_tag: pyt_megatron_lm_train_deepseek-v3-proxy
- model: DeepSeek-V2-Lite
mad_tag: pyt_megatron_lm_train_deepseek-v2-lite-16b
- group: Mistral AI
tag: mistral
models:
- model: Mixtral 8x7B
mad_tag: pyt_megatron_lm_train_mixtral-8x7b
- model: Mixtral 8x22B (proxy)
mad_tag: pyt_megatron_lm_train_mixtral-8x22b-proxy
- group: Qwen
tag: qwen
models:
- model: Qwen 2.5 7B
mad_tag: pyt_megatron_lm_train_qwen2.5-7b
- model: Qwen 2.5 72B
mad_tag: pyt_megatron_lm_train_qwen2.5-72b

View File

@@ -1,58 +0,0 @@
dockers:
- pull_tag: rocm/megatron-lm:v25.8_py310
docker_hub_url: https://hub.docker.com/layers/rocm/megatron-lm/v25.8_py310/images/sha256-50fc824361054e445e86d5d88d5f58817f61f8ec83ad4a7e43ea38bbc4a142c0
components:
ROCm: 6.4.3
Primus: 927a717
PyTorch: 2.8.0a0+gitd06a406
Python: "3.10"
Transformer Engine: 2.2.0.dev0+54dd2bdc
hipBLASLt: d1b517fc7a
Triton: 3.3.0
RCCL: 2.22.3
model_groups:
- group: Meta Llama
tag: llama
models:
- model: Llama 3.3 70B
mad_tag: primus_pyt_megatron_lm_train_llama-3.3-70b
config_name: llama3.3_70B-pretrain.yaml
- model: Llama 3.1 70B
mad_tag: primus_pyt_megatron_lm_train_llama-3.1-70b
config_name: llama3.1_70B-pretrain.yaml
- model: Llama 3.1 8B
mad_tag: primus_pyt_megatron_lm_train_llama-3.1-8b
config_name: llama3.1_8B-pretrain.yaml
- model: Llama 2 7B
mad_tag: primus_pyt_megatron_lm_train_llama-2-7b
config_name: llama2_7B-pretrain.yaml
- model: Llama 2 70B
mad_tag: primus_pyt_megatron_lm_train_llama-2-70b
config_name: llama2_70B-pretrain.yaml
- group: DeepSeek
tag: deepseek
models:
- model: DeepSeek-V3 (proxy)
mad_tag: primus_pyt_megatron_lm_train_deepseek-v3-proxy
config_name: deepseek_v3-pretrain.yaml
- model: DeepSeek-V2-Lite
mad_tag: primus_pyt_megatron_lm_train_deepseek-v2-lite-16b
config_name: deepseek_v2_lite-pretrain.yaml
- group: Mistral AI
tag: mistral
models:
- model: Mixtral 8x7B
mad_tag: primus_pyt_megatron_lm_train_mixtral-8x7b
config_name: mixtral_8x7B_v0.1-pretrain.yaml
- model: Mixtral 8x22B (proxy)
mad_tag: primus_pyt_megatron_lm_train_mixtral-8x22b-proxy
config_name: mixtral_8x22B_v0.1-pretrain.yaml
- group: Qwen
tag: qwen
models:
- model: Qwen 2.5 7B
mad_tag: primus_pyt_megatron_lm_train_qwen2.5-7b
config_name: primus_qwen2.5_7B-pretrain.yaml
- model: Qwen 2.5 72B
mad_tag: primus_pyt_megatron_lm_train_qwen2.5-72b
config_name: qwen2.5_72B-pretrain.yaml

View File

@@ -1,24 +0,0 @@
dockers:
- pull_tag: rocm/pytorch-training:v25.8
docker_hub_url: https://hub.docker.com/layers/rocm/pytorch-training/v25.8/images/sha256-5082ae01d73fec6972b0d84e5dad78c0926820dcf3c19f301d6c8eb892e573c5
components:
ROCm: 6.4.3
PyTorch: 2.8.0a0+gitd06a406
Python: 3.10.18
Transformer Engine: 2.2.0.dev0+a1e66aae
Flash Attention: 3.0.0.post1
hipBLASLt: 1.1.0-d1b517fc7a
model_groups:
- group: Meta Llama
tag: llama
models:
- model: Llama 3.1 8B
mad_tag: primus_pyt_train_llama-3.1-8b
model_repo: Llama-3.1-8B
url: https://huggingface.co/meta-llama/Llama-3.1-8B
precision: BF16
- model: Llama 3.1 70B
mad_tag: primus_pyt_train_llama-3.1-70b
model_repo: Llama-3.1-70B
url: https://huggingface.co/meta-llama/Llama-3.1-70B
precision: BF16

View File

@@ -1,178 +0,0 @@
dockers:
- pull_tag: rocm/pytorch-training:v25.8
docker_hub_url: https://hub.docker.com/layers/rocm/pytorch-training/v25.8/images/sha256-5082ae01d73fec6972b0d84e5dad78c0926820dcf3c19f301d6c8eb892e573c5
components:
ROCm: 6.4.3
PyTorch: 2.8.0a0+gitd06a406
Python: 3.10.18
Transformer Engine: 2.2.0.dev0+a1e66aae
Flash Attention: 3.0.0.post1
hipBLASLt: 1.1.0-d1b517fc7a
model_groups:
- group: Meta Llama
tag: llama
models:
- model: Llama 4 Scout 17B-16E
mad_tag: pyt_train_llama-4-scout-17b-16e
model_repo: Llama-4-17B_16E
url: https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E
precision: BF16
training_modes: [finetune_fw, finetune_lora]
- model: Llama 3.3 70B
mad_tag: pyt_train_llama-3.3-70b
model_repo: Llama-3.3-70B
url: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct
precision: BF16
training_modes: [finetune_fw, finetune_lora, finetune_qlora]
- model: Llama 3.2 1B
mad_tag: pyt_train_llama-3.2-1b
model_repo: Llama-3.2-1B
url: https://huggingface.co/meta-llama/Llama-3.2-1B
precision: BF16
training_modes: [finetune_fw, finetune_lora]
- model: Llama 3.2 3B
mad_tag: pyt_train_llama-3.2-3b
model_repo: Llama-3.2-3B
url: https://huggingface.co/meta-llama/Llama-3.2-3B
precision: BF16
training_modes: [finetune_fw, finetune_lora]
- model: Llama 3.2 Vision 11B
mad_tag: pyt_train_llama-3.2-vision-11b
model_repo: Llama-3.2-Vision-11B
url: https://huggingface.co/meta-llama/Llama-3.2-11B-Vision
precision: BF16
training_modes: [finetune_fw]
- model: Llama 3.2 Vision 90B
mad_tag: pyt_train_llama-3.2-vision-90b
model_repo: Llama-3.2-Vision-90B
url: https://huggingface.co/meta-llama/Llama-3.2-90B-Vision
precision: BF16
training_modes: [finetune_fw]
- model: Llama 3.1 8B
mad_tag: pyt_train_llama-3.1-8b
model_repo: Llama-3.1-8B
url: https://huggingface.co/meta-llama/Llama-3.1-8B
precision: BF16
training_modes: [pretrain, finetune_fw, finetune_lora, HF_pretrain]
- model: Llama 3.1 70B
mad_tag: pyt_train_llama-3.1-70b
model_repo: Llama-3.1-70B
url: https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct
precision: BF16
training_modes: [pretrain, finetune_fw, finetune_lora]
- model: Llama 3.1 405B
mad_tag: pyt_train_llama-3.1-405b
model_repo: Llama-3.1-405B
url: https://huggingface.co/meta-llama/Llama-3.1-405B
precision: BF16
training_modes: [finetune_qlora]
- model: Llama 3 8B
mad_tag: pyt_train_llama-3-8b
model_repo: Llama-3-8B
url: https://huggingface.co/meta-llama/Meta-Llama-3-8B
precision: BF16
training_modes: [finetune_fw, finetune_lora]
- model: Llama 3 70B
mad_tag: pyt_train_llama-3-70b
model_repo: Llama-3-70B
url: https://huggingface.co/meta-llama/Meta-Llama-3-70B
precision: BF16
training_modes: [finetune_fw, finetune_lora]
- model: Llama 2 7B
mad_tag: pyt_train_llama-2-7b
model_repo: Llama-2-7B
url: https://github.com/meta-llama/llama-models/tree/main/models/llama2
precision: BF16
training_modes: [finetune_fw, finetune_lora, finetune_qlora]
- model: Llama 2 13B
mad_tag: pyt_train_llama-2-13b
model_repo: Llama-2-13B
url: https://github.com/meta-llama/llama-models/tree/main/models/llama2
precision: BF16
training_modes: [finetune_fw, finetune_lora]
- model: Llama 2 70B
mad_tag: pyt_train_llama-2-70b
model_repo: Llama-2-70B
url: https://github.com/meta-llama/llama-models/tree/main/models/llama2
precision: BF16
training_modes: [finetune_lora, finetune_qlora]
- group: OpenAI
tag: openai
models:
- model: GPT OSS 20B
mad_tag: pyt_train_gpt_oss_20b
model_repo: GPT-OSS-20B
url: https://huggingface.co/openai/gpt-oss-20b
precision: BF16
training_modes: [HF_finetune_lora]
- model: GPT OSS 120B
mad_tag: pyt_train_gpt_oss_120b
model_repo: GPT-OSS-120B
url: https://huggingface.co/openai/gpt-oss-120b
precision: BF16
training_modes: [HF_finetune_lora]
- group: Qwen
tag: qwen
models:
- model: Qwen 3 8B
mad_tag: pyt_train_qwen3-8b
model_repo: Qwen3-8B
url: https://huggingface.co/Qwen/Qwen3-8B
precision: BF16
training_modes: [finetune_fw, finetune_lora]
- model: Qwen 3 32B
mad_tag: pyt_train_qwen3-32b
model_repo: Qwen3-32
url: https://huggingface.co/Qwen/Qwen3-32B
precision: BF16
training_modes: [finetune_lora]
- model: Qwen 2.5 32B
mad_tag: pyt_train_qwen2.5-32b
model_repo: Qwen2.5-32B
url: https://huggingface.co/Qwen/Qwen2.5-32B
precision: BF16
training_modes: [finetune_lora]
- model: Qwen 2.5 72B
mad_tag: pyt_train_qwen2.5-72b
model_repo: Qwen2.5-72B
url: https://huggingface.co/Qwen/Qwen2.5-72B
precision: BF16
training_modes: [finetune_lora]
- model: Qwen 2 1.5B
mad_tag: pyt_train_qwen2-1.5b
model_repo: Qwen2-1.5B
url: https://huggingface.co/Qwen/Qwen2-1.5B
precision: BF16
training_modes: [finetune_fw, finetune_lora]
- model: Qwen 2 7B
mad_tag: pyt_train_qwen2-7b
model_repo: Qwen2-7B
url: https://huggingface.co/Qwen/Qwen2-7B
precision: BF16
training_modes: [finetune_fw, finetune_lora]
- group: Stable Diffusion
tag: sd
models:
- model: Stable Diffusion XL
mad_tag: pyt_huggingface_stable_diffusion_xl_2k_lora_finetuning
model_repo: SDXL
url: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
precision: BF16
training_modes: [finetune_lora]
- group: Flux
tag: flux
models:
- model: FLUX.1-dev
mad_tag: pyt_train_flux
model_repo: Flux
url: https://huggingface.co/black-forest-labs/FLUX.1-dev
precision: BF16
training_modes: [pretrain]
- group: NCF
tag: ncf
models:
- model: NCF
mad_tag: pyt_ncf_training
model_repo:
url: https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Recommendation/NCF
precision: FP32

View File

@@ -1,22 +1,15 @@
dockers:
MI355X and MI350X:
pull_tag: rocm/primus:v25.9_gfx950
docker_hub_url: https://hub.docker.com/layers/rocm/primus/v25.9_gfx950/images/sha256-1a198be32f49efd66d0ff82066b44bd99b3e6b04c8e0e9b36b2c481e13bff7b6
components: &docker_components
ROCm: 7.0.0
Primus: 0.3.0
Primus Turbo: 0.1.1
PyTorch: 2.9.0.dev20250821+rocm7.0.0.lw.git125803b7
- pull_tag: rocm/megatron-lm:v25.8_py310
docker_hub_url: https://hub.docker.com/layers/rocm/megatron-lm/v25.8_py310/images/sha256-50fc824361054e445e86d5d88d5f58817f61f8ec83ad4a7e43ea38bbc4a142c0
components:
ROCm: 6.4.3
Primus: 927a717
PyTorch: 2.8.0a0+gitd06a406
Python: "3.10"
Transformer Engine: 2.2.0.dev0+54dd2bdc
Flash Attention: 2.8.3
hipBLASLt: 911283acd1
Triton: 3.4.0+rocm7.0.0.git56765e8c
RCCL: 2.26.6
MI325X and MI300X:
pull_tag: rocm/primus:v25.9_gfx942
docker_hub_url: https://hub.docker.com/layers/rocm/primus/v25.9_gfx942/images/sha256-df6ab8f45b4b9ceb100fb24e19b2019a364e351ee3b324dbe54466a1d67f8357
components: *docker_components
hipBLASLt: d1b517fc7a
Triton: 3.3.0
RCCL: 2.22.3
model_groups:
- group: Meta Llama
tag: llama

View File

@@ -1,39 +1,24 @@
dockers:
MI355X and MI350X:
pull_tag: rocm/primus:v25.9_gfx950
docker_hub_url: https://hub.docker.com/layers/rocm/primus/v25.9_gfx950/images/sha256-1a198be32f49efd66d0ff82066b44bd99b3e6b04c8e0e9b36b2c481e13bff7b6
components: &docker_components
ROCm: 7.0.0
Primus: 0.3.0
Primus Turbo: 0.1.1
PyTorch: 2.9.0.dev20250821+rocm7.0.0.lw.git125803b7
Python: "3.10"
Transformer Engine: 2.2.0.dev0+54dd2bdc
Flash Attention: 2.8.3
hipBLASLt: 911283acd1
Triton: 3.4.0+rocm7.0.0.git56765e8c
RCCL: 2.26.6
MI325X and MI300X:
pull_tag: rocm/primus:v25.9_gfx942
docker_hub_url: https://hub.docker.com/layers/rocm/primus/v25.9_gfx942/images/sha256-df6ab8f45b4b9ceb100fb24e19b2019a364e351ee3b324dbe54466a1d67f8357
components: *docker_components
- pull_tag: rocm/pytorch-training:v25.8
docker_hub_url: https://hub.docker.com/layers/rocm/pytorch-training/v25.8/images/sha256-5082ae01d73fec6972b0d84e5dad78c0926820dcf3c19f301d6c8eb892e573c5
components:
ROCm: 6.4.3
PyTorch: 2.8.0a0+gitd06a406
Python: 3.10.18
Transformer Engine: 2.2.0.dev0+a1e66aae
Flash Attention: 3.0.0.post1
hipBLASLt: 1.1.0-d1b517fc7a
model_groups:
- group: Meta Llama
tag: llama
models:
- model: Llama 3.1 8B
mad_tag: primus_pyt_train_llama-3.1-8b
model_repo: meta-llama/Llama-3.1-8B
model_repo: Llama-3.1-8B
url: https://huggingface.co/meta-llama/Llama-3.1-8B
precision: BF16
config_file:
bf16: "./llama3_8b_fsdp_bf16.toml"
fp8: "./llama3_8b_fsdp_fp8.toml"
- model: Llama 3.1 70B
mad_tag: primus_pyt_train_llama-3.1-70b
model_repo: meta-llama/Llama-3.1-70B
model_repo: Llama-3.1-70B
url: https://huggingface.co/meta-llama/Llama-3.1-70B
precision: BF16
config_file:
bf16: "./llama3_70b_fsdp_bf16.toml"
fp8: "./llama3_70b_fsdp_fp8.toml"

View File

@@ -1,21 +1,13 @@
dockers:
MI355X and MI350X:
pull_tag: rocm/pytorch-training:v25.9_gfx950
docker_hub_url: https://hub.docker.com/layers/rocm/pytorch-training/v25.9_gfx950/images/sha256-1a198be32f49efd66d0ff82066b44bd99b3e6b04c8e0e9b36b2c481e13bff7b6
components: &docker_components
ROCm: 7.0.0
Primus: aab4234
PyTorch: 2.9.0.dev20250821+rocm7.0.0.lw.git125803b7
Python: "3.10"
Transformer Engine: 2.2.0.dev0+54dd2bdc
Flash Attention: 2.8.3
hipBLASLt: 911283acd1
Triton: 3.4.0+rocm7.0.0.git56765e8c
RCCL: 2.26.6
MI325X and MI300X:
pull_tag: rocm/pytorch-training:v25.9_gfx942
docker_hub_url: https://hub.docker.com/layers/rocm/pytorch-training/v25.9_gfx942/images/sha256-df6ab8f45b4b9ceb100fb24e19b2019a364e351ee3b324dbe54466a1d67f8357
components: *docker_components
- pull_tag: rocm/pytorch-training:v25.8
docker_hub_url: https://hub.docker.com/layers/rocm/pytorch-training/v25.8/images/sha256-5082ae01d73fec6972b0d84e5dad78c0926820dcf3c19f301d6c8eb892e573c5
components:
ROCm: 6.4.3
PyTorch: 2.8.0a0+gitd06a406
Python: 3.10.18
Transformer Engine: 2.2.0.dev0+a1e66aae
Flash Attention: 3.0.0.post1
hipBLASLt: 1.1.0-d1b517fc7a
model_groups:
- group: Meta Llama
tag: llama
@@ -166,7 +158,7 @@ model_groups:
model_repo: SDXL
url: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
precision: BF16
training_modes: [posttrain-p]
training_modes: [finetune_lora]
- group: Flux
tag: flux
models:
@@ -175,7 +167,7 @@ model_groups:
model_repo: Flux
url: https://huggingface.co/black-forest-labs/FLUX.1-dev
precision: BF16
training_modes: [posttrain-p]
training_modes: [pretrain]
- group: NCF
tag: ncf
models:

View File

@@ -1,4 +1,4 @@
Atomic,MI100,MI200 PCIe,MI200 A+A,MI300X Series,MI300A,MI350X Series
Atomic,MI100,MI200 PCIe,MI200 A+A,MI300X series,MI300A,MI350X series
32 bit atomicAdd,✅ CAS,✅ CAS,✅ CAS,✅ CAS,✅ CAS,✅ CAS
32 bit atomicSub,✅ CAS,✅ CAS,✅ CAS,✅ CAS,✅ CAS,✅ CAS
32 bit atomicMin,✅ CAS,✅ CAS,✅ CAS,✅ CAS,✅ CAS,✅ CAS
1 Atomic MI100 MI200 PCIe MI200 A+A MI300X Series MI300X series MI300A MI350X Series MI350X series
2 32 bit atomicAdd ✅ CAS ✅ CAS ✅ CAS ✅ CAS ✅ CAS ✅ CAS
3 32 bit atomicSub ✅ CAS ✅ CAS ✅ CAS ✅ CAS ✅ CAS ✅ CAS
4 32 bit atomicMin ✅ CAS ✅ CAS ✅ CAS ✅ CAS ✅ CAS ✅ CAS

View File

@@ -1,4 +1,4 @@
Atomic,MI100,MI200 PCIe,MI200 A+A,MI300X Series,MI300A,MI350X Series
Atomic,MI100,MI200 PCIe,MI200 A+A,MI300X series,MI300A,MI350X series
32 bit atomicAdd,✅ CAS,✅ CAS,✅ CAS,✅ CAS,✅ CAS,✅ CAS
32 bit atomicSub,✅ CAS,✅ CAS,✅ CAS,✅ CAS,✅ CAS,✅ CAS
32 bit atomicMin,✅ CAS,✅ CAS,✅ CAS,✅ CAS,✅ CAS,✅ CAS
1 Atomic MI100 MI200 PCIe MI200 A+A MI300X Series MI300X series MI300A MI350X Series MI350X series
2 32 bit atomicAdd ✅ CAS ✅ CAS ✅ CAS ✅ CAS ✅ CAS ✅ CAS
3 32 bit atomicSub ✅ CAS ✅ CAS ✅ CAS ✅ CAS ✅ CAS ✅ CAS
4 32 bit atomicMin ✅ CAS ✅ CAS ✅ CAS ✅ CAS ✅ CAS ✅ CAS

View File

@@ -1,4 +1,4 @@
Atomic,MI100,MI200 PCIe,MI200 A+A,MI300X Series,MI300A,MI350X Series
Atomic,MI100,MI200 PCIe,MI200 A+A,MI300X series,MI300A,MI350X series
32 bit atomicAdd,✅ Native,✅ Native,✅ Native,✅ Native,✅ Native,✅ Native
32 bit atomicSub,✅ Native,✅ Native,✅ Native,✅ Native,✅ Native,✅ Native
32 bit atomicMin,✅ Native,✅ Native,✅ Native,✅ Native,✅ Native,✅ Native
1 Atomic MI100 MI200 PCIe MI200 A+A MI300X Series MI300X series MI300A MI350X Series MI350X series
2 32 bit atomicAdd ✅ Native ✅ Native ✅ Native ✅ Native ✅ Native ✅ Native
3 32 bit atomicSub ✅ Native ✅ Native ✅ Native ✅ Native ✅ Native ✅ Native
4 32 bit atomicMin ✅ Native ✅ Native ✅ Native ✅ Native ✅ Native ✅ Native

View File

@@ -1,4 +1,4 @@
Atomic,MI100,MI200 PCIe,MI200 A+A,MI300X Series,MI300A,MI350X Series
Atomic,MI100,MI200 PCIe,MI200 A+A,MI300X series,MI300A,MI350X series
32 bit atomicAdd,✅ Native,✅ Native,✅ Native,✅ Native,✅ Native,✅ Native
32 bit atomicSub,✅ Native,✅ Native,✅ Native,✅ Native,✅ Native,✅ Native
32 bit atomicMin,✅ Native,✅ Native,✅ Native,✅ Native,✅ Native,✅ Native
1 Atomic MI100 MI200 PCIe MI200 A+A MI300X Series MI300X series MI300A MI350X Series MI350X series
2 32 bit atomicAdd ✅ Native ✅ Native ✅ Native ✅ Native ✅ Native ✅ Native
3 32 bit atomicSub ✅ Native ✅ Native ✅ Native ✅ Native ✅ Native ✅ Native
4 32 bit atomicMin ✅ Native ✅ Native ✅ Native ✅ Native ✅ Native ✅ Native

Binary file not shown.

Before

Width:  |  Height:  |  Size: 108 KiB

After

Width:  |  Height:  |  Size: 1.2 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 340 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 346 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 350 KiB

View File

@@ -0,0 +1,289 @@
from pathlib import Path
from docutils import nodes
from docutils.parsers.rst import directives
from sphinx.util.docutils import SphinxDirective
from .utils import kv_to_data_attr, logger
class CustomTable(nodes.General, nodes.Element):
"""Bootstrap-flavoured table container."""
@staticmethod
def visit_html(translator, node):
show_when_attr = kv_to_data_attr("show-when", node.get("show-when", ""))
classes = ["rocm-docs-table", "table"]
classes.extend(node.get("classes", []))
class_attr = " ".join(classes)
table_id = node.get("id") or ""
attrs = []
if table_id:
attrs.append(f'id="{table_id}"')
attrs.append(f'class="{class_attr}"')
if show_when_attr:
attrs.append(show_when_attr)
attrs_str = " ".join(attrs)
translator.body.append(f"<!-- start custom-table --><table {attrs_str}>")
caption = node.get("caption", "")
if caption:
translator.body.append(f"<caption>{caption}</caption>")
translator._in_matrix_body = False # internal state flag
@staticmethod
def depart_html(translator, node):
# Close an open <tbody> if present
if getattr(translator, "_in_matrix_body", False):
translator.body.append("</tbody>")
translator._in_matrix_body = False
translator.body.append("</table><!-- end custom-table -->")
class CustomTableDirective(SphinxDirective):
""".. matrix:: Optional caption"""
required_arguments = 0
optional_arguments = 1
final_argument_whitespace = True
has_content = True
option_spec = {
"id": directives.unchanged,
"class": directives.class_option,
"show-when": directives.unchanged,
}
def run(self):
node = CustomTable()
node["caption"] = self.arguments[0] if self.arguments else ""
node["id"] = self.options.get("id", "")
node["classes"] = self.options.get("class", [])
node["show-when"] = self.options.get("show-when", "")
self.state.nested_parse(self.content, self.content_offset, node)
return [node]
class CustomTableHead(nodes.General, nodes.Element):
"""A table header section (renders <thead>).</thead>"""
@staticmethod
def visit_html(translator, node):
translator.body.append("<!-- start table head --><thead>")
@staticmethod
def depart_html(translator, node):
translator.body.append("</thead><!-- end table head -->")
class CustomTableHeadDirective(SphinxDirective):
""".. matrix-head::"""
has_content = True
def run(self):
node = CustomTableHead()
self.state.nested_parse(self.content, self.content_offset, node)
return [node]
class CustomTableRow(nodes.General, nodes.Element):
"""A table row (<tr> inside <thead> or <tbody>)."""
@staticmethod
def visit_html(translator, node):
# handle automatic <tbody> opening for body rows
if not node.get("header-row", False):
if not getattr(translator, "_in_matrix_body", False):
translator.body.append("<!-- start tbody --><tbody>")
translator._in_matrix_body = True
show_when_attr = kv_to_data_attr("show-when", node.get("show-when", ""))
disable_when_attr = kv_to_data_attr(
"disable-when", node.get("disable-when", "")
)
classes = " ".join(node.get("classes", []))
attrs = []
if classes:
attrs.append(f'class="{classes}"')
if show_when_attr:
attrs.append(show_when_attr)
if disable_when_attr:
attrs.append(disable_when_attr)
attrs_str = "" if not attrs else " " + " ".join(attrs)
translator.body.append(f"<!-- start custom-table row --><tr{attrs_str}>")
@staticmethod
def depart_html(translator, node):
translator.body.append("</tr><!-- end custom-table row -->")
class CustomTableRowDirective(SphinxDirective):
""".. matrix-row::"""
required_arguments = 0
final_argument_whitespace = True
has_content = True
option_spec = {
"class": directives.class_option,
"show-when": directives.unchanged,
"disable-when": directives.unchanged,
"header": directives.flag,
}
def run(self):
node = CustomTableRow()
node["classes"] = self.options.get("class", [])
node["show-when"] = self.options.get("show-when", "")
node["disable-when"] = self.options.get("disable-when", "")
node["header-row"] = self.options.get("header", False) is not False
# Parse nested cells
self.state.nested_parse(self.content, self.content_offset, node)
# Inherit header status if inside matrix-head
parent_in_head = any(
isinstance(p, CustomTableHead)
for p in self.state.parent.traverse(include_self=True)
)
if parent_in_head:
node["header-row"] = True
# Mark all cells as headers if this is a header row
if node["header-row"]:
for cell in node.findall(CustomTableCell):
if "header" not in cell:
cell["header"] = True
# Sanity check
parent = getattr(self.state, "parent", None)
if not parent or not any(
isinstance(p, (CustomTable, CustomTableHead))
for p in parent.traverse(include_self=True)
):
logger.warning(
"'.. matrix-row::' at line %s should be nested under a '.. matrix::'.",
self.lineno,
location=(self.env.docname, self.lineno),
)
return [node]
class CustomTableCell(nodes.General, nodes.Element):
"""A table cell (<td> or <th>)."""
@staticmethod
def visit_html(translator, node):
is_header = bool(node.get("header", False))
tag = "th" if is_header else "td"
classes = " ".join(node.get("classes", []))
colspan = node.get("colspan", 1)
rowspan = node.get("rowspan", 1)
show_when_attr = kv_to_data_attr("show-when", node.get("show-when", ""))
attrs = []
if classes:
attrs.append(f'class="{classes}"')
if colspan and colspan > 1:
attrs.append(f'colspan="{colspan}"')
if rowspan and rowspan > 1:
attrs.append(f'rowspan="{rowspan}"')
if show_when_attr:
attrs.append(show_when_attr)
attrs_str = "" if not attrs else " " + " ".join(attrs)
translator.body.append(f"<{tag}{attrs_str}>")
@staticmethod
def depart_html(translator, node):
tag = "th" if node.get("header", False) else "td"
translator.body.append(f"</{tag}>")
class CustomTableCellDirective(SphinxDirective):
""".. matrix-cell::"""
required_arguments = 0
optional_arguments = 1
final_argument_whitespace = True
has_content = True
option_spec = {
"header": directives.flag,
"class": directives.class_option,
"colspan": directives.nonnegative_int,
"rowspan": directives.nonnegative_int,
"show-when": directives.unchanged,
}
def run(self):
label = self.arguments[0] if self.arguments else ""
node = CustomTableCell()
# Explicit :header: always wins
explicit_header = self.options.get("header", False) is not False
# Detect if the parent row (matrix-row) or one of its ancestors
# (matrix-head) marks this as a header section.
parent_header_row = False
parent_node = getattr(self.state, "parent", None)
if parent_node:
for ancestor in parent_node.traverse(include_self=True):
if isinstance(ancestor, CustomTableRow) and ancestor.get("header-row", False):
parent_header_row = True
break
if isinstance(ancestor, CustomTableHead):
parent_header_row = True
break
node["header"] = explicit_header or parent_header_row
node["classes"] = self.options.get("class", [])
node["colspan"] = self.options.get("colspan", 1)
node["rowspan"] = self.options.get("rowspan", 1)
node["show-when"] = self.options.get("show-when", "")
if self.content:
self.state.nested_parse(self.content, self.content_offset, node)
elif label:
node += nodes.Text(label)
# Sanity check nesting
if not parent_node or not any(
isinstance(p, CustomTableRow)
for p in parent_node.traverse(include_self=True)
):
logger.warning(
"'.. matrix-cell::' at line %s should be nested under a '.. matrix-row::' directive",
self.lineno,
location=(self.env.docname, self.lineno),
)
return [node]
def setup(app):
app.add_node(CustomTable, html=(CustomTable.visit_html, CustomTable.depart_html))
app.add_node(CustomTableHead, html=(CustomTableHead.visit_html, CustomTableHead.depart_html))
app.add_node(CustomTableRow, html=(CustomTableRow.visit_html, CustomTableRow.depart_html))
app.add_node(CustomTableCell, html=(CustomTableCell.visit_html, CustomTableCell.depart_html))
app.add_directive("matrix", CustomTableDirective)
app.add_directive("matrix-head", CustomTableHeadDirective)
app.add_directive("matrix-row", CustomTableRowDirective)
app.add_directive("matrix-cell", CustomTableCellDirective)
static_assets_dir = Path(__file__).parent / "static"
app.config.html_static_path.append(str(static_assets_dir))
app.add_css_file("table.css")
return {
"version": "1.1",
"parallel_read_safe": True,
"parallel_write_safe": True,
}

View File

@@ -1,20 +1,10 @@
from sphinx.util.docutils import SphinxDirective, directives, nodes
from pathlib import Path
from .utils import kv_to_data_attr, normalize_key
import random
import string
from .utils import kv_to_data_attr, normalize_key, logger
class SelectorGroup(nodes.General, nodes.Element):
"""
A row within a selector container.
rST usage:
.. selector-group:: Heading
:key:
:show-when: os=ubuntu (list of key=value pairs separated by spaces)
:heading-width: 4 (defaults to 6)
A row or dropdown within a selector container.
"""
@staticmethod
@@ -23,32 +13,46 @@ class SelectorGroup(nodes.General, nodes.Element):
key = node["key"]
show_when_attr = kv_to_data_attr("show-when", node["show-when"])
heading_width = node["heading-width"]
icon = node["icon"]
list_mode = node.get("list", False)
icon_html = ""
if icon:
icon_html = f'<i class="rocm-docs-selector-icon {icon}"></i>'
# Standard tile mode
info_nodes = list(node.findall(SelectorInfo))
info_link = info_nodes[0]["link"] if info_nodes else None
info_icon = info_nodes[0]["icon"] if info_nodes else None
info_icon_html = ""
if info_link:
info_icon_html = f"""
<a href="{info_link}" target="_blank">
<i class="rocm-docs-selector-icon {info_icon}"></i>
</a>
"""
translator.body.append(
"<!-- start selector-group row -->"
f"""
<div id="{nodes.make_id(label)}" class="rocm-docs-selector-group row gx-0 pt-2"
<div id="{nodes.make_id(label)}"
class="rocm-docs-selector-group row gx-0 pt-2"
data-selector-key="{key}"
{show_when_attr}
role="radiogroup"
{'role="radiogroup"' if list_mode else ""}
aria-label="{label}"
>
<div class="col-{heading_width} me-1 px-2 rocm-docs-selector-group-heading">
<span class="rocm-docs-selector-group-heading-text">{label}{icon_html}</span>
<span class="rocm-docs-selector-group-heading-text">{label}{info_icon_html}</span>
</div>
<div class="row col-{12 - heading_width} pe-0">
{f'<select class="form-select rocm-docs-selector-dropdown-list" aria-label="{label}">' if list_mode else ""}
""".strip()
)
@staticmethod
def depart_html(translator, _):
def depart_html(translator, node):
list_mode = node.get("list", False)
translator.body.append(
"""
f"""
{"</select>" if list_mode else ""}
</div>
</div>
"""
@@ -57,14 +61,14 @@ class SelectorGroup(nodes.General, nodes.Element):
class SelectorGroupDirective(SphinxDirective):
required_arguments = 1 # tile text
required_arguments = 1 # title text
final_argument_whitespace = True
has_content = True
option_spec = {
"key": directives.unchanged,
"show-when": directives.unchanged,
"heading-width": directives.nonnegative_int,
"icon": directives.unchanged,
"list": directives.flag,
}
def run(self):
@@ -74,73 +78,121 @@ class SelectorGroupDirective(SphinxDirective):
node["key"] = normalize_key(self.options.get("key", label))
node["show-when"] = self.options.get("show-when", "")
node["heading-width"] = self.options.get("heading-width", 3)
node["icon"] = self.options.get("icon")
node["list"] = "list" in self.options
# Parse nested content
# Parse nested content (selector-info + selector-option)
self.state.nested_parse(self.content, self.content_offset, node)
# Find all SelectorOption descendants
option_nodes = list(node.findall(SelectorOption))
if option_nodes:
# Set the group key on all options
for opt in option_nodes:
opt["group_key"] = node["key"]
opt["list"] = node["list"]
# Find all options marked as default
# Default marking
default_options = [opt for opt in option_nodes if opt["default"]]
if default_options:
# Multiple options marked :default: - only keep first as default
for i, opt in enumerate(default_options):
if i > 0:
opt["default"] = False
else:
# No explicit default - make first option default
if not default_options:
option_nodes[0]["default"] = True
return [node]
class SelectorInfo(nodes.General, nodes.Element):
"""
Represents an informational icon/link associated with a selector group.
Appears as a clickable icon in the selector group heading.
rST usage:
.. selector:: AMD EPYC Server CPU
:key: cpu
.. selector-info:: https://www.amd.com/en/products/processors/server/epyc.html
:icon: fa-solid fa-circle-info fa-lg
.. selector-option:: EPYC 9005 (5th gen.)
:value: 9005
"""
@staticmethod
def visit_html(translator, node):
# Do nothing — rendering handled by SelectorGroup
pass
@staticmethod
def depart_html(translator, node):
# Do nothing — prevent NotImplementedError
pass
class SelectorInfoDirective(SphinxDirective):
required_arguments = 1 # link URL
final_argument_whitespace = True
has_content = False
option_spec = {"icon": directives.unchanged}
def run(self):
node = SelectorInfo()
node["link"] = self.arguments[0]
node["icon"] = self.options.get("icon", "fa-solid fa-circle-info fa-lg")
parent = getattr(self.state, "parent", None)
if not parent or not any(isinstance(p, SelectorGroup) for p in parent.traverse(include_self=True)):
logger.warning(
f"'.. selector-info::' at line {self.lineno} should be nested under a '.. selector::' directive",
location=(self.env.docname, self.lineno),
)
return [node]
class SelectorOption(nodes.General, nodes.Element):
"""
A selectable tile within a selector group.
rST usage:
.. selector-option::
A selectable tile or list-item option within a selector group.
"""
@staticmethod
def visit_html(translator, node):
label = node["label"]
value = node["value"]
show_when_attr = kv_to_data_attr("show-when", node["show-when"])
disable_when_attr = kv_to_data_attr("disable-when", node["disable-when"])
default = node["default"]
width = node["width"]
group_key = node.get("group_key", "")
list_mode = node.get("list", False)
if list_mode:
selected_attr = " selected" if default else ""
translator.body.append(
f'<option value="{value}"{selected_attr} {show_when_attr} {disable_when_attr}>{label}</option>'
)
return
default_class = "rocm-docs-selector-option-default" if default else ""
translator.body.append(
"<!-- start selector-option tile -->"
f"""
<div class="rocm-docs-selector-option {default_class} col-{width} px-2"
data-selector-key="{group_key}"
data-selector-key="{node.get('group_key', '')}"
data-selector-value="{value}"
{show_when_attr}
{disable_when_attr}
tabindex="0"
role="radio"
aria-checked="false"
>
<span>{label}</span>
""".strip()
)
@staticmethod
def depart_html(translator, node):
list_mode = node.get("list", False)
if list_mode:
return # no closing tag needed for <option>
icon = node["icon"]
if icon:
translator.body.append(f'<i class="rocm-docs-selector-icon {icon}"></i>')
translator.body.append("</div>" "<!-- end selector-option tile -->")
translator.body.append("</div><!-- end selector-option tile -->")
class SelectorOptionDirective(SphinxDirective):
@@ -148,6 +200,7 @@ class SelectorOptionDirective(SphinxDirective):
final_argument_whitespace = True
option_spec = {
"value": directives.unchanged,
"show-when": directives.unchanged,
"disable-when": directives.unchanged,
"default": directives.flag,
"width": directives.nonnegative_int,
@@ -160,22 +213,31 @@ class SelectorOptionDirective(SphinxDirective):
node = SelectorOption()
node["label"] = label
node["value"] = normalize_key(self.options.get("value", label))
# node["show-when"] = self.options.get("show-when", "")
node["show-when"] = self.options.get("show-when", "")
node["disable-when"] = self.options.get("disable-when", "")
node["default"] = self.options.get("default", False) is not False
node["width"] = self.options.get("width", 6)
node["icon"] = self.options.get("icon")
# Content replaces label if provided
if self.content:
self.state.nested_parse(self.content, self.content_offset, node)
else:
node += nodes.Text(label)
# if self.content:
# self.state.nested_parse(self.content, self.content_offset, node)
# else:
# node += nodes.Text(label)
parent = getattr(self.state, "parent", None)
if not parent or not any(isinstance(p, SelectorGroup) for p in parent.traverse(include_self=True)):
logger.warning(
f"'.. selector-option::' at line {self.lineno} should be nested under a '.. selector::' directive",
location=(self.env.docname, self.lineno),
)
return [node]
class SelectedContent(nodes.General, nodes.Element):
"""
A container to hold conditional content.
A container to hold documentation content to be shown conditionally.
rST usage::
@@ -185,20 +247,21 @@ class SelectedContent(nodes.General, nodes.Element):
@staticmethod
def visit_html(translator, node):
show_when_attr = kv_to_data_attr("show-when", node["show-when"])
show_when = node.get("show-when", "")
show_when_attr = kv_to_data_attr("show-when", show_when)
classes = " ".join(node.get("class", []))
heading = node.get("heading", "")
heading_level = node.get("heading-level") or (SelectedContent._get_depth(node) + 1)
heading_level = min(heading_level, 6)
# default to <h2>
heading_level = min(node.get("heading-level") or 2, 6) # maximum depth is <h6>
id_attr = ""
heading_elem = ""
combined_show_when = node.get("combined-show-when", show_when)
if heading:
# HACK to fix secondary sidebar observer
suffix = "".join(random.choices(string.ascii_lowercase + string.digits, k=3))
id_attr = nodes.make_id(f"{heading}-{suffix}")
id_attr = nodes.make_id(f"{heading}-{combined_show_when}")
heading_elem = (
f'<h{heading_level} id="{id_attr}" class="rocm-docs-custom-heading">'
f'<h{heading_level} class="rocm-docs-custom-heading">'
f'{heading}<a class="headerlink" href="#{id_attr}" title="Link to this heading">#</a>'
f'</h{heading_level}>'
)
@@ -206,24 +269,22 @@ class SelectedContent(nodes.General, nodes.Element):
translator.body.append(
f"""
<!-- start selected-content -->
<div class="rocm-docs-selected-content {classes}" {show_when_attr} aria-hidden="true">
<{"section" if heading else "div"}
id="{id_attr}"
class="rocm-docs-selected-content {classes}"
{show_when_attr}
aria-hidden="true">
{heading_elem}
""".strip()
)
@staticmethod
def depart_html(translator, _):
translator.body.append("</div><!-- end selected-content -->")
def depart_html(translator, node):
heading = node.get("heading", "")
@staticmethod
def _get_depth(node):
depth = 1
parent = node.parent
while parent is not None:
if isinstance(parent, SelectedContent) and parent.get("heading"):
depth += 1
parent = getattr(parent, "parent", None)
return depth
translator.body.append(f"""
</{"section" if heading else "div"}>
<!-- end selected-content -->""")
class SelectedContentDirective(SphinxDirective):
@@ -242,9 +303,20 @@ class SelectedContentDirective(SphinxDirective):
node["show-when"] = self.arguments[0]
node["id"] = self.options.get("id", "")
node["class"] = self.options.get("class", "")
node["heading"] = self.options.get("heading", "") # empty string if none
node["heading"] = self.options.get("heading", "")
node["heading-level"] = self.options.get("heading-level", None)
# Collect parent show-whens (if nested)
# to create a completely unique id
parent_show_whens = []
for ancestor in self.state.parent.traverse(include_self=True):
if isinstance(ancestor, SelectedContent) and "show-when" in ancestor:
parent_show_whens.append(ancestor["show-when"])
# Compose combined show-when chain
combined_show_when = "+".join(parent_show_whens + [node["show-when"]])
node["combined-show-when"] = combined_show_when
# Parse nested content
self.state.nested_parse(self.content, self.content_offset, node)
return [node]
@@ -255,6 +327,10 @@ def setup(app):
SelectorGroup,
html=(SelectorGroup.visit_html, SelectorGroup.depart_html),
)
app.add_node(
SelectorInfo,
html=(SelectorInfo.visit_html, SelectorInfo.depart_html),
)
app.add_node(
SelectorOption,
html=(SelectorOption.visit_html, SelectorOption.depart_html),
@@ -265,13 +341,14 @@ def setup(app):
)
app.add_directive("selector", SelectorGroupDirective)
app.add_directive("selector-info", SelectorInfoDirective)
app.add_directive("selector-option", SelectorOptionDirective)
app.add_directive("selected-content", SelectedContentDirective)
app.add_directive("selected", SelectedContentDirective)
static_assets_dir = Path(__file__).parent / "static"
static_assets_dir = Path(__file__).parent / "static" / "selector"
app.config.html_static_path.append(str(static_assets_dir))
app.add_css_file("selector.css")
app.add_js_file("selector.js", type="module", defer="defer")
return {"version": "1.0", "parallel_read_safe": True}
return {"version": "1.1", "parallel_read_safe": True}

View File

@@ -1,300 +0,0 @@
const READY_EVENT = "ROCmDocsSelectorsReady";
const STATE_CHANGE_EVENT = "ROCmDocsSelectorStateChanged";
const GROUP_QUERY = ".rocm-docs-selector-group";
const OPTION_QUERY = ".rocm-docs-selector-option";
const COND_QUERY = "[data-show-when]";
const TOC2_OPTIONS_LIST_QUERY = ".rocm-docs-selector-toc2-options";
const TOC2_CONTENTS_LIST_QUERY = ".rocm-docs-selector-toc2-contents";
const HEADING_QUERY = ".rocm-docs-selected-content h1,h2,h3,h4,h5,h6[id]";
const isDefaultOption = (elem) =>
elem.classList.contains("rocm-docs-selector-option-default");
const DISABLED_CLASS = "rocm-docs-disabled";
const disable = (elem) => {
elem.classList.add(DISABLED_CLASS);
elem.setAttribute("aria-disabled", "true");
elem.setAttribute("tabindex", "-1");
};
// const enable = (elem) => {
// elem.classList.remove(DISABLED_CLASS);
// elem.setAttribute("aria-disabled", "false");
// elem.setAttribute("tabindex", "0");
// };
const HIDDEN_CLASS = "rocm-docs-hidden";
const hide = (elem) => {
elem.classList.add(HIDDEN_CLASS);
elem.setAttribute("aria-hidden", "true");
};
const show = (elem) => {
elem.classList.remove(HIDDEN_CLASS);
elem.setAttribute("aria-hidden", "false");
};
const SELECTED_CLASS = "rocm-docs-selected";
const select = (elem) => {
elem.classList.add(SELECTED_CLASS);
elem.setAttribute("aria-checked", "true");
};
const deselect = (elem) => {
elem.classList.remove(SELECTED_CLASS);
elem.setAttribute("aria-checked", "false");
};
const state = {};
function getState() {
return { ...state };
}
function setState(updates) {
const previousState = getState();
Object.assign(state, updates);
const event = new CustomEvent(STATE_CHANGE_EVENT, {
detail: {
previousState,
currentState: getState(),
changes: updates,
},
});
document.dispatchEvent(event);
}
function validateOptionElem(optionElem) {
const key = optionElem.dataset.selectorKey;
const value = optionElem.dataset.selectorValue;
const errors = [];
if (!key) errors.push("Missing 'data-selector-key'");
if (!value) errors.push("Missing 'data-selector-value'");
if (errors.length === 0) return;
const label = optionElem.textContent.trim() || "<unnamed option>";
console.error(
`[ROCmDocsSelector] Invalid selector option '${label}': ${
errors.join(", ")
}!`,
);
disable(optionElem);
}
function handleOptionSelect(e) {
const option = e.currentTarget;
const parentGroup = option.closest(GROUP_QUERY);
const siblingOptions = parentGroup.querySelectorAll(OPTION_QUERY);
siblingOptions.forEach((elem) => deselect(elem));
select(option);
// Update global state
const key = option.dataset.selectorKey;
const value = option.dataset.selectorValue;
if (key && value) setState({ [key]: value });
updateVisibility();
}
function shouldBeShown(elem) {
const conditionsData = elem.dataset.showWhen;
if (!conditionsData) return true; // Default visible
try {
const conditions = JSON.parse(conditionsData);
// Ensure it's an object
if (typeof conditions !== "object" || Array.isArray(conditions)) {
console.warn(
"[ROCmDocsSelector] Invalid 'show-when' format (must be key/value object):",
conditionsData,
);
return true;
}
for (const [key, value] of Object.entries(conditions)) {
const currentValue = state[key];
if (currentValue === undefined) return false;
if (Array.isArray(value)) {
if (!value.includes(currentValue)) return false;
continue;
}
if (state[key] !== value) {
return false;
}
}
return true;
} catch (err) {
console.error(
"[ROCmDocsSelector] Couldn't parse 'show-when' conditions:",
err,
);
return true;
}
}
function updateTOC2OptionsList() {
const tocOptionsList = document.querySelector(TOC2_OPTIONS_LIST_QUERY);
if (!tocOptionsList) return;
// Clear previous entries
tocOptionsList.innerHTML = "";
// Get only visible selector groups
const groups = Array.from(document.querySelectorAll(GROUP_QUERY)).filter(
(g) => g.offsetParent !== null
);
if (groups.length === 0) {
const li = document.createElement("li");
li.className =
"nav-item toc-entry toc-h3 rocm-docs-selector-toc2-item empty";
const span = document.createElement("span");
span.textContent = "(no visible selectors)";
li.appendChild(span);
tocOptionsList.appendChild(li);
return;
}
groups.forEach((group) => {
// ✅ Find group heading span
const headingSpan = group.querySelector(
".rocm-docs-selector-group-heading-text"
);
const headingText = headingSpan
? headingSpan.textContent.trim()
: "(Unnamed Selector)";
// Find currently selected option
const selectedOption = group.querySelector(`.${SELECTED_CLASS}`);
const optionText = selectedOption
? selectedOption.textContent.trim()
: "(none selected)";
// Build list item
const li = document.createElement("li");
li.className = "nav-item toc-entry toc-h3 rocm-docs-selector-toc2-item";
const link = document.createElement("a");
link.className = "nav-link";
link.href = `#${group.id}`;
link.innerHTML = `<strong>${headingText}</strong>: ${optionText}`;
li.appendChild(link);
tocOptionsList.appendChild(li);
});
}
function updateTOC2ContentsList() {
const tocOptionsList = document.querySelector(TOC2_OPTIONS_LIST_QUERY);
const tocContentsList = document.querySelector(TOC2_CONTENTS_LIST_QUERY);
if (!tocContentsList || !tocOptionsList) return;
const visibleHeaders = [...document.querySelectorAll(HEADING_QUERY)]
.filter((h) => h.offsetParent !== null); // only visible headings
tocContentsList
.querySelectorAll("li.toc-entry.rocm-docs-selector-toc2-item")
.forEach((node) => node.remove());
if (visibleHeaders.length === 0) return;
let lastH2Li = null;
visibleHeaders.forEach((h) => {
const level = parseInt(h.tagName.substring(1), 10);
const li = document.createElement("li");
li.className = `nav-item toc-entry toc-${h.tagName.toLowerCase()} rocm-docs-selector-toc2-item`;
const a = document.createElement("a");
a.className = "reference internal nav-link";
const section = h.closest("section");
const fallbackId = section ? section.id : "";
a.href = h.id ? `#${h.id}` : fallbackId ? `#${fallbackId}` : "#";
a.textContent = h.cloneNode(true).childNodes[0].textContent.trim();
li.appendChild(a);
// Nest logic: h3+ belong to last h2's inner list
if (level === 2) {
tocContentsList.appendChild(li);
lastH2Li = li;
} else if (level === 3 && lastH2Li) {
// ensure nested UL exists
let innerUl = lastH2Li.querySelector("ul");
if (!innerUl) {
innerUl = document.createElement("ul");
innerUl.className = "nav section-nav flex-column";
lastH2Li.appendChild(innerUl);
}
innerUl.appendChild(li);
} else {
tocContentsList.appendChild(li);
}
});
}
function updateVisibility() {
document.querySelectorAll(COND_QUERY).forEach((elem) => {
if (shouldBeShown(elem)) {
show(elem);
} else {
hide(elem);
}
});
updateTOC2OptionsList();
updateTOC2ContentsList();
}
function init() {
const selectorOptions = document.querySelectorAll(OPTION_QUERY);
const initialState = {};
// Attach listeners and gather defaults
selectorOptions.forEach((option) => {
validateOptionElem(option);
option.addEventListener("click", handleOptionSelect);
option.addEventListener("keydown", (e) => {
if (e.key === "Enter" || e.key === " ") {
e.preventDefault();
handleOptionSelect(e);
}
});
if (isDefaultOption(option)) {
select(option);
const { selectorKey: key, selectorValue: value } = option.dataset;
if (key && value) {
initialState[key] = value;
}
}
});
setState(initialState);
updateVisibility();
document.dispatchEvent(new CustomEvent(READY_EVENT));
}
function domReady(callback) {
if (document.readyState !== "loading") {
callback();
} else {
document.addEventListener("DOMContentLoaded", callback, { once: true });
}
}
// window.rocmDocsSelector = {
// setState,
// getState,
// };
// Initialize when DOM is ready
domReady(init);

View File

@@ -0,0 +1,235 @@
const GROUP_QUERY = ".rocm-docs-selector-group";
const SELECTED_CLASS = "rocm-docs-selected";
const TOC2_OPTIONS_LIST_QUERY = ".rocm-docs-selector-toc2-options";
const TOC2_CONTENTS_LIST_QUERY = ".rocm-docs-selector-toc2-contents";
const HEADING_QUERY = ".rocm-docs-selected-content h1,h2,h3,h4,h5,h6";
const TOC_ITEM_CLASS = "rocm-docs-selector-toc2-item";
const EMPTY_ITEM_CLASS = "empty";
let optionsTocInitialized = false;
function isVisible(el) {
return !!(el && el.offsetParent !== null);
}
function getUniqueGroups(groups) {
const seen = new Set();
return groups.filter((group) => {
// Use group ID as primary identity; fallback to heading text
const headingSpan = group.querySelector(
".rocm-docs-selector-group-heading-text"
);
const headingText = headingSpan
? headingSpan.textContent.trim()
: "(Unnamed Selector)";
const identifier = group.id ? `id:${group.id}` : `heading:${headingText}`;
if (seen.has(identifier)) return false;
seen.add(identifier);
return true;
});
}
function initTOC2OptionsList() {
const tocOptionsList = document.querySelector(TOC2_OPTIONS_LIST_QUERY);
if (!tocOptionsList) return;
tocOptionsList.innerHTML = "";
let groups = Array.from(document.querySelectorAll(GROUP_QUERY)).filter(isVisible);
groups = getUniqueGroups(groups);
if (groups.length === 0) {
const li = document.createElement("li");
li.className = `nav-item toc-entry toc-h3 ${TOC_ITEM_CLASS} ${EMPTY_ITEM_CLASS}`;
const span = document.createElement("span");
span.textContent = "(no visible selectors)";
li.appendChild(span);
tocOptionsList.appendChild(li);
optionsTocInitialized = true;
return;
}
groups.forEach((group) => {
const headingSpan = group.querySelector(
".rocm-docs-selector-group-heading-text"
);
const headingText = headingSpan
? headingSpan.textContent.trim()
: "(Unnamed Selector)";
const li = document.createElement("li");
li.className = `nav-item toc-entry toc-h3 ${TOC_ITEM_CLASS}`;
li.dataset.groupId = group.id || "";
const link = document.createElement("a");
link.className = "nav-link";
link.href = group.id ? `#${group.id}` : "#";
link.dataset.headingText = headingText;
const selectedOption = group.querySelector(`.${SELECTED_CLASS}`);
let optionText = "(none selected)";
if (selectedOption) {
const clone = selectedOption.cloneNode(true);
clone.querySelectorAll("i, svg").forEach((el) => el.remove());
optionText = clone.innerHTML.trim();
}
link.innerHTML = `<strong>${headingText}</strong>: ${optionText}`;
li.appendChild(link);
tocOptionsList.appendChild(li);
});
optionsTocInitialized = true;
}
export function updateTOC2OptionsList() {
const tocOptionsList = document.querySelector(TOC2_OPTIONS_LIST_QUERY);
if (!tocOptionsList) return;
let visibleGroups = Array.from(document.querySelectorAll(GROUP_QUERY)).filter(
isVisible
);
visibleGroups = getUniqueGroups(visibleGroups);
// Always rebuild fresh (simpler, avoids state drift)
tocOptionsList.innerHTML = "";
if (visibleGroups.length === 0) {
const li = document.createElement("li");
li.className = `nav-item toc-entry toc-h3 ${TOC_ITEM_CLASS} ${EMPTY_ITEM_CLASS}`;
const span = document.createElement("span");
span.textContent = "(no visible selectors)";
li.appendChild(span);
tocOptionsList.appendChild(li);
return;
}
visibleGroups.forEach((group) => {
const headingSpan = group.querySelector(
".rocm-docs-selector-group-heading-text"
);
const headingText = headingSpan
? headingSpan.textContent.trim()
: "(Unnamed Selector)";
const li = document.createElement("li");
li.className = `nav-item toc-entry toc-h3 ${TOC_ITEM_CLASS}`;
li.dataset.groupId = group.id || "";
const link = document.createElement("a");
link.className = "nav-link";
link.href = group.id ? `#${group.id}` : "#";
link.dataset.headingText = headingText;
const selectedOption = group.querySelector(`.${SELECTED_CLASS}`);
let optionText = "(none selected)";
if (selectedOption) {
const clone = selectedOption.cloneNode(true);
clone.querySelectorAll("i, svg").forEach((el) => el.remove());
optionText = clone.innerHTML.trim();
}
link.innerHTML = `<strong>${headingText}</strong>: ${optionText}`;
li.appendChild(link);
tocOptionsList.appendChild(li);
});
}
let contentsTocInitialized = false;
function initTOC2ContentsList() {
const tocContentsList = document.querySelector(TOC2_CONTENTS_LIST_QUERY);
if (!tocContentsList) return;
// Remove any previous dynamic items (idempotent init)
tocContentsList
.querySelectorAll(`li.toc-entry.${TOC_ITEM_CLASS}`)
.forEach((node) => node.remove());
const headings = Array.from(document.querySelectorAll(HEADING_QUERY));
if (headings.length === 0) {
contentsTocInitialized = true;
return;
}
const lastLiByLevel = {};
headings.forEach((h) => {
const level = parseInt(h.tagName.substring(1), 10);
if (Number.isNaN(level) || level < 2 || level > 6) return;
const li = document.createElement("li");
li.className = `nav-item toc-entry toc-${h.tagName.toLowerCase()} ` +
TOC_ITEM_CLASS;
const a = document.createElement("a");
a.className = "reference internal nav-link";
const section = h.closest("section");
const targetId = h.id || (section ? section.id : "");
a.href = targetId ? `#${targetId}` : "#";
// Use only the text from the heading (ignore headerlink icon etc.)
const clone = h.cloneNode(true);
const firstTextNode = clone.childNodes.length > 0
? clone.childNodes[0].textContent
: "";
a.textContent = (firstTextNode || "").trim();
li.dataset.targetId = targetId;
li.appendChild(a);
// Nest under closest previous shallower heading
let parentUl = null;
for (let parentLevel = level - 1; parentLevel >= 2; parentLevel -= 1) {
const parentLi = lastLiByLevel[parentLevel];
if (parentLi) {
parentUl = parentLi.querySelector("ul");
if (!parentUl) {
parentUl = document.createElement("ul");
parentUl.className = "nav section-nav flex-column";
parentLi.appendChild(parentUl);
}
break;
}
}
if (parentUl) {
parentUl.appendChild(li);
} else {
tocContentsList.appendChild(li);
}
lastLiByLevel[level] = li;
for (let deeper = level + 1; deeper <= 6; deeper += 1) {
delete lastLiByLevel[deeper];
}
});
contentsTocInitialized = true;
}
export function updateTOC2ContentsList() {
const tocOptionsList = document.querySelector(TOC2_OPTIONS_LIST_QUERY);
const tocContentsList = document.querySelector(TOC2_CONTENTS_LIST_QUERY);
if (!tocContentsList || !tocOptionsList) return;
if (!contentsTocInitialized) {
initTOC2ContentsList();
}
tocContentsList
.querySelectorAll(`li.toc-entry.${TOC_ITEM_CLASS}`)
.forEach((li) => {
const targetId = li.dataset.targetId;
if (!targetId) {
li.style.display = "none";
return;
}
const target = document.getElementById(targetId);
const visible = target && target.offsetParent !== null;
li.style.display = visible ? "" : "none";
});
}

View File

@@ -7,6 +7,11 @@ html {
--rocm-docs-selector-option-hover-color: var(--pst-color-link-hover);
--rocm-docs-selector-option-selected-color: var(--pst-color-primary);
--rocm-docs-selector-tile-padding: 0.2rem;
--rocm-docs-selector-tile-gap: 0.5rem;
--rocm-docs-selector-focus-ring: 2px solid var(
--rocm-docs-selector-accent-color
);
--rocm-docs-selector-focus-offset: 2px;
}
html[data-theme="light"] {
@@ -21,15 +26,37 @@ html[data-theme="dark"] {
--rocm-docs-selector-shadow-hover: 0 2px 8px rgba(0, 0, 0, 0.4);
}
/* Selector container */
.rocm-docs-selector-container {
padding: 0 0 1rem 0;
/* Avoid odd sizing interactions with Bootstrap width utilities */
.rocm-docs-selector-group,
.rocm-docs-selector-group * {
box-sizing: border-box;
}
/* Selector group heading when one of its options is hovered */
.rocm-docs-selector-group:has(.rocm-docs-selector-option:hover)
.rocm-docs-selector-group-heading {
border-right-color: var(--rocm-docs-selector-option-hover-color);
/* Hide selectors during initialization to prevent FOUC */
.rocm-docs-selector-group:not(.rocm-docs-selector-initialized) {
visibility: hidden;
}
/* Smooth fade-in when ready */
.rocm-docs-selector-group.rocm-docs-selector-initialized {
visibility: visible;
animation: rocm-docs-selector-fade-in 0.2s ease-in;
}
@keyframes rocm-docs-selector-fade-in {
from {
opacity: 0;
}
to {
opacity: 1;
}
}
@supports selector(.x:has(.y)) {
.rocm-docs-selector-group:has(.rocm-docs-selector-option:hover)
.rocm-docs-selector-group-heading {
border-right-color: var(--rocm-docs-selector-option-hover-color);
}
}
/* Selector group heading box */
@@ -39,7 +66,7 @@ html[data-theme="dark"] {
font-weight: 600;
border-right: solid 3px var(--rocm-docs-selector-accent-color);
border-radius: var(--rocm-docs-selector-border-radius);
transition: border-right-color 0.25s ease;
transition: border-color 0.25s ease;
box-shadow: var(--rocm-docs-selector-shadow);
}
@@ -57,16 +84,25 @@ html[data-theme="dark"] {
justify-content: space-between;
align-items: center;
gap: 0.5rem;
background-color: var(--rocm-docs-selector-bg-color);
padding: var(--rocm-docs-selector-tile-padding);
border: solid 2px var(--rocm-docs-selector-border-color);
cursor: pointer;
transition: all 0.2 ease;
border-radius: var(--rocm-docs-selector-border-radius);
box-shadow: var(--rocm-docs-selector-shadow);
cursor: pointer;
user-select: none;
transition:
background-color 0.2s ease,
color 0.2s ease,
transform 0.2s ease,
box-shadow 0.2s ease,
border-color 0.2s ease;
}
/* Selector option when hovered */
/* Hover (not disabled) */
.rocm-docs-selector-option:hover:not(.rocm-docs-disabled) {
background-color: var(--rocm-docs-selector-option-hover-color);
color: var(--rocm-docs-selector-fg-color);
@@ -74,30 +110,94 @@ html[data-theme="dark"] {
box-shadow: var(--rocm-docs-selector-shadow-hover);
}
.rocm-docs-selector-option:focus:not(.rocm-docs-disabled) {
z-index: 69;
/* Accessible keyboard focus */
.rocm-docs-selector-option:focus-visible:not(.rocm-docs-disabled) {
outline: var(--rocm-docs-selector-focus-ring);
outline-offset: var(--rocm-docs-selector-focus-offset);
}
/* Selector option when selected */
/* Keep it above neighbors if it gets an outline/box-shadow */
.rocm-docs-selector-option:focus:not(.rocm-docs-disabled) {
position: relative;
z-index: 1;
}
/* Selected */
.rocm-docs-selector-option.rocm-docs-selected {
background-color: var(--rocm-docs-selector-option-selected-color);
color: var(--rocm-docs-selector-fg-color);
}
/* Prevent hover effect on selected */
/* Prevent hover lift on selected (keeps it steady) */
.rocm-docs-selector-option.rocm-docs-selected:hover {
transform: none;
}
/* Selector option when disabled */
/* Disabled */
.rocm-docs-selector-option.rocm-docs-disabled {
background-color: var(--rocm-docs-selector-border-color);
color: var(--rocm-docs-selector-fg-color);
cursor: not-allowed;
pointer-events: none;
opacity: 0.75;
}
/* Hidden state */
.rocm-docs-hidden {
display: none;
}
/* Put selected option summary in secondary page-level TOC on a new line */
.rocm-docs-selector-toc2-item .nav-link span {
display: block;
padding-left: 1rem;
}
@media (max-width: 768px) {
.rocm-docs-selector-group {
row-gap: var(--rocm-docs-selector-tile-gap);
}
}
@media (max-width: 576px) {
.rocm-docs-selector-option {
flex: 0 0 100%;
max-width: 100%;
}
}
@media (max-width: 440px) {
.rocm-docs-selector-group-heading {
flex: 0 0 100%;
max-width: 100%;
/* When stacked, puth the accent on the bottom */
border-right: 0;
border-bottom: solid 3px var(--rocm-docs-selector-accent-color);
}
.rocm-docs-selector-group.row .row {
flex: 0 0 100%;
max-width: 100%;
}
.rocm-docs-selector-option {
flex: 0 0 100%;
max-width: 100%;
}
}
/* Motion reduction */
@media (prefers-reduced-motion: reduce) {
.rocm-docs-selector-group.rocm-docs-selector-initialized {
animation: none;
}
.rocm-docs-selector-option {
transition: none;
}
.rocm-docs-selector-option:hover:not(.rocm-docs-disabled) {
transform: none;
}
}

View File

@@ -0,0 +1,329 @@
import { domReady, logDebug } from "./utils.js";
import {
updateTOC2ContentsList,
updateTOC2OptionsList,
} from "./selector-toc.js";
const GROUP_QUERY = ".rocm-docs-selector-group";
const OPTION_QUERY = ".rocm-docs-selector-option";
const COND_QUERY = "[data-show-when],[data-disable-when]";
const DEFAULT_OPTION_CLASS = "rocm-docs-selector-option-default";
const DISABLED_CLASS = "rocm-docs-disabled";
const HIDDEN_CLASS = "rocm-docs-hidden";
const SELECTED_CLASS = "rocm-docs-selected";
// Toggle helpers -------------------------------------------------------------
const isDefaultOption = (elem) => elem.classList.contains(DEFAULT_OPTION_CLASS);
const disable = (elem) => {
elem.classList.add(DISABLED_CLASS);
elem.setAttribute("aria-disabled", "true");
elem.setAttribute("tabindex", "-1");
};
const enable = (elem) => {
elem.classList.remove(DISABLED_CLASS);
elem.setAttribute("aria-disabled", "false");
elem.setAttribute("tabindex", "0");
};
const hide = (elem) => {
elem.classList.add(HIDDEN_CLASS);
elem.setAttribute("aria-hidden", "true");
};
const show = (elem) => {
elem.classList.remove(HIDDEN_CLASS);
elem.setAttribute("aria-hidden", "false");
};
const select = (elem) => {
elem.classList.add(SELECTED_CLASS);
elem.setAttribute("aria-checked", "true");
};
const deselect = (elem) => {
elem.classList.remove(SELECTED_CLASS);
elem.setAttribute("aria-checked", "false");
};
// Global selector state ------------------------------------------------------
const state = {};
function getState() {
return { ...state };
}
function setState(updates) {
Object.assign(state, updates);
logDebug("State updated:", state);
}
// Condition handling ---------------------------------------------------------
/**
* Safely parse JSON-encoded conditions from a data-* attribute.
* Expects a key/value object, where values may be strings or arrays of strings.
*/
function parseConditions(attrName, raw) {
if (!raw) return null;
try {
const conditions = JSON.parse(raw);
if (typeof conditions !== "object" || Array.isArray(conditions)) {
console.warn(
`[ROCmDocsSelector] Invalid '${attrName}' format ` +
"(must be a key/value object):",
raw,
);
return null;
}
return conditions;
} catch (err) {
console.error(
`[ROCmDocsSelector] Couldn't parse '${attrName}' conditions:`,
err,
);
return null;
}
}
/**
* Return true iff all conditions match the current state.
* - Values can be a string or an array of strings.
* - A condition with an undefined state key is treated as not matching.
*/
function matchesConditions(conditions, currentState) {
for (const [key, expected] of Object.entries(conditions)) {
const actual = currentState[key];
// If no value yet, this condition does not match.
if (actual === undefined) return false;
if (Array.isArray(expected)) {
if (!expected.includes(actual)) return false;
} else if (actual !== expected) {
return false;
}
}
return true;
}
function shouldBeDisabled(elem) {
const raw = elem.dataset.disableWhen;
if (!raw) return false; // no conditions => never disabled
const conditions = parseConditions("disable-when", raw);
if (!conditions) {
console.warn(
"[ROCmDocsSelector] Invalid 'show-when' conditions; " +
"hiding affected element.",
);
return false;
}
return matchesConditions(conditions, state);
}
function shouldBeShown(elem) {
const raw = elem.dataset.showWhen;
if (!raw) return true; // no conditions => always visible
const conditions = parseConditions("show-when", raw);
if (!conditions) return true;
return matchesConditions(conditions, state);
}
// Event handlers -------------------------------------------------------------
function handleOptionSelect(e) {
const option = e.currentTarget;
// Ignore interaction with disabled or already selected options
if (
option.classList.contains(DISABLED_CLASS) ||
option.classList.contains(SELECTED_CLASS)
) {
return;
}
const { selectorKey: key, selectorValue: value } = option.dataset;
if (!key || !value) return;
// Update all selectors sharing the same key
const allOptions = document.querySelectorAll(
`${OPTION_QUERY}[data-selector-key="${key}"]`,
);
allOptions.forEach((opt) => {
if (opt.dataset.selectorValue === value) {
select(opt);
} else {
deselect(opt);
}
});
// Update global state
setState({ [key]: value });
// Re-run visibility rules and TOC sync
updateVisibility();
}
function handleOptionKeydown(e) {
if (e.key === "Enter" || e.key === " ") {
e.preventDefault();
handleOptionSelect(e);
}
}
// Visibility / enablement update --------------------------------------------
// Ensure each selector group always has a valid selected option.
// If the current selection becomes disabled/hidden due to another selector's
// change, automatically pick a replacement.
function reconcileGroupSelections() {
const currentState = getState();
const updates = {};
document.querySelectorAll(GROUP_QUERY).forEach((group) => {
// Skip groups that are themselves hidden
if (group.classList.contains(HIDDEN_CLASS)) return;
const options = Array.from(group.querySelectorAll(OPTION_QUERY));
if (!options.length) return;
const groupKey = group.dataset.selectorKey ||
options[0].dataset.selectorKey;
if (!groupKey) return;
// Options that are both enabled and visible
const enabledVisible = options.filter(
(opt) =>
!opt.classList.contains(DISABLED_CLASS) &&
!opt.classList.contains(HIDDEN_CLASS),
);
if (!enabledVisible.length) {
// No valid options left; just clear visual selection.
options.forEach(deselect);
return;
}
const currentlySelected = options.find((opt) =>
opt.classList.contains(SELECTED_CLASS)
);
const selectedStillValid = currentlySelected &&
enabledVisible.includes(currentlySelected);
if (selectedStillValid) {
const selectedValue = currentlySelected.dataset.selectorValue;
if (selectedValue && currentState[groupKey] !== selectedValue) {
updates[groupKey] = selectedValue;
}
return;
}
// Need a new selection: prefer a default option, otherwise the first
// enabled+visible option in DOM order.
const replacement = enabledVisible.find(isDefaultOption) ||
enabledVisible[0];
if (!replacement) return;
options.forEach(deselect);
select(replacement);
const newValue = replacement.dataset.selectorValue;
if (newValue && currentState[groupKey] !== newValue) {
updates[groupKey] = newValue;
}
});
const changedKeys = Object.keys(updates);
if (changedKeys.length > 0) {
setState(updates);
return true;
}
return false;
}
let isUpdatingVisibility = false;
function updateVisibility() {
// Prevent re-entrancy if something triggers updateVisibility
// while it is already running.
if (isUpdatingVisibility) return;
isUpdatingVisibility = true;
try {
let stateChanged = false;
let iterations = 0;
// We may need multiple passes: reconciling selections can change the
// global state, which in turn affects show/disable conditions.
do {
document.querySelectorAll(COND_QUERY).forEach((elem) => {
// Show/hide only if element has show-when
if (elem.dataset.showWhen !== undefined) {
if (shouldBeShown(elem)) {
show(elem);
} else {
hide(elem);
}
}
// Enable/disable only if element has disable-when
if (elem.dataset.disableWhen !== undefined) {
if (shouldBeDisabled(elem)) {
disable(elem);
} else {
enable(elem);
}
}
});
stateChanged = reconcileGroupSelections();
iterations += 1;
// Hard stop to avoid infinite loops in case of conflicting rules.
} while (stateChanged && iterations < 5);
updateTOC2OptionsList();
updateTOC2ContentsList();
} finally {
isUpdatingVisibility = false;
}
}
// Initialization -------------------------------------------------------------
domReady(() => {
const selectorOptions = document.querySelectorAll(OPTION_QUERY);
const initialState = {};
// Attach listeners and gather defaults
selectorOptions.forEach((option) => {
option.addEventListener("click", handleOptionSelect);
option.addEventListener("keydown", handleOptionKeydown);
if (isDefaultOption(option)) {
select(option);
const { selectorKey: key, selectorValue: value } = option.dataset;
if (key && value && initialState[key] === undefined) {
initialState[key] = value;
}
}
});
setState(initialState);
updateVisibility();
// Mark all selector groups as initialized to make them visible
document.querySelectorAll(GROUP_QUERY).forEach((group) => {
group.classList.add("rocm-docs-selector-initialized");
});
});

View File

@@ -0,0 +1,14 @@
export function domReady(callback) {
if (document.readyState !== "loading") {
callback();
} else {
document.addEventListener("DOMContentLoaded", callback, { once: true });
}
}
const DEBUG = true;
export const logDebug = (...args) => {
if (DEBUG) {
console.debug("[ROCmDocsSelector]", ...args);
}
};

View File

@@ -1,9 +0,0 @@
from sphinx.util.docutils import SphinxDirective, directives, nodes
from pathlib import Path
def setup(app):
static_assets_dir = Path(__file__).parent / "static"
app.config.html_static_path.append(str(static_assets_dir))
app.add_css_file("table.css")
return {"version": "1.0", "parallel_read_safe": True}

View File

@@ -1,6 +1,7 @@
<!-- Summary of selected options in secondary TOC -->
<div class="tocsection onthispage">
<i class="fa-solid fa-filter"></i>
Options
<i class="fa-solid fa-computer"></i>
Installation environment
</div>
<nav class="page-toc rocm-docs-selector-toc2">
<ul
@@ -8,6 +9,7 @@
>
</ul>
</nav>
<!-- Summary of page contents in secondary TOC -->
<div class="page-toc tocsection onthispage">
<i class="fa-solid fa-list"></i>
Contents

View File

@@ -1,5 +1,6 @@
import json
import html
from sphinx.util import logging
def normalize_key(key):
return key.replace(" ", "_").lower().strip()
@@ -28,3 +29,6 @@ def kv_to_data_attr(name, kv_str, separator="="):
return f'data-{name}="{html.escape(json.dumps(pairs))}"' if pairs else ""
logger = logging.getLogger(__name__)

View File

@@ -1,366 +0,0 @@
:orphan:
.. meta::
:description: How to train a model using JAX MaxText for ROCm.
:keywords: ROCm, AI, LLM, train, jax, torch, Llama, flux, tutorial, docker
******************************************
Training a model with JAX MaxText on ROCm
******************************************
.. caution::
This documentation does not reflect the latest version of ROCm JAX MaxText
training performance documentation. See :doc:`../jax-maxtext` for the latest version.
MaxText is a high-performance, open-source framework built on the Google JAX
machine learning library to train LLMs at scale. The MaxText framework for
ROCm is an optimized fork of the upstream
`<https://github.com/AI-Hypercomputer/maxtext>`__ enabling efficient AI workloads
on AMD MI300X series GPUs.
The MaxText for ROCm training Docker image
provides a prebuilt environment for training on AMD Instinct MI300X and MI325X GPUs,
including essential components like JAX, XLA, ROCm libraries, and MaxText utilities.
It includes the following software components:
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/jax-maxtext-benchmark-models.yaml
{% set dockers = data.dockers %}
.. tab-set::
{% for docker in dockers %}
{% set jax_version = docker.components["JAX"] %}
.. tab-item:: ``{{ docker.pull_tag }}``
:sync: {{ docker.pull_tag }}
.. list-table::
:header-rows: 1
* - Software component
- Version
{% for component_name, component_version in docker.components.items() %}
* - {{ component_name }}
- {{ component_version }}
{% endfor %}
{% if jax_version == "0.6.0" %}
.. note::
Shardy is a new config in JAX 0.6.0. You might get related errors if it's
not configured correctly. For now you can turn it off by setting
``shardy=False`` during the training run. You can also follow the `migration
guide <https://docs.jax.dev/en/latest/shardy_jax_migration.html>`__ to enable
it.
{% endif %}
{% endfor %}
MaxText with on ROCm provides the following key features to train large language models efficiently:
- Transformer Engine (TE)
- Flash Attention (FA) 3 -- with or without sequence input packing
- GEMM tuning
- Multi-node support
- NANOO FP8 quantization support
.. _amd-maxtext-model-support-v257:
Supported models
================
The following models are pre-optimized for performance on AMD Instinct MI300
series GPUs. Some instructions, commands, and available training
configurations in this documentation might vary by model -- select one to get
started.
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/jax-maxtext-benchmark-models.yaml
{% set model_groups = data.model_groups %}
.. raw:: html
<div id="vllm-benchmark-ud-params-picker" class="container-fluid">
<div class="row gx-0">
<div class="col-2 me-1 px-2 model-param-head">Model</div>
<div class="row col-10 pe-0">
{% for model_group in model_groups %}
<div class="col-4 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
{% endfor %}
</div>
</div>
<div class="row gx-0 pt-1">
<div class="col-2 me-1 px-2 model-param-head">Variant</div>
<div class="row col-10 pe-0">
{% for model_group in model_groups %}
{% set models = model_group.models %}
{% for model in models %}
{% if models|length % 3 == 0 %}
<div class="col-4 px-2 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
{% else %}
<div class="col-6 px-2 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
{% endif %}
{% endfor %}
{% endfor %}
</div>
</div>
</div>
.. note::
Some models, such as Llama 3, require an external license agreement through
a third party (for example, Meta).
System validation
=================
Before running AI workloads, it's important to validate that your AMD hardware is configured
correctly and performing optimally.
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
before starting training.
To test for optimal performance, consult the recommended :ref:`System health benchmarks
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
system's configuration.
Environment setup
=================
This Docker image is optimized for specific model configurations outlined
as follows. Performance can vary for other training workloads, as AMD
doesnt validate configurations and run conditions outside those described.
Pull the Docker image
---------------------
Use the following command to pull the Docker image from Docker Hub.
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/jax-maxtext-benchmark-models.yaml
{% set dockers = data.dockers %}
.. tab-set::
{% for docker in dockers %}
{% set jax_version = docker.components["JAX"] %}
.. tab-item:: JAX {{ jax_version }}
:sync: {{ docker.pull_tag }}
.. code-block:: shell
docker pull {{ docker.pull_tag }}
{% endfor %}
.. _amd-maxtext-multi-node-setup-v257:
Multi-node configuration
------------------------
See :doc:`/how-to/rocm-for-ai/system-setup/multi-node-setup` to configure your
environment for multi-node training.
.. _amd-maxtext-get-started-v257:
Benchmarking
============
Once the setup is complete, choose between two options to reproduce the
benchmark results:
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/jax-maxtext-benchmark-models.yaml
.. _vllm-benchmark-mad:
{% set dockers = data.dockers %}
{% set model_groups = data.model_groups %}
{% for model_group in model_groups %}
{% for model in model_group.models %}
.. container:: model-doc {{model.mad_tag}}
.. tab-set::
{% if model.mad_tag and "single-node" in model.doc_options %}
.. tab-item:: MAD-integrated benchmarking
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
directory and install the required packages on the host machine.
.. code-block:: shell
git clone https://github.com/ROCm/MAD
cd MAD
pip install -r requirements.txt
2. Use this command to run the performance benchmark test on the {{ model.model }} model
using one GPU with the :literal:`{{model.precision}}` data type on the host machine.
.. code-block:: shell
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
madengine run \
--tags {{model.mad_tag}} \
--keep-model-dir \
--live-output \
--timeout 28800
MAD launches a Docker container with the name
``container_ci-{{model.mad_tag}}``. The latency and throughput reports of the
model are collected in the following path: ``~/MAD/perf.csv/``.
{% endif %}
.. tab-item:: Standalone benchmarking
.. rubric:: Download the Docker image and required scripts
Run the JAX MaxText benchmark tool independently by starting the
Docker container as shown in the following snippet.
.. tab-set::
{% for docker in dockers %}
{% set jax_version = docker.components["JAX"] %}
.. tab-item:: JAX {{ jax_version }}
:sync: {{ docker.pull_tag }}
.. code-block:: shell
docker pull {{ docker.pull_tag }}
{% endfor %}
{% if model.model_repo and "single-node" in model.doc_options %}
.. rubric:: Single node training
1. Set up environment variables.
.. code-block:: shell
export MAD_SECRETS_HFTOKEN=<Your Hugging Face token>
export HF_HOME=<Location of saved/cached Hugging Face models>
``MAD_SECRETS_HFTOKEN`` is your Hugging Face access token to access models, tokenizers, and data.
See `User access tokens <https://huggingface.co/docs/hub/en/security-tokens>`__.
``HF_HOME`` is where ``huggingface_hub`` will store local data. See `huggingface_hub CLI <https://huggingface.co/docs/huggingface_hub/main/en/guides/cli#huggingface-cli-download>`__.
If you already have downloaded or cached Hugging Face artifacts, set this variable to that path.
Downloaded files typically get cached to ``~/.cache/huggingface``.
2. Launch the Docker container.
.. tab-set::
{% for docker in dockers %}
{% set jax_version = docker.components["JAX"] %}
.. tab-item:: JAX {{ jax_version }}
:sync: {{ docker.pull_tag }}
.. code-block:: shell
docker run -it \
--device=/dev/dri \
--device=/dev/kfd \
--network host \
--ipc host \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--privileged \
-v $HOME:$HOME \
-v $HOME/.ssh:/root/.ssh \
-v $HF_HOME:/hf_cache \
-e HF_HOME=/hf_cache \
-e MAD_SECRETS_HFTOKEN=$MAD_SECRETS_HFTOKEN
--shm-size 64G \
--name training_env \
{{ docker.pull_tag }}
{% endfor %}
3. In the Docker container, clone the ROCm MAD repository and navigate to the
benchmark scripts directory at ``MAD/scripts/jax-maxtext``.
.. code-block:: shell
git clone https://github.com/ROCm/MAD
cd MAD/scripts/jax-maxtext
4. Run the setup scripts to install libraries and datasets needed
for benchmarking.
.. code-block:: shell
./jax-maxtext_benchmark_setup.sh -m {{ model.model_repo }}
5. To run the training benchmark without quantization, use the following command:
.. code-block:: shell
./jax-maxtext_benchmark_report.sh -m {{ model.model_repo }}
For quantized training, use the following command:
.. code-block:: shell
./jax-maxtext_benchmark_report.sh -m {{ model.model_repo }} -q nanoo_fp8
{% endif %}
{% if model.multinode_training_script and "multi-node" in model.doc_options %}
.. rubric:: Multi-node training
The following examples use SLURM to run on multiple nodes.
.. note::
The following scripts will launch the Docker container and run the
benchmark. Run them outside of any Docker container.
1. Make sure ``$HF_HOME`` is set before running the test. See
`ROCm benchmarking <https://github.com/ROCm/MAD/blob/develop/scripts/jax-maxtext/gpu-rocm/readme.md>`__
for more details on downloading the Llama models before running the
benchmark.
2. To run multi-node training for {{ model.model }},
use the
`multi-node training script <https://github.com/ROCm/MAD/blob/develop/scripts/jax-maxtext/gpu-rocm/{{ model.multinode_training_script }}>`__
under the ``scripts/jax-maxtext/gpu-rocm/`` directory.
3. Run the multi-node training benchmark script.
.. code-block:: shell
sbatch -N <num_nodes> {{ model.multinode_training_script }}
{% else %}
.. rubric:: Multi-node training
For multi-node training examples, choose a model from :ref:`amd-maxtext-model-support-v257`
with an available `multi-node training script <https://github.com/ROCm/MAD/tree/develop/scripts/jax-maxtext/gpu-rocm>`__.
{% endif %}
{% endfor %}
{% endfor %}
Further reading
===============
- To learn more about MAD and the ``madengine`` CLI, see the `MAD usage guide <https://github.com/ROCm/MAD?tab=readme-ov-file#usage-guide>`__.
- To learn more about system settings and management practices to configure your system for
AMD Instinct MI300X series GPUs, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_.
- For a list of other ready-made Docker images for AI with ROCm, see
`AMD Infinity Hub <https://www.amd.com/en/developer/resources/infinity-hub.html#f-amd_hub_category=AI%20%26%20ML%20Models>`_.
Previous versions
=================
See :doc:`jax-maxtext-history` to find documentation for previous releases
of the ``ROCm/jax-training`` Docker image.

View File

@@ -1,667 +0,0 @@
:orphan:
.. meta::
:description: How to train a model using Megatron-LM for ROCm.
:keywords: ROCm, AI, LLM, train, Megatron-LM, megatron, Llama, tutorial, docker, torch
********************************************
Training a model with Primus and Megatron-LM
********************************************
.. caution::
This documentation does not reflect the latest version of ROCm Megatron-LM
training performance documentation. See :doc:`../primus-megatron` for the latest version.
`Primus <https://github.com/AMD-AGI/Primus>`__ is a unified and flexible
LLM training framework designed to streamline training. It streamlines LLM
training on AMD Instinct GPUs using a modular, reproducible configuration paradigm.
Primus is backend-agnostic and supports multiple training engines -- including Megatron.
.. note::
Primus with Megatron is designed to replace the :doc:`ROCm Megatron-LM training <../megatron-lm>` workflow.
To learn how to migrate workloads from Megatron-LM to Primus with Megatron,
see :doc:`megatron-lm-primus-migration-guide`.
For ease of use, AMD provides a ready-to-use Docker image for MI300 series GPUs
containing essential components for Primus and Megatron-LM. This Docker is powered by Primus
Turbo optimizations for performance; this release adds support for Primus Turbo
with optimized attention and grouped GEMM kernels.
.. note::
This Docker environment is based on Python 3.10 and Ubuntu 22.04. For an alternative environment with
Python 3.12 and Ubuntu 24.04, see the :doc:`previous ROCm Megatron-LM v25.6 Docker release <megatron-lm-v25.6>`.
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-megatron-v25.8-benchmark-models.yaml
{% set dockers = data.dockers %}
{% set docker = dockers[0] %}
.. list-table::
:header-rows: 1
* - Software component
- Version
{% for component_name, component_version in docker.components.items() %}
* - {{ component_name }}
- {{ component_version }}
{% endfor %}
.. _amd-primus-megatron-lm-model-support:
Supported models
================
The following models are pre-optimized for performance on AMD Instinct MI300X series GPUs.
Some instructions, commands, and training examples in this documentation might
vary by model -- select one to get started.
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-megatron-v25.8-benchmark-models.yaml
{% set model_groups = data.model_groups %}
.. raw:: html
<div id="vllm-benchmark-ud-params-picker" class="container-fluid">
<div class="row gx-0">
<div class="col-2 me-1 px-2 model-param-head">Model</div>
<div class="row col-10 pe-0">
{% for model_group in model_groups %}
<div class="col-3 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
{% endfor %}
</div>
</div>
<div class="row gx-0 pt-1">
<div class="col-2 me-1 px-2 model-param-head">Variant</div>
<div class="row col-10 pe-0">
{% for model_group in model_groups %}
{% set models = model_group.models %}
{% for model in models %}
{% if models|length % 3 == 0 %}
<div class="col-4 px-2 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
{% else %}
<div class="col-6 px-2 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
{% endif %}
{% endfor %}
{% endfor %}
</div>
</div>
</div>
.. note::
Some models, such as Llama, require an external license agreement through
a third party (for example, Meta).
System validation
=================
Before running AI workloads, it's important to validate that your AMD hardware is configured
correctly and performing optimally.
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
before starting training.
To test for optimal performance, consult the recommended :ref:`System health benchmarks
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
system's configuration.
.. _mi300x-amd-primus-megatron-lm-training:
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-megatron-v25.8-benchmark-models.yaml
{% set dockers = data.dockers %}
{% set docker = dockers[0] %}
Environment setup
=================
Use the following instructions to set up the environment, configure the script to train models, and
reproduce the benchmark results on MI300X series GPUs with the ``{{ docker.pull_tag }}`` image.
.. _amd-primus-megatron-lm-requirements:
Download the Docker image
-------------------------
1. Use the following command to pull the Docker image from Docker Hub.
.. code-block:: shell
docker pull {{ docker.pull_tag }}
2. Launch the Docker container.
.. code-block:: shell
docker run -it \
--device /dev/dri \
--device /dev/kfd \
--device /dev/infiniband \
--network host --ipc host \
--group-add video \
--cap-add SYS_PTRACE \
--security-opt seccomp=unconfined \
--privileged \
-v $HOME:$HOME \
--shm-size 128G \
--name primus_training_env \
{{ docker.pull_tag }}
3. Use these commands if you exit the ``primus_training_env`` container and need to return to it.
.. code-block:: shell
docker start primus_training_env
docker exec -it primus_training_env bash
The Docker container hosts verified commit ``927a717`` of the `Primus
<https://github.com/AMD-AGI/Primus/tree/927a71702784347a311ca48fd45f0f308c6ef6dd>`__ repository.
.. _amd-primus-megatron-lm-environment-setup:
Configuration
=============
Primus defines a training configuration in YAML for each model in
`examples/megatron/configs <https://github.com/AMD-AGI/Primus/tree/927a71702784347a311ca48fd45f0f308c6ef6dd/examples/megatron/configs>`__.
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-megatron-v25.8-benchmark-models.yaml
{% set model_groups = data.model_groups %}
{% for model_group in model_groups %}
{% for model in model_group.models %}
.. container:: model-doc {{ model.mad_tag }}
To update training parameters for {{ model.model }}, you can update ``examples/megatron/configs/{{ model.config_name }}``.
Note that training configuration YAML files for other models follow this naming convention.
{% endfor %}
{% endfor %}
.. note::
See :ref:`Key options <amd-primus-megatron-lm-benchmark-test-vars>` for more information on configuration options.
Dataset options
---------------
You can use either mock data or real data for training.
* Mock data can be useful for testing and validation. Use the ``mock_data`` field to toggle between mock and real data. The default
value is ``true`` for enabled.
.. code-block:: yaml
mock_data: true
* If you're using a real dataset, update the ``train_data_path`` field to point to the location of your dataset.
.. code-block:: bash
mock_data: false
train_data_path: /path/to/your/dataset
Ensure that the files are accessible inside the Docker container.
.. _amd-primus-megatron-lm-tokenizer:
Tokenizer
---------
Set the ``HF_TOKEN`` environment variable with
right permissions to access the tokenizer for each model.
.. code-block:: bash
# Export your HF_TOKEN in the workspace
export HF_TOKEN=<your_hftoken>
.. note::
In Primus, each model uses a tokenizer from Hugging Face. For example, Llama
3.1 8B model uses ``tokenizer_model: meta-llama/Llama-3.1-8B`` and
``tokenizer_type: Llama3Tokenizer`` defined in the `llama3.1-8B model
<https://github.com/AMD-AGI/Primus/blob/927a71702784347a311ca48fd45f0f308c6ef6dd/examples/megatron/configs/llama3.1_8B-pretrain.yaml>`__
definition.
.. _amd-primus-megatron-lm-run-training:
Run training
============
Use the following example commands to set up the environment, configure
:ref:`key options <amd-primus-megatron-lm-benchmark-test-vars>`, and run training on
MI300X series GPUs with the AMD Megatron-LM environment.
Single node training
--------------------
To run training on a single node, navigate to ``/workspace/Primus`` and use the following setup command:
.. code-block:: shell
pip install -r requirements.txt
export HSA_NO_SCRATCH_RECLAIM=1
export NVTE_CK_USES_BWD_V3=1
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.3-70b
Once setup is complete, run the appropriate training command.
The following run commands are tailored to Llama 3.3 70B.
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
To run pre-training for Llama 3.3 70B BF16, run:
.. code-block:: shell
EXP=examples/megatron/configs/llama3.3_70B-pretrain.yaml \
bash ./examples/run_pretrain.sh \
--micro_batch_size 2 \
--global_batch_size 16 \
--train_iters 50
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.1-8b
Once setup is complete, run the appropriate training command.
The following run commands are tailored to Llama 3.1 8B.
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
To run pre-training for Llama 3.1 8B FP8, run:
.. code-block:: shell
EXP=examples/megatron/configs/llama3.1_8B-pretrain.yaml \
bash ./examples/run_pretrain.sh \
--train_iters 50 \
--fp8 hybrid
For Llama 3.1 8B BF16, use the following command:
.. code-block:: shell
EXP=examples/megatron/configs/llama3.1_8B-pretrain.yaml \
bash ./examples/run_pretrain.sh --train_iters 50
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.1-70b
Once setup is complete, run the appropriate training command.
The following run commands are tailored to Llama 3.1 70B.
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
To run pre-training for Llama 3.1 70B BF16, run:
.. code-block:: shell
EXP=examples/megatron/configs/llama3.1_70B-pretrain.yaml \
bash ./examples/run_pretrain.sh \
--train_iters 50
To run the training on a single node for Llama 3.1 70B FP8 with proxy, use the following command:
.. code-block:: shell
EXP=examples/megatron/configs/llama3.1_70B-pretrain.yaml \
bash ./examples/run_pretrain.sh \
--train_iters 50 \
--num_layers 40 \
--fp8 hybrid
.. note::
Use two or more nodes to run the *full* Llama 70B model with FP8 precision.
.. container:: model-doc primus_pyt_megatron_lm_train_llama-2-7b
Once setup is complete, run the appropriate training command.
The following run commands are tailored to Llama 2 7B.
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
To run pre-training for Llama 2 7B FP8, run:
.. code-block:: shell
EXP=examples/megatron/configs/llama2_7B-pretrain.yaml \
bash ./examples/run_pretrain.sh \
--train_iters 50 \
--fp8 hybrid
To run pre-training for Llama 2 7B BF16, run:
.. code-block:: shell
EXP=examples/megatron/configs/llama2_7B-pretrain.yaml \
bash ./examples/run_pretrain.sh --train_iters 50
.. container:: model-doc primus_pyt_megatron_lm_train_llama-2-70b
Once setup is complete, run the appropriate training command.
The following run commands are tailored to Llama 2 70B.
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
To run pre-training for Llama 2 70B BF16, run:
.. code-block:: shell
EXP=examples/megatron/configs/llama2_70B-pretrain.yaml \
bash ./examples/run_pretrain.sh --train_iters 50
.. container:: model-doc primus_pyt_megatron_lm_train_deepseek-v3-proxy
Once setup is complete, run the appropriate training command.
The following run commands are tailored to DeepSeek-V3.
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
To run training on a single node for DeepSeek-V3 (MoE with expert parallel) with 3-layer proxy,
use the following command:
.. code-block:: shell
EXP=examples/megatron/configs/deepseek_v3-pretrain.yaml \
bash examples/run_pretrain.sh \
--num_layers 3 \
--moe_layer_freq 1 \
--train_iters 50
.. container:: model-doc primus_pyt_megatron_lm_train_deepseek-v2-lite-16b
Once setup is complete, run the appropriate training command.
The following run commands are tailored to DeepSeek-V2-Lite.
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
To run training on a single node for DeepSeek-V2-Lite (MoE with expert parallel),
use the following command:
.. code-block:: shell
EXP=examples/megatron/configs/deepseek_v2_lite-pretrain.yaml \
bash examples/run_pretrain.sh \
--global_batch_size 256 \
--train_iters 50
.. container:: model-doc primus_pyt_megatron_lm_train_mixtral-8x7b
Once setup is complete, run the appropriate training command.
The following run commands are tailored to Mixtral 8x7B.
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
To run training on a single node for Mixtral 8x7B (MoE with expert parallel),
use the following command:
.. code-block:: shell
EXP=examples/megatron/configs/mixtral_8x7B_v0.1-pretrain.yaml \
bash examples/run_pretrain.sh --train_iters 50
.. container:: model-doc primus_pyt_megatron_lm_train_mixtral-8x22b-proxy
Once setup is complete, run the appropriate training command.
The following run commands are tailored to Mixtral 8x22B.
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
To run training on a single node for Mixtral 8x22B (MoE with expert parallel) with 4-layer proxy,
use the following command:
.. code-block:: shell
EXP=examples/megatron/configs/mixtral_8x22B_v0.1-pretrain.yaml \
bash examples/run_pretrain.sh \
--num_layers 4 \
--pipeline_model_parallel_size 1 \
--micro_batch_size 1 \
--global_batch_size 16 \
--train_iters 50
.. container:: model-doc primus_pyt_megatron_lm_train_qwen2.5-7b
Once setup is complete, run the appropriate training command.
The following run commands are tailored to Qwen 2.5 7B.
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
To run training on a single node for Qwen 2.5 7B BF16, use the following
command:
.. code-block:: shell
EXP=examples/megatron/configs/qwen2.5_7B-pretrain.yaml \
bash examples/run_pretrain.sh --train_iters 50
For FP8, use the following command.
.. code-block:: shell
EXP=examples/megatron/configs/qwen2.5_7B-pretrain.yaml \
bash examples/run_pretrain.sh \
--train_iters 50 \
--fp8 hybrid
.. container:: model-doc primus_pyt_megatron_lm_train_qwen2.5-72b
Once setup is complete, run the appropriate training command.
The following run commands are tailored to Qwen 2.5 72B.
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
To run the training on a single node for Qwen 2.5 72B BF16, use the following command.
.. code-block:: shell
EXP=examples/megatron/configs/qwen2.5_72B-pretrain.yaml \
bash examples/run_pretrain.sh --train_iters 50
.. _amd-primus-megatron-multi-node-examples:
Multi-node training examples
----------------------------
Refer to :doc:`/how-to/rocm-for-ai/system-setup/multi-node-setup` to configure your environment for multi-node
training.
To run training on multiple nodes, you can use the
`run_slurm_pretrain.sh <https://github.com/AMD-AGI/Primus/blob/927a71702784347a311ca48fd45f0f308c6ef6dd/examples/run_slurm_pretrain.sh>`__
to launch the multi-node workload. Use the following steps to setup your environment:
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-megatron-v25.8-benchmark-models.yaml
{% set dockers = data.dockers %}
{% set docker = dockers[0] %}
.. code-block:: shell
cd /workspace/Primus/
export DOCKER_IMAGE={{ docker.pull_tag }}
export HF_TOKEN=<your_HF_token>
export HSA_NO_SCRATCH_RECLAIM=1
export NVTE_CK_USES_BWD_V3=1
export NCCL_IB_HCA=<your_NCCL_IB_HCA> # specify which RDMA interfaces to use for communication
export NCCL_SOCKET_IFNAME=<your_NCCL_SOCKET_IFNAME> # your Network Interface
export GLOO_SOCKET_IFNAME=<your_GLOO_SOCKET_IFNAME> # your Network Interface
export NCCL_IB_GID_INDEX=3 # Set InfiniBand GID index for NCCL communication. Default is 3 for ROCE
.. note::
* Make sure correct network drivers are installed on the nodes. If inside a Docker, either install the drivers inside the Docker container or pass the network drivers from the host while creating Docker container.
* If ``NCCL_IB_HCA`` and ``NCCL_SOCKET_IFNAME`` are not set, Primus will try to auto-detect. However, since NICs can vary accross different cluster, it is encouraged to explicitly export your NCCL parameters for the cluster.
* To find your network interface, you can use ``ip a``.
* To find RDMA interfaces, you can use ``ibv_devices`` to get the list of all the RDMA/IB devices.
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.3-70b
To train Llama 3.3 70B FP8 on 8 nodes, run:
.. code-block:: shell
NNODES=8 EXP=examples/megatron/configs/llama3.3_70B-pretrain.yaml \
bash examples/run_slurm_pretrain.sh \
--micro_batch_size 1 \
--global_batch_size 256 \
--recompute_num_layers 80 \
--fp8 hybrid
To train Llama 3.3 70B BF16 on 8 nodes, run:
.. code-block:: shell
NNODES=8 EXP=examples/megatron/configs/llama3.3_70B-pretrain.yaml \
bash examples/run_slurm_pretrain.sh \
--micro_batch_size 1 \
--global_batch_size 256 \
--recompute_num_layers 12
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.1-8b
To train Llama 3.1 8B FP8 on 8 nodes, run:
.. code-block:: shell
# Adjust the training parameters. For e.g., `global_batch_size: 8 * #single_node_bs` for 8 nodes in this case
NNODES=8 EXP=examples/megatron/configs/llama3.1_8B-pretrain.yaml \
bash ./examples/run_slurm_pretrain.sh \
--global_batch_size 1024 \
--fp8 hybrid
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.1-70b
To train Llama 3.1 70B FP8 on 8 nodes, run:
.. code-block:: shell
NNODES=8 EXP=examples/megatron/configs/llama3.1_70B-pretrain.yaml \
bash examples/run_slurm_pretrain.sh \
--micro_batch_size 1 \
--global_batch_size 256 \
--recompute_num_layers 80 \
--fp8 hybrid
To train Llama 3.1 70B BF16 on 8 nodes, run:
.. code-block:: shell
NNODES=8 EXP=examples/megatron/configs/llama3.1_70B-pretrain.yaml \
bash examples/run_slurm_pretrain.sh \
--micro_batch_size 1 \
--global_batch_size 256 \
--recompute_num_layers 12
.. container:: model-doc primus_pyt_megatron_lm_train_llama-2-7b
To train Llama 2 8B FP8 on 8 nodes, run:
.. code-block:: shell
# Adjust the training parameters. For e.g., `global_batch_size: 8 * #single_node_bs` for 8 nodes in this case
NNODES=8 EXP=examples/megatron/configs/llama2_7B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh --global_batch_size 2048 --fp8 hybrid
.. container:: model-doc primus_pyt_megatron_lm_train_llama-2-70b
To train Llama 2 70B FP8 on 8 nodes, run:
.. code-block:: shell
NNODES=8 EXP=examples/megatron/configs/llama2_70B-pretrain.yaml \
bash examples/run_slurm_pretrain.sh \
--micro_batch_size 2 \
--global_batch_size 256 \
--recompute_num_layers 80 \
--fp8 hybrid
To train Llama 2 70B BF16 on 8 nodes, run:
.. code-block:: shell
NNODES=8 EXP=examples/megatron/configs/llama2_70B-pretrain.yaml \
bash ./examples/run_slurm_pretrain.sh \
--micro_batch_size 2 \
--global_batch_size 1536 \
--recompute_num_layers 12
.. container:: model-doc primus_pyt_megatron_lm_train_mixtral-8x7b
To train Mixtral 8x7B BF16 on 8 nodes, run:
.. code-block:: shell
NNODES=8 EXP=examples/megatron/configs/mixtral_8x7B_v0.1-pretrain.yaml \
bash examples/run_slurm_pretrain.sh \
--micro_batch_size 2 \
--global_batch_size 256
.. container:: model-doc primus_pyt_megatron_lm_train_qwen2.5-72b
To train Qwen2.5 72B FP8 on 8 nodes, run:
.. code-block:: shell
NNODES=8 EXP=examples/megatron/configs/qwen2.5_72B-pretrain.yaml \
bash examples/run_slurm_pretrain.sh \
--micro_batch_size 4 \
--global_batch_size 256 \
--recompute_num_layers 80 \
--fp8 hybrid
.. _amd-primus-megatron-lm-benchmark-test-vars:
Key options
-----------
The following are key options to take note of
fp8
``hybrid`` enables FP8 GEMMs.
use_torch_fsdp2
``use_torch_fsdp2: 1`` enables torch fsdp-v2. If FSDP is enabled,
set ``use_distributed_optimizer`` and ``overlap_param_gather`` to ``false``.
profile
To enable PyTorch profiling, set these parameters:
.. code-block:: yaml
profile: true
use_pytorch_profiler: true
profile_step_end: 7
profile_step_start: 6
train_iters
The total number of iterations (default: 50).
mock_data
True by default.
micro_batch_size
Micro batch size.
global_batch_size
Global batch size.
recompute_granularity
For activation checkpointing.
num_layers
For using a reduced number of layers as with proxy models.
Further reading
===============
- For an introduction to Primus, see `Primus: A Lightweight, Unified Training
Framework for Large Models on AMD GPUs <https://rocm.blogs.amd.com/software-tools-optimization/primus/README.html>`__.
- To learn more about system settings and management practices to configure your system for
AMD Instinct MI300X series GPUs, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_.
- For a list of other ready-made Docker images for AI with ROCm, see
`AMD Infinity Hub <https://www.amd.com/en/developer/resources/infinity-hub.html#f-amd_hub_category=AI%20%26%20ML%20Models>`_.
Previous versions
=================
See :doc:`megatron-lm-history` to find documentation for previous releases
of the ``ROCm/megatron-lm`` Docker image.
This training environment now uses Primus with Megatron as the primary
configuration. Limited support for the legacy ROCm Megatron-LM is still
available; see the :doc:`../megatron-lm` documentation.

View File

@@ -1,312 +0,0 @@
:orphan:
.. meta::
:description: How to train a model using PyTorch for ROCm.
:keywords: ROCm, AI, LLM, train, PyTorch, torch, Llama, flux, tutorial, docker
****************************************
Training a model with Primus and PyTorch
****************************************
.. caution::
This documentation does not reflect the latest version of ROCm Primus PyTorch training
performance benchmark documentation. See :doc:`../primus-pytorch` for the latest version.
`Primus <https://github.com/AMD-AGI/Primus>`__ is a unified and flexible
LLM training framework designed to streamline training. It streamlines LLM
training on AMD Instinct GPUs using a modular, reproducible configuration paradigm.
Primus now supports the PyTorch torchtitan backend.
.. note::
Primus with the PyTorch torchtitan backend is designed to replace the :doc:`ROCm PyTorch training <../pytorch-training>` workflow.
See :doc:`../pytorch-training` to see steps to run workloads without Primus.
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-pytorch-v25.8-benchmark-models.yaml
{% set dockers = data.dockers %}
{% set docker = dockers[0] %}
For ease of use, AMD provides a ready-to-use Docker image -- ``{{
docker.pull_tag }}`` -- for MI300X series GPUs containing essential
components for Primus and PyTorch training with
Primus Turbo optimizations.
.. list-table::
:header-rows: 1
* - Software component
- Version
{% for component_name, component_version in docker.components.items() %}
* - {{ component_name }}
- {{ component_version }}
{% endfor %}
.. _amd-primus-pytorch-model-support-v258:
Supported models
================
The following models are pre-optimized for performance on the AMD Instinct MI325X and MI300X GPUs.
Some instructions, commands, and training recommendations in this documentation might
vary by model -- select one to get started.
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-pytorch-v25.8-benchmark-models.yaml
{% set unified_docker = data.dockers[0] %}
{% set model_groups = data.model_groups %}
.. raw:: html
<div id="vllm-benchmark-ud-params-picker" class="container-fluid">
<div class="row gx-0" style="display: none;">
<div class="col-2 me-1 px-2 model-param-head">Model</div>
<div class="row col-10 pe-0">
{% for model_group in model_groups %}
<div class="col-3 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
{% endfor %}
</div>
</div>
<div class="row gx-0 pt-1">
<div class="col-2 me-1 px-2 model-param-head">Model</div>
<div class="row col-10 pe-0">
{% for model_group in model_groups %}
{% set models = model_group.models %}
{% for model in models %}
{% if models|length % 3 == 0 %}
<div class="col-4 px-2 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
{% else %}
<div class="col-6 px-2 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
{% endif %}
{% endfor %}
{% endfor %}
</div>
</div>
</div>
.. seealso::
For additional workloads, including Llama 3.3, Llama 3.2, Llama 2, GPT OSS, Qwen, and Flux models,
see the documentation :doc:`../pytorch-training` (without Primus)
.. _amd-primus-pytorch-performance-measurements-v258:
System validation
=================
Before running AI workloads, it's important to validate that your AMD hardware is configured
correctly and performing optimally.
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
before starting training.
To test for optimal performance, consult the recommended :ref:`System health benchmarks
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
system's configuration.
This Docker image is optimized for specific model configurations outlined
below. Performance can vary for other training workloads, as AMD
doesnt test configurations and run conditions outside those described.
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-pytorch-v25.8-benchmark-models.yaml
{% set unified_docker = data.dockers[0] %}
Pull the Docker image
=====================
Use the following command to pull the `Docker image <{{ unified_docker.docker_hub_url }}>`_ from Docker Hub.
.. code-block:: shell
docker pull {{ unified_docker.pull_tag }}
Run training
============
{% set model_groups = data.model_groups %}
Once the setup is complete, choose between the following two workflows to start benchmarking training.
For fine-tuning workloads and multi-node training examples, see :doc:`../pytorch-training` (without Primus).
.. tab-set::
.. tab-item:: MAD-integrated benchmarking
{% for model_group in model_groups %}
{% for model in model_group.models %}
.. container:: model-doc {{ model.mad_tag }}
The following run command is tailored to {{ model.model }}.
See :ref:`amd-primus-pytorch-model-support-v258` to switch to another available model.
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
directory and install the required packages on the host machine.
.. code-block:: shell
git clone https://github.com/ROCm/MAD
cd MAD
pip install -r requirements.txt
2. For example, use this command to run the performance benchmark test on the {{ model.model }} model
using one node with the {{ model.precision }} data type on the host machine.
.. code-block:: shell
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
madengine run \
--tags {{ model.mad_tag }} \
--keep-model-dir \
--live-output \
--timeout 28800
MAD launches a Docker container with the name
``container_ci-{{ model.mad_tag }}``. The latency and throughput reports of the
model are collected in ``~/MAD/perf.csv``.
{% endfor %}
{% endfor %}
.. tab-item:: Standalone benchmarking
{% for model_group in model_groups %}
{% for model in model_group.models %}
.. container:: model-doc {{ model.mad_tag }}
The following run commands are tailored to {{ model.model }}.
See :ref:`amd-primus-pytorch-model-support-v258` to switch to another available model.
.. rubric:: Download the Docker image and required packages
1. Use the following command to pull the Docker image from Docker Hub.
.. code-block:: shell
docker pull {{ unified_docker.pull_tag }}
2. Run the Docker container.
.. code-block:: shell
docker run -it \
--device /dev/dri \
--device /dev/kfd \
--network host \
--ipc host \
--group-add video \
--cap-add SYS_PTRACE \
--security-opt seccomp=unconfined \
--privileged \
-v $HOME:$HOME \
-v $HOME/.ssh:/root/.ssh \
--shm-size 64G \
--name training_env \
{{ unified_docker.pull_tag }}
Use these commands if you exit the ``training_env`` container and need to return to it.
.. code-block:: shell
docker start training_env
docker exec -it training_env bash
3. In the Docker container, clone the `<https://github.com/ROCm/MAD>`__
repository and navigate to the benchmark scripts directory
``/workspace/MAD/scripts/pytorch_train``.
.. code-block:: shell
git clone https://github.com/ROCm/MAD
cd MAD/scripts/pytorch_train
.. rubric:: Prepare training datasets and dependencies
1. The following benchmarking examples require downloading models and datasets
from Hugging Face. To ensure successful access to gated repos, set your
``HF_TOKEN``.
.. code-block:: shell
export HF_TOKEN=$your_personal_hugging_face_access_token
2. Run the setup script to install libraries and datasets needed for benchmarking.
.. code-block:: shell
./pytorch_benchmark_setup.sh
.. rubric:: Pretraining
To start the pretraining benchmark, use the following command with the
appropriate options. See the following list of options and their descriptions.
.. code-block:: shell
./pytorch_benchmark_report.sh -t pretrain \
-m {{ model.model_repo }} \
-p $datatype \
-s $sequence_length
.. list-table::
:header-rows: 1
* - Name
- Options
- Description
{% for mode in available_modes %}
* - {% if loop.first %}``$training_mode``{% endif %}
- ``{{ mode }}``
- {{ training_mode_descs[mode] }}
{% endfor %}
* - ``$datatype``
- ``BF16``{% if model.mad_tag == "primus_pyt_train_llama-3.1-8b" %} or ``FP8``{% endif %}
- Currently, only Llama 3.1 8B supports FP8 precision.
* - ``$sequence_length``
- Sequence length for the language model.
- Between 2048 and 8192. 8192 by default.
.. rubric:: Benchmarking examples
Use the following command to run train {{ model.model }} with BF16 precision using Primus torchtitan.
.. code-block:: shell
./pytorch_benchmark_report.sh -m {{ model.model_repo }}
To train {{ model.model }} with FP8 precision, use the following command.
.. code-block:: shell
./pytorch_benchmark_report.sh -m {{ model.model_repo }} -p FP8
{% endfor %}
{% endfor %}
Further reading
===============
- For an introduction to Primus, see `Primus: A Lightweight, Unified Training
Framework for Large Models on AMD GPUs <https://rocm.blogs.amd.com/software-tools-optimization/primus/README.html>`__.
- To learn more about MAD and the ``madengine`` CLI, see the `MAD usage guide <https://github.com/ROCm/MAD?tab=readme-ov-file#usage-guide>`__.
- To learn more about system settings and management practices to configure your system for
AMD Instinct MI300X series GPUs, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_.
- For a list of other ready-made Docker images for AI with ROCm, see
`AMD Infinity Hub <https://www.amd.com/en/developer/resources/infinity-hub.html#f-amd_hub_category=AI%20%26%20ML%20Models>`_.
Previous versions
=================
See :doc:`pytorch-training-history` to find documentation for previous releases
of the ``ROCm/pytorch-training`` Docker image.

View File

@@ -1,588 +0,0 @@
:orphan:
.. meta::
:description: How to train a model using PyTorch for ROCm.
:keywords: ROCm, AI, LLM, train, PyTorch, torch, Llama, flux, tutorial, docker
**************************************
Training a model with PyTorch on ROCm
**************************************
.. caution::
This documentation does not reflect the latest version of ROCm PyTorch training
performance benchmark documentation. See :doc:`../pytorch-training` for the latest version.
PyTorch is an open-source machine learning framework that is widely used for
model training with GPU-optimized components for transformer-based models.
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/pytorch-training-v25.8-benchmark-models.yaml
{% set dockers = data.dockers %}
{% set docker = dockers[0] %}
The `PyTorch for ROCm training Docker <{{ docker.docker_hub_url }}>`__
(``{{ docker.pull_tag }}``) image provides a prebuilt optimized environment for fine-tuning and pretraining a
model on AMD Instinct MI325X and MI300X GPUs. It includes the following software components to accelerate
training workloads:
.. list-table::
:header-rows: 1
* - Software component
- Version
{% for component_name, component_version in docker.components.items() %}
* - {{ component_name }}
- {{ component_version }}
{% endfor %}
.. _amd-pytorch-training-model-support:
Supported models
================
The following models are pre-optimized for performance on the AMD Instinct MI325X and MI300X GPUs.
Some instructions, commands, and training recommendations in this documentation might
vary by model -- select one to get started.
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/pytorch-training-v25.8-benchmark-models.yaml
{% set unified_docker = data.dockers[0] %}
{% set model_groups = data.model_groups %}
.. raw:: html
<div id="vllm-benchmark-ud-params-picker" class="container-fluid">
<div class="row gx-0">
<div class="col-2 me-1 px-2 model-param-head">Model</div>
<div class="row col-10 pe-0">
{% for model_group in model_groups %}
<div class="col-4 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
{% endfor %}
</div>
</div>
<div class="row gx-0 pt-1">
<div class="col-2 me-1 px-2 model-param-head">Variant</div>
<div class="row col-10 pe-0">
{% for model_group in model_groups %}
{% set models = model_group.models %}
{% for model in models %}
{% if models|length % 3 == 0 %}
<div class="col-4 px-2 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
{% else %}
<div class="col-6 px-2 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
{% endif %}
{% endfor %}
{% endfor %}
</div>
</div>
</div>
.. _amd-pytorch-training-supported-training-modes:
The following table lists supported training modes per model.
.. dropdown:: Supported training modes
.. list-table::
:header-rows: 1
* - Model
- Supported training modes
{% for model_group in model_groups %}
{% set models = model_group.models %}
{% for model in models %}
{% if model.training_modes %}
* - {{ model.model }}
- ``{{ model.training_modes | join('``, ``') }}``
{% endif %}
{% endfor %}
{% endfor %}
.. note::
Some model and fine-tuning combinations are not listed. This is
because the `upstream torchtune repository <https://github.com/pytorch/torchtune>`__
doesn't provide default YAML configurations for them.
For advanced usage, you can create a custom configuration to enable
unlisted fine-tuning methods by using an existing file in the
``/workspace/torchtune/recipes/configs`` directory as a template.
.. _amd-pytorch-training-performance-measurements:
Performance measurements
========================
To evaluate performance, the
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html#tabs-a8deaeb413-item-21cea50186-tab>`_
page provides reference throughput and latency measurements for training
popular AI models.
.. note::
The performance data presented in
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html#tabs-a8deaeb413-item-21cea50186-tab>`_
should not be interpreted as the peak performance achievable by AMD
Instinct MI325X and MI300X GPUs or ROCm software.
System validation
=================
Before running AI workloads, it's important to validate that your AMD hardware is configured
correctly and performing optimally.
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
before starting training.
To test for optimal performance, consult the recommended :ref:`System health benchmarks
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
system's configuration.
This Docker image is optimized for specific model configurations outlined
below. Performance can vary for other training workloads, as AMD
doesnt test configurations and run conditions outside those described.
Run training
============
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/pytorch-training-v25.8-benchmark-models.yaml
{% set unified_docker = data.dockers[0] %}
{% set model_groups = data.model_groups %}
Once the setup is complete, choose between two options to start benchmarking training:
.. tab-set::
.. tab-item:: MAD-integrated benchmarking
{% for model_group in model_groups %}
{% for model in model_group.models %}
.. container:: model-doc {{ model.mad_tag }}
The following run command is tailored to {{ model.model }}.
See :ref:`amd-pytorch-training-model-support` to switch to another available model.
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
directory and install the required packages on the host machine.
.. code-block:: shell
git clone https://github.com/ROCm/MAD
cd MAD
pip install -r requirements.txt
2. For example, use this command to run the performance benchmark test on the {{ model.model }} model
using one node with the {{ model.precision }} data type on the host machine.
.. code-block:: shell
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
madengine run \
--tags {{ model.mad_tag }} \
--keep-model-dir \
--live-output \
--timeout 28800
MAD launches a Docker container with the name
``container_ci-{{ model.mad_tag }}``. The latency and throughput reports of the
model are collected in ``~/MAD/perf.csv``.
{% endfor %}
{% endfor %}
.. tab-item:: Standalone benchmarking
{% for model_group in model_groups %}
{% for model in model_group.models %}
.. container:: model-doc {{ model.mad_tag }}
The following commands are tailored to {{ model.model }}.
See :ref:`amd-pytorch-training-model-support` to switch to another available model.
{% endfor %}
{% endfor %}
.. rubric:: Download the Docker image and required packages
1. Use the following command to pull the Docker image from Docker Hub.
.. code-block:: shell
docker pull {{ unified_docker.pull_tag }}
2. Run the Docker container.
.. code-block:: shell
docker run -it \
--device /dev/dri \
--device /dev/kfd \
--network host \
--ipc host \
--group-add video \
--cap-add SYS_PTRACE \
--security-opt seccomp=unconfined \
--privileged \
-v $HOME:$HOME \
-v $HOME/.ssh:/root/.ssh \
--shm-size 64G \
--name training_env \
{{ unified_docker.pull_tag }}
Use these commands if you exit the ``training_env`` container and need to return to it.
.. code-block:: shell
docker start training_env
docker exec -it training_env bash
3. In the Docker container, clone the `<https://github.com/ROCm/MAD>`__
repository and navigate to the benchmark scripts directory
``/workspace/MAD/scripts/pytorch_train``.
.. code-block:: shell
git clone https://github.com/ROCm/MAD
cd MAD/scripts/pytorch_train
.. rubric:: Prepare training datasets and dependencies
1. The following benchmarking examples require downloading models and datasets
from Hugging Face. To ensure successful access to gated repos, set your
``HF_TOKEN``.
.. code-block:: shell
export HF_TOKEN=$your_personal_hugging_face_access_token
2. Run the setup script to install libraries and datasets needed for benchmarking.
.. code-block:: shell
./pytorch_benchmark_setup.sh
.. container:: model-doc pyt_train_llama-3.1-8b
``pytorch_benchmark_setup.sh`` installs the following libraries for Llama 3.1 8B:
.. list-table::
:header-rows: 1
* - Library
- Reference
* - ``accelerate``
- `Hugging Face Accelerate <https://huggingface.co/docs/accelerate/en/index>`_
* - ``datasets``
- `Hugging Face Datasets <https://huggingface.co/docs/datasets/v3.2.0/en/index>`_ 3.2.0
.. container:: model-doc pyt_train_llama-3.1-70b
``pytorch_benchmark_setup.sh`` installs the following libraries for Llama 3.1 70B:
.. list-table::
:header-rows: 1
* - Library
- Reference
* - ``datasets``
- `Hugging Face Datasets <https://huggingface.co/docs/datasets/v3.2.0/en/index>`_ 3.2.0
* - ``torchdata``
- `TorchData <https://meta-pytorch.org/data/beta/index.html#torchdata>`__
* - ``tomli``
- `Tomli <https://pypi.org/project/tomli/>`__
* - ``tiktoken``
- `tiktoken <https://github.com/openai/tiktoken>`__
* - ``blobfile``
- `blobfile <https://pypi.org/project/blobfile/>`__
* - ``tabulate``
- `tabulate <https://pypi.org/project/tabulate/>`__
* - ``wandb``
- `Weights & Biases <https://github.com/wandb/wandb>`__
* - ``sentencepiece``
- `SentencePiece <https://github.com/google/sentencepiece>`__ 0.2.0
* - ``tensorboard``
- `TensorBoard <https://www.tensorflow.org/tensorboard>`__ 2.18.0
.. container:: model-doc pyt_train_flux
``pytorch_benchmark_setup.sh`` installs the following libraries for FLUX:
.. list-table::
:header-rows: 1
* - Library
- Reference
* - ``accelerate``
- `Hugging Face Accelerate <https://huggingface.co/docs/accelerate/en/index>`_
* - ``datasets``
- `Hugging Face Datasets <https://huggingface.co/docs/datasets/v3.2.0/en/index>`__ 3.2.0
* - ``sentencepiece``
- `SentencePiece <https://github.com/google/sentencepiece>`__ 0.2.0
* - ``tensorboard``
- `TensorBoard <https://www.tensorflow.org/tensorboard>`__ 2.18.0
* - ``csvkit``
- `csvkit <https://csvkit.readthedocs.io/en/latest/>`__ 2.0.1
* - ``deepspeed``
- `DeepSpeed <https://github.com/deepspeedai/DeepSpeed>`__ 0.16.2
* - ``diffusers``
- `Hugging Face Diffusers <https://huggingface.co/docs/diffusers/en/index>`__ 0.31.0
* - ``GitPython``
- `GitPython <https://github.com/gitpython-developers/GitPython>`__ 3.1.44
* - ``opencv-python-headless``
- `opencv-python-headless <https://pypi.org/project/opencv-python-headless/>`__ 4.10.0.84
* - ``peft``
- `PEFT <https://huggingface.co/docs/peft/en/index>`__ 0.14.0
* - ``protobuf``
- `Protocol Buffers <https://github.com/protocolbuffers/protobuf>`__ 5.29.2
* - ``pytest``
- `PyTest <https://docs.pytest.org/en/stable/>`__ 8.3.4
* - ``python-dotenv``
- `python-dotenv <https://pypi.org/project/python-dotenv/>`__ 1.0.1
* - ``seaborn``
- `Seaborn <https://seaborn.pydata.org/>`__ 0.13.2
* - ``transformers``
- `Transformers <https://huggingface.co/docs/transformers/en/index>`__ 4.47.0
``pytorch_benchmark_setup.sh`` downloads the following datasets from Hugging Face:
* `bghira/pseudo-camera-10k <https://huggingface.co/datasets/bghira/pseudo-camera-10k>`__
{% for model_group in model_groups %}
{% for model in model_group.models %}
{% set training_modes = model.training_modes %}
{% set training_mode_descs = {
"pretrain": "Benchmark pre-training.",
"HF_pretrain": "Llama 3.1 8B pre-training with FP8 precision."
} %}
{% set available_modes = training_modes | select("in", ["pretrain", "HF_pretrain"]) | list %}
{% if available_modes %}
.. container:: model-doc {{ model.mad_tag }}
.. rubric:: Pre-training
To start the pre-training benchmark, use the following command with the
appropriate options. See the following list of options and their descriptions.
.. code-block:: shell
./pytorch_benchmark_report.sh -t {% if available_modes | length == 1 %}{{ available_modes[0] }}{% else %}$training_mode{% endif %} \
-m {{ model.model_repo }} \
-p $datatype \
-s $sequence_length
{% if model.mad_tag == "pyt_train_flux" %}
.. container:: model-doc {{ model.mad_tag }}
.. note::
Currently, FLUX models are not supported out-of-the-box on {{ unified_docker.pull_tag }}.
To use FLUX, refer to ``rocm/pytorch-training`` Docker: :doc:`pytorch-training-v25.6`
Occasionally, downloading the Flux dataset might fail. In the event of this
error, manually download it from Hugging Face at
`black-forest-labs/FLUX.1-dev <https://huggingface.co/black-forest-labs/FLUX.1-dev>`_
and save it to `/workspace/FluxBenchmark`. This ensures that the test script can access
the required dataset.
{% endif %}
.. list-table::
:header-rows: 1
* - Name
- Options
- Description
{% for mode in available_modes %}
* - {% if loop.first %}``$training_mode``{% endif %}
- ``{{ mode }}``
- {{ training_mode_descs[mode] }}
{% endfor %}
* - ``$datatype``
- ``BF16``{% if model.mad_tag == "pyt_train_llama-3.1-8b" %} or ``FP8``{% endif %}
- Only Llama 3.1 8B supports FP8 precision.
* - ``$sequence_length``
- Sequence length for the language model.
- Between 2048 and 8192. 8192 by default.
{% endif %}
{% set training_mode_descs = {
"finetune_fw": "Full weight fine-tuning (BF16 and FP8 supported).",
"finetune_lora": "LoRA fine-tuning (BF16 supported).",
"finetune_qlora": "QLoRA fine-tuning (BF16 supported).",
"HF_finetune_lora": "LoRA fine-tuning with Hugging Face PEFT.",
} %}
{% set available_modes = training_modes | select("in", ["finetune_fw", "finetune_lora", "finetune_qlora", "HF_finetune_lora"]) | list %}
{% if available_modes %}
.. container:: model-doc {{ model.mad_tag }}
.. rubric:: Fine-tuning
To start the fine-tuning benchmark, use the following command with the
appropriate options. See the following list of options and their descriptions.
See :ref:`supported training modes <amd-pytorch-training-supported-training-modes>`.
.. code-block:: shell
./pytorch_benchmark_report.sh -t $training_mode \
-m {{ model.model_repo }} \
-p $datatype \
-s $sequence_length
.. list-table::
:header-rows: 1
* - Name
- Options
- Description
{% for mode in available_modes %}
* - {% if loop.first %}``$training_mode``{% endif %}
- ``{{ mode }}``
- {{ training_mode_descs[mode] }}
{% endfor %}
* - ``$datatype``
- ``BF16``{% if "finetune_fw" in available_modes %} or ``FP8``{% endif %}
- All models support BF16.{% if "finetune_fw" in available_modes %} FP8 is only available for full weight fine-tuning.{% endif %}
* - ``$sequence_length``
- Between 2048 and 16384.
- Sequence length for the language model.
{% if model.mad_tag in ["pyt_train_llama3.2-vision-11b", "pyt_train_llama-3.2-vision-90b"] %}
.. note::
For LoRA and QLoRA support with vision models (Llama 3.2 11B and 90B),
use the following torchtune commit for compatibility:
.. code-block:: shell
git checkout 48192e23188b1fc524dd6d127725ceb2348e7f0e
{% elif model.mad_tag in ["pyt_train_llama-2-7b", "pyt_train_llama-2-13b", "pyt_train_llama-2-70b"] %}
.. note::
You might encounter the following error with Llama 2: ``ValueError: seq_len (16384) of
input tensor should be smaller than max_seq_len (4096)``.
This error indicates that an input sequence is longer than the model's maximum context window.
Ensure your tokenized input does not exceed the model's ``max_seq_len`` (4096
tokens in this case). You can resolve this by truncating the input or splitting
it into smaller chunks before passing it to the model.
Note on reproducibility: The results in this guide are based on
commit ``b4c98ac`` from the upstream
`<https://github.com/pytorch/torchtune>`__ repository. For the
latest updates, you can use the main branch.
{% endif %}
{% endif %}
{% endfor %}
{% endfor %}
.. rubric:: Benchmarking examples
For examples of benchmarking commands, see `<https://github.com/ROCm/MAD/tree/develop/benchmark/pytorch_train#benchmarking-examples>`__.
.. _amd-pytorch-training-multinode-examples:
Multi-node training
-------------------
Refer to :doc:`/how-to/rocm-for-ai/system-setup/multi-node-setup` to configure your environment for multi-node
training. See :ref:`rocm-for-ai-multi-node-setup-pyt-train-example` for example Slurm run commands.
Pre-training
~~~~~~~~~~~~
Multi-node training with torchtitan is supported. The provided SLURM script is pre-configured for Llama 3 70B.
To launch the training job on a SLURM cluster for Llama 3 70B, run the following commands from the MAD repository.
.. code-block:: shell
# In the MAD repository
cd scripts/pytorch_train
sbatch run_slurm_train.sh
Fine-tuning
~~~~~~~~~~~
Multi-node training with torchtune is supported. The provided SLURM script is pre-configured for Llama 3.3 70B.
To launch the training job on a SLURM cluster for Llama 3.3 70B, run the following commands from the MAD repository.
.. code-block:: shell
huggingface-cli login # Get access to HF Llama model space
huggingface-cli download meta-llama/Llama-3.3-70B-Instruct --local-dir ./models/Llama-3.3-70B-Instruct # Download the Llama 3.3 model locally
# In the MAD repository
cd scripts/pytorch_train
sbatch Torchtune_Multinode.sh
.. note::
Information regarding benchmark setup:
* By default, Llama 3.3 70B is fine-tuned using ``alpaca_dataset``.
* You can adjust the torchtune `YAML configuration file
<https://github.com/pytorch/torchtune/blob/main/recipes/configs/llama3_3/70B_full_multinode.yaml>`__
if you're using a different model.
* The number of nodes and other parameters can be tuned in the SLURM script ``Torchtune_Multinode.sh``.
* Set the ``mounting_paths`` inside the SLURM script.
Once the run is finished, you can find the log files in the ``result_torchtune/`` directory.
Further reading
===============
- To learn more about MAD and the ``madengine`` CLI, see the `MAD usage guide <https://github.com/ROCm/MAD?tab=readme-ov-file#usage-guide>`__.
- To learn more about system settings and management practices to configure your system for
AMD Instinct MI300X series GPUs, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_.
- For a list of other ready-made Docker images for AI with ROCm, see
`AMD Infinity Hub <https://www.amd.com/en/developer/resources/infinity-hub.html#f-amd_hub_category=AI%20%26%20ML%20Models>`_.
Previous versions
=================
See :doc:`pytorch-training-history` to find documentation for previous releases
of the ``ROCm/pytorch-training`` Docker image.

View File

@@ -2,16 +2,16 @@
:description: Learn what ROCm is AMD's open software stack for GPU programming, including runtimes, compilers, libraries, and tools for Linux and Windows.
:keywords: ROCm, AMD, GPU computing, ROCm Core SDK, ROCm components, TheRock, ROCm architecture, HPC, AI, machine learning, ROCm runtime
**********************
AMD ROCm 7.9.0 preview
**********************
*******************************
AMD ROCm |ROCM_VERSION| preview
*******************************
AMD ROCm is an open, modular, and highperformance GPU software ecosystem
built collaboratively with the community, maintained transparently, and
optimized for consistent, scalable performance across data centers, desktops,
and edge devices.
AMD ROCm is an open, modular, and highperformance GPU software ecosystem
built collaboratively with the community, maintained transparently, and
optimized for consistent, scalable performance across data centers,
workstations, and edge devices.
ROCm 7.9.0 is a technology preview release built with
ROCm |ROCM_VERSION| is a technology preview release built with
`TheRock <https://github.com/ROCm/TheRock>`__, AMDs new open build and release
system.
This preview introduces a new modular build workflow that will become standard
@@ -34,11 +34,12 @@ frameworks such as PyTorch.
* **Open source** -- Transparent development driven by community feedback
* **Crossplatform** -- Supports Linux and Windows environments
* **Comprehensive** -- Endtoend toolchain from compilers to libraries
* **Performancefocused** -- Tuned for AMD Instinct™, AMD Radeon™, and AMD Ryzen™ architectures
* **Performancefocused** -- Tuned for AMD Instinct™, AMD Radeon™, and AMD Ryzen™ devices
.. image:: data/rocm-ontology.png
:align: center
:alt: ROCm software ecosystem and components illustration
:width: 1000
ROCm supports AMD GPU architectures spanning data center, workstation, and APU
categories. TheRock enables a unified ROCm userspace experience across
@@ -59,8 +60,8 @@ Whats changing
ROCm is evolving to improve flexibility, maintainability, and
usecase alignment.
* **Leaner core** The Core SDK focuses on essential runtime and development components.
* **Use casespecific expansions** Optional domainspecific SDKs for AI, data science, and HPC.
* **Leaner core** The Core SDK focuses on essential runtime and development components.
* **Use casespecific expansions** Optional domainspecific SDKs for AI, data science, and HPC.
* **Modular installation** Install only the components required for your workflow.
This approach streamlines installation, reduces footprint, and accelerates
@@ -76,6 +77,7 @@ computing.
.. image:: data/rocm-sdk-arch.png
:align: center
:alt: ROCm Core SDK internal architecture illustration
:width: 1000
The TheRock infrastructure keeps these components modular, consistent, and easy
to integrate across configurations.

View File

@@ -0,0 +1,118 @@
.. meta::
:description: Learn how to build the ROCm Core SDK from source using TheRock. Includes references to environment setup guides for Ubuntu 24.04 and Windows 11, plus links to official instructions and compatibility guidance.
:keywords: AMD ROCm Core SDK, build ROCm from source, TheRock, ROCm SDK, Ubuntu 24.04, Windows 11, ROCm environment setup, AMD GPU compute, ROCm compatibility, ROCm build guide
***********************************
Build the ROCm Core SDK from source
***********************************
You can build the ROCm Core SDK from source using the open-source unified build
system `TheRock <https://github.com/ROCm/TheRock>`__. To learn more about the
motivation and architecture behind this system, see `ROCm Technology Preview:
ROCm Core SDK and TheRock Build System
<https://rocm.blogs.amd.com/software-tools-optimization/therock/README.html>`__.
This page consists mainly of key references to `TheRock's README
<https://github.com/ROCm/TheRock?tab=readme-ov-file#building-from-source>`__
and `supporting development manuals
<https://github.com/ROCm/TheRock/blob/main/README.md#development-manuals>`__
which provide up-to-date build instructions and guidance for supported
platforms. See `TheRock Development Guide
<https://github.com/ROCm/TheRock/blob/main/docs/development/development_guide.md#therock-development-guide>`__
to learn about the overall build architecture.
.. tip::
Building from source is recommended only if you need custom builds or are
contributing to ROCm development.
For most users, installing from official AMD releases is faster and easier.
See :doc:`/install/rocm` for installation instructions.
Prerequisites
=============
A successful build depends on a correctly configured environment. Before you
begin, ensure your system meets all hardware and software requirements. Review the
following resources:
* `Environment setup guide
<https://github.com/ROCm/TheRock/blob/main/docs/environment_setup_guide.md>`__
* :doc:`/compatibility/compatibility-matrix`
High-level build process
========================
For an overview of the build architecture, start with TheRock's `development guide
<https://github.com/ROCm/TheRock/blob/main/docs/development/development_guide.md>`__.
While specific commands vary by platform, the general workflow for building
from source involves these stages:
1. Clone the repository and install build dependencies. See :ref:`build-from-src-plat-setup`.
2. Configure the build: Use CMake feature flags to configure the build. This
step allows you to target specific platforms, build subsets of ROCm Core SDK
components, and toggle component features. See `Build configuration
<https://github.com/ROCm/TheRock/blob/main/README.md#build-configuration>`__
for optional and required build flags.
3. Execute the build: Run the build command to compile the source code. This
can be a time- and resource-intensive process. See `CMake build usage
<https://github.com/ROCm/TheRock/blob/main/README.md#cmake-build-usage>`__.
4. After a successful build, the outputs are available for use in downstream
tasks. To learn more about build outputs, see the relevant `TheRock
documentation <https://github.com/ROCm/TheRock/blob/main/docs/development/artifacts.md>`__.
Common post-build tasks include:
* Using ``build/dist/rocm``: When the build completes, you should have a
build of ROCm in the ``build/dist/rocm/`` directory. See `Using installed
tarballs <https://github.com/ROCm/TheRock/blob/main/RELEASES.md#using-installed-tarballs>`__
for more information.
* Building Python packages: Prepare the build artifacts for distribution as
Python packages. See `Building Python packages
<https://github.com/ROCm/TheRock/blob/main/docs/packaging/python_packaging.md#building-packages>`__.
* Building PyTorch: Build a compatible PyTorch version against ROCm wheels.
See the `PyTorch build instructions
<https://github.com/ROCm/TheRock/tree/main/external-builds/pytorch#build-instructions>`__.
.. _build-from-src-plat-setup:
Platform-specific setup
=======================
ManyLinux
---------
On Linux, it's recommended to build with ManyLinux to produce binaries that are
portable across Ubuntu and other Linux distributions. To learn more about what
a ROCm ManyLinux build entails, see `ManyLinux builds
<https://github.com/ROCm/TheRock/blob/main/docs/design/manylinux_builds.md>`__.
Refer to `ManyLinux x86_84
<https://github.com/ROCm/TheRock/blob/main/docs/environment_setup_guide.md#manylinux-x84-64>`__
in the environment setup guide.
Ubuntu 24.04
------------
TheRock provides detailed instructions and scripts for preparing an Ubuntu
24.04 system, including installing necessary packages via apt.
Refer to `Setup — Ubuntu 24.04 <https://github.com/ROCm/TheRock?tab=readme-ov-file#setup---ubuntu-2404>`__
in the TheRock repository for guidance.
Windows 11
----------
For setup instructions on Windows 11 using Visual Studio 2022, see
`Setup — Windows 11
<https://github.com/ROCm/TheRock?tab=readme-ov-file#setup---windows-11-vs-2022>`__
in the TheRock repository.
.. seealso::
For details on supported configurations, known issues, and other
Windows-specific considerations, review the `Windows support
<https://github.com/ROCm/TheRock/blob/main/docs/development/windows_support.md>`__
documentation.

File diff suppressed because it is too large Load Diff

View File

@@ -1,21 +0,0 @@
1. Register your Enterprise Linux.
```bash
subscription-manager register --username <username> --password <password>
```
2. Update your Enterprise Linux.
```bash
sudo dnf update --releasever=10.0 --exclude=\*release\*
```
3. Configure permissions for GPU access.
```bash
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
```
```{note}
To apply all settings, reboot your system.
```

View File

@@ -1,27 +0,0 @@
1. Register your Enterprise Linux.
```bash
subscription-manager register --username <username> --password <password>
```
2. Update your Enterprise Linux.
```bash
sudo dnf update --releasever=10.0 --exclude=\*release\*
```
3. Install Python 3.12 or 3.13.
```bash
sudo dnf install python3.12 python3.12-pip
```
4. Configure permissions for GPU access.
```bash
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
```
```{note}
To apply all settings, reboot your system.
```

View File

@@ -1,22 +0,0 @@
1. Register your Enterprise Linux.
```bash
subscription-manager register --username <username> --password <password>
sudo subscription-manager attach --auto
```
2. Update your Enterprise Linux.
```bash
sudo dnf update --releasever=9.6 --exclude=\*release\*
```
3. Configure permissions for GPU access.
```bash
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
```
```{note}
To apply all settings, reboot your system.
```

View File

@@ -1,28 +0,0 @@
1. Register your Enterprise Linux.
```bash
subscription-manager register --username <username> --password <password>
sudo subscription-manager attach --auto
```
2. Update your Enterprise Linux.
```bash
sudo dnf update --releasever=9.6 --exclude=\*release\*
```
3. Install Python 3.11, 3.12 or 3.13.
```bash
sudo dnf install python3.11 python3.11-pip
```
4. Configure permissions for GPU access.
```bash
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
```
```{note}
To apply all settings, reboot your system.
```

View File

@@ -1,15 +0,0 @@
1. Install Python 3.11.
```bash
sudo apt install python3.11 python3.11-venv
```
2. Configure permissions for GPU access.
```bash
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
```
```{note}
To apply all settings, reboot your system.
```

View File

@@ -1,15 +0,0 @@
1. Install Python 3.12 or 3.13.
```bash
sudo apt install python3.12 python3.12-venv
```
2. Configure permissions for GPU access.
```bash
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
```
```{note}
To apply all settings, reboot your system.
```

View File

@@ -1,9 +0,0 @@
Configure permissions for GPU access.
```bash
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
```
```{note}
To apply all settings, reboot your system.
```

View File

@@ -1,6 +0,0 @@
1. Remove any existing HIP SDK installations and other
conflicting AMD graphics software.
2. Install the [Adrenalin Driver version
25.9.2](https://www.amd.com/en/resources/support-articles/release-notes/RN-RAD-WIN-25-9-2.html).
For instructions, see [Install AMD Software: Adrenalin Edition](https://www.amd.com/en/resources/support-articles/faqs/RSX2-INSTALL.html).

View File

@@ -1,8 +0,0 @@
1. Remove any existing HIP SDK installations and other
conflicting AMD graphics software.
2. Install the [Adrenalin Driver version
25.9.2](https://www.amd.com/en/resources/support-articles/release-notes/RN-RAD-WIN-25-9-2.html).
For instructions, see [Install AMD Software: Adrenalin Edition](https://www.amd.com/en/resources/support-articles/faqs/RSX2-INSTALL.html).
3. Install a supported Python version: 3.11, 3.12, or 3.13.

View File

@@ -1,6 +0,0 @@
For information about driver compatibility, see the {doc}`compatibility-matrix`.
For information about the AMD GPU driver installation, see the
[RHEL native
installation](https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/install/detailed-install/package-manager/package-manager-rhel.html)
in the AMD Instinct Data Center GPU Documentation.

View File

@@ -1,6 +0,0 @@
For information about driver compatibility, see the {doc}`compatibility-matrix`.
For information about the AMD GPU driver installation, see the
[Ubuntu native
installation](https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/install/detailed-install/package-manager/package-manager-ubuntu.html)
in the AMD Instinct Data Center GPU Documentation.

View File

@@ -1,20 +0,0 @@
1. Create the installation directory. For example:
```bash
mkdir therock-tarball && cd therock-tarball
```
```{note}
Subsequent commands assume you're working with the
`therock-tarball` directory.
If you choose a different directory name, adjust the
subsequent commands accordingly.
```
2. Download and unpack the tarball.
```bash
wget https://repo.amd.com/rocm/tarball/therock-dist-linux-gfx1151-7.9.0rc1.tar.gz
mkdir install
tar -xf *.tar.gz -C install
```

View File

@@ -1,20 +0,0 @@
1. Create the installation directory. For example:
```bash
mkdir therock-tarball && cd therock-tarball
```
```{note}
Subsequent commands assume you're working with the
`therock-tarball` directory.
If you choose a different directory name, adjust the
subsequent commands accordingly.
```
2. Download and unpack the tarball.
```bash
wget https://repo.amd.com/rocm/tarball/therock-dist-linux-gfx94X-dcgpu-7.9.0rc1.tar.gz
mkdir install
tar -xf *.tar.gz -C install
```

View File

@@ -1,20 +0,0 @@
1. Create the installation directory. For example:
```bash
mkdir therock-tarball && cd therock-tarball
```
```{note}
Subsequent commands assume you're working with the
`therock-tarball` directory.
If you choose a different directory name, adjust the
subsequent commands accordingly.
```
2. Download and unpack the tarball.
```bash
wget https://repo.amd.com/rocm/tarball/therock-dist-linux-gfx950-dcgpu-7.9.0rc1.tar.gz
mkdir install
tar -xf *.tar.gz -C install
```

View File

@@ -1,12 +0,0 @@
1. Set up your Python virtual environment.
```bash
python3.12 -m venv .venv
source .venv/bin/activate
```
2. Install ROCm wheels packages.
```bash
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1151/ "rocm[libraries,devel]"
```

View File

@@ -1,12 +0,0 @@
1. Set up your Python virtual environment.
```bash
python3.11 -m venv .venv
source .venv/bin/activate
```
2. Install ROCm wheels packages.
```bash
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx94X-dcgpu/ "rocm[libraries,devel]"
```

View File

@@ -1,12 +0,0 @@
1. Set up your Python virtual environment.
```bash
python3.12 -m venv .venv
source .venv/bin/activate
```
2. Install ROCm wheels packages.
```bash
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx94X-dcgpu/ "rocm[libraries,devel]"
```

View File

@@ -1,12 +0,0 @@
1. Set up your Python virtual environment.
```bash
python3.11 -m venv .venv
source .venv/bin/activate
```
2. Install ROCm wheels packages.
```bash
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx950-dcgpu/ "rocm[libraries,devel]"
```

View File

@@ -1,12 +0,0 @@
1. Set up your Python virtual environment.
```bash
python3.12 -m venv .venv
source .venv/bin/activate
```
2. Install ROCm wheels packages.
```bash
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx950-dcgpu/ "rocm[libraries,devel]"
```

View File

@@ -1,47 +0,0 @@
```{important}
- Do not copy/replace the ROCm-SDK compiler and runtime DLLs to
`System32` as this can cause conflicts.
- Disable the following Windows security features as they
can interfere with ROCm functionality:
- Turn off WDAG (Windows Defender Application Guard)
- Control Panel > Programs > Programs and Features > Turn Windows features on or off > **Deselect** "Microsoft Defender Application Guard"
- Turn off SAC (Smart App Control)
- Settings > Privacy & security > Windows Security > App & browser control > Smart App Control settings > **Off**
```
1. Create the installation directory in `C:\TheRock\build`.
```{note}
Subsequent commands assume you're working with the
`C:\TheRock\build` directory.
```
2. Download the tarball and extract the contents to
`C:\TheRock\build`.
- Download link: [https://repo.amd.com/rocm/tarball/therock-dist-windows-gfx1151-7.9.0rc1.tar.gz](https://repo.amd.com/rocm/tarball/therock-dist-windows-gfx1151-7.9.0rc1.tar.gz)
3. Set the following environment variables using the command
prompt as an administrator:
```bat
setx HIP_DEVICE_LIB_PATH “C:\TheRock\build\lib\llvm\amdgcn\bitcode” /M
setx HIP_PATH “C:\TheRock\build” /M
setx HIP_PLATFORM “amd” /M
setx LLVM_PATH “C:\TheRock\build\lib\llvm” /M
```
4. Add the following paths into PATH environment variable using your system settings GUI.
- `C:\TheRock\build\bin`
- `C:\TheRock\build\lib\llvm\bin`
5. Open a new command prompt window for the environment variables to take effect. Run `set`
to see the list of active variables.
```bat
set
````

View File

@@ -1,12 +0,0 @@
1. Set up your Python virtual environment.
```bash
python3.12 -m venv .venv
.venv\Scripts\activate
```
2. Install ROCm wheels packages.
```bash
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1151/ "rocm[libraries,devel]"
```

View File

@@ -1,53 +0,0 @@
1. Configure ROCm PATH. Make sure you're in the `therock-tarball` directory before proceeding.
```bash
export ROCM_PATH=$PWD
export PATH=$PATH:$ROCM_PATH/install/bin
```
2. Configure `LD_LIBRARY_PATH`.
```bash
export LD_LIBRARY_PATH=$ROCM_PATH/install/lib
```
3. Verify the ROCm installation.
```bash
rocminfo
amd-smi
```
```{eval-rst}
.. dropdown:: Example output of ``rocminfo``
.. code-block:: shell-session
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.18
Runtime Ext Version: 1.14
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
XNACK enabled: NO
DMAbuf Support: YES
VMM Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD RYZEN AI MAX+ PRO 395 w/ Radeon 8060S
Uuid: CPU-XX
Marketing Name: AMD RYZEN AI MAX+ PRO 395 w/ Radeon 8060S
Vendor Name: CPU
[output truncated]
```

View File

@@ -1,72 +0,0 @@
1. Verify the ROCm installation.
```bash
rocminfo
amd-smi
```
```{eval-rst}
.. dropdown:: Example output of ``rocminfo``
.. code-block:: shell-session
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.18
Runtime Ext Version: 1.14
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
XNACK enabled: NO
DMAbuf Support: YES
VMM Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD RYZEN AI MAX+ PRO 395 w/ Radeon 8060S
Uuid: CPU-XX
Marketing Name: AMD RYZEN AI MAX+ PRO 395 w/ Radeon 8060S
Vendor Name: CPU
[output truncated]
```
2. Inspect your ROCm installation in your Python environment.
```bash
pip freeze | grep rocm
which rocm-sdk
ls .venv/bin
```
3. Test your ROCm installation.
```bash
rocm-sdk targets
rocm-sdk path --cmake
rocm-sdk path --bin
rocm-sdk path --root
rocm-sdk test
```
To learn more about the `rocm-sdk` tool and to see example expected outputs,
see [Using ROCm Python packages
(TheRock)](https://github.com/ROCm/TheRock/blob/main/RELEASES.md#using-rocm-python-packages).
````{tip}
If you need to deactivate your Python virtual environment when finished,
run:
```shell
deactivate
```
````

View File

@@ -1,72 +0,0 @@
1. Verify the ROCm installation.
```bash
rocminfo
amd-smi
```
```{eval-rst}
.. dropdown:: Example output of ``rocminfo``
.. code-block:: shell-session
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.18
Runtime Ext Version: 1.14
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
XNACK enabled: NO
DMAbuf Support: YES
VMM Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD RYZEN AI MAX+ PRO 395 w/ Radeon 8060S
Uuid: CPU-XX
Marketing Name: AMD RYZEN AI MAX+ PRO 395 w/ Radeon 8060S
Vendor Name: CPU
[output truncated]
```
2. Inspect your ROCm installation in your Python environment.
```bash
pip freeze | grep rocm
which rocm-sdk
ls .venv/bin
```
3. Test your ROCm installation.
```bash
rocm-sdk targets
rocm-sdk path --cmake
rocm-sdk path --bin
rocm-sdk path --root
rocm-sdk test
```
To learn more about the `rocm-sdk` tool and to see example expected outputs,
see [Using ROCm Python packages
(TheRock)](https://github.com/ROCm/TheRock/blob/main/RELEASES.md#using-rocm-python-packages).
````{tip}
If you need to deactivate your Python virtual environment when finished,
run:
```shell
deactivate
```
````

View File

@@ -1,21 +0,0 @@
Verify the ROCm installation.
```shell
hipinfo
```
```{eval-rst}
.. dropdown:: Example output of ``hipinfo``
.. code-block:: shell-session
--------------------------------------------------------------------------------
device# 0
Name: AMD Radeon(TM) 8060S Graphics
pciBusID: 197
pciDeviceID: 0
pciDomainID: 0
multiProcessorCount: 20
[output truncated]
```

View File

@@ -1,48 +0,0 @@
1. Verify the ROCm installation.
```bash
hipinfo
```
```{eval-rst}
.. dropdown:: Example output of ``hipinfo``
.. code-block:: shell-session
--------------------------------------------------------------------------------
device# 0
Name: AMD Radeon(TM) 8060S Graphics
pciBusID: 197
pciDeviceID: 0
pciDomainID: 0
multiProcessorCount: 20
[output truncated]
```
2. Inspect your ROCm installation in your Python environment.
```bash
pip freeze
where rocm-sdk
dir .venv\Scripts
```
3. Test your ROCm installation.
```bash
rocm-sdk test
```
To learn more about the `rocm-sdk` tool and to see example expected outputs,
see [Using ROCm Python packages
(TheRock)](https://github.com/ROCm/TheRock/blob/main/RELEASES.md#using-rocm-python-packages).
````{tip}
If you need to deactivate your Python virtual environment when finished,
run:
```shell
deactivate
```
````

View File

@@ -1,5 +0,0 @@
To uninstall ROCm, remove your installation directory.
```bash
sudo rm -rf therock-tarball
```

View File

@@ -1,11 +0,0 @@
1. Clear the pip cache.
```bash
sudo rm -rf ~/.cache/pip
```
2. Remove your local Python virtual environment.
```bash
sudo rm -rf .venv
```

View File

@@ -1,11 +0,0 @@
1. Clear the pip cache.
```bash
pip cache purge
```
2. Remove your local Python virtual environment.
```bash
rmdir /s /q .venv
```

View File

@@ -1,18 +0,0 @@
1. Delete the `C:\TheRock\build` and its contents.
2. Delete the environment variables. For example, using PowerShell as an administrator:
```powershell
[Environment]::SetEnvironmentVariable("HIP_PATH", $null, "Machine")
[Environment]::SetEnvironmentVariable("HIP_DEVICE_LIB_PATH", $null, "Machine")
[Environment]::SetEnvironmentVariable("HIP_PLATFORM", $null, "Machine")
[Environment]::SetEnvironmentVariable("LLVM_PATH", $null, "Machine")
```
3. Remove the following paths from your PATH environment variable using your system settings GUI.
- `C:\TheRock\build\bin`
- `C:\TheRock\build\lib\llvm\bin`
4. If you want to uninstall the Adrenalin driver, see [Uninstall AMD Software](https://www.amd.com/en/resources/support-articles/faqs/RSX2-UNINSTALL.html).

View File

@@ -1,8 +0,0 @@
Docker images often only include a minimal set of installations, meaning some
essential packages might be missing. When installing ROCm within a Docker
container, you might need to install additional packages for a successful
installation. Use the following commands to install the prerequisite packages.
```bash
dnf install sudo libatomic
```

View File

@@ -1,9 +0,0 @@
Docker images often only include a minimal set of installations, meaning some
essential packages might be missing. When installing ROCm within a Docker
container, you might need to install additional packages for a successful
installation. Use the following commands to install the prerequisite packages.
```bash
apt update
apt install sudo wget
```

View File

@@ -0,0 +1,395 @@
Installation
============
.. dropdown:: Installation environment
:animate: fade-in-slide-down
:icon: desktop-download
:chevron: down-up
.. include:: ./includes/selector.rst
Before getting started, make sure you've completed the :ref:`rocm-prerequisites`.
For information about supported operating systems and compatible AMD devices,
see the :doc:`Compatibility matrix </compatibility/compatibility-matrix>`.
.. selected:: os=windows
.. caution::
Do not copy/replace the ROCm compiler and runtime DLLs to System32 as
this can cause conflicts.
.. selected:: fam=instinct fam=radeon-pro fam=radeon
.. selected:: os=ubuntu
:heading: Install kernel driver
:heading-level: 3
For information about AMD GPU Driver (amdgpu) compatibility, see the
:doc:`Compatibility matrix </compatibility/compatibility-matrix>`.
For instructions on installing the AMD GPU Driver (amdgpu), see `Ubuntu native
installation
<https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/install/detailed-install/package-manager/package-manager-ubuntu.html>`__
in the AMD Instinct Data Center GPU Documentation.
.. selected:: os=rhel
:heading: Install kernel driver
:heading-level: 3
For information about AMD GPU Driver (amdgpu) compatibility, see the
:doc:`Compatibility matrix </compatibility/compatibility-matrix>`.
For instructions on installing the AMD GPU Driver (amdgpu), see `RHEL native
installation
<https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/install/detailed-install/package-manager/package-manager-rhel.html>`__
in the AMD Instinct Data Center GPU Documentation.
.. selected:: os=sles
:heading: Install kernel driver
:heading-level: 3
For information about AMD GPU Driver (amdgpu) compatibility, see the
:doc:`Compatibility matrix </compatibility/compatibility-matrix>`.
For instructions on installing the AMD GPU Driver (amdgpu), see `SLES native
installation
<https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/install/detailed-install/package-manager/package-manager-sles.html>`__
in the AMD Instinct Data Center GPU Documentation.
.. selected:: fam=ryzen
.. selected:: os=ubuntu os=rhel os=sles
:heading: About the kernel driver
:heading-level: 3
Supported Ryzen AI APUs require the inbox kernel driver included with
Ubuntu 24.04.3.
Install ROCm
------------
Use the following instructions to install the ROCm Core SDK on your system.
.. selected:: i=pip
:heading: Set up your Python virtual environment
:heading-level: 4
Create and activate the Python virtual environment where you'll install
ROCm packages.
.. selected:: os=ubuntu
.. selected:: os-version=24
For example, to create and activate a Python 3.12 virtual environment,
run the following command:
.. code-block:: bash
python3.12 -m venv .venv
source .venv/bin/activate
.. selected:: os-version=22
For example, to create and activate a Python 3.11 virtual environment,
run the following command:
.. code-block:: bash
python3.11 -m venv .venv
source .venv/bin/activate
.. selected:: os=rhel
.. selected:: os-version=10.1 os-version=10.0
For example, to create and activate a Python 3.12 virtual environment,
run the following command:
.. code-block:: bash
python3.12 -m venv .venv
source .venv/bin/activate
.. selected:: os-version=9.7 os-version=9.6
For example, to create and activate a Python 3.11 virtual environment,
run the following command:
.. code-block:: bash
python3.11 -m venv .venv
source .venv/bin/activate
.. selected:: os-version=8
For example, to create and activate a Python 3.11 virtual environment,
run the following command:
.. code-block:: bash
python3.11 -m venv .venv
source .venv/bin/activate
.. selected:: os=sles
For example, to create and activate a Python 3.11 virtual environment,
run the following command:
.. code-block:: bash
python3.11 -m venv .venv
source .venv/bin/activate
.. selected:: os=windows
For example, to create and activate a Python 3.12 virtual environment,
run the following command:
.. code-block:: bat
python3.12 -m venv .venv
.venv\Scripts\activate
.. selected:: i=pip
:heading: Install ROCm wheel packages
:heading-level: 4
.. selected:: gfx=950
Use pip to install the ROCm Core SDK libraries and development tools for
your ``gfx950`` GPU.
Run the following command:
.. code-block:: bash
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx950-dcgpu/ "rocm[libraries,devel]"
.. selected:: gfx=942
Use pip to install the ROCm Core SDK libraries and development tools for
your ``gfx942`` device.
Run the following command:
.. code-block:: bash
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx94X-dcgpu/ "rocm[libraries,devel]"
.. selected:: gfx=90a
Use pip to install the ROCm Core SDK libraries and development tools for
your ``gfx90a`` GPU.
Run the following command:
.. code-block:: bash
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx90X-dcgpu/ "rocm[libraries,devel]"
.. selected:: gfx=1100
Use pip to install the ROCm Core SDK libraries and development tools for
your ``gfx1100`` GPU.
Run the following command:
.. code-block:: bash
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx110X-dgpu/ "rocm[libraries,devel]"
.. selected:: gfx=1101
Use pip to install the ROCm Core SDK libraries and development tools for
your ``gfx1101`` GPU.
Run the following command:
.. code-block:: bash
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx110X-dgpu/ "rocm[libraries,devel]"
.. selected:: gfx=1150
Use pip to install the ROCm Core SDK libraries and development tools for
your ``gfx1150`` GPU.
Run the following command:
.. code-block:: bash
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1150/ "rocm[libraries,devel]"
.. selected:: gfx=1151
Use pip to install the ROCm Core SDK libraries and development tools for
your ``gfx1151`` GPU.
Run the following command:
.. code-block:: bash
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1151/ "rocm[libraries,devel]"
.. selected:: i=tar
:heading: Create the installation directory
:heading-level: 4
.. selected:: os=ubuntu os=rhel os=sles
Run the following command in your desired location to create your
installation directory:
.. code-block:: bash
mkdir therock-tarball && cd therock-tarball
.. important::
Subsequent commands assume you're working with the ``therock-tarball``
directory. If you choose a different directory name, adjust the
commands accordingly.
.. selected:: os=windows
Create the installation directory in ``C:\TheRock\build``.
.. important::
Subsequent commands assume you're working with the
``C:\TheRock\build`` directory. If you choose a different directory
name, adjust the commands accordingly.
.. selected:: i=tar
:heading: Download and unpack the tarball
:heading-level: 4
.. selected:: os=ubuntu os=rhel os=sles
.. selected:: gfx=950
Use the following commands to download and untar the ROCm tarball for
your ``gfx950`` GPU.
Run the following command:
.. code-block:: bash
wget https://repo.amd.com/rocm/tarball/therock-dist-linux-gfx950-dcgpu-7.10.0.tar.gz
mkdir install
tar -xf *.tar.gz -C install
.. selected:: gfx=942
Use the following commands to download and untar the ROCm tarball for
your ``gfx942`` GPU.
Run the following command:
.. code-block:: bash
wget https://repo.amd.com/rocm/tarball/therock-dist-linux-gfx94X-dcgpu-7.10.0.tar.gz
mkdir install
tar -xf *.tar.gz -C install
.. selected:: gfx=90a
Use the following commands to download and untar the ROCm tarball for
your ``gfx90a`` GPU.
Run the following command:
.. code-block:: bash
wget https://repo.amd.com/rocm/tarball/therock-dist-linux-gfx90X-dcgpu-7.10.0.tar.gz
mkdir install
tar -xf *.tar.gz -C install
.. selected:: gfx=1100
Use the following commands to download and untar the ROCm tarball for
your ``gfx1100`` GPU.
Run the following command:
.. code-block:: bash
wget https://repo.amd.com/rocm/tarball/therock-dist-linux-gfx110X-dgpu-7.10.0.tar.gz
mkdir install
tar -xf *.tar.gz -C install
.. selected:: gfx=1101
Use the following commands to download and untar the ROCm tarball for
your ``gfx1101`` GPU.
Run the following command:
.. code-block:: bash
wget https://repo.amd.com/rocm/tarball/therock-dist-linux-gfx110X-dgpu-7.10.0.tar.gz
mkdir install
tar -xf *.tar.gz -C install
.. selected:: gfx=1150
Use the following commands to download and untar the ROCm tarball for
your ``gfx1150`` GPU.
Run the following command:
.. code-block:: bash
wget https://repo.amd.com/rocm/tarball/therock-dist-linux-gfx1150-7.10.0.tar.gz
mkdir install
tar -xf *.tar.gz -C install
.. selected:: gfx=1151
Use the following commands to download and untar the ROCm tarball for
your ``gfx1151`` GPU.
Run the following command:
.. code-block:: bash
wget https://repo.amd.com/rocm/tarball/therock-dist-linux-gfx1151-7.10.0.tar.gz
mkdir install
tar -xf *.tar.gz -C install
.. selected:: os=windows
.. selected:: gfx=1100
Download the tarball for your ``gfx1100`` GPU and extract the contents
to ``C:\TheRock\build``.
- Download link: `therock-dist-windows-gfx110X-dgpu-7.10.0.tar.gz
<https://repo.amd.com/rocm/tarball/therock-dist-windows-gfx110X-dgpu-7.10.0.tar.gz>`__
.. selected:: gfx=1101
Download the tarball for your ``gfx1101`` GPU and extract the contents
to ``C:\TheRock\build``.
- Download link: `therock-dist-windows-gfx110X-dgpu-7.10.0.tar.gz
<https://repo.amd.com/rocm/tarball/therock-dist-windows-gfx110X-dgpu-7.10.0.tar.gz>`__
.. selected:: gfx=1150
Download the tarball for your ``gfx1150`` GPU and extract the contents
to ``C:\TheRock\build``.
- Download link: `therock-dist-windows-gfx1150-7.10.0.tar.gz
<https://repo.amd.com/rocm/tarball/therock-dist-windows-gfx1150-7.10.0.tar.gz>`__
.. selected:: gfx=1151
Download the tarball for your ``gfx1151`` GPU and extract the contents
to ``C:\TheRock\build``.
- Download link: `therock-dist-windows-gfx1151-7.10.0.tar.gz
<https://repo.amd.com/rocm/tarball/therock-dist-windows-gfx1151-7.10.0.tar.gz>`__

View File

@@ -0,0 +1,316 @@
Post-installation
=================
.. dropdown:: Installation environment
:animate: fade-in-slide-down
:icon: desktop-download
:chevron: down-up
.. include:: ./includes/selector.rst
After installing the ROCm Core SDK |ROCM_VERSION| -- see :ref:`rocm-install` --
complete these post-installation steps to complete your system configuration
and validate the installation.
.. selected:: i=tar
.. selected:: os=ubuntu os=rhel os=sles
:heading: Configure your environment
:heading-level: 3
Configure environment variables so that ROCm libraries and tools are
available either to all users on the system or only to your user account.
.. _rocm-post-install-system-wide:
.. tab-set::
.. tab-item:: System-wide setup
Create a profile script so that all users inherit the ROCm
environment variables when they start a shell session. Make sure
you're in the ``therock-tarball`` directory before proceeding.
.. code-block:: bash
# Configure ROCm PATH. Make sure you're in the therock-tarball directory before proceeding.
ROCM_INSTALL_PATH=$(pwd)/install
sudo tee /etc/profile.d/set-rocm-env.sh << EOF
export ROCM_PATH=$ROCM_INSTALL_PATH
export PATH=\$PATH:\$ROCM_PATH/bin
export LD_LIBRARY_PATH=\$ROCM_PATH/lib
EOF
sudo chmod +x /etc/profile.d/set-rocm-env.sh
source /etc/profile.d/set-rocm-env.sh
.. tab-item:: User setup
Configure the ROCm environment for your user by updating your shell
configuration file.
1. Add the following to your shell configuration file
(``~/.bashrc``, ``~/.profile``). Make sure you're in the
``therock-tarball`` directory before proceeding.
.. code-block:: bash
# Configure ROCm PATH. Make sure you're in the therock-tarball directory before proceeding.
export ROCM_PATH=$PWD/install
export PATH=$PATH:$ROCM_PATH/bin
export LD_LIBRARY_PATH=$ROCM_PATH/lib
2. After modifying your shell configuration, apply the change to
your current session by sourcing your updated shell
configuration file.
.. tab-set::
.. tab-item:: .bashrc
.. code-block:: bash
source ~/.bashrc
.. tab-item:: .profile
.. code-block:: bash
source ~/.profile
.. selected:: os=windows
:heading: Configure your environment
:heading-level: 3
Configure environment variables so that ROCm libraries and tools are
available on your Windows system.
1. Set the following environment variables using the command
prompt as an administrator:
.. code-block:: bat
setx HIP_DEVICE_LIB_PATH “C:\TheRock\build\lib\llvm\amdgcn\bitcode” /M
setx HIP_PATH “C:\TheRock\build” /M
setx HIP_PLATFORM “amd” /M
setx LLVM_PATH “C:\TheRock\build\lib\llvm” /M
2. Add the following paths into PATH environment variable using your system settings GUI.
- ``C:\TheRock\build\bin``
- ``C:\TheRock\build\lib\llvm\bin``
3. Open a new command prompt window for the environment variables to take effect. Run ``set``
to see the list of active variables.
.. code-block:: bat
set
.. selected:: os=ubuntu os=rhel os=sles
:heading: Verify your installation
:heading-level: 3
Use the following ROCm tools to verify that the ROCm stack is correctly
installed and that your AMD GPU is visible to the system.
1. Use ``rocminfo`` to list detected AMD GPUs and confirm that the ROCm
runtimes and drivers are correctly installed and loaded.
.. code-block:: bash
rocminfo
.. dropdown:: Example output of ``rocminfo``
:animate: fade-in-slide-down
:color: success
:icon: note
:chevron: down-up
.. code-block:: shell-session
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.18
Runtime Ext Version: 1.14
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
XNACK enabled: NO
DMAbuf Support: YES
VMM Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD RYZEN AI MAX+ PRO 395 w/ Radeon 8060S
Uuid: CPU-XX
Marketing Name: AMD RYZEN AI MAX+ PRO 395 w/ Radeon 8060S
Vendor Name: CPU
... [output truncated]
2. Use the AMD SMI CLI ``amd-smi`` to validate system information.
.. code-block:: bash
amd-smi version
.. dropdown:: Example output of ``amd-smi version``
:animate: fade-in-slide-down
:color: success
:icon: note
:chevron: down-up
.. code-block:: shell-session
AMDSMI Tool: 26.1.0+cd50d9e0 | AMDSMI Library version: 26.1.0 | ROCm version: 7.10.0 | amdgpu version: 6.16.6 | amd_hsmp version: N/A
.. selected:: i=pip
3. Inspect your installation in your Python environment and confirm that
ROCm packages, including the ``rocm-sdk`` CLI, are available.
.. code-block:: bash
pip freeze | grep rocm
which rocm-sdk
ls .venv/bin
.. selected:: os=windows
:heading: Verify your installation
:heading-level: 3
Use the following ROCm tools to verify that the ROCm stack is correctly
installed and that your AMD GPU is visible to the system.
.. selected:: i=pip
1. Use ``hipinfo`` to list detected AMD GPUs and confirm that the ROCm
runtimes and drivers are correctly installed and loaded.
.. code-block:: bash
hipinfo
.. dropdown:: Example output of ``hipinfo``
:animate: fade-in-slide-down
:color: success
:icon: note
:chevron: down-up
.. code-block:: shell-session
--------------------------------------------------------------------------------
device# 0
Name: AMD Radeon(TM) 8060S Graphics
pciBusID: 197
pciDeviceID: 0
pciDomainID: 0
multiProcessorCount: 20
... [output truncated]
2. Inspect your installation in your Python environment and confirm that
ROCm packages, including the ``rocm-sdk`` CLI, are available.
.. code-block:: bash
pip freeze
where rocm-sdk
dir .venv\Scripts
.. selected:: i=tar
Use ``hipinfo`` to list detected AMD GPUs and confirm that the ROCm
runtimes and drivers are correctly installed and loaded.
.. code-block:: bash
hipinfo
.. dropdown:: Example output of ``hipinfo``
:animate: fade-in-slide-down
:color: success
:icon: note
:chevron: down-up
.. code-block:: shell-session
--------------------------------------------------------------------------------
device# 0
Name: AMD Radeon(TM) 8060S Graphics
pciBusID: 197
pciDeviceID: 0
pciDomainID: 0
multiProcessorCount: 20
... [output truncated]
.. selected:: os=ubuntu os=rhel os=sles
.. selected:: i=pip
:heading: Test your installation
:heading-level: 3
Run the following commands from your Python virtual environment to confirm
that the ROCm SDK is correctly configured and that basic checks complete
successfully.
.. code-block:: bash
rocm-sdk targets
rocm-sdk path --cmake
rocm-sdk path --bin
rocm-sdk path --root
To learn more about the ``rocm-sdk`` tool and to see example expected
outputs, see `Using ROCm Python packages (TheRock)
<https://github.com/ROCm/TheRock/blob/main/RELEASES.md#using-rocm-python-packages>`__.
.. selected:: i=tar
:heading: Test your installation
:heading-level: 3
Run the ``test_hip_api`` tool to verify that the HIP runtime can access
your GPU and execute a simple workload.
.. code-block:: bash
test_hip_api
.. selected:: os=windows
.. selected:: i=pip
:heading: Test your installation
:heading-level: 3
Run the following commands from your Python virtual environment to confirm
that the ROCm SDK is correctly configured and that basic checks complete
successfully.
.. code-block:: bash
rocm-sdk test
To learn more about the ``rocm-sdk`` tool and to see example expected
outputs, see `Using ROCm Python packages (TheRock)
<https://github.com/ROCm/TheRock/blob/main/RELEASES.md#using-rocm-python-packages>`__.
.. selected:: i=pip
.. tip::
If you need to deactivate your Python virtual environment when finished, run:
.. code-block::
deactivate

View File

@@ -0,0 +1,301 @@
Prerequisites
=============
Before installing the ROCm Core SDK |ROCM_VERSION|, ensure your system meets
all prerequisites. This includes installing the required dependencies and
configuring permissions for GPU access. To confirm that your system is
supported, see the :doc:`Compatibility matrix
</compatibility/compatibility-matrix>`.
.. selected:: os=ubuntu os=rhel os=sles
.. dropdown:: Install essential packages for Docker containers
:animate: fade-in-slide-down
:color: info
:icon: tools
:chevron: down-up
Docker images often include only a minimal set of installations, so some
essential packages might be missing. When installing ROCm within a Docker
container, you might need to install additional packages for a successful
installation.
If applicable, run the following command to install essential packages:
.. selected:: os=ubuntu
.. code-block:: bash
apt update
apt install sudo wget python3 libatomic1
.. selected:: os=rhel
.. selected:: os-version=10.1 os-version=10.0 os-version=9.7 os-version=9.6
.. code-block:: bash
dnf install sudo wget libatomic
.. selected:: os-version=8
.. code-block:: bash
dnf install sudo wget libatomic python3
.. selected:: os=sles
.. code-block:: bash
zypper install sudo libatomic1 libgfortran5 wget SUSEConnect python3
.. selected:: os=windows
1. Remove any existing HIP SDK for Windows installations and other
conflicting AMD graphics software.
2. Install the Adrenalin Driver for Windows.
* For general use cases, use the Adrenalin Driver version 25.11.1. For
details and the download link, see `AMD Software: Adrenalin
Edition 25.11.1
<https://www.amd.com/en/resources/support-articles/release-notes/RN-RAD-WIN-25-11-1.html>`__.
* If you intend to run :ref:`ComfyUI workloads
<install-comfyui-windows>`, use driver version 25.20.01.17. For
details and the download link, see `AMD Software: PyTorch on Windows
Edition 7.1.1
<https://www.amd.com/en/resources/support-articles/release-notes/RN-AMDGPU-WINDOWS-PYTORCH-7-1-1.html>`__.
3. Disable the following Windows security features as they can interfere
with ROCm functionality:
* Turn off WDAG (Windows Defender Application Guard)
* Control Panel > Programs > Programs and Features > Turn Windows
features on or off > **Clear** “Microsoft Defender Application
Guard”
* Turn off SAC (Smart App Control)
* Settings > Privacy & security > Windows Security > App & browser
control > Smart App Control settings > **Off**
.. selected:: os=rhel
:heading: Register your Red Hat Enterprise Linux system
:heading-level: 3
Register your Red Hat Enterprise Linux (RHEL) system to enable access to Red
Hat repositories and ensure youre able to download and install packages.
Run the following command to register your system:
.. selected:: os-version=10.1 os-version=10.0
.. code-block:: bash
subscription-manager register --username <username> --password <password>
.. selected:: os-version=9.7 os-version=9.6 os-version=8
.. code-block:: bash
subscription-manager register --username <username> --password <password>
subscription-manager attach --auto
.. selected:: os=sles
:heading: Register your SUSE Linux Enterprise Server system
:heading-level: 3
Register your SUSE Linux Enterprise Server (SLES) system to enable access to
SUSE repositories and ensure youre able to download and install packages.
Run the following command to register your system:
.. code-block:: bash
sudo SUSEConnect -r <REGCODE>
.. selected:: os=rhel
:heading: Update your system
:heading-level: 3
After registering your system, update RHEL to the latest packages. This is
particularly important for newer hardware on older versions of RHEL.
Run the following command to update your system:
.. selected:: os-version=10.1
.. code-block:: bash
sudo dnf update --releasever=10.1 --exclude=\*release\*
.. selected:: os-version=10.0
.. code-block:: bash
sudo dnf update --releasever=10.0 --exclude=\*release\*
.. selected:: os-version=9.7
.. code-block:: bash
sudo dnf update --releasever=9.7 --exclude=\*release\*
.. selected:: os-version=9.6
.. code-block:: bash
sudo dnf update --releasever=9.6 --exclude=\*release\*
.. selected:: os-version=8
.. code-block:: bash
sudo dnf update --releasever=8.10 --exclude=\*release\*
.. selected:: os=sles
:heading: Update your system
:heading-level: 3
After registering your system, update SLES to the latest available packages.
This is particularly important for newer hardware on older versions of SLES.
Run the following command to update your system:
.. code-block:: bash
sudo zypper update
.. selected:: i=pip
.. selected:: os=ubuntu
.. selected:: os-version=24
:heading: Install Python
:heading-level: 3
Install a supported Python version. For example, to install Python
3.12, run the following command:
.. code-block:: bash
sudo apt install python3.12 python3.12-venv
.. selected:: os-version=22
:heading: Install Python
:heading-level: 3
Install a supported Python version. For example, to install Python
3.11, run the following command:
.. code-block:: bash
sudo apt install python3.11 python3.11-venv
.. selected:: os=rhel
.. selected:: os-version=10.1 os-version=10.0
:heading: Install Python
:heading-level: 3
Install a supported Python version. For example, to install Python
3.12, run the following command:
.. code-block:: bash
sudo dnf install python3.12 python3.12-pip
.. selected:: os-version=9.7 os-version=9.6 os-version=8
:heading: Install Python
:heading-level: 3
Install a supported Python version. For example, to install Python
3.11, run the following command:
.. code-block:: bash
sudo dnf install python3.11 python3.11-pip
.. selected:: os=sles
:heading: Install Python
:heading-level: 3
Install a supported Python version. For example, to install Python 3.11,
run the following command:
.. code-block:: bash
sudo zypper install -y python311 python311-pip
.. selected:: os=windows
:heading: Install Python
:heading-level: 3
Install a supported Python version: 3.11, 3.12, or 3.13.
.. selected:: os=rhel
.. selected:: os-version=10.0 os-version=8
:heading: Install additional development packages
:heading-level: 3
.. code-block:: bash
sudo dnf install libatomic
.. selected:: os=sles
.. selected:: os-version=15
:heading: Install additional development packages
:heading-level: 3
.. code-block:: bash
sudo zypper install libatomic1
.. selected:: os=ubuntu os=rhel os=sles
:heading: Configure permissions for GPU access
:heading-level: 3
There are two primary methods of configuring GPU access for ROCm: group
membership or udev rules. Each method has its own advantages. The choice
depends on your specific requirements and system management preferences.
.. tab-set::
.. tab-item:: Group membership
By default, GPU access is controlled by membership in the ``video`` and
``render`` Linux system groups. The ``video`` group traditionally handles
video device access, while the ``render`` group manages GPU rendering
through DRM render nodes.
.. code-block:: bash
# Add the current user to the render and video groups
sudo usermod -a -G render,video $LOGNAME
.. tab-item:: udev rules
udev rules are a flexible, system-wide approach for managing device
permissions, eliminating the need for user group management while
allowing granular GPU access. To enable them and grant GPU access to
all users, run the following command:
.. code-block:: bash
sudo tee /etc/udev/rules.d/70-amdgpu.rules << EOF
KERNEL=="kfd", GROUP="render", MODE="0666"
SUBSYSTEM=="drm", KERNEL=="renderD*", GROUP="render", MODE="0666"
EOF
sudo udevadm control --reload-rules
sudo udevadm trigger
.. note::
To apply all settings, reboot your system.

View File

@@ -0,0 +1,73 @@
.. include:: /compatibility/includes/selector.rst
.. selected:: fam=instinct fam=radeon-pro fam=radeon
.. selector:: Ubuntu version
:key: os-version
:show-when: os=ubuntu
.. selector-option:: 24.04.3
:value: 24
.. selector-option:: 22.04.5
:value: 22
.. selected:: fam=ryzen
.. selector:: Ubuntu version
:key: os-version
:show-when: os=ubuntu
.. selector-option:: 24.04.3
:value: 24
:width: 12
.. selector:: RHEL version
:key: os-version
:show-when: os=rhel
.. selector-option:: 10.1
:value: 10.1
:width: 3
.. selector-option:: 10.0
:value: 10.0
:width: 3
.. selector-option:: 9.7
:value: 9.7
:width: 2
.. selector-option:: 9.6
:value: 9.6
:width: 2
.. selector-option:: 8.10
:value: 8
:width: 2
.. selector:: SLES version
:key: os-version
:show-when: os=sles
.. selector-option:: 15.7
:value: 15
:width: 12
.. selector:: Windows version
:key: os-version
:show-when: os=windows
.. selector-option:: 11 25H2
:value: 11-25h2
:width: 12
.. selector:: Installation method
:key: i
.. selector-option:: pip
:value: pip
.. selector-option:: Tarball
:value: tar

Some files were not shown because too many files have changed in this diff Show More