Compare commits

...

64 Commits

Author SHA1 Message Date
David Dixon
bfe3983d90 correct pip args 2025-07-31 08:29:08 -06:00
David Dixon
98f51f2bcf add python deps for hipblaslt 2025-07-30 17:15:13 -06:00
David Dixon
a38ab1c212 Need both target options while transitioning between build systems 2025-07-30 16:11:37 -06:00
David Dixon
54c86f3a92 add deps install back 2025-07-30 13:34:09 -06:00
Daniel Su
014442b28d Change to GPU_TARGETS 2025-07-30 10:32:52 -04:00
Daniel Su
f3f1526fee Add blas and lapack to dnf map 2025-07-30 10:30:00 -04:00
David Dixon
d74d0543e8 Drop lapack install script 2025-07-29 18:47:25 -06:00
Pratik Basyal
f632f2879f ROCm Software Stack image for 6.4.0 updated (#5112) 2025-07-28 14:51:19 -04:00
yugang-amd
cc5bc5a882 Add SGLang inference benchmark doc w/ initial support for DeepSeek-R1-Distill-Qwen-32B (#4870) 2025-07-25 12:42:40 -04:00
Daniel Su
2c9c3d0ba1 [Ex CI] switch hipBLAS/SPARSE pipeline IDs to monorepo (#5098) 2025-07-24 16:53:29 -04:00
Peter Park
14249f24d8 Use madengine instead of tools/run_models.py in docs (#5095) 2025-07-24 15:38:12 -04:00
Daniel Su
0e8045cca7 [Ex CI] enable hipBLAS monorepo (#5090) 2025-07-24 12:37:34 -04:00
Daniel Su
541fe92947 [Ex CI] update to 6.4.2 (#5087) 2025-07-23 14:10:40 -04:00
Daniel Su
628d5f8a19 [Ex CI] create Docker images for nightly builds (#5005) 2025-07-23 12:16:11 -04:00
Peter Park
984a91f008 Add DeepSeek Janus Pro 7B to PyTorch inference benchmark doc (#5071)
---------

Co-authored-by: yugang-amd <yugang.wang@amd.com>
2025-07-22 16:26:06 -04:00
amd-hsivasun
ae2cc6ab38 [EX CI] ROCR-Runtime: migrate from rocm-smi to amd-smi (#5088)
* Update ROCR-Runtime.yml

Migrate from rocmsmi to amdsmi

* Update ROCR-Runtime.yml

Removed libhwloc.so.5 install

* Update ROCR-Runtime.yml

Link to hwloc.so.5

* Update ROCR-Runtime.yml

Added link in the rocrtst step

* Update ROCR-Runtime.yml
2025-07-22 14:17:53 -04:00
Peter Park
15ee605d18 Fix branches for install docs in _toc.yml.in (#5083) 2025-07-22 11:03:40 -04:00
anisha-amd
ae54add299 Sphinx warning for ROCm fixed (#5077) (#5082)
* Sphinx warning for DGL fixed

* Update dgl-compatibility.rst

removed benchmark line and updated link

---------

Co-authored-by: Pratik Basyal <prbasyal@amd.com>
2025-07-22 10:51:15 -04:00
Peter Park
2269e9d25d Remove broken link to deprecated AMDGPU installer documentation (#5078)
* remove link to deprecated AMDGPU installation method

* add deep learning frameworks
2025-07-21 19:36:20 -04:00
alexxu-amd
1b0b9f5a67 update xml for 6.4.2 (#5075) 2025-07-21 17:22:12 -04:00
Pratik Basyal
49548ada2e Date updated for GA (#5073) 2025-07-21 16:52:44 -04:00
alexxu-amd
aa5ddfb483 Merge pull request #5072 from ROCm/sync-develop-from-internal
Sync develop from internal
2025-07-21 15:50:27 -04:00
Pratik Basyal
34bffcb8ac Internal Link reverted to public for 6.4.2 GA (#477)
* Link reveted for GA

* Reduntant footnote removed
2025-07-21 15:35:52 -04:00
alexxu-amd
efc302ca83 Merge pull request #476 from ROCm/sync-develop-from-external
Sync develop from external
2025-07-21 15:09:42 -04:00
alexxu-amd
33c2a9fa89 Merge branch 'develop' into sync-develop-from-external 2025-07-21 15:03:20 -04:00
Alex Xu
aa6f40e2e0 Merge remote-tracking branch 'external/develop' into sync-develop-from-external 2025-07-21 14:55:59 -04:00
Pratik Basyal
977c74fe71 Deep learning framework doc highlight update to RN 6.4.2 (#474)
* HIP 7.0 upcoming changes blog link updated

* Documentation highlight for deep learning framework added

* Note loading fixed

* Note removed

* Link fixed
2025-07-21 14:17:10 -04:00
alexxu-amd
2bbcfc8f92 Update versions.md (#475) 2025-07-21 12:44:58 -04:00
Peter Park
5bcf3b0847 Update Megatron-LM training benchmark doc for v25.6 release (#5064) 2025-07-18 15:57:25 -04:00
Peter Park
7e7e15a201 Fix path to data file in vllm-0.9.1-20250702.rst (#5066) 2025-07-18 14:16:05 -04:00
Pratik Basyal
50718c9dc0 SLES 15 SP 7 note removed from Compatibility matrix (#472)
* HIP 7.0 upcoming changes blog link updated

* SLES 15 SP 7 note removed from compatibility matrix
2025-07-18 10:37:34 -04:00
Peter Park
b437a625b3 Update vLLM inference benchmark doc for 0715 release (#5058) 2025-07-17 15:00:02 -04:00
Daniel Su
09460f7332 [Ex CI] re-enable rocm-examples rocfft_callback (#5062) 2025-07-17 13:59:34 -04:00
spolifroni-amd
0d3b19b3cc added dgl and megatron to csv (#5057) 2025-07-16 15:28:08 -04:00
spolifroni-amd
703e253db5 minor link and comp matrix fixes (#5056) 2025-07-16 14:32:47 -04:00
Daniel Su
ec9b9cad17 [Ex CI] disable checkout in roc/hipSPARSE test jobs (#5046)
* [Ex CI] disable checkout in roc/hipSPARSE test jobs

* rocsparse 2 hour timeout
2025-07-16 10:46:13 -04:00
Daniel Su
20ff132b9b [Ex CI] migrate rocSPARSE, hipSPARSELt pipeline IDs (#5045) 2025-07-16 10:46:07 -04:00
Jan Stephan
16f707d6c4 Merge pull request #5001 from j-stephan/fix-doc-warnings
Fix doc warnings
2025-07-16 07:10:54 -04:00
Jeffrey Novotny
b431415ade Merge Verl, DGL, Megatron changes. (#5047)
* Verl compatibility

* verl compatibility

* add Supported features

Signed-off-by: Vicky Tsang <vtsang@amd.com>

* updated and edited verl compat doc

* added links to verl

* add future release for sglang and megatron inference eng.

Signed-off-by: Vicky Tsang <vtsang@amd.com>

* fix lint

Signed-off-by: Vicky Tsang <vtsang@amd.com>

* fixed a typo and a table

* Spolifroni amd/add to compat matrix (#430)

* added verl to compatibility matrix

* small change

* fixed an error in csv

* edited the verl compat based on leo's recommendations

* updated compat matrix (#435)

* Added a hardcoded link to the verl install

This is a link to an RTD build and MUST be removed before publishing.

* Update verl-compatibility.rst

* Added a hardcoded link to the verl install

This link is to an RTD build and it WILL break at publishing. It MUST be changed before publishing.

* Added version support note (#448)

* small fixes

* Update verl-compatibility.rst

* Update verl-compatibility.rst

---------

Signed-off-by: Vicky Tsang <vtsang@amd.com>
Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com>
Co-authored-by: anisha-amd <anisha.sankar@amd.com>
(cherry picked from commit f9bd22626b)

* Stanford Megatron-LM Compatibility

* Create stanford-megatron-lm-compatibility.rst

* toc and wordlist

* Update deep-learning-rocm.rst

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* fixes and adding to main compat matrix

* formatting fix

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* Update docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

---------

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
(cherry picked from commit f4f096b44e)

* Framework: DGL Compatability

* Introducing new file for DGL Compatability

* Update dgl-compatibility.rst

* Update .wordlist.txt

* Update .wordlist.txt

* Update deep-learning-rocm.rst

* compatibility fixes

* Update docs/compatibility/ml-compatibility/dgl-compatibility.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/compatibility/ml-compatibility/dgl-compatibility.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/compatibility/ml-compatibility/dgl-compatibility.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/compatibility/ml-compatibility/dgl-compatibility.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update dgl-compatibility.rst

* Update dgl-compatibility.rst

* Update dgl-compatibility.rst

* Update dgl-compatibility.rst

* additions to use-cases and system support

* wording and fixes

* Update dgl-compatibility.rst

* Update dgl-compatibility.rst

* remove table heading

* Update compatibility-matrix-historical-6.0.csv

---------

Co-authored-by: anisha-amd <anisha.sankar@amd.com>
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
(cherry picked from commit 2a7554c0b9)

* Manually resolve merge conflict

* Further merge conflict adjustments

---------

Signed-off-by: Vicky Tsang <vtsang@amd.com>
Co-authored-by: vickytsang <vtsang@amd.com>
Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com>
Co-authored-by: anisha-amd <anisha.sankar@amd.com>
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
Co-authored-by: Mukhil M S <167260682+mukh1l@users.noreply.github.com>
2025-07-15 18:57:31 -04:00
vickytsang
f9bd22626b Verl compatibility
* verl compatibility

* add Supported features

Signed-off-by: Vicky Tsang <vtsang@amd.com>

* updated and edited verl compat doc

* added links to verl

* add future release for sglang and megatron inference eng.

Signed-off-by: Vicky Tsang <vtsang@amd.com>

* fix lint

Signed-off-by: Vicky Tsang <vtsang@amd.com>

* fixed a typo and a table

* Spolifroni amd/add to compat matrix (#430)

* added verl to compatibility matrix

* small change

* fixed an error in csv

* edited the verl compat based on leo's recommendations

* updated compat matrix (#435)

* Added a hardcoded link to the verl install

This is a link to an RTD build and MUST be removed before publishing.

* Update verl-compatibility.rst

* Added a hardcoded link to the verl install

This link is to an RTD build and it WILL break at publishing. It MUST be changed before publishing.

* Added version support note (#448)

* small fixes

* Update verl-compatibility.rst

* Update verl-compatibility.rst

---------

Signed-off-by: Vicky Tsang <vtsang@amd.com>
Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com>
Co-authored-by: anisha-amd <anisha.sankar@amd.com>
2025-07-15 16:39:31 -04:00
anisha-amd
f4f096b44e Stanford Megatron-LM Compatibility
* Create stanford-megatron-lm-compatibility.rst

* toc and wordlist

* Update deep-learning-rocm.rst

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* fixes and adding to main compat matrix

* formatting fix

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* Update docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

* Update stanford-megatron-lm-compatibility.rst

---------

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
2025-07-15 16:23:50 -04:00
Mukhil M S
2a7554c0b9 Framework: DGL Compatability
* Introducing new file for DGL Compatability

* Update dgl-compatibility.rst

* Update .wordlist.txt

* Update .wordlist.txt

* Update deep-learning-rocm.rst

* compatibility fixes

* Update docs/compatibility/ml-compatibility/dgl-compatibility.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/compatibility/ml-compatibility/dgl-compatibility.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/compatibility/ml-compatibility/dgl-compatibility.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/compatibility/ml-compatibility/dgl-compatibility.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update dgl-compatibility.rst

* Update dgl-compatibility.rst

* Update dgl-compatibility.rst

* Update dgl-compatibility.rst

* additions to use-cases and system support

* wording and fixes

* Update dgl-compatibility.rst

* Update dgl-compatibility.rst

* remove table heading

* Update compatibility-matrix-historical-6.0.csv

---------

Co-authored-by: anisha-amd <anisha.sankar@amd.com>
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
2025-07-15 16:17:58 -04:00
Daniel Su
505698ed3f [Ex CI] enable roc/hipSPARSE monorepo (#5042) 2025-07-15 13:49:57 -04:00
Peter Park
548d31f990 fix broken image in megatron-lm-v24.12-dev.rst (#5043) 2025-07-15 10:57:12 -04:00
Pratik Basyal
b48abafd9b Radeon RX 7700 XT added to 6.4.2 RN and compatibility matrix (#467)
* HIP 7.0 upcoming changes blog link updated

* Radeon RX 7700 XT added to release notes, compatibility matrix

* Changelog synced

* Footnote update
2025-07-14 16:43:50 -04:00
Daniel Su
32f79a966b [Ex CI] fix MIOpen CK script again (#5034) 2025-07-14 14:18:24 -04:00
randyh62
fcea6ded85 Update RELEASE.md (#462)
Add latest HIP content for 6.4.2
2025-07-11 08:56:20 -07:00
Pratik Basyal
f2f7e36503 6.4.2 Release notes updates post RC4 [Batch1] (#463)
* HIP 7.0 upcoming changes blog link updated

* Changes to AMDGPU and rocPRIM update

* formating change

* Changelog update
2025-07-11 08:36:37 -04:00
Pratik Basyal
e2e67f57e7 HIP 7.0 upcoming changes blog link updated (#461) 2025-07-10 09:53:34 -04:00
Pratik Basyal
be2bc2142b ROCm for HPC table update for 6.4.2 (#5015) (#5016) (#459)
* ROCm for HPC table update for 6.4.0 (#5015) (#5016)

* 6.4.0 updates synced

* Minor change

* Link update
2025-07-09 14:56:18 -04:00
Pratik Basyal
08b39b61b1 Post RC3 6.4.2 Release notes updates [Batch 1] (#447)
* RC3 manifest.xml added

* Warpsize reference added

* SLES15 SP7 support added

* Consolidated changelog synced

* Footnote order updated

* Footnote order

* Footnote reordered

* Leo's feedback incorporated

* SLES 15 SP7 support added to runfile

* Wording update for SLES 15 support in Runfile installer

* AMD SMI CPER known issues added

* Known issues removed post RC4

* Resolved issues added

* RC4 manifest added

* Compute Profiler highlight updated

* Changelog.md synced
2025-07-09 08:33:44 -04:00
alexxu-amd
dc0d89ff4f Merge pull request #451 from ROCm/sync-develop-from-external
Sync develop from external
2025-06-24 16:41:37 -04:00
Alex Xu
9812d8f745 Merge remote-tracking branch 'external/develop' into sync-develop-from-external 2025-06-24 15:45:03 -04:00
Pratik Basyal
4342239006 Quick fix (#446) 2025-06-23 11:48:05 -04:00
Pratik Basyal
151b9bb6d4 Review feedback to release notes post RC3 (#444)
* Review feedback to release notes

* AI tutorials addition listed in RN

* Resolved issue added

* Indentation fix

* OS support changes

* Minor fix

* SLES mention removed
2025-06-23 11:25:40 -04:00
randyh62
76549f97b9 Update RELEASE.md (#445)
Merge changes from updated Changelog
2025-06-23 11:02:50 -04:00
Pratik Basyal
535ca32590 Updates to 6.4.2 release notes post RC2 review (#439)
* Updates to release notes

* AMDGPU refrence brokenlink fix

* Jeff's feedback incorporated

* Add AMD SMI changelog for 6.4.2

* add amd smi changelog

* Apply suggestions from code review

Co-authored-by: Pratik Basyal <prbasyal@amd.com>

---------

Co-authored-by: Pratik Basyal <prbasyal@amd.com>

* AMDGPU installer highlight

* AMDGPU installation removal highlight

* AMD SMI changelog synced with consolidated changelog

* AMDSMI v25.5.1 updated in compatibility table

* fix AMD SMI changelog formatting

* Quick update

* leo's feedback incorporated

* Update on ROCm Offline Installer

* AMDGOU installer link updated

* rocBLAS changes added

* Consolidated changelog synced

* Version update post RC3

* AMD SMI upcoming changes added

* Minor change

---------

Co-authored-by: Peter Park <peter.park@amd.com>
2025-06-19 13:59:01 -04:00
yugang-amd
aa29e156a8 Remove xref to amdgpu-install (#442) 2025-06-18 14:49:35 -04:00
Istvan Kiss
666996ee2d Merge pull request #434 from ROCm/pytorch_6.4.1_comp
Docs: Pytorch compatibility page update
2025-06-18 13:21:04 +02:00
Adel Johar
51cb6461b5 Docs: Pytorch compatibility page update 2025-06-18 11:12:47 +02:00
Pratik Basyal
86efb8c0c7 6.4.2 post RC2 release notes review feedback incorporated (#437)
* Compatibility table update

* Quick updates

* HIP 7.0 change link reintroduced
2025-06-12 15:57:09 -04:00
Pratik Basyal
5d594feeac Initial changes to 6.4.2 release notes, compatibility matrix, and changelog. (#427)
* Initial 6.4.2 changes to release notes

* 6.4.2 initial changes applied

* edited entry for rocPRIM

* Add HIP content for 6.4.2

* Release notes for 6.4.2 updated

* Conf.py updated

* Review and fixed issue updated

* RCCL changelog update

* Compatibility matrix updated

* Pointed to internal linux docs

* Histrorical changelog added

* Quick change on component table

* Changelog for Systems profiler updated

* Consolidated changelog synced

* ROCm validation suite changelog added

* Leo's feedback incoporated

* Highlights added

* Manifest added for comparision

* Manifest 642 RC1 added

* RC2 manifest added

* KMD/UMD footnote updated

* Minor changes

* Consolidated changelog synced

---------

Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com>
Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
2025-06-12 13:40:02 -04:00
Adel Johar
4a05e26a0e Merge pull request #433 from ROCm/jax_6.4.1_comp
Docs: Overhaul JAX compatibility page for 6.4.1
2025-06-12 16:23:41 +02:00
Adel Johar
c699aaf915 Docs: Overhaul JAX compatibility page 2025-06-12 14:35:30 +02:00
73 changed files with 4756 additions and 2058 deletions

View File

@@ -28,8 +28,8 @@ parameters:
- name: rocmTestDependencies
type: object
default:
- amdsmi
- llvm-project
- rocm_smi_lib
- rocprofiler-register
- name: jobMatrix
@@ -111,14 +111,6 @@ jobs:
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- task: Bash@3
displayName: Install libhwloc5
inputs:
targetType: 'inline'
script: |
wget http://ftp.us.debian.org/debian/pool/main/h/hwloc/libhwloc5_1.11.12-3_amd64.deb
wget http://ftp.us.debian.org/debian/pool/main/h/hwloc/libhwloc-dev_1.11.12-3_amd64.deb
sudo apt install -y --allow-downgrades ./libhwloc5_1.11.12-3_amd64.deb ./libhwloc-dev_1.11.12-3_amd64.deb
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
@@ -161,6 +153,10 @@ jobs:
targetType: 'inline'
workingDirectory: $(Build.SourcesDirectory)/rocrtst/suites/test_common
script: |
echo $(Build.SourcesDirectory)/rocrtst/thirdparty/lib | sudo tee -a /etc/ld.so.conf.d/rocm-ci.conf
sudo cat /etc/ld.so.conf.d/rocm-ci.conf
sudo ldconfig -v
ldconfig -p
if [ -e /opt/rh/gcc-toolset-14/enable ]; then
source /opt/rh/gcc-toolset-14/enable
fi

View File

@@ -107,6 +107,7 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
gpuTarget: ${{ job.target }}
# if this artifact name is changed, please also update $ARTIFACT_URL inside miopen-get-ck-build.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
gpuTarget: ${{ job.target }}

View File

@@ -1,10 +1,29 @@
parameters:
- name: componentName
type: string
default: hipBLAS
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -69,10 +88,30 @@ parameters:
target: gfx942
- gfx90a:
target: gfx90a
# MIOpen depends on both rocRAND and hipBLAS
# for a unified build, hipBLAS will be the one to call MIOpen
# - name: downstreamComponentMatrix
# type: object
# default:
# - MIOpen:
# name: MIOpen
# sparseCheckoutDir: projects/miopen
# skipUnifiedBuild: 'false'
# buildDependsOn:
# - hipBLAS_build
# unifiedBuild:
# downstreamAggregateNames: hipBLAS+rocRAND
# buildDependsOn:
# - hipBLAS_build
# - rocRAND_build
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: hipBLAS_build_${{ job.target }}
- job: ${{ parameters.componentName }}_build_${{ job.target }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_ubuntu2204_${{ job.target }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -88,6 +127,7 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aocl.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
@@ -95,6 +135,8 @@ jobs:
dependencyList: ${{ parameters.rocmDependencies }}
gpuTarget: ${{ job.target }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
extraBuildFlags: >-
@@ -109,9 +151,12 @@ jobs:
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
componentName: ${{ parameters.componentName }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
componentName: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
@@ -121,46 +166,67 @@ jobs:
installAOCL: true
gpuTarget: ${{ job.target }}
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: hipBLAS_test_${{ job.target }}
dependsOn: hipBLAS_build_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: hipBLAS
testExecutable: $(Agent.BuildDirectory)/rocm/bin/hipblas-test
testParameters: '--yaml hipblas_smoke.yaml --gtest_output=xml:./test_output.xml --gtest_color=yes'
testDir: '$(Agent.BuildDirectory)/rocm/bin'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
environment: test
gpuTarget: ${{ job.target }}
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: ${{ parameters.componentName }}_test_${{ job.target }}
dependsOn: ${{ parameters.componentName }}_build_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- checkout: none
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
preTargetFilter: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: ${{ parameters.componentName }}
testExecutable: $(Agent.BuildDirectory)/rocm/bin/hipblas-test
testParameters: '--yaml hipblas_smoke.yaml --gtest_output=xml:./test_output.xml --gtest_color=yes'
testDir: '$(Agent.BuildDirectory)/rocm/bin'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
environment: test
gpuTarget: ${{ job.target }}
# - ${{ if parameters.triggerDownstreamJobs }}:
# - ${{ each component in parameters.downstreamComponentMatrix }}:
# - ${{ if not(and(parameters.unifiedBuild, eq(component.skipUnifiedBuild, 'true'))) }}:
# - template: /.azuredevops/components/${{ component.name }}.yml@pipelines_repo
# parameters:
# checkoutRepo: ${{ parameters.checkoutRepo }}
# sparseCheckoutDir: ${{ component.sparseCheckoutDir }}
# triggerDownstreamJobs: true
# unifiedBuild: ${{ parameters.unifiedBuild }}
# ${{ if parameters.unifiedBuild }}:
# buildDependsOn: ${{ component.unifiedBuild.buildDependsOn }}
# downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}+${{ component.unifiedBuild.downstreamAggregateNames }}
# ${{ else }}:
# buildDependsOn: ${{ component.buildDependsOn }}
# downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}+${{ parameters.componentName }}

View File

@@ -36,8 +36,10 @@ parameters:
- gfortran
- git
- libdrm-dev
- liblapack-dev
- libmsgpack-dev
- libnuma-dev
- libopenblas-dev
- ninja-build
- python3-pip
- python3-venv
@@ -46,6 +48,12 @@ parameters:
default:
- joblib
- "packaging>=22.0"
- pyyaml
- msgpack
- simplejson
- ujson
- orjson
- yappi
- --upgrade
- name: rocmDependencies
type: object
@@ -195,6 +203,7 @@ jobs:
-DCMAKE_CXX_COMPILER_LAUNCHER=ccache
-DCMAKE_C_COMPILER_LAUNCHER=ccache
-DAMDGPU_TARGETS=${{ job.target }}
-DGPU_TARGETS=${{ job.target }}
-DBUILD_CLIENTS_TESTS=ON
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml

View File

@@ -1,10 +1,29 @@
parameters:
- name: componentName
type: string
default: hipSPARSE
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -14,13 +33,11 @@ parameters:
type: object
default:
- cmake
- ninja-build
- libboost-program-options-dev
- googletest
- libfftw3-dev
- git
- gfortran
- libgtest-dev
- git
- libboost-program-options-dev
- libfftw3-dev
- ninja-build
- python3-pip
- name: rocmDependencies
type: object
@@ -49,19 +66,31 @@ parameters:
type: object
default:
buildJobs:
- gfx942:
target: gfx942
- gfx90a:
target: gfx90a
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
- { os: ubuntu2204, packageManager: apt, target: gfx1201 }
- { os: ubuntu2204, packageManager: apt, target: gfx1030 }
- { os: ubuntu2204, packageManager: apt, target: gfx1100 }
testJobs:
- gfx942:
target: gfx942
- gfx90a:
target: gfx90a
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
- name: downstreamComponentMatrix
type: object
default:
- hipSPARSELt:
name: hipSPARSELt
sparseCheckoutDir: projects/hipsparselt
skipUnifiedBuild: 'false'
buildDependsOn:
- hipSPARSE_build
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: hipSPARSE_build_${{ job.target }}
- job: ${{ parameters.componentName }}_build_${{ job.os }}_${{ job.target }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.os }}_${{ job.target }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -73,42 +102,57 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-vendor.yml
parameters:
dependencyList:
- gtest
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
extraBuildFlags: >-
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm;$(Agent.BuildDirectory)/vendor
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/bin/amdclang
-DCMAKE_BUILD_TYPE=Release
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm;$(Agent.BuildDirectory)/rocm/share/rocm/cmake/
-DBUILD_CLIENTS_TESTS=ON
-DBUILD_CLIENTS_SAMPLES=OFF
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
artifactName: hipSPARSE
componentName: ${{ parameters.componentName }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
artifactName: hipSPARSE
componentName: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
publish: false
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-prepare-package.yml
parameters:
sourceDir: $(Build.SourcesDirectory)/build/clients
sourceDir: $(Agent.BuildDirectory)/s/build/clients
contentsString: matrices/**
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
componentName: ${{ parameters.componentName }}
artifactName: testMatrices
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
# - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
# parameters:
@@ -116,44 +160,65 @@ jobs:
# environment: test
# gpuTarget: ${{ job.target }}
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: hipSPARSE_test_${{ job.target }}
dependsOn: hipSPARSE_build_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: hipSPARSE
testDir: '$(Agent.BuildDirectory)/rocm/bin'
testExecutable: './hipsparse-test'
testParameters: '--gtest_output=xml:./test_output.xml --gtest_color=yes'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
environment: test
gpuTarget: ${{ job.target }}
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: ${{ parameters.componentName }}_test_${{ job.os }}_${{ job.target }}
dependsOn: ${{ parameters.componentName }}_build_${{ job.os }}_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- checkout: none
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
preTargetFilter: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: ${{ parameters.componentName }}
os: ${{ job.os }}
testDir: '$(Agent.BuildDirectory)/rocm/bin'
testExecutable: './hipsparse-test'
testParameters: '--gtest_output=xml:./test_output.xml --gtest_color=yes'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
environment: test
gpuTarget: ${{ job.target }}
- ${{ if parameters.triggerDownstreamJobs }}:
- ${{ each component in parameters.downstreamComponentMatrix }}:
- ${{ if not(and(parameters.unifiedBuild, eq(component.skipUnifiedBuild, 'true'))) }}:
- template: /.azuredevops/components/${{ component.name }}.yml@pipelines_repo
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ component.sparseCheckoutDir }}
buildDependsOn: ${{ component.buildDependsOn }}
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}+${{ parameters.componentName }}
triggerDownstreamJobs: true
unifiedBuild: ${{ parameters.unifiedBuild }}

View File

@@ -75,19 +75,17 @@ parameters:
type: object
default:
buildJobs:
- gfx942:
target: gfx942
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
testJobs:
- gfx942:
target: gfx942
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: ${{ parameters.componentName }}_build_ubuntu2204_${{ job.target }}
- job: ${{ parameters.componentName }}_build_${{ job.os }}_${{ job.target }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_ubuntu2204_${{ job.target }}
- ${{ build }}_${{ job.os }}_${{ job.target }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -109,6 +107,7 @@ jobs:
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-latest.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
@@ -120,6 +119,7 @@ jobs:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
@@ -141,6 +141,7 @@ jobs:
workingDirectory: $(Pipeline.Workspace)/deps
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
extraBuildFlags: >-
-DCMAKE_BUILD_TYPE=Release
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
@@ -156,37 +157,41 @@ jobs:
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
componentName: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
componentName: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
gpuTarget: ${{ job.target }}
extraCopyDirectories:
- deps
extraPaths: /home/user/workspace/rocm/llvm/bin:/home/user/workspace/rocm/bin
extraEnvVars:
- HIP_ROCCLR_HOME:::/home/user/workspace/rocm
- TENSILE_ROCM_ASSEMBLER_PATH:::/home/user/workspace/rocm/llvm/bin/clang
- CMAKE_CXX_COMPILER:::/home/user/workspace/rocm/llvm/bin/hipcc
- TENSILE_ROCM_OFFLOAD_BUNDLER_PATH:::/home/user/workspace/rocm/llvm/bin/clang-offload-bundler
installLatestCMake: true
- ${{ if eq(job.os, 'ubuntu2204') }}:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
pipModules: ${{ parameters.pipModules }}
gpuTarget: ${{ job.target }}
extraCopyDirectories:
- deps
extraPaths: /home/user/workspace/rocm/llvm/bin:/home/user/workspace/rocm/bin
extraEnvVars:
- HIP_ROCCLR_HOME:::/home/user/workspace/rocm
- TENSILE_ROCM_ASSEMBLER_PATH:::/home/user/workspace/rocm/llvm/bin/clang
- CMAKE_CXX_COMPILER:::/home/user/workspace/rocm/llvm/bin/hipcc
- TENSILE_ROCM_OFFLOAD_BUNDLER_PATH:::/home/user/workspace/rocm/llvm/bin/clang-offload-bundler
installLatestCMake: true
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: ${{ parameters.componentName }}_test_ubuntu2204_${{ job.target }}
- job: ${{ parameters.componentName }}_test_${{ job.os }}_${{ job.target }}
timeoutInMinutes: 120
dependsOn: ${{ parameters.componentName }}_build_ubuntu2204_${{ job.target }}
dependsOn: ${{ parameters.componentName }}_build_${{ job.os }}_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
@@ -200,21 +205,26 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
preTargetFilter: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: ${{ parameters.componentName }}
os: ${{ job.os }}
testDir: '$(Agent.BuildDirectory)/rocm/bin'
testExecutable: './hipsparselt-test'
testParameters: '--gtest_output=xml:./test_output.xml --gtest_color=yes --gtest_filter=*pre_checkin*'

View File

@@ -96,27 +96,25 @@ parameters:
- name: downstreamComponentMatrix
type: object
default:
# technically hipSPARSELt is a downstream component of hipSPARSE
# since hipSPARSE is not yet enabled, we will trigger it from rocBLAS in the interim
- hipSPARSELt:
name: hipSPARSELt
sparseCheckoutDir: projects/hipsparselt
- rocSPARSE:
name: rocSPARSE
sparseCheckoutDir: projects/rocsparse
skipUnifiedBuild: 'false'
buildDependsOn:
- rocBLAS_build
# rocSOLVER depends on both rocBLAS and rocPRIM
# for a unified build, rocBLAS will be the one to call rocSOLVER
# - rocSOLVER:
# name: rocSOLVER
# sparseCheckoutDir: projects/rocsolver
# skipUnifiedBuild: 'false'
# buildDependsOn:
# - rocBLAS_build
# unifiedBuild:
# downstreamAggregateNames: rocBLAS+rocPRIM
# buildDependsOn:
# - rocBLAS_build
# - rocPRIM_build
- rocSOLVER:
name: rocSOLVER
sparseCheckoutDir: projects/rocsolver
skipUnifiedBuild: 'false'
buildDependsOn:
- rocBLAS_build
unifiedBuild:
downstreamAggregateNames: rocBLAS+rocPRIM
buildDependsOn:
- rocBLAS_build
- rocPRIM_build
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:

View File

@@ -91,12 +91,12 @@ parameters:
- rocPRIM_build
# rocSOLVER depends on both rocBLAS and rocPRIM
# for a unified build, rocBLAS will be the one to call rocSOLVER
# - rocSOLVER:
# name: rocSOLVER
# sparseCheckoutDir: projects/rocsolver
# skipUnifiedBuild: 'true'
# buildDependsOn:
# - rocPRIM_build
- rocSOLVER:
name: rocSOLVER
sparseCheckoutDir: projects/rocsolver
skipUnifiedBuild: 'true'
buildDependsOn:
- rocPRIM_build
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:

View File

@@ -83,6 +83,28 @@ parameters:
testJobs:
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
- name: downstreamComponentMatrix
type: object
default:
- hipBLAS:
name: hipBLAS
sparseCheckoutDir: projects/hipblas
skipUnifiedBuild: 'false'
buildDependsOn:
- rocSOLVER_build
# hipSOLVER depends on both rocSOLVER and rocSPARSE
# for a unified build, rocSOLVER will be the one to call hipSOLVER
# - hipSOLVER:
# name: hipSOLVER
# sparseCheckoutDir: projects/hipsolver
# skipUnifiedBuild: 'false'
# buildDependsOn:
# - rocSOLVER_build
# unifiedBuild:
# downstreamAggregateNames: rocSOLVER+rocSPARSE
# buildDependsOn:
# - rocSOLVER_build
# - rocSPARSE_build
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
@@ -228,3 +250,19 @@ jobs:
aptPackages: ${{ parameters.aptPackages }}
environment: test
gpuTarget: ${{ job.target }}
- ${{ if parameters.triggerDownstreamJobs }}:
- ${{ each component in parameters.downstreamComponentMatrix }}:
- ${{ if not(and(parameters.unifiedBuild, eq(component.skipUnifiedBuild, 'true'))) }}:
- template: /.azuredevops/components/${{ component.name }}.yml@pipelines_repo
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ component.sparseCheckoutDir }}
triggerDownstreamJobs: true
unifiedBuild: ${{ parameters.unifiedBuild }}
${{ if parameters.unifiedBuild }}:
buildDependsOn: ${{ component.unifiedBuild.buildDependsOn }}
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}+${{ component.unifiedBuild.downstreamAggregateNames }}
${{ else }}:
buildDependsOn: ${{ component.buildDependsOn }}
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}+${{ parameters.componentName }}

View File

@@ -1,10 +1,29 @@
parameters:
- name: componentName
type: string
default: rocSPARSE
- name: checkoutRepo
type: string
default: 'self'
- name: checkoutRef
type: string
default: ''
# monorepo related parameters
- name: sparseCheckoutDir
type: string
default: ''
- name: triggerDownstreamJobs
type: boolean
default: false
- name: downstreamAggregateNames
type: string
default: ''
- name: buildDependsOn
type: object
default: null
- name: unifiedBuild
type: boolean
default: false
# set to true if doing full build of ROCm stack
# and dependencies are pulled from same pipeline
- name: aggregatePipeline
@@ -13,27 +32,25 @@ parameters:
- name: aptPackages
type: object
default:
- python3-pip
- cmake
- ninja-build
- libboost-program-options-dev
- googletest
- libfftw3-dev
- git
- gfortran
- libgtest-dev
- git
- libboost-program-options-dev
- libdrm-dev
- libfftw3-dev
- ninja-build
- python3-pip
- name: rocmDependencies
type: object
default:
- rocm-cmake
- llvm-project
- ROCR-Runtime
- clr
- llvm-project
- rocBLAS
- rocm-cmake
- rocminfo
- rocPRIM
- rocprofiler-register
- ROCR-Runtime
- roctracer
- name: rocmTestDependencies
type: object
@@ -52,19 +69,39 @@ parameters:
type: object
default:
buildJobs:
- gfx942:
target: gfx942
- gfx90a:
target: gfx90a
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
- { os: ubuntu2204, packageManager: apt, target: gfx1201 }
- { os: ubuntu2204, packageManager: apt, target: gfx1100 }
- { os: ubuntu2204, packageManager: apt, target: gfx1030 }
testJobs:
- gfx942:
target: gfx942
- gfx90a:
target: gfx90a
- { os: ubuntu2204, packageManager: apt, target: gfx942 }
- { os: ubuntu2204, packageManager: apt, target: gfx90a }
- name: downstreamComponentMatrix
type: object
default:
- hipSPARSE:
name: hipSPARSE
sparseCheckoutDir: projects/hipsparse
skipUnifiedBuild: 'false'
buildDependsOn:
- rocSPARSE_build
# hipSOLVER depends on both rocSOLVER and rocSPARSE
# for a unified build, rocSOLVER will be the one to call hipSOLVER
# - hipSOLVER:
# name: hipSOLVER
# sparseCheckoutDir: projects/hipsolver
# skipUnifiedBuild: 'true'
# buildDependsOn:
# - rocSPARSE_build
jobs:
- ${{ each job in parameters.jobMatrix.buildJobs }}:
- job: rocSPARSE_build_${{ job.target }}
- job: ${{ parameters.componentName }}_build_${{ job.os }}_${{ job.target }}
${{ if parameters.buildDependsOn }}:
dependsOn:
- ${{ each build in parameters.buildDependsOn }}:
- ${{ build }}_${{ job.os }}_${{ job.target }}
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -77,22 +114,32 @@ jobs:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-vendor.yml
parameters:
dependencyList:
- gtest
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmDependencies }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
aggregatePipeline: ${{ parameters.aggregatePipeline }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/build-cmake.yml
parameters:
os: ${{ job.os }}
extraBuildFlags: >-
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/bin/hipcc
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/bin/hipcc
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm;$(Agent.BuildDirectory)/vendor
-DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/bin/amdclang++
-DCMAKE_C_COMPILER=$(Agent.BuildDirectory)/rocm/bin/amdclang
-DROCM_PATH=$(Agent.BuildDirectory)/rocm
-DCMAKE_BUILD_TYPE=Release
-DAMDGPU_TARGETS=${{ job.target }}
@@ -103,68 +150,94 @@ jobs:
-GNinja
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/manifest.yml
parameters:
artifactName: rocSPARSE
componentName: ${{ parameters.componentName }}
sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
artifactName: rocSPARSE
componentName: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
publish: false
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-prepare-package.yml
parameters:
sourceDir: $(Build.SourcesDirectory)/build/clients
sourceDir: $(Agent.BuildDirectory)/s/build/clients
contentsString: matrices/**
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-upload.yml
parameters:
componentName: ${{ parameters.componentName }}
artifactName: testMatrices
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
gpuTarget: ${{ job.target }}
extraEnvVars:
- HIP_ROCCLR_HOME:::/home/user/workspace/rocm
- ${{ if eq(job.os, 'ubuntu2204') }}:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
gpuTarget: ${{ job.target }}
extraEnvVars:
- HIP_ROCCLR_HOME:::/home/user/workspace/rocm
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: rocSPARSE_test_${{ job.target }}
timeoutInMinutes: 90
dependsOn: rocSPARSE_build_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), variables['Build.DefinitionName'])),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: rocSPARSE
testDir: '$(Agent.BuildDirectory)/rocm/bin'
testExecutable: './rocsparse-test'
testParameters: '--gtest_filter="*quick*" --gtest_output=xml:./test_output.xml --gtest_color=yes'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
environment: test
gpuTarget: ${{ job.target }}
- ${{ if eq(parameters.unifiedBuild, False) }}:
- ${{ each job in parameters.jobMatrix.testJobs }}:
- job: ${{ parameters.componentName }}_test_${{ job.os }}_${{ job.target }}
timeoutInMinutes: 120
dependsOn: ${{ parameters.componentName }}_build_${{ job.os }}_${{ job.target }}
condition:
and(succeeded(),
eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
eq(${{ parameters.aggregatePipeline }}, False)
)
variables:
- group: common
- template: /.azuredevops/variables-global.yml
pool: ${{ job.target }}_test_pool
workspace:
clean: all
steps:
- checkout: none
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
packageManager: ${{ job.packageManager }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
parameters:
preTargetFilter: ${{ parameters.componentName }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-aqlprofile.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
parameters:
checkoutRef: ${{ parameters.checkoutRef }}
dependencyList: ${{ parameters.rocmTestDependencies }}
gpuTarget: ${{ job.target }}
os: ${{ job.os }}
${{ if parameters.triggerDownstreamJobs }}:
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
parameters:
componentName: ${{ parameters.componentName }}
os: ${{ job.os }}
testDir: '$(Agent.BuildDirectory)/rocm/bin'
testExecutable: './rocsparse-test'
testParameters: '--gtest_filter="*quick*" --gtest_output=xml:./test_output.xml --gtest_color=yes'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}
environment: test
gpuTarget: ${{ job.target }}
- ${{ if parameters.triggerDownstreamJobs }}:
- ${{ each component in parameters.downstreamComponentMatrix }}:
- ${{ if not(and(parameters.unifiedBuild, eq(component.skipUnifiedBuild, 'true'))) }}:
- template: /.azuredevops/components/${{ component.name }}.yml@pipelines_repo
parameters:
checkoutRepo: ${{ parameters.checkoutRepo }}
sparseCheckoutDir: ${{ component.sparseCheckoutDir }}
buildDependsOn: ${{ component.buildDependsOn }}
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}+${{ parameters.componentName }}
triggerDownstreamJobs: true
unifiedBuild: ${{ parameters.unifiedBuild }}

View File

@@ -184,7 +184,7 @@ jobs:
parameters:
componentName: rocm-examples
testDir: $(Build.SourcesDirectory)/build
testParameters: '--output-on-failure --force-new-ctest-process --output-junit test_output.xml --exclude-regex "rocfft_callback"'
testParameters: '--output-on-failure --force-new-ctest-process --output-junit test_output.xml'
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
parameters:
aptPackages: ${{ parameters.aptPackages }}

View File

@@ -3,21 +3,21 @@ parameters:
- name: jobList
type: object
default:
- { os: ubuntu2204, target: gfx942, source: staging }
- { os: ubuntu2204, target: gfx90a, source: staging }
- { os: ubuntu2204, target: gfx1201, source: staging }
- { os: ubuntu2204, target: gfx1100, source: staging }
- { os: ubuntu2204, target: gfx1030, source: staging }
- { os: ubuntu2404, target: gfx942, source: staging }
- { os: ubuntu2404, target: gfx90a, source: staging }
- { os: ubuntu2404, target: gfx1201, source: staging }
- { os: ubuntu2404, target: gfx1100, source: staging }
- { os: ubuntu2404, target: gfx1030, source: staging }
- { os: almalinux8, target: gfx942, source: staging }
- { os: almalinux8, target: gfx90a, source: staging }
- { os: almalinux8, target: gfx1201, source: staging }
- { os: almalinux8, target: gfx1100, source: staging }
- { os: almalinux8, target: gfx1030, source: staging }
- { os: ubuntu2204, packageManager: apt, target: gfx942, source: staging }
- { os: ubuntu2204, packageManager: apt, target: gfx90a, source: staging }
- { os: ubuntu2204, packageManager: apt, target: gfx1201, source: staging }
- { os: ubuntu2204, packageManager: apt, target: gfx1100, source: staging }
- { os: ubuntu2204, packageManager: apt, target: gfx1030, source: staging }
- { os: ubuntu2404, packageManager: apt, target: gfx942, source: staging }
- { os: ubuntu2404, packageManager: apt, target: gfx90a, source: staging }
- { os: ubuntu2404, packageManager: apt, target: gfx1201, source: staging }
- { os: ubuntu2404, packageManager: apt, target: gfx1100, source: staging }
- { os: ubuntu2404, packageManager: apt, target: gfx1030, source: staging }
- { os: almalinux8, packageManager: dnf, target: gfx942, source: staging }
- { os: almalinux8, packageManager: dnf, target: gfx90a, source: staging }
- { os: almalinux8, packageManager: dnf, target: gfx1201, source: staging }
- { os: almalinux8, packageManager: dnf, target: gfx1100, source: staging }
- { os: almalinux8, packageManager: dnf, target: gfx1030, source: staging }
- name: rocmDependencies
type: object
default:
@@ -92,7 +92,8 @@ schedules:
jobs:
- ${{ each job in parameters.jobList }}:
- job: rocm_nightly_${{ job.os }}_${{ job.target }}_${{ job.source }}
- job: nightly_${{ job.os }}_${{ job.target }}_${{ job.source }}
timeoutInMinutes: 90
variables:
- group: common
- template: /.azuredevops/variables-global.yml
@@ -131,7 +132,7 @@ jobs:
includeRootFolder: false
archiveType: tar
tarCompression: gz
archiveFile: $(Build.ArtifactStagingDirectory)/$(Build.DefinitionName)_$(Build.BuildNumber)_ubuntu2204_${{ job.target }}.tar.gz
archiveFile: $(Build.ArtifactStagingDirectory)/$(Build.DefinitionName)_$(Build.BuildNumber)_${{ job.os }}_${{ job.target }}.tar.gz
- script: du -sh $(Build.ArtifactStagingDirectory)
displayName: Compressed ROCm size
- task: PublishPipelineArtifact@1
@@ -144,5 +145,95 @@ jobs:
inputs:
workingDirectory: $(Pipeline.Workspace)
targetType: inline
script: echo "$(Build.DefinitionName)_$(Build.BuildNumber)_ubuntu2204_${{ job.target }}.tar.gz" >> pipelineArtifacts.txt
script: echo "$(Build.DefinitionName)_$(Build.BuildNumber)_${{ job.os }}_${{ job.target }}.tar.gz" >> pipelineArtifacts.txt
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/artifact-links.yml
- ${{ if eq(job.packageManager, 'apt') }}:
- task: Bash@3
displayName: Create Dockerfile
inputs:
workingDirectory: $(Agent.BuildDirectory)
targetType: inline
script: |
cat <<'EOF' > Dockerfile
${{ iif(eq(job.os, 'ubuntu2204'), 'FROM ubuntu:22.04', '') }}
${{ iif(eq(job.os, 'ubuntu2404'), 'FROM ubuntu:24.04', '') }}
WORKDIR /root
RUN mkdir rocm
RUN apt update \
&& apt upgrade -y \
&& apt install -y cmake curl git gcc g++ gpg lsb-release lsof ninja-build pkg-config python3 python3-pip wget zip libdrm-dev libelf-dev libgtest-dev libhsakmt-dev libhwloc-dev libnuma-dev libstdc++-12-dev libtbb-dev jq \
&& apt clean all
RUN PACKAGE_NAME=$(curl -s https://repo.radeon.com/rocm/apt/latest/pool/main/h/hsa-amd-aqlprofile/ | grep -oP "href=\"\K[^\"]*$(lsb_release -rs)[^\"]*\.deb") \
&& wget -nv --retry-connrefused https://repo.radeon.com/rocm/apt/latest/pool/main/h/hsa-amd-aqlprofile/$PACKAGE_NAME \
&& mkdir hsa-amd-aqlprofile \
&& dpkg-deb -R $PACKAGE_NAME hsa-amd-aqlprofile \
&& cp -R hsa-amd-aqlprofile/opt/rocm-*/* rocm
RUN ARTIFACT_URL="https://dev.azure.com/ROCm-CI/ROCm-CI/_apis/build/builds/$(Build.BuildId)/artifacts?artifactName=nightly${{ job.os }}${{ job.target }}${{ job.source }}&api-version=7.1" \
&& DOWNLOAD_URL=$(curl -s $ARTIFACT_URL | jq ".resource.downloadUrl" | tr -d '"') \
&& wget -nv --retry-connrefused $DOWNLOAD_URL -O nightly.zip \
&& unzip nightly.zip \
&& tar -xf nightly${{ job.os }}${{ job.target }}${{ job.source }}/rocm-nightly*${{ job.os }}*${{ job.target }}*.tar.gz -C rocm
RUN echo /root/rocm/lib | tee /etc/ld.so.conf.d/rocm-ci.conf
RUN echo /root/rocm/llvm/lib | tee -a /etc/ld.so.conf.d/rocm-ci.conf
RUN echo /root/rocm/lib64 | tee -a /etc/ld.so.conf.d/rocm-ci.conf
RUN echo /root/rocm/llvm/lib64 | tee -a /etc/ld.so.conf.d/rocm-ci.conf
RUN ldconfig -v
ENV PATH="$PATH:/root/rocm/bin"
ENTRYPOINT ["/bin/bash"]
EOF
cat Dockerfile
- ${{ elseif eq(job.packageManager, 'dnf') }}:
- task: Bash@3
displayName: Create Dockerfile
inputs:
workingDirectory: $(Agent.BuildDirectory)
targetType: inline
script: |
cat <<'EOF' > Dockerfile
${{ iif(eq(job.os, 'almalinux8'), 'FROM almalinux:8', '') }}
WORKDIR /root
RUN mkdir rocm
RUN dnf install -y cmake curl git gcc gcc-c++ gnupg2 redhat-lsb-core lsof pkgconf python3 python3-pip wget zip libdrm-devel elfutils-libelf-devel numactl-devel libstdc++-devel tbb-devel jq \
&& dnf clean all
RUN PACKAGE_NAME=$(curl -s https://repo.radeon.com/rocm/rhel8/$(REPO_RADEON_VERSION)/main/ | grep -oP "hsa-amd-aqlprofile-[^\"]+\.rpm" | head -n1) \
&& wget -nv --retry-connrefused https://repo.radeon.com/rocm/rhel8/$(REPO_RADEON_VERSION)/main/$PACKAGE_NAME \
&& mkdir hsa-amd-aqlprofile \
&& dnf -y install rpm-build cpio \
&& rpm2cpio $PACKAGE_NAME | (cd hsa-amd-aqlprofile && cpio -idmv) \
&& cp -R hsa-amd-aqlprofile/opt/rocm-*/* rocm
RUN ARTIFACT_URL="https://dev.azure.com/ROCm-CI/ROCm-CI/_apis/build/builds/$(Build.BuildId)/artifacts?artifactName=nightly${{ job.os }}${{ job.target }}${{ job.source }}&api-version=7.1" \
&& DOWNLOAD_URL=$(curl -s $ARTIFACT_URL | jq ".resource.downloadUrl" | tr -d '"') \
&& wget -nv --retry-connrefused $DOWNLOAD_URL -O nightly.zip \
&& UNZIP_DISABLE_ZIPBOMB_DETECTION=TRUE unzip nightly.zip \
&& tar -xf nightly${{ job.os }}${{ job.target }}${{ job.source }}/rocm-nightly*${{ job.os }}*${{ job.target }}*.tar.gz -C rocm
RUN echo /root/rocm/lib | tee /etc/ld.so.conf.d/rocm-ci.conf
RUN echo /root/rocm/llvm/lib | tee -a /etc/ld.so.conf.d/rocm-ci.conf
RUN echo /root/rocm/lib64 | tee -a /etc/ld.so.conf.d/rocm-ci.conf
RUN echo /root/rocm/llvm/lib64 | tee -a /etc/ld.so.conf.d/rocm-ci.conf
RUN ldconfig -v
ENV PATH="$PATH:/root/rocm/bin"
ENTRYPOINT ["/bin/bash"]
EOF
cat Dockerfile
- task: Docker@2
displayName: Build and upload Docker image
inputs:
containerRegistry: ContainerService3
repository: 'nightly-${{ job.os }}-${{ job.target }}-${{ job.source }}'
Dockerfile: '$(Agent.BuildDirectory)/Dockerfile'
buildContext: '$(Agent.BuildDirectory)'
- task: Bash@3
displayName: '!! Docker Run Command !!'
inputs:
targetType: inline
script: echo "docker run -it --network=host --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined rocmexternalcicd.azurecr.io/nightly-${{ job.os }}-${{ job.target }}-${{ job.source }}:$(Build.BuildId)" | tr '[:upper:]' '[:lower:]'

View File

@@ -46,5 +46,6 @@ steps:
displayName: '${{ parameters.artifactName }} Publish'
retryCountOnTaskFailure: 3
inputs:
# if this artifact name is changed, please also update $ARTIFACT_URL inside miopen-get-ck-build.yml
artifactName: $(Agent.JobName)_$(System.JobAttempt)
targetPath: '$(Build.ArtifactStagingDirectory)'

View File

@@ -54,11 +54,13 @@ parameters:
libfftw3-dev: fftw-devel
libfmt-dev: fmt-devel
libgmp-dev: gmp-devel
liblapack-dev: lapack-devel
liblzma-dev: xz-devel
libmpfr-dev: mpfr-devel
libmsgpack-dev: msgpack-devel
libncurses5-dev: ncurses-devel
libnuma-dev: numactl-devel
libopenblas-dev: openblas-devel
libopenmpi-dev: openmpi-devel
libpci-dev: libpciaccess-devel
libssl-dev: openssl-devel

View File

@@ -40,10 +40,10 @@ steps:
AZURE_URL="$AZ_API/build/builds/$CK_BUILD_ID/artifacts?api-version=7.1"
ARTIFACT_URL=$(curl -s $AZURE_URL | \
jq --arg os "ubuntu2204" --arg gfx "${{ parameters.gpuTarget }}" '
jq --arg gfx "${{ parameters.gpuTarget }}" '
.value
| map(select(.name | test($os) and test($gfx)))
| max_by(.name | capture("drop_(?<dropNumber>\\d+)").dropNumber | tonumber)
| map(select(.name | test($gfx)))
| max_by(.name | capture("_(?<dropNumber>\\d+)").dropNumber | tonumber)
| .resource.downloadUrl
' | \
tr -d '"')
@@ -59,7 +59,7 @@ steps:
jq --arg os "ubuntu2204" --arg gfx "${{ parameters.gpuTarget }}" '
.value
| map(select(.name | test($os) and test($gfx)))
| max_by(.name | capture("drop_(?<dropNumber>\\d+)").dropNumber | tonumber)
| max_by(.name | capture("_(?<dropNumber>\\d+)").dropNumber | tonumber)
| .resource.downloadUrl
' | \
tr -d '"')

View File

@@ -32,13 +32,13 @@ variables:
- name: GFX90A_TEST_POOL
value: gfx90a_test_pool
- name: LATEST_RELEASE_VERSION
value: 6.4.1
value: 6.4.2
- name: REPO_RADEON_VERSION
value: 6.4.1
value: 6.4.2
- name: NEXT_RELEASE_VERSION
value: 7.0.0
- name: LATEST_RELEASE_TAG
value: rocm-6.4.1
value: rocm-6.4.2
- name: DOCKER_SKIP_GFX
value: gfx90a
- name: AMDMIGRAPHX_PIPELINE_ID
@@ -68,7 +68,7 @@ variables:
- name: HIPBLAS_COMMON_PIPELINE_ID
value: 300
- name: HIPBLAS_PIPELINE_ID
value: 87
value: 317
- name: HIPBLASLT_PIPELINE_ID
value: 301
- name: HIPCUB_PIPELINE_ID
@@ -84,9 +84,9 @@ variables:
- name: HIPSOLVER_PIPELINE_ID
value: 84
- name: HIPSPARSE_PIPELINE_ID
value: 83
value: 315
- name: HIPSPARSELT_PIPELINE_ID
value: 104
value: 309
- name: HIPTENSOR_PIPELINE_ID
value: 105
- name: LLVM_PROJECT_PIPELINE_ID
@@ -154,7 +154,7 @@ variables:
- name: ROCSOLVER_PIPELINE_ID
value: 81
- name: ROCSPARSE_PIPELINE_ID
value: 98
value: 314
- name: ROCTHRUST_PIPELINE_ID
value: 276
- name: ROCTRACER_PIPELINE_ID

View File

@@ -6,6 +6,7 @@ ACS
AccVGPR
AccVGPRs
ALU
AllReduce
AMD
AMDGPU
AMDGPUs
@@ -13,6 +14,7 @@ AMDMIGraphX
AMI
AOCC
AOMP
AOT
AOTriton
APBDIS
APIC
@@ -32,6 +34,7 @@ Andrej
Arb
Autocast
BARs
BatchNorm
BLAS
BMC
BabelStream
@@ -79,10 +82,13 @@ ConnectX
CuPy
da
Dashboarding
Dataloading
DBRX
DDR
DF
DGEMM
DGL
DGLGraph
dGPU
dGPUs
DIMM
@@ -100,6 +106,7 @@ DataFrame
DataLoader
DataParallel
Debian
decompositions
DeepSeek
DeepSpeed
Dependabot
@@ -125,10 +132,12 @@ FX
Filesystem
FindDb
Flang
FlashAttention
FluxBenchmark
Fortran
Fuyu
GALB
GAT
GCC
GCD
GCDs
@@ -156,6 +165,8 @@ GPT
GPU
GPU's
GPUs
Graphbolt
GraphSage
GRBM
GenAI
GenZ
@@ -168,6 +179,7 @@ HIPCC
HIPExtension
HIPIFY
HIPification
hipification
HIPify
HPC
HPCG
@@ -182,6 +194,7 @@ Higgs
Hyperparameters
Huggingface
ICD
ICT
ICV
IDE
IDEs
@@ -216,6 +229,7 @@ KV
KVM
Karpathy's
KiB
Kineto
Keras
Khronos
LAPACK
@@ -228,6 +242,7 @@ LM
LSAN
LSan
LTS
LSTMs
LanguageCrossEntropy
LoRA
MEM
@@ -264,6 +279,7 @@ Miniconda
MirroredStrategy
Mixtral
MosaicML
Mpops
Multicore
Multithreaded
MyEnvironment
@@ -277,6 +293,7 @@ NIC
NICs
NLI
NLP
NN
NPKit
NPS
NSP
@@ -313,6 +330,7 @@ OpenMPI
OpenSSL
OpenVX
OpenXLA
Optim
Oversubscription
PagedAttention
Pallas
@@ -351,6 +369,7 @@ RDC's
RDMA
RDNA
README
Recomputation
RHEL
RMW
RNN
@@ -383,11 +402,13 @@ Ryzen
SALU
SBIOS
SCA
ScaledGEMM
SDK
SDMA
SDPA
SDRAM
SENDMSG
SGLang
SGPR
SGPRs
SHA
@@ -423,6 +444,8 @@ TCI
TCIU
TCP
TCR
TensorRT
TensorFloat
TF
TFLOPS
TP
@@ -509,6 +532,7 @@ allocator
allocators
amdgpu
api
aten
atmi
atomics
autogenerated
@@ -679,6 +703,7 @@ installable
interop
interprocedural
intra
intrinsics
invariants
invocating
ipo
@@ -697,11 +722,13 @@ linearized
linter
linux
llvm
lm
localscratch
logits
lossy
macOS
matchers
megatron
microarchitecture
migraphx
migratable
@@ -773,6 +800,7 @@ quantile
quantizer
quasirandom
queueing
qwen
radeon
rccl
rdc
@@ -781,6 +809,7 @@ reStructuredText
redirections
refactorization
reformats
reinforcememt
repo
repos
representativeness
@@ -788,6 +817,7 @@ req
resampling
rescaling
reusability
RLHF
roadmap
roc
rocAL
@@ -825,6 +855,7 @@ roctracer
rst
runtime
runtimes
ResNet
sL
scalability
scalable
@@ -833,6 +864,7 @@ seealso
sendmsg
seqs
serializers
sglang
shader
sharding
sigmoid
@@ -840,6 +872,7 @@ sm
smi
softmax
spack
spmm
src
stochastically
strided
@@ -848,6 +881,7 @@ subdirectory
subexpression
subfolder
subfolders
submatrix
submodule
submodules
subnet
@@ -872,6 +906,7 @@ torchvision
tqdm
tracebacks
txt
TopK
uarch
uncached
uncacheable
@@ -899,6 +934,7 @@ vectorize
vectorized
vectorizer
vectorizes
verl
virtualize
virtualized
vjxb

View File

@@ -4,6 +4,162 @@ This page is a historical overview of changes made to ROCm components. This
consolidated changelog documents key modifications and improvements across
different versions of the ROCm software stack and its components.
## ROCm 6.4.2
See the [ROCm 6.4.2 release notes](https://rocm.docs.amd.com/en/docs-6.4.2/about/release-notes.html)
for a complete overview of this release.
### **AMD SMI** (25.5.1)
#### Added
- Compute Unit Occupancy information per process.
- Support for getting the GPU Board voltage.
- New firmware PLDM_BUNDLE. `amd-smi firmware` can now show the PLDM Bundle on supported systems.
- `amd-smi ras --afid --cper-file <file_path>` to decode CPER records.
#### Changed
- Padded `asic_serial` in `amdsmi_get_asic_info` with 0s.
- Renamed field `COMPUTE_PARTITION` to `ACCELERATOR_PARTITION` in CLI call `amd-smi --partition`.
#### Resolved issues
- Corrected VRAM memory calculation in `amdsmi_get_gpu_process_list`. Previously, the VRAM memory usage reported by `amdsmi_get_gpu_process_list` was inaccurate and was calculated using KB instead of KiB.
```{note}
See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions.
```
### **HIP** (6.4.2)
#### Added
* HIP API implementation for `hipEventRecordWithFlags`, records an event in the specified stream with flags.
* Support for the pointer attribute `HIP_POINTER_ATTRIBUTE_CONTEXT`.
* Support for the flags `hipEventWaitDefault` and `hipEventWaitExternal`.
#### Optimized
* Improved implementation in `hipEventSynchronize`, HIP runtime now makes internal callbacks as non-blocking operations to improve performance.
#### Resolved issues
* Issue of dependency on `libgcc-s1` during rocm-dev install on Debian Buster. HIP runtime removed this Debian package dependency, and uses `libgcc1` instead for this distros.
* Building issue for `COMGR` dynamic load on Fedora and other Distros. HIP runtime now doesn't link against `libamd_comgr.so`.
* Failure in the API `hipStreamDestroy`, when stream type is `hipStreamLegacy`. The API now returns error code `hipErrorInvalidResourceHandle` on this condition.
* Kernel launch errors, such as `shared object initialization failed`, `invalid device function` or `kernel execution failure`. HIP runtime now loads `COMGR` properly considering the file with its name and mapped image.
* Memory access fault in some applications. HIP runtime fixed offset accumulation in memory address.
* The memory leak in virtual memory management (VMM). HIP runtime now uses the size of handle for allocated memory range instead of actual size for physical memory, which fixed the issue of address clash with VMM.
* Large memory allocation issue. HIP runtime now checks GPU video RAM and system RAM properly and sets size limits during memory allocation either on the host or the GPU device.
* Support of `hipDeviceMallocContiguous` flags in `hipExtMallocWithFlags()`. It now enables `HSA_AMD_MEMORY_POOL_CONTIGUOUS_FLAG` in the memory pool allocation on GPU device.
* Radom memory segmentation fault in handling `GraphExec` object release and `hipDeviceSyncronization`. HIP runtime now uses internal device synchronize function in `__hipUnregisterFatBinary`.
### **hipBLASLt** (0.12.1)
#### Added
* Support for gfx1151 on Linux, complementing the previous support in the HIP SDK for Windows.
### **RCCL** (2.22.3)
#### Added
* Added support for the LL128 protocol on gfx942.
### **rocBLAS** (4.4.1)
#### Resolved issues
* rocBLAS might have failed to produce correct results for cherk/zherk on gfx90a/gfx942 with problem sizes k > 500 due to the imaginary portion on the C matrix diagonal not being zeros. rocBLAS now zeros the imaginary portion.
### **ROCm Compute Profiler** (3.1.1)
#### Added
* 8-bit floating point (FP8) metrics support for AMD Instinct MI300 GPUs.
* Additional data types for roofline: FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on the GPU architecture).
* Data type selection option ``--roofline-data-type / -R`` for roofline profiling. The default data type is FP32.
#### Changed
* Changed dependency from `rocm-smi` to `amd-smi`.
#### Resolved issues
* Fixed a crash related to Agent ID caused by the new format of the `rocprofv3` output CSV file.
### **ROCm Systems Profiler** (1.0.2)
#### Optimized
* Improved readability of the OpenMP target offload traces by showing on a single Perfetto track.
#### Resolved issues
* Fixed the file path to the script that merges Perfetto files from multi-process MPI runs. The script has also been renamed from `merge-multiprocess-output.sh` to `rocprof-sys-merge-output.sh`.
### **ROCm Validation Suite** (1.1.0)
#### Added
* NPS2/DPX and NPS4/CPX partition modes support for AMD Instinct MI300X.
### **rocPRIM** (3.4.1)
#### Upcoming changes
* Changes to the template parameters of warp and block algorithms will be made in an upcoming release.
* Due to an upcoming compiler change, the following symbols related to warp size have been marked as deprecated and will be removed in an upcoming major release:
* `rocprim::device_warp_size()`. This has been replaced by `rocprim::arch::wavefront::min_size()` and `rocprim::arch::wavefront::max_size()` for compile-time constants. Use these when allocating global or shared memory. For run-time constants, use `rocprim::arch::wavefront::size()`.
* `rocprim::warp_size()`
* `ROCPRIM_WAVEFRONT_SIZE`
* The default scan accumulator types for device-level scan algorithms will be changed in an upcoming release, resulting in a breaking change. Previously, the default accumulator type was set to the input type for the inclusive scans and to the initial value type for the exclusive scans. This could lead to unexpected overflow if the input or initial type was smaller than the output type when the accumulator type wasn't explicitly set using the `AccType` template parameter. The new default accumulator types will be set to the type that results when the input or initial value type is applied to the scan operator.
The following is the complete list of affected functions and how their default accumulator types are changing:
* `rocprim::inclusive_scan`
* current default: `class AccType = typename std::iterator_traits<InputIterator>::value_type>`
* future default: `class AccType = rocprim::invoke_result_binary_op_t<typename std::iterator_traits<InputIterator>::value_type, BinaryFunction>`
* `rocprim::deterministic_inclusive_scan`
* current default: `class AccType = typename std::iterator_traits<InputIterator>::value_type>`
* future default: `class AccType = rocprim::invoke_result_binary_op_t<typename std::iterator_traits<InputIterator>::value_type, BinaryFunction>`
* `rocprim::exclusive_scan`
* current default: `class AccType = detail::input_type_t<InitValueType>>`
* future default: `class AccType = rocprim::invoke_result_binary_op_t<rocprim::detail::input_type_t<InitValueType>, BinaryFunction>`
* `rocprim::deterministic_exclusive_scan`
* current default: `class AccType = detail::input_type_t<InitValueType>>`
* future default: `class AccType = rocprim::invoke_result_binary_op_t<rocprim::detail::input_type_t<InitValueType>, BinaryFunction>`
* `rocprim::load_cs` and `rocprim::store_cs` are deprecated and will be removed in an upcoming release. Alternatively, you can use `rocprim::load_nontemporal` and `rocprim::store_nontemporal` to load and store values in specific conditions (like bypassing the cache) for `rocprim::thread_load` and `rocprim::thread_store`.
### **rocSHMEM** (2.0.1)
#### Resolved issues
* Incorrect output for `rocshmem_ctx_my_pe` and `rocshmem_ctx_n_pes`.
* Multi-team errors by providing team specific buffers in `rocshmem_ctx_wg_team_sync`.
* Missing implementation of `rocshmem_g` for IPC conduit.
### **rocSOLVER** (3.28.2)
#### Added
* Hybrid computation support for existing routines, such as STERF.
* SVD for general matrices based on Cuppen's Divide and Conquer algorithm:
- GESDD (with batched and strided\_batched versions)
#### Optimized
* Reduced the device memory requirements for STEDC, SYEVD/HEEVD, and SYGVD/HEGVD.
* Improved the performance of STEDC and divide and conquer Eigensolvers.
* Improved the performance of SYTRD, the initial step of the Eigensolvers that start with the tridiagonalization of the input matrix.
## ROCm 6.4.1
See the [ROCm 6.4.1 release notes](https://rocm.docs.amd.com/en/docs-6.4.1/about/release-notes.html)
@@ -24,7 +180,7 @@ for a complete overview of this release.
#### Optimized
* Improved load times for CLI commands when the GPU has multiple parititons.
* Improved load times for CLI commands when the GPU has multiple partitions.
#### Resolved issues
@@ -34,9 +190,8 @@ for a complete overview of this release.
* When using the `--follow` flag with `amd-smi ras --cper`, CPER entries are not streamed continuously as intended. This will be fixed in an upcoming ROCm release.
```{note}
See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions.
```
> [!NOTE]
> See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions.
### **HIP** (6.4.1)
@@ -117,9 +272,8 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc
- Fixed partition enumeration. It now refers to the correct DRM Render and Card paths.
```{note}
See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions.
```
> [!NOTE]
> See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions.
### **ROCm Systems Profiler** (1.0.1)
@@ -257,9 +411,8 @@ Some workaround options are as follows:
- The `pasid` field in struct `amdsmi_process_info_t` will be deprecated in a future ROCm release.
```{note}
See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions.
```
> [!NOTE]
> See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions.
### **AMDMIGraphX** (2.12.0)
@@ -867,9 +1020,8 @@ The following lists the backward incompatible changes planned for upcoming major
- Fixed `rsmi_dev_target_graphics_version_get`, `rocm-smi --showhw`, and `rocm-smi --showprod` not displaying graphics version correctly for Instinct MI200 series, MI100 series, and RDNA3-based GPUs.
```{note}
See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions.
```
> [!NOTE]
> See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions.
### **ROCm Systems Profiler** (1.0.0)
@@ -1295,9 +1447,8 @@ for a complete overview of this release.
* Fixed `amd-smi monitor`'s reporting of encode and decode information. `VCLOCK` and `DCLOCK` are
now associated with both `ENC_UTIL` and `DEC_UTIL`.
```{note}
See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/6.3.x/CHANGELOG.md) for more details and examples.
```
> [!NOTE]
> See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/6.3.x/CHANGELOG.md) for more details and examples.
### **HIP** (6.3.1)
@@ -1501,9 +1652,8 @@ for a complete overview of this release.
- The new partition command can display GPU information, including memory and accelerator partition information.
- The command will be at full functionality once additional partition information from `amdsmi_get_gpu_accelerator_partition_profile()` has been implemented.
```{note}
See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/6.3.x/CHANGELOG.md) for more details and examples.
```
> [!NOTE]
> See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/6.3.x/CHANGELOG.md) for more details and examples.
### **HIP** (6.3.0)
@@ -1637,18 +1787,17 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/6.3.x/CHANG
* Support for `fp8` data types
### **hipRAND** (2.11.0[*](#id22))
### **hipRAND** (2.11.0)
> [!NOTE]
> In ROCm 6.3.0, the hipRAND package version is incorrectly set to `2.11.0`.
> In ROCm 6.2.4, the hipRAND package version was `2.11.1`.
> The hipRAND version number will be corrected in a future ROCm release.
#### Changed
* Updated the default value for the `-a` argument from `rmake.py` to `gfx906:xnack-,gfx1030,gfx1100,gfx1101,gfx1102`.
#### Known issues
* In ROCm 6.3.0, the hipRAND package version is incorrectly set to `2.11.0`. In ROCm
6.2.4, the hipRAND package version was `2.11.1`. The hipRAND version number will be corrected in a
future ROCm release.
#### Resolved issues
* Fixed an issue in `rmake.py` where the list storing the CMake options would contain individual characters instead of a full string of options.
@@ -1849,7 +1998,7 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/6.3.x/CHANG
#### Known issues
* See [MIVisionX memory access fault in Canny edge detection](#mivisionx-memory-access-fault-in-canny-edge-detection).
* See [MIVisionX memory access fault in Canny edge detection](https://github.com/ROCm/ROCm/issues/4086).
* Package installation requires the manual installation of OpenCV.
* Installation on CentOS/RedHat/SLES requires the manual installation of the `FFMPEG Dev` package.
* Hardware decode requires installation with `--usecase=graphics` in addition to `--usecase=rocm`.
@@ -2040,9 +2189,9 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/6.3.x/CHANG
#### Known issues
- See [ROCm Compute Profiler post-upgrade](#rocm-compute-profiler-post-upgrade).
- See [ROCm Compute Profiler post-upgrade](https://github.com/ROCm/ROCm/issues/4082).
- See [ROCm Compute Profiler CTest failure in CI](#rocm-compute-profiler-ctest-failure-in-ci).
- See [ROCm Compute Profiler CTest failure in CI](https://github.com/ROCm/ROCm/issues/4085).
### **ROCm Data Center Tool** (0.3.0)
@@ -2055,7 +2204,7 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/6.3.x/CHANG
#### Known issues
- See [ROCm Data Center Tool incorrect RHEL9 package version](#rocm-data-center-tool-incorrect-rhel9-package-version).
- See [ROCm Data Center Tool incorrect RHEL9 package version](https://github.com/ROCm/ROCm/issues/4089).
### **ROCm SMI** (7.4.0)
@@ -2093,9 +2242,8 @@ memory partition modes upon an invalid argument return from memory partition mod
- C++ tests for `memorypartition_read_write` are to be re-enabled in a future ROCm release.
```{note}
See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/6.3.x/CHANGELOG.md) for more details and examples.
```
> [!NOTE]
> See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/6.3.x/CHANGELOG.md) for more details and examples.
### **ROCm Systems Profiler** (0.1.0)
@@ -2109,7 +2257,7 @@ See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/6.3.
#### Known issues
- See [ROCm Systems Profiler post-upgrade](#rocm-systems-profiler-post-upgrade).
- See [ROCm Systems Profiler post-upgrade](https://github.com/ROCm/ROCm/issues/4083).
### **ROCm Validation Suite** (1.1.0)
@@ -2123,7 +2271,7 @@ See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/6.3.
#### Known issues
- See [ROCm Validation Suite needs specified configuration file](#rocm-validation-suite-needs-specified-configuration-file).
- See [ROCm Validation Suite needs specified configuration file](https://github.com/ROCm/ROCm/issues/4090).
### **rocPRIM** (3.3.0)
@@ -2866,10 +3014,8 @@ for a complete overview of this release.
See [issue #3500](https://github.com/ROCm/ROCm/issues/3500) on GitHub.
```{note}
See the [detailed AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/docs/6.2.0/CHANGELOG.md)
on GitHub for more information.
```
> [!NOTE]
> See the [detailed AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/docs/6.2.0/CHANGELOG.md) on GitHub for more information.
### **Composable Kernel** (1.1.0)
@@ -3468,9 +3614,8 @@ The compiler may incorrectly compile a program that uses the
the function is undefined along some path to the function. For most functions,
uninitialized inputs cause undefined behavior.
```{note}
The ``-Wall`` compilation flag prompts the compiler to generate a warning if a variable is uninitialized along some path.
```
> [!NOTE]
> The ``-Wall`` compilation flag prompts the compiler to generate a warning if a variable is uninitialized along some path.
As a workaround, initialize the parameters to ``__shfl``. For example:
@@ -3791,10 +3936,8 @@ See [issue #3498](https://github.com/ROCm/ROCm/issues/3498) on GitHub.
- Fixed Partition ID CLI output.
```{note}
See the [detailed ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/docs/6.2.0/CHANGELOG.md)
on GitHub for more information.
```
> [!NOTE]
> See the [detailed ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/docs/6.2.0/CHANGELOG.md) on GitHub for more information.
### **ROCm Validation Suite** (1.0.0)
@@ -4164,9 +4307,8 @@ for a complete overview of this release.
* Fixed the `amdsmitstReadWrite.TestPowerCapReadWrite` test for RDNA3, RDNA2, and MI100 devices.
* Fixed an issue with the `amdsmi_get_gpu_memory_reserved_pages` and `amdsmi_get_gpu_bad_page_info` Python interface calls.
```{note}
See the AMD SMI [detailed changelog](https://github.com/ROCm/amdsmi/blob/rocm-6.1.x/CHANGELOG.md) with code samples for more information.
```
> [!NOTE]
> See the AMD SMI [detailed changelog](https://github.com/ROCm/amdsmi/blob/rocm-6.1.x/CHANGELOG.md) with code samples for more information.
### **RCCL** (2.18.6)
@@ -4246,9 +4388,8 @@ for a complete overview of this release.
- `amd-smi bad-pages` can result in a `ValueError: Null pointer access` error when using some PMU firmware versions.
```{note}
See the [detailed changelog](https://github.com/ROCm/amdsmi/blob/docs/6.1.1/CHANGELOG.md) with code samples for more information.
```
> [!NOTE]
> See the [detailed changelog](https://github.com/ROCm/amdsmi/blob/docs/6.1.1/CHANGELOG.md) with code samples for more information.
### **hipBLASLt** (0.7.0)
@@ -4317,9 +4458,8 @@ See the [detailed changelog](https://github.com/ROCm/amdsmi/blob/docs/6.1.1/CHAN
- ROCm SMI reports GPU utilization incorrectly for RDNA3 GPUs in some situations. See the issue on [GitHub](https://github.com/ROCm/ROCm/issues/3112).
```{note}
See the [detailed ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/docs/6.1.1/CHANGELOG.md) with code samples for more information.
```
> [!NOTE]
> See the [detailed ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/docs/6.1.1/CHANGELOG.md) with code samples for more information.
## ROCm 6.1.0
@@ -5036,16 +5176,16 @@ on GitHub for a complete overview of this release.
### **rocSPARSE** (2.5.4)
##### Added
#### Added
- Added more mixed precisions for SpMV, (matrix: float, vectors: double, calculation: double) and (matrix: rocsparse_float_complex, vectors: rocsparse_double_complex, calculation: rocsparse_double_complex)
- Added support for gfx940, gfx941 and gfx942
##### Optimized
#### Optimized
- Fixed a bug in csrsm and bsrsm
##### Known issues
#### Known issues
In csritlu0, the algorithm rocsparse_itilu0_alg_sync_split_fusion has some accuracy issues to investigate with XNACK enabled. The fallback is rocsparse_itilu0_alg_sync_split.
@@ -5131,7 +5271,7 @@ on GitHub for a complete overview of this release.
### **HIP** (5.6.0)
##### Added
#### Added
- Added hipRTC support for amd_hip_fp16
- Added hipStreamGetDevice implementation to get the device associated with the stream
@@ -5140,7 +5280,7 @@ on GitHub for a complete overview of this release.
- hipArrayGetDescriptor for getting 1D or 2D array descriptor
- hipArray3DGetDescriptor to get 3D array descriptor
##### Changed
#### Changed
- hipMallocAsync to return success for zero size allocation to match hipMalloc
- Separation of hipcc perl binaries from HIP project to hipcc project. hip-devel package depends on newly added hipcc package
@@ -5445,15 +5585,15 @@ $ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64-v2
The resulting `a.out` will depend on
`/opt/rocm-5.6.0/lib/librocprofiler64.so.2`.
##### Added
#### Added
- 'end_time' need to be disabled in roctx_trace.txt
##### Optimized
#### Optimized
- Improved Test Suite
##### Resolved issues
#### Resolved issues
- rocprof in ROcm/5.4.0 gpu selector broken.
- rocprof in ROCm/5.4.1 fails to generate kernel info.

View File

@@ -10,7 +10,7 @@
<!-- markdownlint-disable reference-links-images -->
<!-- markdownlint-disable no-missing-space-atx -->
<!-- spellcheck-disable -->
# ROCm 6.4.1 release notes
# ROCm 6.4.2 release notes
The release notes provide a summary of notable changes since the previous ROCm release.
@@ -24,63 +24,81 @@ The release notes provide a summary of notable changes since the previous ROCm r
- [ROCm known issues](#rocm-known-issues)
- [ROCm resolved issues](#rocm-resolved-issues)
- [ROCm upcoming changes](#rocm-upcoming-changes)
```{note}
If youre using Radeon™ PRO or Radeon GPUs in a workstation setting with a display connected, see the [Use ROCm on Radeon GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/docs/compatibility/native_linux/native_linux_compatibility.html)
If youre using AMD Radeon™ PRO or Radeon GPUs in a workstation setting with a display connected, see the [Use ROCm on Radeon GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/docs/compatibility/native_linux/native_linux_compatibility.html)
documentation to verify compatibility and system requirements.
```
## Release highlights
The following are notable new features and improvements in ROCm 6.4.1. For changes to individual components, see
The following are notable new features and improvements in ROCm 6.4.2. For changes to individual components, see
[Detailed component changes](#detailed-component-changes).
### Addition of DPX partition mode under NPS2 memory mode
AMD Instinct MI300X now supports DPX partition mode under NPS2 memory mode. For more partitioning information, see the [Deep dive into the MI300 compute and memory partition modes](https://rocm.blogs.amd.com/software-tools-optimization/compute-memory-modes/README.html) blog and [AMD Instinct MI300X system optimization](https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html#change-gpu-partition-modes).
### ROCm Compute Profiler enhancements
### Introducing the ROCm Data Science toolkit
[ROCm Compute Profiler](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/index.html) includes the following changes:
The ROCm Data Science toolkit (or ROCm-DS) is an open-source software collection for high-performance data science applications built on the core ROCm platform. You can leverage ROCm-DS to accelerate both new and existing data science workloads, allowing you to execute intensive applications with larger datasets at lightning speed. ROCm-DS is in an early access state. Running production workloads is not recommended. For more information, see [AMD ROCm-DS Documentation](https://rocm.docs.amd.com/projects/rocm-ds/en/latest/index.html).
* The ``--roofline-data-type`` option now supports FP8, FP16, BF16, FP32, FP64, I8, I32, and I64 data types. This is dependent on the GPU architecture. For more information, see [Roofline options](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/docs-6.4.2/how-to/profile/mode.html#roofline-options).
* ROCm Compute Profiler now uses [AMD SMI](https://rocm.docs.amd.com/projects/amdsmi/en/latest/index.html) instead of [ROCm SMI](https://rocm.docs.amd.com/projects/rocm_smi_lib/en/latest/index.html). The AMD System Management Interface Library (AMD SMI) is a successor to ROCm SMI. It is a unified system management interface tool that provides a user-space interface for applications to monitor and control GPU applications and gives users the ability to query information about drivers and GPUs on the system. For more information, see [https://github.com/ROCm/amdsmi](https://github.com/ROCm/amdsmi) and the [AMD SMI documentation](https://rocm.docs.amd.com/projects/amdsmi/en/latest/index.html).
* ROCm Compute Profiler has added 8-bit floating point (FP8) metrics support for AMD Instinct MI300 series accelerators. For more information, see [System Speed-of-Light](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/docs-6.4.2/conceptual/system-speed-of-light.html).
### rocSOLVER enhancements
rocSOLVER has improved the performance of eigensolvers and singular value decomposition (SVD). For more information, see [rocSOLVER documentation](https://rocm.docs.amd.com/projects/rocSOLVER/en/docs-6.4.2/index.html).
### ROCm Offline Installer Creator updates
The ROCm Offline Installer Creator 6.4.1 now allows you to use the SPACEBAR or ENTER keys for menu item selection in the GUI. It also adds support for Debian 12 and fixes an issue for “full” mode RHEL offline installer creation, where GDM packages were uninstalled during offline installation. See [ROCm Offline Installer Creator](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/rocm-offline-installer.html) for more information.
The ROCm Offline Installer Creator 6.4.2 includes the following features and improvements:
* Added support for Oracle Linux 8.10 and 9.6, and SLES 15 SP7.
* Additional package options for the Offline Installer Creator, including `amd-smi`, `rocdecode`, `rocjpeg`, and `rdc`.
* ROCm meta packages are now used for selecting ROCm components and use cases.
* Improved separation of kernel/driver and ROCm prerequisite packages to reduce the size of ROCm-only or driver-only offline installers.
In addition, the option to build an offline installer based on ROCm version 5.7.3 has been removed. To build an offline installer for ROCm 5.7.3, use the Offline Installer Creator from version 6.4.1 or earlier. See [ROCm Offline Installer Creator](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.2/install/rocm-offline-installer.html) for more information.
### ROCm Runfile Installer updates
The ROCm Runfile Installer 6.4.1 adds the following improvements:
- Relaxed version checks for installation on different distributions. Provided the dependencies are not installed by the Runfile Installer, you can target installation for a different path from the host system running the installer. For example, the installer can run on a system using Ubuntu 22.04 and install to a partition/system that is using Ubuntu 24.04.
- Performance improvements for detecting a previous ROCm install.
- Removal of the extra `opt` directory created for the target during the ROCm installation. For example, installing to `target=/home/amd` now installs ROCm to `/home/amd/rocm-6.4.1` and not `/home/amd/opt/rocm-6.4.1`. For installs using `target=/`, the installation will continue to use `/opt/`.
- The Runfile Installer can be used to uninstall any Runfile-based installation of the driver.
- In the CLI interface, the `postrocm` argument can now be run separately from the `rocm` argument. In cases where `postrocm` was missed from the initial ROCm install, `postrocm` can now be run on the same target folder. For example, if you installed ROCm 6.4.1 using `install.run target=/myrocm rocm`, you can run the post-installation separately using the command `install.run target=/myrocm/rocm-6.4.1 postrocm`.
For more information, see [ROCm Runfile Installer](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/rocm-runfile-installer.html).
The ROCm Runfile Installer 6.4.2 adds support for Oracle Linux 8.10 and 9.6 (using the RHEL 8 or 9 .run files), Debian 12 (using the Ubuntu 22.04 .run file), and SLES 15 SP7. It also fixes permission settings issues during ROCm and AMDGPU driver installation. For more information, see [ROCm Runfile Installer](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.2/install/rocm-runfile-installer.html).
### ROCm documentation updates
ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.
* [Tutorials for AI developers](https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/) have been expanded with five new tutorials. These tutorials are Jupyter notebook-based, easy-to-follow documents. They are ideal for AI developers who want to learn about specific topics, including inference, fine-tuning, and training. For more information about the changes, see [Changelog for the AI Developer Hub](https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/changelog.html).
* The [Training a model with LLM Foundry](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/mpt-llm-foundry.html) performance testing guide has been added. This guide describes how to use the preconfigured [ROCm/pytorch-training](https://hub.docker.com/layers/rocm/pytorch-training/v25.5/images/sha256-d47850a9b25b4a7151f796a8d24d55ea17bba545573f0d50d54d3852f96ecde5) training environment and [https://github.com/ROCm/MAD](https://github.com/ROCm/MAD) to test the training performance of the LLM Foundry framework on AMD Instinct MI325X and MI300X accelerators using the [MPT-30B](https://huggingface.co/mosaicml/mpt-30b) model.
* The [Training a model with PyTorch](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.html) performance testing guide has been updated to feature the latest [ROCm/pytorch-training](https://hub.docker.com/layers/rocm/pytorch-training/v25.5/images/sha256-d47850a9b25b4a7151f796a8d24d55ea17bba545573f0d50d54d3852f96ecde5) Docker image (a preconfigured training environment with ROCm and PyTorch). Support for [Llama 3.3 70B](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) has been added.
* The [Training a model with JAX MaxText](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/jax-maxtext.html) performance testing guide has been updated to feature the latest [ROCm/jax-training](https://hub.docker.com/layers/rocm/jax-training/maxtext-v25.5/images/sha256-4e0516358a227cae8f552fb866ec07e2edcf244756f02e7b40212abfbab5217b) Docker image (a preconfigured training environment with ROCm, JAX, and [MaxText](https://github.com/AI-Hypercomputer/maxtext)). Support for [Llama 3.3 70B](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) has been added.
* The [vLLM inference performance testing](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference/vllm-benchmark.html?model=pyt_vllm_qwq-32b) guide has been updated to feature the latest [ROCm/vLLM](https://hub.docker.com/layers/rocm/vllm/latest/images/sha256-5c8b4436dd0464119d9df2b44c745fadf81512f18ffb2f4b5dc235c71ebe26b4) Docker image (a preconfigured environment for inference with ROCm and [vLLM](https://docs.vllm.ai/en/latest/)). Support for the [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) model has been added.
* The [PyTorch inference performance testing](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference/pytorch-inference-benchmark.html?model=pyt_clip_inference) guide has been added, featuring the [ROCm/PyTorch](https://hub.docker.com/layers/rocm/pytorch/latest/images/sha256-ab1d350b818b90123cfda31363019d11c0d41a8f12a19e3cb2cb40cf0261137d) Docker image (a preconfigured inference environment with ROCm and PyTorch) with initial support for the [CLIP](https://huggingface.co/laion/CLIP-ViT-B-32-laion2B-s34B-b79K) and [Chai-1](https://huggingface.co/chaidiscovery/chai-1) models.
* [Tutorials for AI developers](https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/) have been expanded with the following four new tutorials:
* Inference tutorial: [AI agent with MCPs using vLLM and PydanticAI](https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/notebooks/inference/build_airbnb_agent_mcp.html)
* GPU development and optimization tutorials:
* [Kernel development and optimization with Triton](https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/notebooks/gpu_dev_optimize/triton_kernel_dev.html)
* [Profiling Llama-4 inference with vLLM](https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/notebooks/gpu_dev_optimize/llama4_profiling_vllm.html)
* [FP8 quantization with AMD Quark for vLLM](https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/notebooks/gpu_dev_optimize/fp8_quantization_quark_vllm.html)
For more information about the changes, see [Changelog for the AI Developer Hub](https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/changelog.html).
* ROCm provides a comprehensive ecosystem for deep learning development. For more details, see [Deep learning frameworks for ROCm](https://rocm.docs.amd.com/en/docs-6.4.2/how-to/deep-learning-rocm.html). As of July 2025, AMD ROCm provides support for the following additional deep learning frameworks:
* Deep Graph Library is an easy-to-use, high-performance, and scalable Python package for deep learning on graphs. DGL is framework agnostic, meaning if a deep graph model is a component in an end-to-end application, the rest of the logic is implemented using PyTorch. It is currently supported on ROCm 6.4.0. For more information, see [DGL compatibility](https://rocm.docs.amd.com/en/docs-6.4.2/compatibility/ml-compatibility/dgl-compatibility.html).
* Stanford Megatron-LM is a large-scale language model training framework. Its designed to train massive transformer-based language models efficiently by model and data parallelism. It is currently supported on ROCm 6.3.0. For more information, see [Stanford Megatron-LM compatibility](https://rocm.docs.amd.com/en/docs-6.4.2/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.html).
* Volcano Engine Reinforcement Learning for LLMs (verl) is a reinforcement learning framework designed for large language models (LLMs). verl offers a scalable, open-source fine-tuning solution optimized for AMD Instinct GPUs with full ROCm support. It is currently supported on ROCm 6.2.0. For more information, see [verl compatibility](https://rocm.docs.amd.com/en/docs-6.4.2/compatibility/ml-compatibility/verl-compatibility.html).
* Documentation for the new [ROCprof Compute Viewer](https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/docs-6.4.2/) was added in May 2025. This tool is used to visualize and analyze GPU thread trace data collected using [rocprofv3](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/index.html). Note that [ROCprof Compute Viewer](https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/docs-6.4.2/) is in an early access state. Running production workloads is not recommended.
* The AMDGPU installer documentation has been removed to encourage the use of the package manager for ROCm installation. While the package manager is the recommended method, you can still install ROCm using the AMDGPU installer by following the [legacy process](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.1/install/install-methods/amdgpu-installer-index.html). Ensure to update the command with the intended ROCm version before running it. For more information, see [Installation via native package manager](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.2/install/install-methods/package-manager-index.html).
## Operating system and hardware support changes
ROCm 6.4.1 introduces support for the RDNA4 architecture-based [Radeon AI PRO
R9700](https://www.amd.com/en/products/graphics/workstations/radeon-ai-pro/ai-9000-series/amd-radeon-ai-pro-r9700.html),
[Radeon RX 9070](https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9070.html),
[Radeon RX 9070 XT](https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9070xt.html),
Radeon RX 9070 GRE, and
[Radeon RX 9060 XT](https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9060xt.html) GPUs
for compute workloads. It also adds support for RDNA3 architecture-based [Radeon PRO W7700](https://www.amd.com/en/products/graphics/workstations/radeon-pro/w7700.html) and [Radeon RX 7800 XT](https://www.amd.com/en/products/graphics/desktops/radeon/7000-series/amd-radeon-rx-7800-xt.html) GPUs. These GPUs are supported on Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.6, RHEL 9.5, and RHEL 9.4.
ROCm 6.4.2 adds support for SLES 15 SP7. For more information, see [SLES installation](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.2/install/install-methods/package-manager/package-manager-sles.html).
ROCm 6.4.2 marks the end of support (EoS) for RHEL 9.5.
ROCm 6.4.2 adds support for RDNA3 architecture-based [Radeon RX 7700 XT](https://www.amd.com/en/products/graphics/desktops/radeon/7000-series/amd-radeon-rx-7700-xt.html) GPU. This GPU is supported on Ubuntu 24.04.2 and RHEL 9.6.
For details, see the full list of [Supported GPUs
(Linux)](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-gpus).
(Linux)](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.2/reference/system-requirements.html#supported-gpus).
See the [Compatibility
matrix](../../docs/compatibility/compatibility-matrix.rst)
@@ -88,8 +106,8 @@ for more information about operating system and hardware compatibility.
## ROCm components
The following table lists the versions of ROCm components for ROCm 6.4.1, including any version
changes from 6.4.0 to 6.4.1. Click the component's updated version to go to a list of its changes.
The following table lists the versions of ROCm components for ROCm 6.4.2, including any version
changes from 6.4.1 to 6.4.2. Click the component's updated version to go to a list of its changes.
Click {fab}`github` to go to the component's source code on GitHub.
<div class="pst-scrollable-table-container">
@@ -111,47 +129,47 @@ Click {fab}`github` to go to the component's source code on GitHub.
<tr>
<th rowspan="9">Libraries</th>
<th rowspan="9">Machine learning and computer vision</th>
<td><a href="https://rocm.docs.amd.com/projects/composable_kernel/en/docs-6.4.1/index.html">Composable Kernel</a></td>
<td><a href="https://rocm.docs.amd.com/projects/composable_kernel/en/docs-6.4.2/index.html">Composable Kernel</a></td>
<td>1.1.0</td>
<td><a href="https://github.com/ROCm/composable_kernel"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/AMDMIGraphX/en/docs-6.4.1/index.html">MIGraphX</a></td>
<td><a href="https://rocm.docs.amd.com/projects/AMDMIGraphX/en/docs-6.4.2/index.html">MIGraphX</a></td>
<td>2.12.0</td>
<td><a href="https://github.com/ROCm/AMDMIGraphX"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/MIOpen/en/docs-6.4.1/index.html">MIOpen</a></td>
<td><a href="https://rocm.docs.amd.com/projects/MIOpen/en/docs-6.4.2/index.html">MIOpen</a></td>
<td>3.4.0</td>
<td><a href="https://github.com/ROCm/MIOpen"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/MIVisionX/en/docs-6.4.1/index.html">MIVisionX</a></td>
<td><a href="https://rocm.docs.amd.com/projects/MIVisionX/en/docs-6.4.2/index.html">MIVisionX</a></td>
<td>3.2.0</td>
<td><a href="https://github.com/ROCm/MIVisionX"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocAL/en/docs-6.4.1/index.html">rocAL</a></td>
<td><a href="https://rocm.docs.amd.com/projects/rocAL/en/docs-6.4.2/index.html">rocAL</a></td>
<td>2.2.0</td>
<td><a href="https://github.com/ROCm/rocAL"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocDecode/en/docs-6.4.1/index.html">rocDecode</a></td>
<td><a href="https://rocm.docs.amd.com/projects/rocDecode/en/docs-6.4.2/index.html">rocDecode</a></td>
<td>0.10.0</td>
<td><a href="https://github.com/ROCm/rocDecode"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocJPEG/en/docs-6.4.1/index.html">rocJPEG</a></td>
<td><a href="https://rocm.docs.amd.com/projects/rocJPEG/en/docs-6.4.2/index.html">rocJPEG</a></td>
<td>0.8.0</td>
<td><a href="https://github.com/ROCm/rocJPEG"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocPyDecode/en/docs-6.4.1/index.html">rocPyDecode</a></td>
<td><a href="https://rocm.docs.amd.com/projects/rocPyDecode/en/docs-6.4.2/index.html">rocPyDecode</a></td>
<td>0.3.1</td>
<td><a href="https://github.com/ROCm/rocPyDecode"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rpp/en/docs-6.4.1/index.html">RPP</a></td>
<td><a href="https://rocm.docs.amd.com/projects/rpp/en/docs-6.4.2/index.html">RPP</a></td>
<td>1.9.10</td>
<td><a href="https://github.com/ROCm/rpp"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
@@ -160,13 +178,13 @@ Click {fab}`github` to go to the component's source code on GitHub.
<tr>
<th rowspan="2"></th>
<th rowspan="2">Communication</th>
<td><a href="https://rocm.docs.amd.com/projects/rccl/en/docs-6.4.1/index.html">RCCL</a></td>
<td><a href="https://rocm.docs.amd.com/projects/rccl/en/docs-6.4.2/index.html">RCCL</a></td>
<td>2.22.3&nbsp;&Rightarrow;&nbsp;<a href="#rccl-2-22-3">2.22.3</td>
<td><a href="https://github.com/ROCm/rccl"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocSHMEM/en/docs-6.4.1/index.html">rocSHMEM</a></td>
<td>2.0.0</td>
<td><a href="https://rocm.docs.amd.com/projects/rocSHMEM/en/docs-6.4.2/index.html">rocSHMEM</a></td>
<td>2.0.0&nbsp;&Rightarrow;&nbsp;<a href="#rocshmem-2-0-1">2.0.1</td>
<td><a href="https://github.com/ROCm/rocSHMEM"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
</tbody>
@@ -174,82 +192,82 @@ Click {fab}`github` to go to the component's source code on GitHub.
<tr>
<th rowspan="16"></th>
<th rowspan="16">Math</th>
<td><a href="https://rocm.docs.amd.com/projects/hipBLAS/en/docs-6.4.1/index.html">hipBLAS</a></td>
<td><a href="https://rocm.docs.amd.com/projects/hipBLAS/en/docs-6.4.2/index.html">hipBLAS</a></td>
<td>2.4.0</td>
<td><a href="https://github.com/ROCm/hipBLAS"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipBLASLt/en/docs-6.4.1/index.html">hipBLASLt</a></td>
<td>0.12.0&nbsp;&Rightarrow;&nbsp;<a href="#hipblaslt-0-12-1">0.12.1</td>
<td><a href="https://rocm.docs.amd.com/projects/hipBLASLt/en/docs-6.4.2/index.html">hipBLASLt</a></td>
<td>0.12.1&nbsp;&Rightarrow;&nbsp;<a href="#hipblaslt-0-12-1">0.12.1</td>
<td><a href="https://github.com/ROCm/hipBLASLt"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipFFT/en/docs-6.4.1/index.html">hipFFT</a></td>
<td><a href="https://rocm.docs.amd.com/projects/hipFFT/en/docs-6.4.2/index.html">hipFFT</a></td>
<td>1.0.18</td>
<td><a href="https://github.com/ROCm/hipFFT"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipfort/en/docs-6.4.1/index.html">hipfort</a></td>
<td><a href="https://rocm.docs.amd.com/projects/hipfort/en/docs-6.4.2/index.html">hipfort</a></td>
<td>0.6.0</td>
<td><a href="https://github.com/ROCm/hipfort"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipRAND/en/docs-6.4.1/index.html">hipRAND</a></td>
<td><a href="https://rocm.docs.amd.com/projects/hipRAND/en/docs-6.4.2/index.html">hipRAND</a></td>
<td>2.12.0</td>
<td><a href="https://github.com/ROCm/hipRAND"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipSOLVER/en/docs-6.4.1/index.html">hipSOLVER</a></td>
<td><a href="https://rocm.docs.amd.com/projects/hipSOLVER/en/docs-6.4.2/index.html">hipSOLVER</a></td>
<td>2.4.0</td>
<td><a href="https://github.com/ROCm/hipSOLVER"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipSPARSE/en/docs-6.4.1/index.html">hipSPARSE</a></td>
<td><a href="https://rocm.docs.amd.com/projects/hipSPARSE/en/docs-6.4.2/index.html">hipSPARSE</a></td>
<td>3.2.0</td>
<td><a href="https://github.com/ROCm/hipSPARSE"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipSPARSELt/en/docs-6.4.1/index.html">hipSPARSELt</a></td>
<td><a href="https://rocm.docs.amd.com/projects/hipSPARSELt/en/docs-6.4.2/index.html">hipSPARSELt</a></td>
<td>0.2.3</td>
<td><a href="https://github.com/ROCm/hipSPARSELt"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocALUTION/en/docs-6.4.1/index.html">rocALUTION</a></td>
<td>3.2.2&nbsp;&Rightarrow;&nbsp;<a href="#rocalution-3-2-3">3.2.3</td></td>
<td><a href="https://rocm.docs.amd.com/projects/rocALUTION/en/docs-6.4.2/index.html">rocALUTION</a></td>
<td>3.2.3</td>
<td><a href="https://github.com/ROCm/rocALUTION"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocBLAS/en/docs-6.4.1/index.html">rocBLAS</a></td>
<td>4.4.0</td>
<td><a href="https://rocm.docs.amd.com/projects/rocBLAS/en/docs-6.4.2/index.html">rocBLAS</a></td>
<td>4.4.0&nbsp;&Rightarrow;&nbsp;<a href="#rocblas-4-4-1">4.4.1</td></td>
<td><a href="https://github.com/ROCm/rocBLAS"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocFFT/en/docs-6.4.1/index.html">rocFFT</a></td>
<td><a href="https://rocm.docs.amd.com/projects/rocFFT/en/docs-6.4.2/index.html">rocFFT</a></td>
<td>1.0.32</td>
<td><a href="https://github.com/ROCm/rocFFT"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocRAND/en/docs-6.4.1/index.html">rocRAND</a></td>
<td><a href="https://rocm.docs.amd.com/projects/rocRAND/en/docs-6.4.2/index.html">rocRAND</a></td>
<td>3.3.0</td>
<td><a href="https://github.com/ROCm/rocRAND"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocSOLVER/en/docs-6.4.1/index.html">rocSOLVER</a></td>
<td>3.28.0</td>
<td><a href="https://rocm.docs.amd.com/projects/rocSOLVER/en/docs-6.4.2/index.html">rocSOLVER</a></td>
<td>3.28.0&nbsp;&Rightarrow;&nbsp;<a href="#rocsolver-3-28-2">3.28.2</td>
<td><a href="https://github.com/ROCm/rocSOLVER"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocSPARSE/en/docs-6.4.1/index.html">rocSPARSE</a></td>
<td><a href="https://rocm.docs.amd.com/projects/rocSPARSE/en/docs-6.4.2/index.html">rocSPARSE</a></td>
<td>3.4.0</td>
<td><a href="https://github.com/ROCm/rocSPARSE"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocWMMA/en/docs-6.4.1/index.html">rocWMMA</a></td>
<td><a href="https://rocm.docs.amd.com/projects/rocWMMA/en/docs-6.4.2/index.html">rocWMMA</a></td>
<td>1.7.0</td>
<td><a href="https://github.com/ROCm/rocWMMA"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/Tensile/en/docs-6.4.1/src/index.html">Tensile</a></td>
<td><a href="https://rocm.docs.amd.com/projects/Tensile/en/docs-6.4.2/src/index.html">Tensile</a></td>
<td>4.43.0</td>
<td><a href="https://github.com/ROCm/Tensile"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
@@ -258,22 +276,22 @@ Click {fab}`github` to go to the component's source code on GitHub.
<tr>
<th rowspan="4"></th>
<th rowspan="4">Primitives</th>
<td><a href="https://rocm.docs.amd.com/projects/hipCUB/en/docs-6.4.1/index.html">hipCUB</a></td>
<td><a href="https://rocm.docs.amd.com/projects/hipCUB/en/docs-6.4.2/index.html">hipCUB</a></td>
<td>3.4.0</td>
<td><a href="https://github.com/ROCm/hipCUB"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipTensor/en/docs-6.4.1/index.html">hipTensor</a></td>
<td><a href="https://rocm.docs.amd.com/projects/hipTensor/en/docs-6.4.2/index.html">hipTensor</a></td>
<td>1.5.0</td>
<td><a href="https://github.com/ROCm/hipTensor"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocPRIM/en/docs-6.4.1/index.html">rocPRIM</a></td>
<td>3.4.0</td>
<td><a href="https://rocm.docs.amd.com/projects/rocPRIM/en/docs-6.4.2/index.html">rocPRIM</a></td>
<td>3.4.0&nbsp;&Rightarrow;&nbsp;<a href="#rocprim-3-4-1">3.4.1</td>
<td><a href="https://github.com/ROCm/rocPRIM"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocThrust/en/docs-6.4.1/index.html">rocThrust</a></td>
<td><a href="https://rocm.docs.amd.com/projects/rocThrust/en/docs-6.4.2/index.html">rocThrust</a></td>
<td>3.3.0</td>
<td><a href="https://github.com/ROCm/rocThrust"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
@@ -282,28 +300,28 @@ Click {fab}`github` to go to the component's source code on GitHub.
<tr>
<th rowspan="7">Tools</th>
<th rowspan="7">System management</th>
<td><a href="https://rocm.docs.amd.com/projects/amdsmi/en/docs-6.4.1/index.html">AMD SMI</a></td>
<td>25.3.0&nbsp;&Rightarrow;&nbsp;<a href="#amd-smi-25-4-2">25.4.2</a></td>
<td><a href="https://rocm.docs.amd.com/projects/amdsmi/en/docs-6.4.2/index.html">AMD SMI</a></td>
<td>25.4.2&nbsp;&Rightarrow;&nbsp;<a href="#amd-smi-25-5-1">25.5.1</a></td>
<td><a href="https://github.com/ROCm/amdsmi"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rdc/en/docs-6.4.1/index.html">ROCm Data Center Tool</a></td>
<td>0.3.0&nbsp;&Rightarrow;&nbsp;<a href="#rocm-data-center-tool-0-3-0">0.3.0</td>
<td><a href="https://rocm.docs.amd.com/projects/rdc/en/docs-6.4.2/index.html">ROCm Data Center Tool</a></td>
<td>0.3.0</td>
<td><a href="https://github.com/ROCm/rdc"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocminfo/en/docs-6.4.1/index.html">rocminfo</a></td>
<td><a href="https://rocm.docs.amd.com/projects/rocminfo/en/docs-6.4.2/index.html">rocminfo</a></td>
<td>1.0.0</td>
<td><a href="https://github.com/ROCm/rocminfo"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocm_smi_lib/en/docs-6.4.1/index.html">ROCm SMI</a></td>
<td>7.5.0&nbsp;&Rightarrow;&nbsp;<a href="#rocm-smi-7-5-0">7.5.0</a></td>
<td><a href="https://rocm.docs.amd.com/projects/rocm_smi_lib/en/docs-6.4.2/index.html">ROCm SMI</a></td>
<td>7.5.0</td>
<td><a href="https://github.com/ROCm/rocm_smi_lib"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/ROCmValidationSuite/en/docs-6.4.1/index.html">ROCmValidationSuite</a></td>
<td>1.1.0</td>
<td><a href="https://rocm.docs.amd.com/projects/ROCmValidationSuite/en/docs-6.4.2/index.html">ROCm Validation Suite</a></td>
<td>1.1.0&nbsp;&Rightarrow;&nbsp;<a href="#rocm-validation-suite-1-1-0">1.1.0</td>
<td><a href="https://github.com/ROCm/ROCmValidationSuite"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
</tbody>
@@ -311,38 +329,38 @@ Click {fab}`github` to go to the component's source code on GitHub.
<tr>
<th rowspan="6"></th>
<th rowspan="6">Performance</th>
<td><a href="https://rocm.docs.amd.com/projects/rocm_bandwidth_test/en/docs-6.4.1/index.html">ROCm Bandwidth
<td><a href="https://rocm.docs.amd.com/projects/rocm_bandwidth_test/en/docs-6.4.2/index.html">ROCm Bandwidth
Test</a></td>
<td>1.4.0</td>
<td><a href="https://github.com/ROCm/rocm_bandwidth_test/"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocprofiler-compute/en/docs-6.4.1/index.html">ROCm Compute Profiler</a></td>
<td>3.1.0</td>
<td><a href="https://rocm.docs.amd.com/projects/rocprofiler-compute/en/docs-6.4.2/index.html">ROCm Compute Profiler</a></td>
<td>3.1.0&nbsp;&Rightarrow;&nbsp;<a href="#rocm-compute-profiler-3-1-1">3.1.1</td>
<td><a href="https://github.com/ROCm/rocprofiler-compute"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocprofiler-systems/en/docs-6.4.1/index.html">ROCm Systems Profiler</a></td>
<td>1.0.0&nbsp;&Rightarrow;&nbsp;<a href="#rocm-systems-profiler-1-0-1">1.0.1</td>
<td><a href="https://rocm.docs.amd.com/projects/rocprofiler-systems/en/docs-6.4.2/index.html">ROCm Systems Profiler</a></td>
<td>1.0.1&nbsp;&Rightarrow;&nbsp;<a href="#rocm-systems-profiler-1-0-2">1.0.2</td>
<td><a href="https://github.com/ROCm/rocprofiler-systems"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocprofiler/en/docs-6.4.1/index.html">ROCProfiler</a></td>
<td><a href="https://rocm.docs.amd.com/projects/rocprofiler/en/docs-6.4.2/index.html">ROCProfiler</a></td>
<td>2.0.0</td>
<td><a href="https://github.com/ROCm/ROCProfiler/"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/docs-6.4.1/index.html">ROCprofiler-SDK</a></td>
<td><a href="https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/docs-6.4.2/index.html">ROCprofiler-SDK</a></td>
<td>0.6.0</td>
<td><a href="https://github.com/ROCm/rocprofiler-sdk/"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr >
<td><a href="https://rocm.docs.amd.com/projects/roctracer/en/docs-6.4.1/index.html">ROCTracer</a></td>
<td><a href="https://rocm.docs.amd.com/projects/roctracer/en/docs-6.4.2/index.html">ROCTracer</a></td>
<td>4.1.0</td>
<td><a href="https://github.com/ROCm/ROCTracer/"><i
class="fab fa-github fa-lg"></i></a></td>
@@ -352,32 +370,32 @@ Click {fab}`github` to go to the component's source code on GitHub.
<tr>
<th rowspan="5"></th>
<th rowspan="5">Development</th>
<td><a href="https://rocm.docs.amd.com/projects/HIPIFY/en/docs-6.4.1/index.html">HIPIFY</a></td>
<td><a href="https://rocm.docs.amd.com/projects/HIPIFY/en/docs-6.4.2/index.html">HIPIFY</a></td>
<td>19.0.0</td>
<td><a href="https://github.com/ROCm/HIPIFY/"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/ROCdbgapi/en/docs-6.4.1/index.html">ROCdbgapi</a></td>
<td><a href="https://rocm.docs.amd.com/projects/ROCdbgapi/en/docs-6.4.2/index.html">ROCdbgapi</a></td>
<td>0.77.2</td>
<td><a href="https://github.com/ROCm/ROCdbgapi/"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/ROCmCMakeBuildTools/en/docs-6.4.1/index.html">ROCm CMake</a></td>
<td><a href="https://rocm.docs.amd.com/projects/ROCmCMakeBuildTools/en/docs-6.4.2/index.html">ROCm CMake</a></td>
<td>0.14.0</td>
<td><a href="https://github.com/ROCm/rocm-cmake/"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/ROCgdb/en/docs-6.4.1/index.html">ROCm Debugger (ROCgdb)</a>
<td><a href="https://rocm.docs.amd.com/projects/ROCgdb/en/docs-6.4.2/index.html">ROCm Debugger (ROCgdb)</a>
</td>
<td>15.2</td>
<td><a href="https://github.com/ROCm/ROCgdb/"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocr_debug_agent/en/docs-6.4.1/index.html">ROCr Debug Agent</a>
<td><a href="https://rocm.docs.amd.com/projects/rocr_debug_agent/en/docs-6.4.2/index.html">ROCr Debug Agent</a>
</td>
<td>2.0.4</td>
<td><a href="https://github.com/ROCm/rocr_debug_agent/"><i
@@ -387,13 +405,13 @@ Click {fab}`github` to go to the component's source code on GitHub.
<tbody class="rocm-components-compilers tbody-reverse-zebra">
<tr>
<th rowspan="2" colspan="2">Compilers</th>
<td><a href="https://rocm.docs.amd.com/projects/HIPCC/en/docs-6.4.1/index.html">HIPCC</a></td>
<td><a href="https://rocm.docs.amd.com/projects/HIPCC/en/docs-6.4.2/index.html">HIPCC</a></td>
<td>1.1.1</td>
<td><a href="https://github.com/ROCm/llvm-project/tree/amd-staging/amd/hipcc"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/llvm-project/en/docs-6.4.1/index.html">llvm-project</a></td>
<td><a href="https://rocm.docs.amd.com/projects/llvm-project/en/docs-6.4.2/index.html">llvm-project</a></td>
<td>19.0.0</td>
<td><a href="https://github.com/ROCm/llvm-project/"><i
class="fab fa-github fa-lg"></i></a></td>
@@ -402,13 +420,13 @@ Click {fab}`github` to go to the component's source code on GitHub.
<tbody class="rocm-components-runtimes tbody-reverse-zebra">
<tr>
<th rowspan="2" colspan="2">Runtimes</th>
<td><a href="https://rocm.docs.amd.com/projects/HIP/en/docs-6.4.1/index.html">HIP</a></td>
<td>6.4.0&nbsp;&Rightarrow;&nbsp;<a href="#hip-6-4-1">6.4.1</td>
<td><a href="https://rocm.docs.amd.com/projects/HIP/en/docs-6.4.2/index.html">HIP</a></td>
<td>6.4.1&nbsp;&Rightarrow;&nbsp;<a href="#hip-6-4-2">6.4.2</td>
<td><a href="https://github.com/ROCm/HIP/"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/ROCR-Runtime/en/docs-6.4.1/index.html">ROCr Runtime</a></td>
<td>1.15.0&nbsp;&Rightarrow;&nbsp;<a href="#rocr-runtime-1-15-0">1.15.0</td>
<td><a href="https://rocm.docs.amd.com/projects/ROCR-Runtime/en/docs-6.4.2/index.html">ROCr Runtime</a></td>
<td>1.15.0</td>
<td><a href="https://github.com/ROCm/ROCR-Runtime/"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
</tbody>
@@ -423,172 +441,187 @@ The following sections describe key changes to ROCm components.
For a historical overview of ROCm component updates, see the {doc}`ROCm consolidated changelog </release/changelog>`.
```
### **AMD SMI** (25.4.2)
### **AMD SMI** (25.5.1)
#### Added
* Dumping CPER entries from RAS tool `amdsmi_get_gpu_cper_entries()` to Python and C APIs.
- Dumping CPER entries consist of `amdsmi_cper_hdr_t`.
- Dumping CPER entries is also enabled in the CLI interface through `sudo amd-smi ras --cper`.
* `amdsmi_get_gpu_busy_percent` to the C API.
- Compute Unit Occupancy information per process.
- Support for getting the GPU Board voltage.
- New firmware PLDM_BUNDLE. `amd-smi firmware` can now show the PLDM Bundle on supported systems.
- `amd-smi ras --afid --cper-file <file_path>` to decode CPER records.
#### Changed
* Modified VRAM display for amd-smi monitor -v.
- Padded `asic_serial` in `amdsmi_get_asic_info` with 0s.
#### Optimized
* Improved load times for CLI commands when the GPU has multiple parititons.
- Renamed field `COMPUTE_PARTITION` to `ACCELERATOR_PARTITION` in CLI call `amd-smi --partition`.
#### Resolved issues
* Fixed partition enumeration in `amd-smi list -e`, `amdsmi_get_gpu_enumeration_info()`, `amdsmi_enumeration_info_t`, `drm_card`, and `drm_render` fields.
#### Known issues
* When using the `--follow` flag with `amd-smi ras --cper`, CPER entries are not streamed continuously as intended. This will be fixed in an upcoming ROCm release.
- Corrected VRAM memory calculation in `amdsmi_get_gpu_process_list`. Previously, the VRAM memory usage reported by `amdsmi_get_gpu_process_list` was inaccurate and was calculated using KB instead of KiB.
```{note}
See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions.
```
### **HIP** (6.4.1)
### **HIP** (6.4.2)
#### Added
* New log mask enumeration `LOG_COMGR` enables logging precise code object information.
#### Changed
* HIP runtime uses device bitcode before SPIRV.
* The implementation of preventing `hipLaunchKernel` latency degradation with number of idle streams is reverted/disabled by default.
* HIP API implementation for `hipEventRecordWithFlags`, records an event in the specified stream with flags.
* Support for the pointer attribute `HIP_POINTER_ATTRIBUTE_CONTEXT`.
* Support for the flags `hipEventWaitDefault` and `hipEventWaitExternal`.
#### Optimized
* Improved kernel logging includes de-mangling shader names.
* Refined implementation in HIP APIs `hipEventRecords` and `hipStreamWaitEvent` for performance improvement.
* Improved implementation in `hipEventSynchronize`, HIP runtime now makes internal callbacks as non-blocking operations to improve performance.
#### Resolved issues
* Stale state during the graph capture. The return error was fixed, HIP runtime now always uses the latest dependent nodes during `hipEventRecord` capture.
* Segmentation fault during kernel execution. HIP runtime now allows maximum stack size as per ISA on the GPU device.
* Issue of dependency on `libgcc-s1` during rocm-dev install on Debian Buster. HIP runtime removed this Debian package dependency, and uses `libgcc1` instead for this distros.
* Building issue for `COMGR` dynamic load on Fedora and other Distros. HIP runtime now doesn't link against `libamd_comgr.so`.
* Failure in the API `hipStreamDestroy`, when stream type is `hipStreamLegacy`. The API now returns error code `hipErrorInvalidResourceHandle` on this condition.
* Kernel launch errors, such as `shared object initialization failed`, `invalid device function` or `kernel execution failure`. HIP runtime now loads `COMGR` properly considering the file with its name and mapped image.
* Memory access fault in some applications. HIP runtime fixed offset accumulation in memory address.
* The memory leak in virtual memory management (VMM). HIP runtime now uses the size of handle for allocated memory range instead of actual size for physical memory, which fixed the issue of address clash with VMM.
* Large memory allocation issue. HIP runtime now checks GPU video RAM and system RAM properly and sets size limits during memory allocation either on the host or the GPU device.
* Support of `hipDeviceMallocContiguous` flags in `hipExtMallocWithFlags()`. It now enables `HSA_AMD_MEMORY_POOL_CONTIGUOUS_FLAG` in the memory pool allocation on GPU device.
* Radom memory segmentation fault in handling `GraphExec` object release and `hipDeviceSyncronization`. HIP runtime now uses internal device synchronize function in `__hipUnregisterFatBinary`.
### **hipBLASLt** (0.12.1)
#### Resolved issues
#### Added
* Fixed an accuracy issue for some solutions using an `FP32` or `TF32` data type with a TT transpose.
* Support for gfx1151 on Linux, complementing the previous support in the HIP SDK for Windows.
### **RCCL** (2.22.3)
#### Changed
#### Added
* MSCCL++ is now disabled by default. To enable it, set `RCCL_MSCCLPP_ENABLE=1`.
* Added support for the LL128 protocol on gfx942.
### **rocBLAS** (4.4.1)
#### Resolved issues
* Fixed an issue where early termination, in rare circumstances, could cause the application to stop responding by adding synchronization before destroying a proxy thread.
* Fixed the accuracy issue for the MSCCLPP `allreduce7` kernel in graph mode.
* rocBLAS might have failed to produce correct results for cherk/zherk on gfx90a/gfx942 with problem sizes k > 500 due to the imaginary portion on the C matrix diagonal not being zeros. rocBLAS now zeros the imaginary portion.
#### Known issues
* When splitting a communicator using `ncclCommSplit` in some GPU configurations, MSCCL initialization can cause a segmentation fault. The recommended workaround is to disable MSCCL with `export RCCL_MSCCL_ENABLE=0`.
This issue will be fixed in a future ROCm release.
* Within the RCCL-UnitTests test suite, failures occur in tests ending with the
`.ManagedMem` and `.ManagedMemGraph` suffixes. These failures only affect the
test results and do not affect the RCCL component itself. This issue will be
resolved in a future ROCm release.
### **rocALUTION** (3.2.3)
### **ROCm Compute Profiler** (3.1.1)
#### Added
* The `-a` option has been added to the `rmake.py` build script. This option allows you to select specific architectures when building on Microsoft Windows.
#### Resolved issues
* Fixed an issue where the `HIP_PATH` environment variable was being ignored when compiling on Microsoft Windows.
### **ROCm Data Center Tool** (0.3.0)
#### Added
- Support for GPU partitions.
- `RDC_FI_GPU_BUSY_PERCENT` metric.
* 8-bit floating point (FP8) metrics support for AMD Instinct MI300 GPUs.
* Additional data types for roofline: FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on the GPU architecture).
* Data type selection option ``--roofline-data-type / -R`` for roofline profiling. The default data type is FP32.
#### Changed
- Updated `rdc_field` to align with `rdc_bootstrap` for current metrics.
* Changed dependency from `rocm-smi` to `amd-smi`.
#### Resolved issues
- Fixed [ROCProfiler](https://rocm.docs.amd.com/projects/rocprofiler/en/docs-6.4.0/index.html) eval metrics and memory leaks.
* Fixed a crash related to Agent ID caused by the new format of the `rocprofv3` output CSV file.
### **ROCm SMI** (7.5.0)
### **ROCm Systems Profiler** (1.0.2)
#### Optimized
* Improved readability of the OpenMP target offload traces by showing on a single Perfetto track.
#### Resolved issues
- Fixed partition enumeration. It now refers to the correct DRM Render and Card paths.
* Fixed the file path to the script that merges Perfetto files from multi-process MPI runs. The script has also been renamed from `merge-multiprocess-output.sh` to `rocprof-sys-merge-output.sh`.
```{note}
See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions.
```
### **ROCm Validation Suite** (1.1.0)
### **ROCm Systems Profiler** (1.0.1)
#### Added
#### Added
* NPS2/DPX and NPS4/CPX partition modes support for AMD Instinct MI300X.
* How-to document for [network performance profiling](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/how-to/nic-profiling.html) for standard Network Interface Cards (NICs).
### **rocPRIM** (3.4.1)
#### Upcoming changes
* Changes to the template parameters of warp and block algorithms will be made in an upcoming release.
* Due to an upcoming compiler change, the following symbols related to warp size have been marked as deprecated and will be removed in an upcoming major release:
* `rocprim::device_warp_size()`. This has been replaced by `rocprim::arch::wavefront::min_size()` and `rocprim::arch::wavefront::max_size()` for compile-time constants. Use these when allocating global or shared memory. For run-time constants, use `rocprim::arch::wavefront::size()`.
* `rocprim::warp_size()`
* `ROCPRIM_WAVEFRONT_SIZE`
* The default scan accumulator types for device-level scan algorithms will be changed in an upcoming release, resulting in a breaking change. Previously, the default accumulator type was set to the input type for the inclusive scans and to the initial value type for the exclusive scans. This could lead to unexpected overflow if the input or initial type was smaller than the output type when the accumulator type wasn't explicitly set using the `AccType` template parameter. The new default accumulator types will be set to the type that results when the input or initial value type is applied to the scan operator.
The following is the complete list of affected functions and how their default accumulator types are changing:
* `rocprim::inclusive_scan`
* current default: `class AccType = typename std::iterator_traits<InputIterator>::value_type>`
* future default: `class AccType = rocprim::invoke_result_binary_op_t<typename std::iterator_traits<InputIterator>::value_type, BinaryFunction>`
* `rocprim::deterministic_inclusive_scan`
* current default: `class AccType = typename std::iterator_traits<InputIterator>::value_type>`
* future default: `class AccType = rocprim::invoke_result_binary_op_t<typename std::iterator_traits<InputIterator>::value_type, BinaryFunction>`
* `rocprim::exclusive_scan`
* current default: `class AccType = detail::input_type_t<InitValueType>>`
* future default: `class AccType = rocprim::invoke_result_binary_op_t<rocprim::detail::input_type_t<InitValueType>, BinaryFunction>`
* `rocprim::deterministic_exclusive_scan`
* current default: `class AccType = detail::input_type_t<InitValueType>>`
* future default: `class AccType = rocprim::invoke_result_binary_op_t<rocprim::detail::input_type_t<InitValueType>, BinaryFunction>`
* `rocprim::load_cs` and `rocprim::store_cs` are deprecated and will be removed in an upcoming release. Alternatively, you can use `rocprim::load_nontemporal` and `rocprim::store_nontemporal` to load and store values in specific conditions (like bypassing the cache) for `rocprim::thread_load` and `rocprim::thread_store`.
### **rocSHMEM** (2.0.1)
#### Resolved issues
* Fixed a build issue with Dyninst on GCC 13.
* Incorrect output for `rocshmem_ctx_my_pe` and `rocshmem_ctx_n_pes`.
* Multi-team errors by providing team specific buffers in `rocshmem_ctx_wg_team_sync`.
* Missing implementation of `rocshmem_g` for IPC conduit.
### **ROCr Runtime** (1.15.0)
### **rocSOLVER** (3.28.2)
#### Resolved issues
#### Added
* Fixed a rare occurrence issue on AMD Instinct MI25, MI50, and MI100 GPUs, where the `SDMA` copies might start before the dependent Kernel finishes and could cause memory corruption.
* Hybrid computation support for existing routines, such as STERF.
* SVD for general matrices based on Cuppen's Divide and Conquer algorithm:
- GESDD (with batched and strided\_batched versions)
#### Optimized
* Reduced the device memory requirements for STEDC, SYEVD/HEEVD, and SYGVD/HEGVD.
* Improved the performance of STEDC and divide and conquer Eigensolvers.
* Improved the performance of SYTRD, the initial step of the Eigensolvers that start with the tridiagonalization of the input matrix.
## ROCm known issues
ROCm known issues are noted on {fab}`github` [GitHub](https://github.com/ROCm/ROCm/labels/Verified%20Issue). For known
issues related to individual components, review the [Detailed component changes](#detailed-component-changes).
### Radeon AI PRO R9700 hangs when running Stable Diffusion 2.1 at batch sizes above four
## ROCm resolved issues
Radeon AI PRO R9700 GPUs might hang when running [Stable Diffusion
2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1) with batch sizes
greater than four. As a workaround, limit batch sizes to four or fewer. This issue
will be addressed in a future ROCm release. See [issue #4770](https://github.com/ROCm/ROCm/issues/4770) on GitHub.
### RCCL MSCCL initialization failure
When splitting a communicator using `ncclCommSplit` in some GPU configurations, MSCCL initialization can cause a segmentation fault. The recommended workaround is to disable MSCCL with `export RCCL_MSCCL_ENABLE=0`.
This issue will be fixed in a future ROCm release. See [issue #4769](https://github.com/ROCm/ROCm/issues/4769) on GitHub.
The following are previously known issues resolved in this release. For resolved issues related to
individual components, review the [Detailed component changes](#detailed-component-changes).
### AMD SMI CLI: CPER entries not dumped continuously when using follow flag
* When using the `--follow` flag with `amd-smi ras --cper`, CPER entries are not streamed continuously as intended. This will be fixed in an upcoming ROCm release.
See [issue #4768](https://github.com/ROCm/ROCm/issues/4768) on GitHub.
An issue where CPER entries were not streamed continuously as intended when using the `--follow` flag with `amd-smi ras --cper` has been resolved. See [GitHub issue #4768](https://github.com/ROCm/ROCm/issues/4768).
### ROCm SMI uninstallation issue on RHEL and SLES
### Instinct MI300X reports incorrect raw GPU timestamps
`rocm-smi-lib` does not get uninstalled and remains orphaned on RHEL and SLES systems when:
An issue where the command processor firmware reported incorrect raw GPU timestamps on MI300X accelerators has been resolved. See [GitHub issue #4079](https://github.com/ROCm/ROCm/issues/4079).
* [Uninstalling ROCm using the AMDGPU installer](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/amdgpu-install.html#uninstalling-rocm) with `amdgpu-install --uninstall`
### MIOpen generates incorrect results for particular input with FP32 data type
* [Uninstalling via package manager](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/install-methods/package-manager/package-manager-rhel.html#uninstall-rocm-packages)
with `dnf remove rocm-core` on RHEL or `zypper remove rocm-core` on SLES.
As a workaround, manually remove the `rocm-smi-lib` package using `sudo dnf remove rocm-smi-lib` or `sudo zypper remove rocm-smi-lib`.
See [issue #4767](https://github.com/ROCm/ROCm/issues/4767) on GitHub.
An issue where MIOpen generated incorrect results on the `conv2dbackward` function for a particular input with 32-bit floating point (FP32) data types has been resolved. The issue was only specific to FP32 data types with 2 * 2 kernel size and dilation 2 * 1. See [GitHub issue #4606](https://github.com/ROCm/ROCm/issues/4606).
## ROCm upcoming changes
The following changes to the ROCm software stack are anticipated for future releases.
### AMD SMI migration to AMDGPU driver repository
In a future release, [AMD SMI](https://github.com/ROCm/amdsmi) will be relocated from the ROCm organization repository to a new AMDTools repository to better align with its system-level functionality. `amd-smi-lib` will no longer be included in the `rocm-developer-tools` meta-package included with your standard ROCm installation. Instead, it will be packaged with the AMDGPU driver installation.
### ROCm SMI deprecation
[ROCm SMI](https://github.com/ROCm/rocm_smi_lib) will be phased out in an
@@ -616,10 +649,10 @@ and will be disabled in a future release.
* The `__AMDGCN_WAVEFRONT_SIZE__` macro and `__AMDGCN_WAVEFRONT_SIZE` alias will be removed in an upcoming release.
It is recommended to remove any use of this macro. For more information, see
[AMDGPU support](https://rocm.docs.amd.com/projects/llvm-project/en/docs-6.4.0/LLVM/clang/html/AMDGPUSupport.html).
[AMDGPU support](https://rocm.docs.amd.com/projects/llvm-project/en/docs-6.4.2/LLVM/clang/html/AMDGPUSupport.html).
* `warpSize` will only be available as a non-`constexpr` variable. Where required,
the wavefront size should be queried via the `warpSize` variable in device code,
or via `hipGetDeviceProperties` in host code. Neither of these will result in a compile-time constant.
or via `hipGetDeviceProperties` in host code. Neither of these will result in a compile-time constant. For more information, see [warpSize](https://rocm.docs.amd.com/projects/HIP/en/docs-6.4.2/how-to/hip_cpp_language_extensions.html#warpsize).
* For cases where compile-time evaluation of the wavefront size cannot be avoided,
uses of `__AMDGCN_WAVEFRONT_SIZE`, `__AMDGCN_WAVEFRONT_SIZE__`, or `warpSize`
can be replaced with a user-defined macro or `constexpr` variable with the wavefront

View File

@@ -1,7 +1,7 @@
<?xml version="1.0" encoding="UTF-8"?>
<manifest>
<remote name="rocm-org" fetch="https://github.com/ROCm/" />
<default revision="refs/tags/rocm-6.4.1"
<default revision="refs/tags/rocm-6.4.2"
remote="rocm-org"
sync-c="true"
sync-j="4" />

View File

@@ -1,126 +1,129 @@
ROCm Version,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5, 6.1.2, 6.1.1, 6.1.0, 6.0.2, 6.0.0
:ref:`Operating systems & kernels <OS-kernel-versions>`,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,"Ubuntu 24.04.1, 24.04","Ubuntu 24.04.1, 24.04","Ubuntu 24.04.1, 24.04",Ubuntu 24.04,,,,,,
,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,"Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3, 22.04.2","Ubuntu 22.04.4, 22.04.3, 22.04.2"
,,,,,,,,,,,"Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5"
,"RHEL 9.6, 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.3, 9.2","RHEL 9.3, 9.2"
,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,"RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8"
,SLES 15 SP6,SLES 15 SP6,"SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4"
,,,,,,,,,,,,CentOS 7.9,CentOS 7.9,CentOS 7.9,CentOS 7.9,CentOS 7.9
,"Oracle Linux 9, 8 [#mi300x-past-60]_","Oracle Linux 9, 8 [#mi300x-past-60]_",Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,,,
,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,,,,,,,,,,,
,Azure Linux 3.0 [#mi300x-past-60]_,Azure Linux 3.0 [#mi300x-past-60]_,Azure Linux 3.0 [#mi300x-past-60]_,Azure Linux 3.0 [#mi300x-past-60]_,,,,,,,,,,,,
,.. _architecture-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
:doc:`Architecture <rocm-install-on-linux:reference/system-requirements>`,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3
,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2
,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA
,RDNA4,,,,,,,,,,,,,,,
,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3
,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2
,.. _gpu-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
:doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>`,gfx1201 [#RDNA-OS-past-60]_,,,,,,,,,,,,,,,
,gfx1200 [#RDNA-OS-past-60]_,,,,,,,,,,,,,,,
,gfx1101 [#RDNA-OS-past-60]_,,,,,,,,,,,,,,,
,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100
,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030
,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942 [#mi300_624-past-60]_,gfx942 [#mi300_622-past-60]_,gfx942 [#mi300_621-past-60]_,gfx942 [#mi300_620-past-60]_, gfx942 [#mi300_612-past-60]_, gfx942 [#mi300_612-past-60]_, gfx942 [#mi300_611-past-60]_, gfx942 [#mi300_610-past-60]_, gfx942 [#mi300_602-past-60]_, gfx942 [#mi300_600-past-60]_
,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a
,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908
,,,,,,,,,,,,,,,,
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
:doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
:doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>`,"2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.14.0, 2.13.1, 2.12.1","2.14.0, 2.13.1, 2.12.1"
:doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.4.35,0.4.35,0.4.31,0.4.31,0.4.31,0.4.31,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.2,1.2,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.14.1,1.14.1
,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,
THIRD PARTY COMMS,.. _thirdpartycomms-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
`UCC <https://github.com/ROCm/ucc>`_,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.2.0,>=1.2.0
`UCX <https://github.com/ROCm/ucx>`_,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1
,,,,,,,,,,,,,,,,
THIRD PARTY ALGORITHM,.. _thirdpartyalgorithm-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
Thrust,2.5.0,2.5.0,2.3.2,2.3.2,2.3.2,2.3.2,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.1,2.0.1
CUB,2.5.0,2.5.0,2.3.2,2.3.2,2.3.2,2.3.2,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.1,2.0.1
,,,,,,,,,,,,,,,,
KMD & USER SPACE [#kfd_support-past-60]_,.. _kfd-userspace-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
:doc:`KMD versions <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x"
,,,,,,,,,,,,,,,,
ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
:doc:`Composable Kernel <composable_kernel:index>`,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0
:doc:`MIGraphX <amdmigraphx:index>`,2.12.0,2.12.0,2.11.0,2.11.0,2.11.0,2.11.0,2.10.0,2.10.0,2.10.0,2.10.0,2.9.0,2.9.0,2.9.0,2.9.0,2.8.0,2.8.0
:doc:`MIOpen <miopen:index>`,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.0,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
:doc:`MIVisionX <mivisionx:index>`,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0,3.0.0,3.0.0,2.5.0,2.5.0,2.5.0,2.5.0,2.5.0,2.5.0
:doc:`rocAL <rocal:index>`,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.0,2.0.0,2.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
:doc:`rocDecode <rocdecode:index>`,0.10.0,0.10.0,0.8.0,0.8.0,0.8.0,0.8.0,0.6.0,0.6.0,0.6.0,0.6.0,0.6.0,0.6.0,0.5.0,0.5.0,N/A,N/A
:doc:`rocJPEG <rocjpeg:index>`,0.8.0,0.8.0,0.6.0,0.6.0,0.6.0,0.6.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`rocPyDecode <rocpydecode:index>`,0.3.1,0.3.1,0.2.0,0.2.0,0.2.0,0.2.0,0.1.0,0.1.0,0.1.0,0.1.0,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`RPP <rpp:index>`,1.9.10,1.9.10,1.9.1,1.9.1,1.9.1,1.9.1,1.8.0,1.8.0,1.8.0,1.8.0,1.5.0,1.5.0,1.5.0,1.5.0,1.4.0,1.4.0
,,,,,,,,,,,,,,,,
COMMUNICATION,.. _commlibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
:doc:`RCCL <rccl:index>`,2.22.3,2.22.3,2.21.5,2.21.5,2.21.5,2.21.5,2.20.5,2.20.5,2.20.5,2.20.5,2.18.6,2.18.6,2.18.6,2.18.6,2.18.3,2.18.3
:doc:`rocSHMEM <rocshmem:index>`,2.0.0,2.0.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
,,,,,,,,,,,,,,,,
MATH LIBS,.. _mathlibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
`half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0
:doc:`hipBLAS <hipblas:index>`,2.4.0,2.4.0,2.3.0,2.3.0,2.3.0,2.3.0,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.0,2.0.0
:doc:`hipBLASLt <hipblaslt:index>`,0.12.1,0.12.0,0.10.0,0.10.0,0.10.0,0.10.0,0.8.0,0.8.0,0.8.0,0.8.0,0.7.0,0.7.0,0.7.0,0.7.0,0.6.0,0.6.0
:doc:`hipFFT <hipfft:index>`,1.0.18,1.0.18,1.0.17,1.0.17,1.0.17,1.0.17,1.0.16,1.0.15,1.0.15,1.0.14,1.0.14,1.0.14,1.0.14,1.0.14,1.0.13,1.0.13
:doc:`hipfort <hipfort:index>`,0.6.0,0.6.0,0.5.1,0.5.1,0.5.0,0.5.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0
:doc:`hipRAND <hiprand:index>`,2.12.0,2.12.0,2.11.1,2.11.1,2.11.1,2.11.0,2.11.1,2.11.0,2.11.0,2.11.0,2.10.16,2.10.16,2.10.16,2.10.16,2.10.16,2.10.16
:doc:`hipSOLVER <hipsolver:index>`,2.4.0,2.4.0,2.3.0,2.3.0,2.3.0,2.3.0,2.2.0,2.2.0,2.2.0,2.2.0,2.1.1,2.1.1,2.1.1,2.1.0,2.0.0,2.0.0
:doc:`hipSPARSE <hipsparse:index>`,3.2.0,3.2.0,3.1.2,3.1.2,3.1.2,3.1.2,3.1.1,3.1.1,3.1.1,3.1.1,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,3.0.0
:doc:`hipSPARSELt <hipsparselt:index>`,0.2.3,0.2.3,0.2.2,0.2.2,0.2.2,0.2.2,0.2.1,0.2.1,0.2.1,0.2.1,0.2.0,0.2.0,0.1.0,0.1.0,0.1.0,0.1.0
:doc:`rocALUTION <rocalution:index>`,3.2.3,3.2.2,3.2.1,3.2.1,3.2.1,3.2.1,3.2.1,3.2.0,3.2.0,3.2.0,3.1.1,3.1.1,3.1.1,3.1.1,3.0.3,3.0.3
:doc:`rocBLAS <rocblas:index>`,4.4.0,4.4.0,4.3.0,4.3.0,4.3.0,4.3.0,4.2.4,4.2.1,4.2.1,4.2.0,4.1.2,4.1.2,4.1.0,4.1.0,4.0.0,4.0.0
:doc:`rocFFT <rocfft:index>`,1.0.32,1.0.32,1.0.31,1.0.31,1.0.31,1.0.31,1.0.30,1.0.29,1.0.29,1.0.28,1.0.27,1.0.27,1.0.27,1.0.26,1.0.25,1.0.23
:doc:`rocRAND <rocrand:index>`,3.3.0,3.3.0,3.2.0,3.2.0,3.2.0,3.2.0,3.1.1,3.1.0,3.1.0,3.1.0,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,2.10.17
:doc:`rocSOLVER <rocsolver:index>`,3.28.0,3.28.0,3.27.0,3.27.0,3.27.0,3.27.0,3.26.2,3.26.0,3.26.0,3.26.0,3.25.0,3.25.0,3.25.0,3.25.0,3.24.0,3.24.0
:doc:`rocSPARSE <rocsparse:index>`,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.1,3.2.0,3.2.0,3.2.0,3.1.2,3.1.2,3.1.2,3.1.2,3.0.2,3.0.2
:doc:`rocWMMA <rocwmma:index>`,1.7.0,1.7.0,1.6.0,1.6.0,1.6.0,1.6.0,1.5.0,1.5.0,1.5.0,1.5.0,1.4.0,1.4.0,1.4.0,1.4.0,1.3.0,1.3.0
:doc:`Tensile <tensile:src/index>`,4.43.0,4.43.0,4.42.0,4.42.0,4.42.0,4.42.0,4.41.0,4.41.0,4.41.0,4.41.0,4.40.0,4.40.0,4.40.0,4.40.0,4.39.0,4.39.0
,,,,,,,,,,,,,,,,
PRIMITIVES,.. _primitivelibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
:doc:`hipCUB <hipcub:index>`,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.1,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
:doc:`hipTensor <hiptensor:index>`,1.5.0,1.5.0,1.4.0,1.4.0,1.4.0,1.4.0,1.3.0,1.3.0,1.3.0,1.3.0,1.2.0,1.2.0,1.2.0,1.2.0,1.1.0,1.1.0
:doc:`rocPRIM <rocprim:index>`,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.2,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
:doc:`rocThrust <rocthrust:index>`,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.1.1,3.1.0,3.1.0,3.0.1,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,3.0.0
,,,,,,,,,,,,,,,,
SUPPORT LIBS,,,,,,,,,,,,,,,,
`hipother <https://github.com/ROCm/hipother>`_,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
`rocm-core <https://github.com/ROCm/rocm-core>`_,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0,6.1.5,6.1.2,6.1.1,6.1.0,6.0.2,6.0.0
`ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,20240607.5.7,20240607.5.7,20240607.4.05,20240607.1.4246,20240125.5.08,20240125.5.08,20240125.5.08,20240125.3.30,20231016.2.245,20231016.2.245
,,,,,,,,,,,,,,,,
SYSTEM MGMT TOOLS,.. _tools-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
:doc:`AMD SMI <amdsmi:index>`,25.4.2,25.3.0,24.7.1,24.7.1,24.7.1,24.7.1,24.6.3,24.6.3,24.6.3,24.6.2,24.5.1,24.5.1,24.5.1,24.4.1,23.4.2,23.4.2
:doc:`ROCm Data Center Tool <rdc:index>`,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0
:doc:`rocminfo <rocminfo:index>`,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
:doc:`ROCm SMI <rocm_smi_lib:index>`,7.5.0,7.5.0,7.4.0,7.4.0,7.4.0,7.4.0,7.3.0,7.3.0,7.3.0,7.3.0,7.2.0,7.2.0,7.0.0,7.0.0,6.0.2,6.0.0
:doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.0.60204,1.0.60202,1.0.60201,1.0.60200,1.0.60105,1.0.60102,1.0.60101,1.0.60100,1.0.60002,1.0.60000
,,,,,,,,,,,,,,,,
PERFORMANCE TOOLS,,,,,,,,,,,,,,,,
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0
:doc:`ROCm Compute Profiler <rocprofiler-compute:index>`,3.1.0,3.1.0,3.0.0,3.0.0,3.0.0,3.0.0,2.0.1,2.0.1,2.0.1,2.0.1,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`ROCm Systems Profiler <rocprofiler-systems:index>`,1.0.1,1.0.0,0.1.2,0.1.1,0.1.0,0.1.0,1.11.2,1.11.2,1.11.2,1.11.2,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`ROCProfiler <rocprofiler:index>`,2.0.60401,2.0.60400,2.0.60303,2.0.60302,2.0.60301,2.0.60300,2.0.60204,2.0.60202,2.0.60201,2.0.60200,2.0.60105,2.0.60102,2.0.60101,2.0.60100,2.0.60002,2.0.60000
:doc:`ROCprofiler-SDK <rocprofiler-sdk:index>`,0.6.0,0.6.0,0.5.0,0.5.0,0.5.0,0.5.0,0.4.0,0.4.0,0.4.0,0.4.0,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`ROCTracer <roctracer:index>`,4.1.60401,4.1.60400,4.1.60303,4.1.60302,4.1.60301,4.1.60300,4.1.60204,4.1.60202,4.1.60201,4.1.60200,4.1.60105,4.1.60102,4.1.60101,4.1.60100,4.1.60002,4.1.60000
,,,,,,,,,,,,,,,,
DEVELOPMENT TOOLS,,,,,,,,,,,,,,,,
:doc:`HIPIFY <hipify:index>`,19.0.0,19.0.0,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24455,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
:doc:`ROCm CMake <rocmcmakebuildtools:index>`,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.13.0,0.13.0,0.13.0,0.13.0,0.12.0,0.12.0,0.12.0,0.12.0,0.11.0,0.11.0
:doc:`ROCdbgapi <rocdbgapi:index>`,0.77.2,0.77.2,0.77.0,0.77.0,0.77.0,0.77.0,0.76.0,0.76.0,0.76.0,0.76.0,0.71.0,0.71.0,0.71.0,0.71.0,0.71.0,0.71.0
:doc:`ROCm Debugger (ROCgdb) <rocgdb:index>`,15.2.0,15.2.0,15.2.0,15.2.0,15.2.0,15.2.0,14.2.0,14.2.0,14.2.0,14.2.0,14.1.0,14.1.0,14.1.0,14.1.0,13.2.0,13.2.0
`rocprofiler-register <https://github.com/ROCm/rocprofiler-register>`_,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.3.0,0.3.0,0.3.0,0.3.0,N/A,N/A
:doc:`ROCr Debug Agent <rocr_debug_agent:index>`,2.0.4,2.0.4,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3
,,,,,,,,,,,,,,,,
COMPILERS,.. _compilers-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
`clang-ocl <https://github.com/ROCm/clang-ocl>`_,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,0.5.0,0.5.0,0.5.0,0.5.0,0.5.0,0.5.0
:doc:`hipCC <hipcc:index>`,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
`Flang <https://github.com/ROCm/flang>`_,19.0.0.25184,19.0.0.25133,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24455,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
:doc:`llvm-project <llvm-project:index>`,19.0.0.25184,19.0.0.25133,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24491,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
`OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_,19.0.0.25184,19.0.0.25133,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24491,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
,,,,,,,,,,,,,,,,
RUNTIMES,.. _runtime-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
:doc:`AMD CLR <hip:understand/amd_clr>`,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
:doc:`HIP <hip:index>`,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
`OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0
:doc:`ROCr Runtime <rocr-runtime:index>`,1.15.0,1.15.0,1.14.0,1.14.0,1.14.0,1.14.0,1.14.0,1.14.0,1.14.0,1.13.0,1.13.0,1.13.0,1.13.0,1.13.0,1.12.0,1.12.0
ROCm Version,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5, 6.1.2, 6.1.1, 6.1.0, 6.0.2, 6.0.0
:ref:`Operating systems & kernels <OS-kernel-versions>`,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,"Ubuntu 24.04.1, 24.04","Ubuntu 24.04.1, 24.04","Ubuntu 24.04.1, 24.04",Ubuntu 24.04,,,,,,
,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,"Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3, 22.04.2","Ubuntu 22.04.4, 22.04.3, 22.04.2"
,,,,,,,,,,,,"Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5"
,"RHEL 9.6, 9.4","RHEL 9.6, 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.3, 9.2","RHEL 9.3, 9.2"
,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,"RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8"
,"SLES 15 SP7, SP6",SLES 15 SP6,SLES 15 SP6,"SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4"
,,,,,,,,,,,,,CentOS 7.9,CentOS 7.9,CentOS 7.9,CentOS 7.9,CentOS 7.9
,"Oracle Linux 9, 8 [#mi300x-past-60]_","Oracle Linux 9, 8 [#mi300x-past-60]_","Oracle Linux 9, 8 [#mi300x-past-60]_",Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,,,
,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,,,,,,,,,,,
,Azure Linux 3.0 [#mi300x-past-60]_,Azure Linux 3.0 [#mi300x-past-60]_,Azure Linux 3.0 [#mi300x-past-60]_,Azure Linux 3.0 [#mi300x-past-60]_,Azure Linux 3.0 [#mi300x-past-60]_,,,,,,,,,,,,
,.. _architecture-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`Architecture <rocm-install-on-linux:reference/system-requirements>`,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3,CDNA3
,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2,CDNA2
,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA,CDNA
,RDNA4,RDNA4,,,,,,,,,,,,,,,
,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3
,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2
,.. _gpu-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>`,gfx1201 [#RDNA-OS-past-60]_,gfx1201 [#RDNA-OS-past-60]_,,,,,,,,,,,,,,,
,gfx1200 [#RDNA-OS-past-60]_,gfx1200 [#RDNA-OS-past-60]_,,,,,,,,,,,,,,,
,gfx1101 [#RDNA-OS-past-60]_ [#7700XT-OS-past-60]_,gfx1101 [#RDNA-OS-past-60]_,,,,,,,,,,,,,,,
,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100
,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030
,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942 [#mi300_624-past-60]_,gfx942 [#mi300_622-past-60]_,gfx942 [#mi300_621-past-60]_,gfx942 [#mi300_620-past-60]_, gfx942 [#mi300_612-past-60]_, gfx942 [#mi300_612-past-60]_, gfx942 [#mi300_611-past-60]_, gfx942 [#mi300_610-past-60]_, gfx942 [#mi300_602-past-60]_, gfx942 [#mi300_600-past-60]_
,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a
,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908
,,,,,,,,,,,,,,,,,
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
:doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>`,"2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.14.0, 2.13.1, 2.12.1","2.14.0, 2.13.1, 2.12.1"
:doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.4.35,0.4.35,0.4.35,0.4.31,0.4.31,0.4.31,0.4.31,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26
:doc:`Stanford Megatron-LM <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>`,N/A,N/A,N/A,N/A,N/A,85f95ae,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>`,2.4.0,2.4.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`verl <../compatibility/ml-compatibility/verl-compatibility>` [#verl_compat]_,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,0.3.0.post0,N/A,N/A,N/A,N/A,N/A,N/A
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.2,1.2,1.2,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.14.1,1.14.1
,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,
THIRD PARTY COMMS,.. _thirdpartycomms-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
`UCC <https://github.com/ROCm/ucc>`_,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.3.0,>=1.2.0,>=1.2.0
`UCX <https://github.com/ROCm/ucx>`_,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.15.0,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1,>=1.14.1
,,,,,,,,,,,,,,,,,
THIRD PARTY ALGORITHM,.. _thirdpartyalgorithm-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
Thrust,2.5.0,2.5.0,2.5.0,2.3.2,2.3.2,2.3.2,2.3.2,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.1,2.0.1
CUB,2.5.0,2.5.0,2.5.0,2.3.2,2.3.2,2.3.2,2.3.2,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.1,2.0.1
,,,,,,,,,,,,,,,,,
KMD & USER SPACE [#kfd_support-past-60]_,.. _kfd-userspace-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`KMD versions <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x"
,,,,,,,,,,,,,,,,,
ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`Composable Kernel <composable_kernel:index>`,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0
:doc:`MIGraphX <amdmigraphx:index>`,2.12.0,2.12.0,2.12.0,2.11.0,2.11.0,2.11.0,2.11.0,2.10.0,2.10.0,2.10.0,2.10.0,2.9.0,2.9.0,2.9.0,2.9.0,2.8.0,2.8.0
:doc:`MIOpen <miopen:index>`,3.4.0,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.0,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
:doc:`MIVisionX <mivisionx:index>`,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0,3.0.0,3.0.0,2.5.0,2.5.0,2.5.0,2.5.0,2.5.0,2.5.0
:doc:`rocAL <rocal:index>`,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.0,2.0.0,2.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
:doc:`rocDecode <rocdecode:index>`,0.10.0,0.10.0,0.10.0,0.8.0,0.8.0,0.8.0,0.8.0,0.6.0,0.6.0,0.6.0,0.6.0,0.6.0,0.6.0,0.5.0,0.5.0,N/A,N/A
:doc:`rocJPEG <rocjpeg:index>`,0.8.0,0.8.0,0.8.0,0.6.0,0.6.0,0.6.0,0.6.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`rocPyDecode <rocpydecode:index>`,0.3.1,0.3.1,0.3.1,0.2.0,0.2.0,0.2.0,0.2.0,0.1.0,0.1.0,0.1.0,0.1.0,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`RPP <rpp:index>`,1.9.10,1.9.10,1.9.10,1.9.1,1.9.1,1.9.1,1.9.1,1.8.0,1.8.0,1.8.0,1.8.0,1.5.0,1.5.0,1.5.0,1.5.0,1.4.0,1.4.0
,,,,,,,,,,,,,,,,,
COMMUNICATION,.. _commlibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`RCCL <rccl:index>`,2.22.3,2.22.3,2.22.3,2.21.5,2.21.5,2.21.5,2.21.5,2.20.5,2.20.5,2.20.5,2.20.5,2.18.6,2.18.6,2.18.6,2.18.6,2.18.3,2.18.3
:doc:`rocSHMEM <rocshmem:index>`,2.0.1,2.0.0,2.0.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
,,,,,,,,,,,,,,,,,
MATH LIBS,.. _mathlibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
`half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0
:doc:`hipBLAS <hipblas:index>`,2.4.0,2.4.0,2.4.0,2.3.0,2.3.0,2.3.0,2.3.0,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.0,2.0.0
:doc:`hipBLASLt <hipblaslt:index>`,0.12.1,0.12.1,0.12.0,0.10.0,0.10.0,0.10.0,0.10.0,0.8.0,0.8.0,0.8.0,0.8.0,0.7.0,0.7.0,0.7.0,0.7.0,0.6.0,0.6.0
:doc:`hipFFT <hipfft:index>`,1.0.18,1.0.18,1.0.18,1.0.17,1.0.17,1.0.17,1.0.17,1.0.16,1.0.15,1.0.15,1.0.14,1.0.14,1.0.14,1.0.14,1.0.14,1.0.13,1.0.13
:doc:`hipfort <hipfort:index>`,0.6.0,0.6.0,0.6.0,0.5.1,0.5.1,0.5.0,0.5.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0
:doc:`hipRAND <hiprand:index>`,2.12.0,2.12.0,2.12.0,2.11.1,2.11.1,2.11.1,2.11.0,2.11.1,2.11.0,2.11.0,2.11.0,2.10.16,2.10.16,2.10.16,2.10.16,2.10.16,2.10.16
:doc:`hipSOLVER <hipsolver:index>`,2.4.0,2.4.0,2.4.0,2.3.0,2.3.0,2.3.0,2.3.0,2.2.0,2.2.0,2.2.0,2.2.0,2.1.1,2.1.1,2.1.1,2.1.0,2.0.0,2.0.0
:doc:`hipSPARSE <hipsparse:index>`,3.2.0,3.2.0,3.2.0,3.1.2,3.1.2,3.1.2,3.1.2,3.1.1,3.1.1,3.1.1,3.1.1,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,3.0.0
:doc:`hipSPARSELt <hipsparselt:index>`,0.2.3,0.2.3,0.2.3,0.2.2,0.2.2,0.2.2,0.2.2,0.2.1,0.2.1,0.2.1,0.2.1,0.2.0,0.2.0,0.1.0,0.1.0,0.1.0,0.1.0
:doc:`rocALUTION <rocalution:index>`,3.2.3,3.2.3,3.2.2,3.2.1,3.2.1,3.2.1,3.2.1,3.2.1,3.2.0,3.2.0,3.2.0,3.1.1,3.1.1,3.1.1,3.1.1,3.0.3,3.0.3
:doc:`rocBLAS <rocblas:index>`,4.4.1,4.4.0,4.4.0,4.3.0,4.3.0,4.3.0,4.3.0,4.2.4,4.2.1,4.2.1,4.2.0,4.1.2,4.1.2,4.1.0,4.1.0,4.0.0,4.0.0
:doc:`rocFFT <rocfft:index>`,1.0.32,1.0.32,1.0.32,1.0.31,1.0.31,1.0.31,1.0.31,1.0.30,1.0.29,1.0.29,1.0.28,1.0.27,1.0.27,1.0.27,1.0.26,1.0.25,1.0.23
:doc:`rocRAND <rocrand:index>`,3.3.0,3.3.0,3.3.0,3.2.0,3.2.0,3.2.0,3.2.0,3.1.1,3.1.0,3.1.0,3.1.0,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,2.10.17
:doc:`rocSOLVER <rocsolver:index>`,3.28.2,3.28.0,3.28.0,3.27.0,3.27.0,3.27.0,3.27.0,3.26.2,3.26.0,3.26.0,3.26.0,3.25.0,3.25.0,3.25.0,3.25.0,3.24.0,3.24.0
:doc:`rocSPARSE <rocsparse:index>`,3.4.0,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.1,3.2.0,3.2.0,3.2.0,3.1.2,3.1.2,3.1.2,3.1.2,3.0.2,3.0.2
:doc:`rocWMMA <rocwmma:index>`,1.7.0,1.7.0,1.7.0,1.6.0,1.6.0,1.6.0,1.6.0,1.5.0,1.5.0,1.5.0,1.5.0,1.4.0,1.4.0,1.4.0,1.4.0,1.3.0,1.3.0
:doc:`Tensile <tensile:src/index>`,4.43.0,4.43.0,4.43.0,4.42.0,4.42.0,4.42.0,4.42.0,4.41.0,4.41.0,4.41.0,4.41.0,4.40.0,4.40.0,4.40.0,4.40.0,4.39.0,4.39.0
,,,,,,,,,,,,,,,,,
PRIMITIVES,.. _primitivelibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`hipCUB <hipcub:index>`,3.4.0,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.1,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
:doc:`hipTensor <hiptensor:index>`,1.5.0,1.5.0,1.5.0,1.4.0,1.4.0,1.4.0,1.4.0,1.3.0,1.3.0,1.3.0,1.3.0,1.2.0,1.2.0,1.2.0,1.2.0,1.1.0,1.1.0
:doc:`rocPRIM <rocprim:index>`,3.4.1,3.4.0,3.4.0,3.3.0,3.3.0,3.3.0,3.3.0,3.2.2,3.2.0,3.2.0,3.2.0,3.1.0,3.1.0,3.1.0,3.1.0,3.0.0,3.0.0
:doc:`rocThrust <rocthrust:index>`,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.1.1,3.1.0,3.1.0,3.0.1,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,3.0.0
,,,,,,,,,,,,,,,,,
SUPPORT LIBS,,,,,,,,,,,,,,,,,
`hipother <https://github.com/ROCm/hipother>`_,6.4.43483,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
`rocm-core <https://github.com/ROCm/rocm-core>`_,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0,6.1.5,6.1.2,6.1.1,6.1.0,6.0.2,6.0.0
`ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,20240607.5.7,20240607.5.7,20240607.4.05,20240607.1.4246,20240125.5.08,20240125.5.08,20240125.5.08,20240125.3.30,20231016.2.245,20231016.2.245
,,,,,,,,,,,,,,,,,
SYSTEM MGMT TOOLS,.. _tools-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`AMD SMI <amdsmi:index>`,25.5.1,25.4.2,25.3.0,24.7.1,24.7.1,24.7.1,24.7.1,24.6.3,24.6.3,24.6.3,24.6.2,24.5.1,24.5.1,24.5.1,24.4.1,23.4.2,23.4.2
:doc:`ROCm Data Center Tool <rdc:index>`,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0
:doc:`rocminfo <rocminfo:index>`,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
:doc:`ROCm SMI <rocm_smi_lib:index>`,7.5.0,7.5.0,7.5.0,7.4.0,7.4.0,7.4.0,7.4.0,7.3.0,7.3.0,7.3.0,7.3.0,7.2.0,7.2.0,7.0.0,7.0.0,6.0.2,6.0.0
:doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.0.60204,1.0.60202,1.0.60201,1.0.60200,1.0.60105,1.0.60102,1.0.60101,1.0.60100,1.0.60002,1.0.60000
,,,,,,,,,,,,,,,,,
PERFORMANCE TOOLS,,,,,,,,,,,,,,,,,
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0
:doc:`ROCm Compute Profiler <rocprofiler-compute:index>`,3.1.1,3.1.0,3.1.0,3.0.0,3.0.0,3.0.0,3.0.0,2.0.1,2.0.1,2.0.1,2.0.1,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`ROCm Systems Profiler <rocprofiler-systems:index>`,1.0.2,1.0.1,1.0.0,0.1.2,0.1.1,0.1.0,0.1.0,1.11.2,1.11.2,1.11.2,1.11.2,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`ROCProfiler <rocprofiler:index>`,2.0.60402,2.0.60401,2.0.60400,2.0.60303,2.0.60302,2.0.60301,2.0.60300,2.0.60204,2.0.60202,2.0.60201,2.0.60200,2.0.60105,2.0.60102,2.0.60101,2.0.60100,2.0.60002,2.0.60000
:doc:`ROCprofiler-SDK <rocprofiler-sdk:index>`,0.6.0,0.6.0,0.6.0,0.5.0,0.5.0,0.5.0,0.5.0,0.4.0,0.4.0,0.4.0,0.4.0,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`ROCTracer <roctracer:index>`,4.1.60402,4.1.60401,4.1.60400,4.1.60303,4.1.60302,4.1.60301,4.1.60300,4.1.60204,4.1.60202,4.1.60201,4.1.60200,4.1.60105,4.1.60102,4.1.60101,4.1.60100,4.1.60002,4.1.60000
,,,,,,,,,,,,,,,,,
DEVELOPMENT TOOLS,,,,,,,,,,,,,,,,,
:doc:`HIPIFY <hipify:index>`,19.0.0,19.0.0,19.0.0,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24455,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
:doc:`ROCm CMake <rocmcmakebuildtools:index>`,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.14.0,0.13.0,0.13.0,0.13.0,0.13.0,0.12.0,0.12.0,0.12.0,0.12.0,0.11.0,0.11.0
:doc:`ROCdbgapi <rocdbgapi:index>`,0.77.2,0.77.2,0.77.2,0.77.0,0.77.0,0.77.0,0.77.0,0.76.0,0.76.0,0.76.0,0.76.0,0.71.0,0.71.0,0.71.0,0.71.0,0.71.0,0.71.0
:doc:`ROCm Debugger (ROCgdb) <rocgdb:index>`,15.2.0,15.2.0,15.2.0,15.2.0,15.2.0,15.2.0,15.2.0,14.2.0,14.2.0,14.2.0,14.2.0,14.1.0,14.1.0,14.1.0,14.1.0,13.2.0,13.2.0
`rocprofiler-register <https://github.com/ROCm/rocprofiler-register>`_,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.4.0,0.3.0,0.3.0,0.3.0,0.3.0,N/A,N/A
:doc:`ROCr Debug Agent <rocr_debug_agent:index>`,2.0.4,2.0.4,2.0.4,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3,2.0.3
,,,,,,,,,,,,,,,,,
COMPILERS,.. _compilers-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
`clang-ocl <https://github.com/ROCm/clang-ocl>`_,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,0.5.0,0.5.0,0.5.0,0.5.0,0.5.0,0.5.0
:doc:`hipCC <hipcc:index>`,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.1.1,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
`Flang <https://github.com/ROCm/flang>`_,19.0.0.25224,19.0.0.25184,19.0.0.25133,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24455,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
:doc:`llvm-project <llvm-project:index>`,19.0.0.25224,19.0.0.25184,19.0.0.25133,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24491,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
`OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_,19.0.0.25224,19.0.0.25184,19.0.0.25133,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24491,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
,,,,,,,,,,,,,,,,,
RUNTIMES,.. _runtime-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,
:doc:`AMD CLR <hip:understand/amd_clr>`,6.4.43484,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
:doc:`HIP <hip:index>`,6.4.43484,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
`OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0,2.0.0
:doc:`ROCr Runtime <rocr-runtime:index>`,1.15.0,1.15.0,1.15.0,1.14.0,1.14.0,1.14.0,1.14.0,1.14.0,1.14.0,1.14.0,1.13.0,1.13.0,1.13.0,1.13.0,1.13.0,1.12.0,1.12.0
1 ROCm Version 6.4.2 6.4.1 6.4.0 6.3.3 6.3.2 6.3.1 6.3.0 6.2.4 6.2.2 6.2.1 6.2.0 6.1.5 6.1.2 6.1.1 6.1.0 6.0.2 6.0.0
2 :ref:`Operating systems & kernels <OS-kernel-versions>` Ubuntu 24.04.2 Ubuntu 24.04.2 Ubuntu 24.04.2 Ubuntu 24.04.2 Ubuntu 24.04.2 Ubuntu 24.04.2 Ubuntu 24.04.2 Ubuntu 24.04.1, 24.04 Ubuntu 24.04.1, 24.04 Ubuntu 24.04.1, 24.04 Ubuntu 24.04
3 Ubuntu 22.04.5 Ubuntu 22.04.5 Ubuntu 22.04.5 Ubuntu 22.04.5 Ubuntu 22.04.5 Ubuntu 22.04.5 Ubuntu 22.04.5 Ubuntu 22.04.5, 22.04.4 Ubuntu 22.04.5, 22.04.4 Ubuntu 22.04.5, 22.04.4 Ubuntu 22.04.5, 22.04.4 Ubuntu 22.04.5, 22.04.4, 22.04.3 Ubuntu 22.04.4, 22.04.3 Ubuntu 22.04.4, 22.04.3 Ubuntu 22.04.4, 22.04.3 Ubuntu 22.04.4, 22.04.3, 22.04.2 Ubuntu 22.04.4, 22.04.3, 22.04.2
4 Ubuntu 20.04.6, 20.04.5 Ubuntu 20.04.6, 20.04.5 Ubuntu 20.04.6, 20.04.5 Ubuntu 20.04.6, 20.04.5 Ubuntu 20.04.6, 20.04.5 Ubuntu 20.04.6, 20.04.5
5 RHEL 9.6, 9.4 RHEL 9.6, 9.5, 9.4 RHEL 9.5, 9.4 RHEL 9.5, 9.4 RHEL 9.5, 9.4 RHEL 9.5, 9.4 RHEL 9.5, 9.4 RHEL 9.4, 9.3 RHEL 9.4, 9.3 RHEL 9.4, 9.3 RHEL 9.4, 9.3 RHEL 9.4, 9.3, 9.2 RHEL 9.4, 9.3, 9.2 RHEL 9.4, 9.3, 9.2 RHEL 9.4, 9.3, 9.2 RHEL 9.3, 9.2 RHEL 9.3, 9.2
6 RHEL 8.10 RHEL 8.10 RHEL 8.10 RHEL 8.10 RHEL 8.10 RHEL 8.10 RHEL 8.10 RHEL 8.10, 8.9 RHEL 8.10, 8.9 RHEL 8.10, 8.9 RHEL 8.10, 8.9 RHEL 8.9, 8.8 RHEL 8.9, 8.8 RHEL 8.9, 8.8 RHEL 8.9, 8.8 RHEL 8.9, 8.8 RHEL 8.9, 8.8
7 SLES 15 SP7, SP6 SLES 15 SP6 SLES 15 SP6 SLES 15 SP6, SP5 SLES 15 SP6, SP5 SLES 15 SP6, SP5 SLES 15 SP6, SP5 SLES 15 SP6, SP5 SLES 15 SP6, SP5 SLES 15 SP6, SP5 SLES 15 SP6, SP5 SLES 15 SP5, SP4 SLES 15 SP5, SP4 SLES 15 SP5, SP4 SLES 15 SP5, SP4 SLES 15 SP5, SP4 SLES 15 SP5, SP4
8 CentOS 7.9 CentOS 7.9 CentOS 7.9 CentOS 7.9 CentOS 7.9
9 Oracle Linux 9, 8 [#mi300x-past-60]_ Oracle Linux 9, 8 [#mi300x-past-60]_ Oracle Linux 9, 8 [#mi300x-past-60]_ Oracle Linux 8.10 [#mi300x-past-60]_ Oracle Linux 8.10 [#mi300x-past-60]_ Oracle Linux 8.10 [#mi300x-past-60]_ Oracle Linux 8.10 [#mi300x-past-60]_ Oracle Linux 8.9 [#mi300x-past-60]_ Oracle Linux 8.9 [#mi300x-past-60]_ Oracle Linux 8.9 [#mi300x-past-60]_ Oracle Linux 8.9 [#mi300x-past-60]_ Oracle Linux 8.9 [#mi300x-past-60]_ Oracle Linux 8.9 [#mi300x-past-60]_ Oracle Linux 8.9 [#mi300x-past-60]_
10 Debian 12 [#single-node-past-60]_ Debian 12 [#single-node-past-60]_ Debian 12 [#single-node-past-60]_ Debian 12 [#single-node-past-60]_ Debian 12 [#single-node-past-60]_ Debian 12 [#single-node-past-60]_
11 Azure Linux 3.0 [#mi300x-past-60]_ Azure Linux 3.0 [#mi300x-past-60]_ Azure Linux 3.0 [#mi300x-past-60]_ Azure Linux 3.0 [#mi300x-past-60]_ Azure Linux 3.0 [#mi300x-past-60]_
12 .. _architecture-support-compatibility-matrix-past-60: .. _architecture-support-compatibility-matrix-past-60:
13 :doc:`Architecture <rocm-install-on-linux:reference/system-requirements>` CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3 CDNA3
14 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2 CDNA2
15 CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA CDNA
16 RDNA4 RDNA4
17 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3 RDNA3
18 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2 RDNA2
19 .. _gpu-support-compatibility-matrix-past-60: .. _gpu-support-compatibility-matrix-past-60:
20 :doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>` gfx1201 [#RDNA-OS-past-60]_ gfx1201 [#RDNA-OS-past-60]_
21 gfx1200 [#RDNA-OS-past-60]_ gfx1200 [#RDNA-OS-past-60]_
22 gfx1101 [#RDNA-OS-past-60]_ [#7700XT-OS-past-60]_ gfx1101 [#RDNA-OS-past-60]_
23 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100 gfx1100
24 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030 gfx1030
25 gfx942 gfx942 gfx942 gfx942 gfx942 gfx942 gfx942 gfx942 [#mi300_624-past-60]_ gfx942 [#mi300_622-past-60]_ gfx942 [#mi300_621-past-60]_ gfx942 [#mi300_620-past-60]_ gfx942 [#mi300_612-past-60]_ gfx942 [#mi300_612-past-60]_ gfx942 [#mi300_611-past-60]_ gfx942 [#mi300_610-past-60]_ gfx942 [#mi300_602-past-60]_ gfx942 [#mi300_600-past-60]_
26 gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a
27 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908 gfx908
28
29 FRAMEWORK SUPPORT .. _framework-support-compatibility-matrix-past-60: .. _framework-support-compatibility-matrix-past-60:
30 :doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>` 2.6, 2.5, 2.4, 2.3 2.6, 2.5, 2.4, 2.3 2.6, 2.5, 2.4, 2.3 2.4, 2.3, 2.2, 1.13 2.4, 2.3, 2.2, 1.13 2.4, 2.3, 2.2, 1.13 2.4, 2.3, 2.2, 2.1, 2.0, 1.13 2.3, 2.2, 2.1, 2.0, 1.13 2.3, 2.2, 2.1, 2.0, 1.13 2.3, 2.2, 2.1, 2.0, 1.13 2.3, 2.2, 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13 2.1, 2.0, 1.13
31 :doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>` 2.18.1, 2.17.1, 2.16.2 2.18.1, 2.17.1, 2.16.2 2.18.1, 2.17.1, 2.16.2 2.17.0, 2.16.2, 2.15.1 2.17.0, 2.16.2, 2.15.1 2.17.0, 2.16.2, 2.15.1 2.17.0, 2.16.2, 2.15.1 2.16.1, 2.15.1, 2.14.1 2.16.1, 2.15.1, 2.14.1 2.16.1, 2.15.1, 2.14.1 2.16.1, 2.15.1, 2.14.1 2.15.0, 2.14.0, 2.13.1 2.15.0, 2.14.0, 2.13.1 2.15.0, 2.14.0, 2.13.1 2.15.0, 2.14.0, 2.13.1 2.14.0, 2.13.1, 2.12.1 2.14.0, 2.13.1, 2.12.1
32 :doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>` 0.4.35 0.4.35 0.4.35 0.4.31 0.4.31 0.4.31 0.4.31 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26 0.4.26
33 `ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_ :doc:`Stanford Megatron-LM <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>` N/A 1.2 N/A 1.2 N/A 1.17.3 N/A 1.17.3 N/A 1.17.3 85f95ae 1.17.3 N/A 1.17.3 N/A 1.17.3 N/A 1.17.3 N/A 1.17.3 N/A 1.17.3 N/A 1.17.3 N/A 1.17.3 N/A 1.17.3 N/A 1.14.1 N/A
34 :doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>` 2.4.0 2.4.0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
35 :doc:`verl <../compatibility/ml-compatibility/verl-compatibility>` [#verl_compat]_ N/A N/A N/A N/A N/A N/A N/A N/A N/A 0.3.0.post0 N/A N/A N/A N/A N/A N/A
36 THIRD PARTY COMMS `ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_ 1.2 .. _thirdpartycomms-support-compatibility-matrix-past-60: 1.2 1.2 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.17.3 1.14.1 1.14.1
37 `UCC <https://github.com/ROCm/ucc>`_ >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.2.0 >=1.2.0
38 `UCX <https://github.com/ROCm/ucx>`_ >=1.15.0 >=1.15.0 >=1.15.0 >=1.15.0 >=1.15.0 >=1.15.0 >=1.15.0 >=1.15.0 >=1.15.0 >=1.15.0 >=1.14.1 >=1.14.1 >=1.14.1 >=1.14.1 >=1.14.1 >=1.14.1
39 THIRD PARTY COMMS .. _thirdpartycomms-support-compatibility-matrix-past-60:
40 THIRD PARTY ALGORITHM `UCC <https://github.com/ROCm/ucc>`_ >=1.3.0 .. _thirdpartyalgorithm-support-compatibility-matrix-past-60: >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.3.0 >=1.2.0 >=1.2.0
41 Thrust `UCX <https://github.com/ROCm/ucx>`_ >=1.15.0 2.5.0 >=1.15.0 2.5.0 >=1.15.0 2.3.2 >=1.15.0 2.3.2 >=1.15.0 2.3.2 >=1.15.0 2.3.2 >=1.15.0 2.2.0 >=1.15.0 2.2.0 >=1.15.0 2.2.0 >=1.15.0 2.2.0 >=1.15.0 2.1.0 >=1.14.1 2.1.0 >=1.14.1 2.1.0 >=1.14.1 2.1.0 >=1.14.1 2.0.1 >=1.14.1 2.0.1 >=1.14.1
42 CUB 2.5.0 2.5.0 2.3.2 2.3.2 2.3.2 2.3.2 2.2.0 2.2.0 2.2.0 2.2.0 2.1.0 2.1.0 2.1.0 2.1.0 2.0.1 2.0.1
43 THIRD PARTY ALGORITHM .. _thirdpartyalgorithm-support-compatibility-matrix-past-60:
44 KMD & USER SPACE [#kfd_support-past-60]_ Thrust 2.5.0 .. _kfd-userspace-support-compatibility-matrix-past-60: 2.5.0 2.5.0 2.3.2 2.3.2 2.3.2 2.3.2 2.2.0 2.2.0 2.2.0 2.2.0 2.1.0 2.1.0 2.1.0 2.1.0 2.0.1 2.0.1
45 :doc:`KMD versions <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>` CUB 2.5.0 6.4.x, 6.3.x, 6.2.x, 6.1.x 2.5.0 6.4.x, 6.3.x, 6.2.x, 6.1.x 2.5.0 6.4.x, 6.3.x, 6.2.x, 6.1.x 2.3.2 6.4.x, 6.3.x, 6.2.x, 6.1.x 2.3.2 6.4.x, 6.3.x, 6.2.x, 6.1.x 2.3.2 6.4.x, 6.3.x, 6.2.x, 6.1.x 2.3.2 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x 2.2.0 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x 2.2.0 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x 2.2.0 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x 2.2.0 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x 2.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x 2.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x 2.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x 2.1.0 6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x 2.0.1 6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x 2.0.1
46
47 ML & COMPUTER VISION KMD & USER SPACE [#kfd_support-past-60]_ .. _kfd-userspace-support-compatibility-matrix-past-60: .. _mllibs-support-compatibility-matrix-past-60:
48 :doc:`Composable Kernel <composable_kernel:index>` :doc:`KMD versions <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>` 6.4.x, 6.3.x, 6.2.x, 6.1.x 1.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x 1.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x 1.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x 1.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x 1.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x 1.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x 1.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x 1.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x 1.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x 1.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x 1.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x 1.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x 1.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x 1.1.0 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x 1.1.0 6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x 1.1.0 6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x
49 :doc:`MIGraphX <amdmigraphx:index>` 2.12.0 2.12.0 2.11.0 2.11.0 2.11.0 2.11.0 2.10.0 2.10.0 2.10.0 2.10.0 2.9.0 2.9.0 2.9.0 2.9.0 2.8.0 2.8.0
50 :doc:`MIOpen <miopen:index>` ML & COMPUTER VISION .. _mllibs-support-compatibility-matrix-past-60: 3.4.0 3.4.0 3.3.0 3.3.0 3.3.0 3.3.0 3.2.0 3.2.0 3.2.0 3.2.0 3.1.0 3.1.0 3.1.0 3.1.0 3.0.0 3.0.0
51 :doc:`MIVisionX <mivisionx:index>` :doc:`Composable Kernel <composable_kernel:index>` 1.1.0 3.2.0 1.1.0 3.2.0 1.1.0 3.1.0 1.1.0 3.1.0 1.1.0 3.1.0 1.1.0 3.1.0 1.1.0 3.0.0 1.1.0 3.0.0 1.1.0 3.0.0 1.1.0 3.0.0 1.1.0 2.5.0 1.1.0 2.5.0 1.1.0 2.5.0 1.1.0 2.5.0 1.1.0 2.5.0 1.1.0 2.5.0 1.1.0
52 :doc:`rocAL <rocal:index>` :doc:`MIGraphX <amdmigraphx:index>` 2.12.0 2.2.0 2.12.0 2.2.0 2.12.0 2.1.0 2.11.0 2.1.0 2.11.0 2.1.0 2.11.0 2.1.0 2.11.0 2.0.0 2.10.0 2.0.0 2.10.0 2.0.0 2.10.0 1.0.0 2.10.0 1.0.0 2.9.0 1.0.0 2.9.0 1.0.0 2.9.0 1.0.0 2.9.0 1.0.0 2.8.0 1.0.0 2.8.0
53 :doc:`rocDecode <rocdecode:index>` :doc:`MIOpen <miopen:index>` 3.4.0 0.10.0 3.4.0 0.10.0 3.4.0 0.8.0 3.3.0 0.8.0 3.3.0 0.8.0 3.3.0 0.8.0 3.3.0 0.6.0 3.2.0 0.6.0 3.2.0 0.6.0 3.2.0 0.6.0 3.2.0 0.6.0 3.1.0 0.6.0 3.1.0 0.5.0 3.1.0 0.5.0 3.1.0 N/A 3.0.0 N/A 3.0.0
54 :doc:`rocJPEG <rocjpeg:index>` :doc:`MIVisionX <mivisionx:index>` 3.2.0 0.8.0 3.2.0 0.8.0 3.2.0 0.6.0 3.1.0 0.6.0 3.1.0 0.6.0 3.1.0 0.6.0 3.1.0 N/A 3.0.0 N/A 3.0.0 N/A 3.0.0 N/A 3.0.0 N/A 2.5.0 N/A 2.5.0 N/A 2.5.0 N/A 2.5.0 N/A 2.5.0 N/A 2.5.0
55 :doc:`rocPyDecode <rocpydecode:index>` :doc:`rocAL <rocal:index>` 2.2.0 0.3.1 2.2.0 0.3.1 2.2.0 0.2.0 2.1.0 0.2.0 2.1.0 0.2.0 2.1.0 0.2.0 2.1.0 0.1.0 2.0.0 0.1.0 2.0.0 0.1.0 2.0.0 0.1.0 1.0.0 N/A 1.0.0 N/A 1.0.0 N/A 1.0.0 N/A 1.0.0 N/A 1.0.0 N/A 1.0.0
56 :doc:`RPP <rpp:index>` :doc:`rocDecode <rocdecode:index>` 0.10.0 1.9.10 0.10.0 1.9.10 0.10.0 1.9.1 0.8.0 1.9.1 0.8.0 1.9.1 0.8.0 1.9.1 0.8.0 1.8.0 0.6.0 1.8.0 0.6.0 1.8.0 0.6.0 1.8.0 0.6.0 1.5.0 0.6.0 1.5.0 0.6.0 1.5.0 0.5.0 1.5.0 0.5.0 1.4.0 N/A 1.4.0 N/A
57 :doc:`rocJPEG <rocjpeg:index>` 0.8.0 0.8.0 0.8.0 0.6.0 0.6.0 0.6.0 0.6.0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
58 COMMUNICATION :doc:`rocPyDecode <rocpydecode:index>` 0.3.1 .. _commlibs-support-compatibility-matrix-past-60: 0.3.1 0.3.1 0.2.0 0.2.0 0.2.0 0.2.0 0.1.0 0.1.0 0.1.0 0.1.0 N/A N/A N/A N/A N/A N/A
59 :doc:`RCCL <rccl:index>` :doc:`RPP <rpp:index>` 1.9.10 2.22.3 1.9.10 2.22.3 1.9.10 2.21.5 1.9.1 2.21.5 1.9.1 2.21.5 1.9.1 2.21.5 1.9.1 2.20.5 1.8.0 2.20.5 1.8.0 2.20.5 1.8.0 2.20.5 1.8.0 2.18.6 1.5.0 2.18.6 1.5.0 2.18.6 1.5.0 2.18.6 1.5.0 2.18.3 1.4.0 2.18.3 1.4.0
60 :doc:`rocSHMEM <rocshmem:index>` 2.0.0 2.0.0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
61 COMMUNICATION .. _commlibs-support-compatibility-matrix-past-60:
62 MATH LIBS :doc:`RCCL <rccl:index>` 2.22.3 .. _mathlibs-support-compatibility-matrix-past-60: 2.22.3 2.22.3 2.21.5 2.21.5 2.21.5 2.21.5 2.20.5 2.20.5 2.20.5 2.20.5 2.18.6 2.18.6 2.18.6 2.18.6 2.18.3 2.18.3
63 `half <https://github.com/ROCm/half>`_ :doc:`rocSHMEM <rocshmem:index>` 2.0.1 1.12.0 2.0.0 1.12.0 2.0.0 1.12.0 N/A 1.12.0 N/A 1.12.0 N/A 1.12.0 N/A 1.12.0 N/A 1.12.0 N/A 1.12.0 N/A 1.12.0 N/A 1.12.0 N/A 1.12.0 N/A 1.12.0 N/A 1.12.0 N/A 1.12.0 N/A 1.12.0 N/A
64 :doc:`hipBLAS <hipblas:index>` 2.4.0 2.4.0 2.3.0 2.3.0 2.3.0 2.3.0 2.2.0 2.2.0 2.2.0 2.2.0 2.1.0 2.1.0 2.1.0 2.1.0 2.0.0 2.0.0
65 :doc:`hipBLASLt <hipblaslt:index>` MATH LIBS .. _mathlibs-support-compatibility-matrix-past-60: 0.12.1 0.12.0 0.10.0 0.10.0 0.10.0 0.10.0 0.8.0 0.8.0 0.8.0 0.8.0 0.7.0 0.7.0 0.7.0 0.7.0 0.6.0 0.6.0
66 :doc:`hipFFT <hipfft:index>` `half <https://github.com/ROCm/half>`_ 1.12.0 1.0.18 1.12.0 1.0.18 1.12.0 1.0.17 1.12.0 1.0.17 1.12.0 1.0.17 1.12.0 1.0.17 1.12.0 1.0.16 1.12.0 1.0.15 1.12.0 1.0.15 1.12.0 1.0.14 1.12.0 1.0.14 1.12.0 1.0.14 1.12.0 1.0.14 1.12.0 1.0.14 1.12.0 1.0.13 1.12.0 1.0.13 1.12.0
67 :doc:`hipfort <hipfort:index>` :doc:`hipBLAS <hipblas:index>` 2.4.0 0.6.0 2.4.0 0.6.0 2.4.0 0.5.1 2.3.0 0.5.1 2.3.0 0.5.0 2.3.0 0.5.0 2.3.0 0.4.0 2.2.0 0.4.0 2.2.0 0.4.0 2.2.0 0.4.0 2.2.0 0.4.0 2.1.0 0.4.0 2.1.0 0.4.0 2.1.0 0.4.0 2.1.0 0.4.0 2.0.0 0.4.0 2.0.0
68 :doc:`hipRAND <hiprand:index>` :doc:`hipBLASLt <hipblaslt:index>` 0.12.1 2.12.0 0.12.1 2.12.0 0.12.0 2.11.1 0.10.0 2.11.1 0.10.0 2.11.1 0.10.0 2.11.0 0.10.0 2.11.1 0.8.0 2.11.0 0.8.0 2.11.0 0.8.0 2.11.0 0.8.0 2.10.16 0.7.0 2.10.16 0.7.0 2.10.16 0.7.0 2.10.16 0.7.0 2.10.16 0.6.0 2.10.16 0.6.0
69 :doc:`hipSOLVER <hipsolver:index>` :doc:`hipFFT <hipfft:index>` 1.0.18 2.4.0 1.0.18 2.4.0 1.0.18 2.3.0 1.0.17 2.3.0 1.0.17 2.3.0 1.0.17 2.3.0 1.0.17 2.2.0 1.0.16 2.2.0 1.0.15 2.2.0 1.0.15 2.2.0 1.0.14 2.1.1 1.0.14 2.1.1 1.0.14 2.1.1 1.0.14 2.1.0 1.0.14 2.0.0 1.0.13 2.0.0 1.0.13
70 :doc:`hipSPARSE <hipsparse:index>` :doc:`hipfort <hipfort:index>` 0.6.0 3.2.0 0.6.0 3.2.0 0.6.0 3.1.2 0.5.1 3.1.2 0.5.1 3.1.2 0.5.0 3.1.2 0.5.0 3.1.1 0.4.0 3.1.1 0.4.0 3.1.1 0.4.0 3.1.1 0.4.0 3.0.1 0.4.0 3.0.1 0.4.0 3.0.1 0.4.0 3.0.1 0.4.0 3.0.0 0.4.0 3.0.0 0.4.0
71 :doc:`hipSPARSELt <hipsparselt:index>` :doc:`hipRAND <hiprand:index>` 2.12.0 0.2.3 2.12.0 0.2.3 2.12.0 0.2.2 2.11.1 0.2.2 2.11.1 0.2.2 2.11.1 0.2.2 2.11.0 0.2.1 2.11.1 0.2.1 2.11.0 0.2.1 2.11.0 0.2.1 2.11.0 0.2.0 2.10.16 0.2.0 2.10.16 0.1.0 2.10.16 0.1.0 2.10.16 0.1.0 2.10.16 0.1.0 2.10.16
72 :doc:`rocALUTION <rocalution:index>` :doc:`hipSOLVER <hipsolver:index>` 2.4.0 3.2.3 2.4.0 3.2.2 2.4.0 3.2.1 2.3.0 3.2.1 2.3.0 3.2.1 2.3.0 3.2.1 2.3.0 3.2.1 2.2.0 3.2.0 2.2.0 3.2.0 2.2.0 3.2.0 2.2.0 3.1.1 2.1.1 3.1.1 2.1.1 3.1.1 2.1.1 3.1.1 2.1.0 3.0.3 2.0.0 3.0.3 2.0.0
73 :doc:`rocBLAS <rocblas:index>` :doc:`hipSPARSE <hipsparse:index>` 3.2.0 4.4.0 3.2.0 4.4.0 3.2.0 4.3.0 3.1.2 4.3.0 3.1.2 4.3.0 3.1.2 4.3.0 3.1.2 4.2.4 3.1.1 4.2.1 3.1.1 4.2.1 3.1.1 4.2.0 3.1.1 4.1.2 3.0.1 4.1.2 3.0.1 4.1.0 3.0.1 4.1.0 3.0.1 4.0.0 3.0.0 4.0.0 3.0.0
74 :doc:`rocFFT <rocfft:index>` :doc:`hipSPARSELt <hipsparselt:index>` 0.2.3 1.0.32 0.2.3 1.0.32 0.2.3 1.0.31 0.2.2 1.0.31 0.2.2 1.0.31 0.2.2 1.0.31 0.2.2 1.0.30 0.2.1 1.0.29 0.2.1 1.0.29 0.2.1 1.0.28 0.2.1 1.0.27 0.2.0 1.0.27 0.2.0 1.0.27 0.1.0 1.0.26 0.1.0 1.0.25 0.1.0 1.0.23 0.1.0
75 :doc:`rocRAND <rocrand:index>` :doc:`rocALUTION <rocalution:index>` 3.2.3 3.3.0 3.2.3 3.3.0 3.2.2 3.2.0 3.2.1 3.2.0 3.2.1 3.2.0 3.2.1 3.2.0 3.2.1 3.1.1 3.2.1 3.1.0 3.2.0 3.1.0 3.2.0 3.1.0 3.2.0 3.0.1 3.1.1 3.0.1 3.1.1 3.0.1 3.1.1 3.0.1 3.1.1 3.0.0 3.0.3 2.10.17 3.0.3
76 :doc:`rocSOLVER <rocsolver:index>` :doc:`rocBLAS <rocblas:index>` 4.4.1 3.28.0 4.4.0 3.28.0 4.4.0 3.27.0 4.3.0 3.27.0 4.3.0 3.27.0 4.3.0 3.27.0 4.3.0 3.26.2 4.2.4 3.26.0 4.2.1 3.26.0 4.2.1 3.26.0 4.2.0 3.25.0 4.1.2 3.25.0 4.1.2 3.25.0 4.1.0 3.25.0 4.1.0 3.24.0 4.0.0 3.24.0 4.0.0
77 :doc:`rocSPARSE <rocsparse:index>` :doc:`rocFFT <rocfft:index>` 1.0.32 3.4.0 1.0.32 3.4.0 1.0.32 3.3.0 1.0.31 3.3.0 1.0.31 3.3.0 1.0.31 3.3.0 1.0.31 3.2.1 1.0.30 3.2.0 1.0.29 3.2.0 1.0.29 3.2.0 1.0.28 3.1.2 1.0.27 3.1.2 1.0.27 3.1.2 1.0.27 3.1.2 1.0.26 3.0.2 1.0.25 3.0.2 1.0.23
78 :doc:`rocWMMA <rocwmma:index>` :doc:`rocRAND <rocrand:index>` 3.3.0 1.7.0 3.3.0 1.7.0 3.3.0 1.6.0 3.2.0 1.6.0 3.2.0 1.6.0 3.2.0 1.6.0 3.2.0 1.5.0 3.1.1 1.5.0 3.1.0 1.5.0 3.1.0 1.5.0 3.1.0 1.4.0 3.0.1 1.4.0 3.0.1 1.4.0 3.0.1 1.4.0 3.0.1 1.3.0 3.0.0 1.3.0 2.10.17
79 :doc:`Tensile <tensile:src/index>` :doc:`rocSOLVER <rocsolver:index>` 3.28.2 4.43.0 3.28.0 4.43.0 3.28.0 4.42.0 3.27.0 4.42.0 3.27.0 4.42.0 3.27.0 4.42.0 3.27.0 4.41.0 3.26.2 4.41.0 3.26.0 4.41.0 3.26.0 4.41.0 3.26.0 4.40.0 3.25.0 4.40.0 3.25.0 4.40.0 3.25.0 4.40.0 3.25.0 4.39.0 3.24.0 4.39.0 3.24.0
80 :doc:`rocSPARSE <rocsparse:index>` 3.4.0 3.4.0 3.4.0 3.3.0 3.3.0 3.3.0 3.3.0 3.2.1 3.2.0 3.2.0 3.2.0 3.1.2 3.1.2 3.1.2 3.1.2 3.0.2 3.0.2
81 PRIMITIVES :doc:`rocWMMA <rocwmma:index>` 1.7.0 .. _primitivelibs-support-compatibility-matrix-past-60: 1.7.0 1.7.0 1.6.0 1.6.0 1.6.0 1.6.0 1.5.0 1.5.0 1.5.0 1.5.0 1.4.0 1.4.0 1.4.0 1.4.0 1.3.0 1.3.0
82 :doc:`hipCUB <hipcub:index>` :doc:`Tensile <tensile:src/index>` 4.43.0 3.4.0 4.43.0 3.4.0 4.43.0 3.3.0 4.42.0 3.3.0 4.42.0 3.3.0 4.42.0 3.3.0 4.42.0 3.2.1 4.41.0 3.2.0 4.41.0 3.2.0 4.41.0 3.2.0 4.41.0 3.1.0 4.40.0 3.1.0 4.40.0 3.1.0 4.40.0 3.1.0 4.40.0 3.0.0 4.39.0 3.0.0 4.39.0
83 :doc:`hipTensor <hiptensor:index>` 1.5.0 1.5.0 1.4.0 1.4.0 1.4.0 1.4.0 1.3.0 1.3.0 1.3.0 1.3.0 1.2.0 1.2.0 1.2.0 1.2.0 1.1.0 1.1.0
84 :doc:`rocPRIM <rocprim:index>` PRIMITIVES .. _primitivelibs-support-compatibility-matrix-past-60: 3.4.0 3.4.0 3.3.0 3.3.0 3.3.0 3.3.0 3.2.2 3.2.0 3.2.0 3.2.0 3.1.0 3.1.0 3.1.0 3.1.0 3.0.0 3.0.0
85 :doc:`rocThrust <rocthrust:index>` :doc:`hipCUB <hipcub:index>` 3.4.0 3.3.0 3.4.0 3.3.0 3.4.0 3.3.0 3.3.0 3.3.0 3.3.0 3.1.1 3.2.1 3.1.0 3.2.0 3.1.0 3.2.0 3.0.1 3.2.0 3.0.1 3.1.0 3.0.1 3.1.0 3.0.1 3.1.0 3.0.1 3.1.0 3.0.0 3.0.0
86 :doc:`hipTensor <hiptensor:index>` 1.5.0 1.5.0 1.5.0 1.4.0 1.4.0 1.4.0 1.4.0 1.3.0 1.3.0 1.3.0 1.3.0 1.2.0 1.2.0 1.2.0 1.2.0 1.1.0 1.1.0
87 SUPPORT LIBS :doc:`rocPRIM <rocprim:index>` 3.4.1 3.4.0 3.4.0 3.3.0 3.3.0 3.3.0 3.3.0 3.2.2 3.2.0 3.2.0 3.2.0 3.1.0 3.1.0 3.1.0 3.1.0 3.0.0 3.0.0
88 `hipother <https://github.com/ROCm/hipother>`_ :doc:`rocThrust <rocthrust:index>` 3.3.0 6.4.43483 3.3.0 6.4.43482 3.3.0 6.3.42134 3.3.0 6.3.42134 3.3.0 6.3.42133 3.3.0 6.3.42131 3.3.0 6.2.41134 3.1.1 6.2.41134 3.1.0 6.2.41134 3.1.0 6.2.41133 3.0.1 6.1.40093 3.0.1 6.1.40093 3.0.1 6.1.40092 3.0.1 6.1.40091 3.0.1 6.1.32831 3.0.0 6.1.32830 3.0.0
89 `rocm-core <https://github.com/ROCm/rocm-core>`_ 6.4.1 6.4.0 6.3.3 6.3.2 6.3.1 6.3.0 6.2.4 6.2.2 6.2.1 6.2.0 6.1.5 6.1.2 6.1.1 6.1.0 6.0.2 6.0.0
90 `ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_ SUPPORT LIBS N/A [#ROCT-rocr-past-60]_ N/A [#ROCT-rocr-past-60]_ N/A [#ROCT-rocr-past-60]_ N/A [#ROCT-rocr-past-60]_ N/A [#ROCT-rocr-past-60]_ N/A [#ROCT-rocr-past-60]_ 20240607.5.7 20240607.5.7 20240607.4.05 20240607.1.4246 20240125.5.08 20240125.5.08 20240125.5.08 20240125.3.30 20231016.2.245 20231016.2.245
91 `hipother <https://github.com/ROCm/hipother>`_ 6.4.43483 6.4.43483 6.4.43482 6.3.42134 6.3.42134 6.3.42133 6.3.42131 6.2.41134 6.2.41134 6.2.41134 6.2.41133 6.1.40093 6.1.40093 6.1.40092 6.1.40091 6.1.32831 6.1.32830
92 SYSTEM MGMT TOOLS `rocm-core <https://github.com/ROCm/rocm-core>`_ 6.4.2 .. _tools-support-compatibility-matrix-past-60: 6.4.1 6.4.0 6.3.3 6.3.2 6.3.1 6.3.0 6.2.4 6.2.2 6.2.1 6.2.0 6.1.5 6.1.2 6.1.1 6.1.0 6.0.2 6.0.0
93 :doc:`AMD SMI <amdsmi:index>` `ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_ N/A [#ROCT-rocr-past-60]_ 25.4.2 N/A [#ROCT-rocr-past-60]_ 25.3.0 N/A [#ROCT-rocr-past-60]_ 24.7.1 N/A [#ROCT-rocr-past-60]_ 24.7.1 N/A [#ROCT-rocr-past-60]_ 24.7.1 N/A [#ROCT-rocr-past-60]_ 24.7.1 N/A [#ROCT-rocr-past-60]_ 24.6.3 20240607.5.7 24.6.3 20240607.5.7 24.6.3 20240607.4.05 24.6.2 20240607.1.4246 24.5.1 20240125.5.08 24.5.1 20240125.5.08 24.5.1 20240125.5.08 24.4.1 20240125.3.30 23.4.2 20231016.2.245 23.4.2 20231016.2.245
94 :doc:`ROCm Data Center Tool <rdc:index>` 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0 0.3.0
95 :doc:`rocminfo <rocminfo:index>` SYSTEM MGMT TOOLS .. _tools-support-compatibility-matrix-past-60: 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0
96 :doc:`ROCm SMI <rocm_smi_lib:index>` :doc:`AMD SMI <amdsmi:index>` 25.5.1 7.5.0 25.4.2 7.5.0 25.3.0 7.4.0 24.7.1 7.4.0 24.7.1 7.4.0 24.7.1 7.4.0 24.7.1 7.3.0 24.6.3 7.3.0 24.6.3 7.3.0 24.6.3 7.3.0 24.6.2 7.2.0 24.5.1 7.2.0 24.5.1 7.0.0 24.5.1 7.0.0 24.4.1 6.0.2 23.4.2 6.0.0 23.4.2
97 :doc:`ROCm Validation Suite <rocmvalidationsuite:index>` :doc:`ROCm Data Center Tool <rdc:index>` 0.3.0 1.1.0 0.3.0 1.1.0 0.3.0 1.1.0 0.3.0 1.1.0 0.3.0 1.1.0 0.3.0 1.1.0 0.3.0 1.0.60204 0.3.0 1.0.60202 0.3.0 1.0.60201 0.3.0 1.0.60200 0.3.0 1.0.60105 0.3.0 1.0.60102 0.3.0 1.0.60101 0.3.0 1.0.60100 0.3.0 1.0.60002 0.3.0 1.0.60000 0.3.0
98 :doc:`rocminfo <rocminfo:index>` 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0
99 PERFORMANCE TOOLS :doc:`ROCm SMI <rocm_smi_lib:index>` 7.5.0 7.5.0 7.5.0 7.4.0 7.4.0 7.4.0 7.4.0 7.3.0 7.3.0 7.3.0 7.3.0 7.2.0 7.2.0 7.0.0 7.0.0 6.0.2 6.0.0
100 :doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>` :doc:`ROCm Validation Suite <rocmvalidationsuite:index>` 1.1.0 1.4.0 1.1.0 1.4.0 1.1.0 1.4.0 1.1.0 1.4.0 1.1.0 1.4.0 1.1.0 1.4.0 1.1.0 1.4.0 1.0.60204 1.4.0 1.0.60202 1.4.0 1.0.60201 1.4.0 1.0.60200 1.4.0 1.0.60105 1.4.0 1.0.60102 1.4.0 1.0.60101 1.4.0 1.0.60100 1.4.0 1.0.60002 1.4.0 1.0.60000
101 :doc:`ROCm Compute Profiler <rocprofiler-compute:index>` 3.1.0 3.1.0 3.0.0 3.0.0 3.0.0 3.0.0 2.0.1 2.0.1 2.0.1 2.0.1 N/A N/A N/A N/A N/A N/A
102 :doc:`ROCm Systems Profiler <rocprofiler-systems:index>` PERFORMANCE TOOLS 1.0.1 1.0.0 0.1.2 0.1.1 0.1.0 0.1.0 1.11.2 1.11.2 1.11.2 1.11.2 N/A N/A N/A N/A N/A N/A
103 :doc:`ROCProfiler <rocprofiler:index>` :doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>` 1.4.0 2.0.60401 1.4.0 2.0.60400 1.4.0 2.0.60303 1.4.0 2.0.60302 1.4.0 2.0.60301 1.4.0 2.0.60300 1.4.0 2.0.60204 1.4.0 2.0.60202 1.4.0 2.0.60201 1.4.0 2.0.60200 1.4.0 2.0.60105 1.4.0 2.0.60102 1.4.0 2.0.60101 1.4.0 2.0.60100 1.4.0 2.0.60002 1.4.0 2.0.60000 1.4.0
104 :doc:`ROCprofiler-SDK <rocprofiler-sdk:index>` :doc:`ROCm Compute Profiler <rocprofiler-compute:index>` 3.1.1 0.6.0 3.1.0 0.6.0 3.1.0 0.5.0 3.0.0 0.5.0 3.0.0 0.5.0 3.0.0 0.5.0 3.0.0 0.4.0 2.0.1 0.4.0 2.0.1 0.4.0 2.0.1 0.4.0 2.0.1 N/A N/A N/A N/A N/A N/A
105 :doc:`ROCTracer <roctracer:index>` :doc:`ROCm Systems Profiler <rocprofiler-systems:index>` 1.0.2 4.1.60401 1.0.1 4.1.60400 1.0.0 4.1.60303 0.1.2 4.1.60302 0.1.1 4.1.60301 0.1.0 4.1.60300 0.1.0 4.1.60204 1.11.2 4.1.60202 1.11.2 4.1.60201 1.11.2 4.1.60200 1.11.2 4.1.60105 N/A 4.1.60102 N/A 4.1.60101 N/A 4.1.60100 N/A 4.1.60002 N/A 4.1.60000 N/A
106 :doc:`ROCProfiler <rocprofiler:index>` 2.0.60402 2.0.60401 2.0.60400 2.0.60303 2.0.60302 2.0.60301 2.0.60300 2.0.60204 2.0.60202 2.0.60201 2.0.60200 2.0.60105 2.0.60102 2.0.60101 2.0.60100 2.0.60002 2.0.60000
107 DEVELOPMENT TOOLS :doc:`ROCprofiler-SDK <rocprofiler-sdk:index>` 0.6.0 0.6.0 0.6.0 0.5.0 0.5.0 0.5.0 0.5.0 0.4.0 0.4.0 0.4.0 0.4.0 N/A N/A N/A N/A N/A N/A
108 :doc:`HIPIFY <hipify:index>` :doc:`ROCTracer <roctracer:index>` 4.1.60402 19.0.0 4.1.60401 19.0.0 4.1.60400 18.0.0.25012 4.1.60303 18.0.0.25012 4.1.60302 18.0.0.24491 4.1.60301 18.0.0.24455 4.1.60300 18.0.0.24392 4.1.60204 18.0.0.24355 4.1.60202 18.0.0.24355 4.1.60201 18.0.0.24232 4.1.60200 17.0.0.24193 4.1.60105 17.0.0.24193 4.1.60102 17.0.0.24154 4.1.60101 17.0.0.24103 4.1.60100 17.0.0.24012 4.1.60002 17.0.0.23483 4.1.60000
109 :doc:`ROCm CMake <rocmcmakebuildtools:index>` 0.14.0 0.14.0 0.14.0 0.14.0 0.14.0 0.14.0 0.13.0 0.13.0 0.13.0 0.13.0 0.12.0 0.12.0 0.12.0 0.12.0 0.11.0 0.11.0
110 :doc:`ROCdbgapi <rocdbgapi:index>` DEVELOPMENT TOOLS 0.77.2 0.77.2 0.77.0 0.77.0 0.77.0 0.77.0 0.76.0 0.76.0 0.76.0 0.76.0 0.71.0 0.71.0 0.71.0 0.71.0 0.71.0 0.71.0
111 :doc:`ROCm Debugger (ROCgdb) <rocgdb:index>` :doc:`HIPIFY <hipify:index>` 19.0.0 15.2.0 19.0.0 15.2.0 19.0.0 15.2.0 18.0.0.25012 15.2.0 18.0.0.25012 15.2.0 18.0.0.24491 15.2.0 18.0.0.24455 14.2.0 18.0.0.24392 14.2.0 18.0.0.24355 14.2.0 18.0.0.24355 14.2.0 18.0.0.24232 14.1.0 17.0.0.24193 14.1.0 17.0.0.24193 14.1.0 17.0.0.24154 14.1.0 17.0.0.24103 13.2.0 17.0.0.24012 13.2.0 17.0.0.23483
112 `rocprofiler-register <https://github.com/ROCm/rocprofiler-register>`_ :doc:`ROCm CMake <rocmcmakebuildtools:index>` 0.14.0 0.4.0 0.14.0 0.4.0 0.14.0 0.4.0 0.14.0 0.4.0 0.14.0 0.4.0 0.14.0 0.4.0 0.14.0 0.4.0 0.13.0 0.4.0 0.13.0 0.4.0 0.13.0 0.4.0 0.13.0 0.3.0 0.12.0 0.3.0 0.12.0 0.3.0 0.12.0 0.3.0 0.12.0 N/A 0.11.0 N/A 0.11.0
113 :doc:`ROCr Debug Agent <rocr_debug_agent:index>` :doc:`ROCdbgapi <rocdbgapi:index>` 0.77.2 2.0.4 0.77.2 2.0.4 0.77.2 2.0.3 0.77.0 2.0.3 0.77.0 2.0.3 0.77.0 2.0.3 0.77.0 2.0.3 0.76.0 2.0.3 0.76.0 2.0.3 0.76.0 2.0.3 0.76.0 2.0.3 0.71.0 2.0.3 0.71.0 2.0.3 0.71.0 2.0.3 0.71.0 2.0.3 0.71.0 2.0.3 0.71.0
114 :doc:`ROCm Debugger (ROCgdb) <rocgdb:index>` 15.2.0 15.2.0 15.2.0 15.2.0 15.2.0 15.2.0 15.2.0 14.2.0 14.2.0 14.2.0 14.2.0 14.1.0 14.1.0 14.1.0 14.1.0 13.2.0 13.2.0
115 COMPILERS `rocprofiler-register <https://github.com/ROCm/rocprofiler-register>`_ 0.4.0 .. _compilers-support-compatibility-matrix-past-60: 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.4.0 0.3.0 0.3.0 0.3.0 0.3.0 N/A N/A
116 `clang-ocl <https://github.com/ROCm/clang-ocl>`_ :doc:`ROCr Debug Agent <rocr_debug_agent:index>` 2.0.4 N/A 2.0.4 N/A 2.0.4 N/A 2.0.3 N/A 2.0.3 N/A 2.0.3 N/A 2.0.3 N/A 2.0.3 N/A 2.0.3 N/A 2.0.3 N/A 2.0.3 0.5.0 2.0.3 0.5.0 2.0.3 0.5.0 2.0.3 0.5.0 2.0.3 0.5.0 2.0.3 0.5.0 2.0.3
117 :doc:`hipCC <hipcc:index>` 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.1.1 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0 1.0.0
118 `Flang <https://github.com/ROCm/flang>`_ COMPILERS .. _compilers-support-compatibility-matrix-past-60: 19.0.0.25184 19.0.0.25133 18.0.0.25012 18.0.0.25012 18.0.0.24491 18.0.0.24455 18.0.0.24392 18.0.0.24355 18.0.0.24355 18.0.0.24232 17.0.0.24193 17.0.0.24193 17.0.0.24154 17.0.0.24103 17.0.0.24012 17.0.0.23483
119 :doc:`llvm-project <llvm-project:index>` `clang-ocl <https://github.com/ROCm/clang-ocl>`_ N/A 19.0.0.25184 N/A 19.0.0.25133 N/A 18.0.0.25012 N/A 18.0.0.25012 N/A 18.0.0.24491 N/A 18.0.0.24491 N/A 18.0.0.24392 N/A 18.0.0.24355 N/A 18.0.0.24355 N/A 18.0.0.24232 N/A 17.0.0.24193 0.5.0 17.0.0.24193 0.5.0 17.0.0.24154 0.5.0 17.0.0.24103 0.5.0 17.0.0.24012 0.5.0 17.0.0.23483 0.5.0
120 `OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_ :doc:`hipCC <hipcc:index>` 1.1.1 19.0.0.25184 1.1.1 19.0.0.25133 1.1.1 18.0.0.25012 1.1.1 18.0.0.25012 1.1.1 18.0.0.24491 1.1.1 18.0.0.24491 1.1.1 18.0.0.24392 1.1.1 18.0.0.24355 1.1.1 18.0.0.24355 1.1.1 18.0.0.24232 1.1.1 17.0.0.24193 1.0.0 17.0.0.24193 1.0.0 17.0.0.24154 1.0.0 17.0.0.24103 1.0.0 17.0.0.24012 1.0.0 17.0.0.23483 1.0.0
121 `Flang <https://github.com/ROCm/flang>`_ 19.0.0.25224 19.0.0.25184 19.0.0.25133 18.0.0.25012 18.0.0.25012 18.0.0.24491 18.0.0.24455 18.0.0.24392 18.0.0.24355 18.0.0.24355 18.0.0.24232 17.0.0.24193 17.0.0.24193 17.0.0.24154 17.0.0.24103 17.0.0.24012 17.0.0.23483
122 RUNTIMES :doc:`llvm-project <llvm-project:index>` 19.0.0.25224 .. _runtime-support-compatibility-matrix-past-60: 19.0.0.25184 19.0.0.25133 18.0.0.25012 18.0.0.25012 18.0.0.24491 18.0.0.24491 18.0.0.24392 18.0.0.24355 18.0.0.24355 18.0.0.24232 17.0.0.24193 17.0.0.24193 17.0.0.24154 17.0.0.24103 17.0.0.24012 17.0.0.23483
123 :doc:`AMD CLR <hip:understand/amd_clr>` `OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_ 19.0.0.25224 6.4.43483 19.0.0.25184 6.4.43482 19.0.0.25133 6.3.42134 18.0.0.25012 6.3.42134 18.0.0.25012 6.3.42133 18.0.0.24491 6.3.42131 18.0.0.24491 6.2.41134 18.0.0.24392 6.2.41134 18.0.0.24355 6.2.41134 18.0.0.24355 6.2.41133 18.0.0.24232 6.1.40093 17.0.0.24193 6.1.40093 17.0.0.24193 6.1.40092 17.0.0.24154 6.1.40091 17.0.0.24103 6.1.32831 17.0.0.24012 6.1.32830 17.0.0.23483
124 :doc:`HIP <hip:index>` 6.4.43483 6.4.43482 6.3.42134 6.3.42134 6.3.42133 6.3.42131 6.2.41134 6.2.41134 6.2.41134 6.2.41133 6.1.40093 6.1.40093 6.1.40092 6.1.40091 6.1.32831 6.1.32830
125 `OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_ RUNTIMES .. _runtime-support-compatibility-matrix-past-60: 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0
126 :doc:`ROCr Runtime <rocr-runtime:index>` :doc:`AMD CLR <hip:understand/amd_clr>` 6.4.43484 1.15.0 6.4.43483 1.15.0 6.4.43482 1.14.0 6.3.42134 1.14.0 6.3.42134 1.14.0 6.3.42133 1.14.0 6.3.42131 1.14.0 6.2.41134 1.14.0 6.2.41134 1.14.0 6.2.41134 1.13.0 6.2.41133 1.13.0 6.1.40093 1.13.0 6.1.40093 1.13.0 6.1.40092 1.13.0 6.1.40091 1.12.0 6.1.32831 1.12.0 6.1.32830
127 :doc:`HIP <hip:index>` 6.4.43484 6.4.43483 6.4.43482 6.3.42134 6.3.42134 6.3.42133 6.3.42131 6.2.41134 6.2.41134 6.2.41134 6.2.41133 6.1.40093 6.1.40093 6.1.40092 6.1.40091 6.1.32831 6.1.32830
128 `OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_ 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0 2.0.0
129 :doc:`ROCr Runtime <rocr-runtime:index>` 1.15.0 1.15.0 1.15.0 1.14.0 1.14.0 1.14.0 1.14.0 1.14.0 1.14.0 1.14.0 1.13.0 1.13.0 1.13.0 1.13.0 1.13.0 1.12.0 1.12.0

View File

@@ -23,14 +23,14 @@ compatibility and system requirements.
.. container:: format-big-table
.. csv-table::
:header: "ROCm Version", "6.4.1", "6.4.0", "6.3.0"
:header: "ROCm Version", "6.4.2", "6.4.1", "6.3.0"
:stub-columns: 1
:ref:`Operating systems & kernels <OS-kernel-versions>`,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2
,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5
,"RHEL 9.6, 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4"
,"RHEL 9.6, 9.4","RHEL 9.6, 9.5, 9.4","RHEL 9.5, 9.4"
,RHEL 8.10,RHEL 8.10,RHEL 8.10
,SLES 15 SP6,SLES 15 SP6,"SLES 15 SP6, SP5"
,"SLES 15 SP7, SP6",SLES 15 SP6,"SLES 15 SP6, SP5"
,"Oracle Linux 9, 8 [#mi300x]_","Oracle Linux 9, 8 [#mi300x]_",Oracle Linux 8.10 [#mi300x]_
,Debian 12 [#single-node]_,Debian 12 [#single-node]_,
,Azure Linux 3.0 [#mi300x]_,Azure Linux 3.0 [#mi300x]_,
@@ -38,13 +38,13 @@ compatibility and system requirements.
:doc:`Architecture <rocm-install-on-linux:reference/system-requirements>`,CDNA3,CDNA3,CDNA3
,CDNA2,CDNA2,CDNA2
,CDNA,CDNA,CDNA
,RDNA4,,
,RDNA4,RDNA4,
,RDNA3,RDNA3,RDNA3
,RDNA2,RDNA2,RDNA2
,.. _gpu-support-compatibility-matrix:,,
:doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>`,gfx1201 [#RDNA-OS]_,,
,gfx1200 [#RDNA-OS]_,,
,gfx1101 [#RDNA-OS]_,,
:doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>`,gfx1201 [#RDNA-OS]_,gfx1201 [#RDNA-OS]_,
,gfx1200 [#RDNA-OS]_,gfx1200 [#RDNA-OS]_,
,gfx1101 [#RDNA-OS]_ [#7700XT-OS]_,gfx1101 [#RDNA-OS]_,
,gfx1100,gfx1100,gfx1100
,gfx1030,gfx1030,gfx1030
,gfx942,gfx942,gfx942
@@ -54,7 +54,9 @@ compatibility and system requirements.
FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix:,,
:doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.4, 2.3, 2.2, 2.1, 2.0, 1.13"
:doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>`,"2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.17.0, 2.16.2, 2.15.1"
:doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.4.35,0.4.35,0.4.31
:doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.4.35,0.4.35,0.4.31
:doc:`Stanford Megatron-LM <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>`,N/A,N/A,85f95ae
:doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>`,2.4.0,2.4.0,N/A
`ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.2,1.2,1.17.3
,,,
THIRD PARTY COMMS,.. _thirdpartycomms-support-compatibility-matrix:,,
@@ -81,23 +83,23 @@ compatibility and system requirements.
,,,
COMMUNICATION,.. _commlibs-support-compatibility-matrix:,,
:doc:`RCCL <rccl:index>`,2.22.3,2.22.3,2.21.5
:doc:`rocSHMEM <rocshmem:index>`,2.0.0,2.0.0,N/A
:doc:`rocSHMEM <rocshmem:index>`,2.0.1,2.0.0,N/A
,,,
MATH LIBS,.. _mathlibs-support-compatibility-matrix:,,
`half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0,1.12.0
:doc:`hipBLAS <hipblas:index>`,2.4.0,2.4.0,2.3.0
:doc:`hipBLASLt <hipblaslt:index>`,0.12.1,0.12.0,0.10.0
:doc:`hipBLASLt <hipblaslt:index>`,0.12.1,0.12.1,0.10.0
:doc:`hipFFT <hipfft:index>`,1.0.18,1.0.18,1.0.17
:doc:`hipfort <hipfort:index>`,0.6.0,0.6.0,0.5.0
:doc:`hipRAND <hiprand:index>`,2.12.0,2.12.0,2.11.0
:doc:`hipSOLVER <hipsolver:index>`,2.4.0,2.4.0,2.3.0
:doc:`hipSPARSE <hipsparse:index>`,3.2.0,3.2.0,3.1.2
:doc:`hipSPARSELt <hipsparselt:index>`,0.2.3,0.2.3,0.2.2
:doc:`rocALUTION <rocalution:index>`,3.2.3,3.2.2,3.2.1
:doc:`rocBLAS <rocblas:index>`,4.4.0,4.4.0,4.3.0
:doc:`rocALUTION <rocalution:index>`,3.2.3,3.2.3,3.2.1
:doc:`rocBLAS <rocblas:index>`,4.4.1,4.4.0,4.3.0
:doc:`rocFFT <rocfft:index>`,1.0.32,1.0.32,1.0.31
:doc:`rocRAND <rocrand:index>`,3.3.0,3.3.0,3.2.0
:doc:`rocSOLVER <rocsolver:index>`,3.28.0,3.28.0,3.27.0
:doc:`rocSOLVER <rocsolver:index>`,3.28.2,3.28.0,3.27.0
:doc:`rocSPARSE <rocsparse:index>`,3.4.0,3.4.0,3.3.0
:doc:`rocWMMA <rocwmma:index>`,1.7.0,1.7.0,1.6.0
:doc:`Tensile <tensile:src/index>`,4.43.0,4.43.0,4.42.0
@@ -105,16 +107,16 @@ compatibility and system requirements.
PRIMITIVES,.. _primitivelibs-support-compatibility-matrix:,,
:doc:`hipCUB <hipcub:index>`,3.4.0,3.4.0,3.3.0
:doc:`hipTensor <hiptensor:index>`,1.5.0,1.5.0,1.4.0
:doc:`rocPRIM <rocprim:index>`,3.4.0,3.4.0,3.3.0
:doc:`rocPRIM <rocprim:index>`,3.4.1,3.4.0,3.3.0
:doc:`rocThrust <rocthrust:index>`,3.3.0,3.3.0,3.3.0
,,,
SUPPORT LIBS,,,
`hipother <https://github.com/ROCm/hipother>`_,6.4.43483,6.4.43482,6.3.42131
`rocm-core <https://github.com/ROCm/rocm-core>`_,6.4.1,6.4.0,6.3.0
`hipother <https://github.com/ROCm/hipother>`_,6.4.43483,6.4.43483,6.3.42131
`rocm-core <https://github.com/ROCm/rocm-core>`_,6.4.2,6.4.1,6.3.0
`ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,N/A [#ROCT-rocr]_,N/A [#ROCT-rocr]_,N/A [#ROCT-rocr]_
,,,
SYSTEM MGMT TOOLS,.. _tools-support-compatibility-matrix:,,
:doc:`AMD SMI <amdsmi:index>`,25.4.2,25.3.0,24.7.1
:doc:`AMD SMI <amdsmi:index>`,25.5.1,25.4.2,24.7.1
:doc:`ROCm Data Center Tool <rdc:index>`,0.3.0,0.3.0,0.3.0
:doc:`rocminfo <rocminfo:index>`,1.0.0,1.0.0,1.0.0
:doc:`ROCm SMI <rocm_smi_lib:index>`,7.5.0,7.5.0,7.4.0
@@ -122,11 +124,11 @@ compatibility and system requirements.
,,,
PERFORMANCE TOOLS,,,
:doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,1.4.0,1.4.0,1.4.0
:doc:`ROCm Compute Profiler <rocprofiler-compute:index>`,3.1.0,3.1.0,3.0.0
:doc:`ROCm Systems Profiler <rocprofiler-systems:index>`,1.0.1,1.0.0,0.1.0
:doc:`ROCProfiler <rocprofiler:index>`,2.0.60401,2.0.60400,2.0.60300
:doc:`ROCm Compute Profiler <rocprofiler-compute:index>`,3.1.1,3.1.0,3.0.0
:doc:`ROCm Systems Profiler <rocprofiler-systems:index>`,1.0.2,1.0.1,0.1.0
:doc:`ROCProfiler <rocprofiler:index>`,2.0.60402,2.0.60401,2.0.60300
:doc:`ROCprofiler-SDK <rocprofiler-sdk:index>`,0.6.0,0.6.0,0.5.0
:doc:`ROCTracer <roctracer:index>`,4.1.60401,4.1.60400,4.1.60300
:doc:`ROCTracer <roctracer:index>`,4.1.60402,4.1.60401,4.1.60300
,,,
DEVELOPMENT TOOLS,,,
:doc:`HIPIFY <hipify:index>`,19.0.0,19.0.0,18.0.0.24455
@@ -139,25 +141,25 @@ compatibility and system requirements.
COMPILERS,.. _compilers-support-compatibility-matrix:,,
`clang-ocl <https://github.com/ROCm/clang-ocl>`_,N/A,N/A,N/A
:doc:`hipCC <hipcc:index>`,1.1.1,1.1.1,1.1.1
`Flang <https://github.com/ROCm/flang>`_,19.0.0.25184,19.0.0.25133,18.0.0.24455
:doc:`llvm-project <llvm-project:index>`,19.0.0.25184,19.0.0.25133,18.0.0.24491
`OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_,19.0.0.25184,19.0.0.25133,18.0.0.24491
`Flang <https://github.com/ROCm/flang>`_,19.0.0.25224,19.0.0.25184,18.0.0.24455
:doc:`llvm-project <llvm-project:index>`,19.0.0.25224,19.0.0.25184,18.0.0.24491
`OpenMP <https://github.com/ROCm/llvm-project/tree/amd-staging/openmp>`_,19.0.0.25224,19.0.0.25184,18.0.0.24491
,,,
RUNTIMES,.. _runtime-support-compatibility-matrix:,,
:doc:`AMD CLR <hip:understand/amd_clr>`,6.4.43483,6.4.43482,6.3.42131
:doc:`HIP <hip:index>`,6.4.43483,6.4.43482,6.3.42131
:doc:`AMD CLR <hip:understand/amd_clr>`,6.4.43484,6.4.43483,6.3.42131
:doc:`HIP <hip:index>`,6.4.43484,6.4.43483,6.3.42131
`OpenCL Runtime <https://github.com/ROCm/clr/tree/develop/opencl>`_,2.0.0,2.0.0,2.0.0
:doc:`ROCr Runtime <rocr-runtime:index>`,1.15.0,1.15.0,1.14.0
.. rubric:: Footnotes
.. [#mi300x] Oracle Linux and Azure Linux are supported only on AMD Instinct MI300X.
.. [#single-node] Debian 12 is supported only on AMD Instinct MI300X for single-node functionality.
.. [#mi300_620] **For ROCm 6.2.0** - MI300X (gfx942) is supported on listed operating systems *except* Ubuntu 22.04.5 [6.8 HWE] and Ubuntu 22.04.4 [6.5 HWE].
.. [#single-node] Debian 12 is supported only on AMD Instinct MI300X for single-node functionality.
.. [#RDNA-OS] Radeon AI PRO R9700, Radeon RX 9070 XT (gfx1201), Radeon RX 9060 XT (gfx1200), Radeon PRO W7700 (gfx1101), and Radeon RX 7800 XT (gfx1101) are supported only on Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.6, and RHEL 9.4.
.. [#7700XT-OS] Radeon RX 7700 XT (gfx1101) is supported only on Ubuntu 24.04.2 and RHEL 9.6.
.. [#kfd_support] As of ROCm 6.4.0, forward and backward compatibility between the AMD Kernel-mode GPU Driver (KMD) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The tested user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and kernel-space support matrix <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html>`_.
.. [#ROCT-rocr] Starting from ROCm 6.3.0, the ROCT Thunk Interface is included as part of the ROCr runtime package.
.. [#RDNA-OS] Radeon AI PRO R9700, Radeon RX 9070 XT (gfx1201), Radeon RX 9060 XT (gfx1200), Radeon PRO W7700 (gfx1101), and Radeon RX 7800 XT (gfx1101) are supported only on Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.6, RHEL 9.5, and RHEL 9.4.
.. _OS-kernel-versions:
@@ -176,14 +178,15 @@ Use this lookup table to confirm which operating system and kernel versions are
`Ubuntu <https://ubuntu.com/about/release-cycle#ubuntu-kernel-release-cycle>`_, 22.04.5, "5.15 GA, 6.8 HWE", 2.35
,,
`Red Hat Enterprise Linux (RHEL 9) <https://access.redhat.com/articles/3078#RHEL9>`_, 9.6, 5.14+, 2.34
, 9.5, 5.14+, 2.34
,9.5, 5.14+, 2.34
,9.4, 5.14+, 2.34
,9.3, 5.14+, 2.34
,,
`Red Hat Enterprise Linux (RHEL 8) <https://access.redhat.com/articles/3078#RHEL8>`_, 8.10, 4.18.0+, 2.28
,8.9, 4.18.0, 2.28
,,
`SUSE Linux Enterprise Server (SLES) <https://www.suse.com/support/kb/doc/?id=000019587#SLE15SP4>`_, 15 SP6, "6.5.0+, 6.4.0", 2.38
`SUSE Linux Enterprise Server (SLES) <https://www.suse.com/support/kb/doc/?id=000019587#SLE15SP4>`_, 15 SP7, 6.11.0+, 2.38
,15 SP6, "6.5.0+, 6.4.0", 2.38
,15 SP5, 5.14.21, 2.31
,,
`Oracle Linux <https://blogs.oracle.com/scoter/post/oracle-linux-and-unbreakable-enterprise-kernel-uek-releases>`_, 9, 5.15.0 (UEK), 2.35
@@ -225,7 +228,9 @@ Expand for full historical view of:
.. rubric:: Footnotes
.. [#mi300x-past-60] Oracle Linux and Azure Linux are supported only on AMD Instinct MI300X.
.. [#single-node-past-60] Debian 12 is supported only on AMD Instinct MI300X for single-node functionality.
.. [#single-node-past-60] Debian 12 is supported only on AMD Instinct MI300X for single-node functionality.
.. [#RDNA-OS-past-60] Radeon AI PRO R9700, Radeon RX 9070 XT (gfx1201), Radeon RX 9060 XT (gfx1200), Radeon PRO W7700 (gfx1101), and Radeon RX 7800 XT (gfx1101) are supported only on Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.6, and RHEL 9.4.
.. [#7700XT-OS-past-60] Radeon RX 7700 XT (gfx1101) is supported only on Ubuntu 24.04.2 and RHEL 9.6.
.. [#mi300_624-past-60] **For ROCm 6.2.4** - MI300X (gfx942) is supported on listed operating systems *except* Ubuntu 22.04.5 [6.8 HWE] and Ubuntu 22.04.4 [6.5 HWE].
.. [#mi300_622-past-60] **For ROCm 6.2.2** - MI300X (gfx942) is supported on listed operating systems *except* Ubuntu 22.04.5 [6.8 HWE] and Ubuntu 22.04.4 [6.5 HWE].
.. [#mi300_621-past-60] **For ROCm 6.2.1** - MI300X (gfx942) is supported on listed operating systems *except* Ubuntu 22.04.5 [6.8 HWE] and Ubuntu 22.04.4 [6.5 HWE].
@@ -235,6 +240,7 @@ Expand for full historical view of:
.. [#mi300_610-past-60] **For ROCm 6.1.0** - MI300A (gfx942) is supported on Ubuntu 22.04.4, RHEL 9.4, RHEL 9.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.4.
.. [#mi300_602-past-60] **For ROCm 6.0.2** - MI300A (gfx942) is supported on Ubuntu 22.04.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.3.
.. [#mi300_600-past-60] **For ROCm 6.0.0** - MI300A (gfx942) is supported on Ubuntu 22.04.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.3.
.. [#verl_compat] verl is only supported on ROCm 6.2.0.
.. [#kfd_support-past-60] As of ROCm 6.4.0, forward and backward compatibility between the AMD Kernel-mode GPU Driver (KMD) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The tested user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and kernel-space support matrix <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html>`_.
.. [#ROCT-rocr-past-60] Starting from ROCm 6.3.0, the ROCT Thunk Interface is included as part of the ROCr runtime package.
.. [#RDNA-OS-past-60] Radeon AI PRO R9700, Radeon RX 9070 XT (gfx1201), Radeon RX 9060 XT (gfx1200), Radeon PRO W7700 (gfx1101), and Radeon RX 7800 XT (gfx1101) are supported only on Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.6, RHEL 9.5, and RHEL 9.4.

View File

@@ -0,0 +1,255 @@
:orphan:
.. meta::
:description: Deep Graph Library (DGL) compatibility
:keywords: GPU, DGL compatibility
.. version-set:: rocm_version latest
********************************************************************************
DGL compatibility
********************************************************************************
Deep Graph Library `(DGL) <https://www.dgl.ai/>`_ is an easy-to-use, high-performance and scalable
Python package for deep learning on graphs. DGL is framework agnostic, meaning
if a deep graph model is a component in an end-to-end application, the rest of
the logic is implemented using PyTorch.
* ROCm support for DGL is hosted in the `https://github.com/ROCm/dgl <https://github.com/ROCm/dgl>`_ repository.
* Due to independent compatibility considerations, this location differs from the `https://github.com/dmlc/dgl <https://github.com/dmlc/dgl>`_ upstream repository.
* Use the prebuilt :ref:`Docker images <dgl-docker-compat>` with DGL, PyTorch, and ROCm preinstalled.
* See the :doc:`ROCm DGL installation guide <rocm-install-on-linux:install/3rd-party/dgl-install>`
to install and get started.
Supported devices
================================================================================
- **Officially Supported**: TF32 with AMD Instinct MI300X (through hipblaslt)
- **Partially Supported**: TF32 with AMD Instinct MI250X
.. _dgl-recommendations:
Use cases and recommendations
================================================================================
DGL can be used for Graph Learning, and building popular graph models like
GAT, GCN and GraphSage. Using these we can support a variety of use-cases such as:
- Recommender systems
- Network Optimization and Analysis
- 1D (Temporal) and 2D (Image) Classification
- Drug Discovery
Multiple use cases of DGL have been tested and verified.
However, a recommended example follows a drug discovery pipeline using the ``SE3Transformer``.
Refer to the `AMD ROCm blog <https://rocm.blogs.amd.com/>`_,
where you can search for DGL examples and best practices to optimize your training workflows on AMD GPUs.
Coverage includes:
- Single-GPU training/inference
- Multi-GPU training
.. _dgl-docker-compat:
Docker image compatibility
================================================================================
.. |docker-icon| raw:: html
<i class="fab fa-docker"></i>
AMD validates and publishes `DGL images <https://hub.docker.com/r/rocm/dgl>`_
with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated
inventories were tested on `ROCm 6.4.0 <https://repo.radeon.com/rocm/apt/6.4/>`_.
Click the |docker-icon| to view the image on Docker Hub.
.. list-table:: DGL Docker image components
:header-rows: 1
:class: docker-image-compatibility
* - Docker
- DGL
- PyTorch
- Ubuntu
- Python
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/dgl/dgl-2.4_rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.6.0/images/sha256-8ce2c3bcfaa137ab94a75f9e2ea711894748980f57417739138402a542dd5564"><i class="fab fa-docker fa-lg"></i></a>
- `2.4.0 <https://github.com/dmlc/dgl/releases/tag/v2.4.0>`_
- `2.6.0 <https://github.com/ROCm/pytorch/tree/release/2.6>`_
- 24.04
- `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/dgl/dgl-2.4_rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1/images/sha256-cf1683283b8eeda867b690229c8091c5bbf1edb9f52e8fb3da437c49a612ebe4"><i class="fab fa-docker fa-lg"></i></a>
- `2.4.0 <https://github.com/dmlc/dgl/releases/tag/v2.4.0>`_
- `2.4.1 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
- 24.04
- `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/dgl/dgl-2.4_rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.4.1/images/sha256-4834f178c3614e2d09e89e32041db8984c456d45dfd20286e377ca8635686554"><i class="fab fa-docker fa-lg"></i></a>
- `2.4.0 <https://github.com/dmlc/dgl/releases/tag/v2.4.0>`_
- `2.4.1 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
- 22.04
- `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/dgl/dgl-2.4_rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.3.0/images/sha256-88740a2c8ab4084b42b10c3c6ba984cab33dd3a044f479c6d7618e2b2cb05e69"><i class="fab fa-docker fa-lg"></i></a>
- `2.4.0 <https://github.com/dmlc/dgl/releases/tag/v2.4.0>`_
- `2.3.0 <https://github.com/ROCm/pytorch/tree/release/2.3>`_
- 22.04
- `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
Key ROCm libraries for DGL
================================================================================
DGL on ROCm depends on specific libraries that affect its features and performance.
Using the DGL Docker container or building it with the provided docker file or a ROCm base image is recommended.
If you prefer to build it yourself, ensure the following dependencies are installed:
.. list-table::
:header-rows: 1
* - ROCm library
- Version
- Purpose
* - `Composable Kernel <https://github.com/ROCm/composable_kernel>`_
- :version-ref:`"Composable Kernel" rocm_version`
- Enables faster execution of core operations like matrix multiplication
(GEMM), convolutions and transformations.
* - `hipBLAS <https://github.com/ROCm/hipBLAS>`_
- :version-ref:`hipBLAS rocm_version`
- Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for
matrix and vector operations.
* - `hipBLASLt <https://github.com/ROCm/hipBLASLt>`_
- :version-ref:`hipBLASLt rocm_version`
- hipBLASLt is an extension of the hipBLAS library, providing additional
features like epilogues fused into the matrix multiplication kernel or
use of integer tensor cores.
* - `hipCUB <https://github.com/ROCm/hipCUB>`_
- :version-ref:`hipCUB rocm_version`
- Provides a C++ template library for parallel algorithms for reduction,
scan, sort and select.
* - `hipFFT <https://github.com/ROCm/hipFFT>`_
- :version-ref:`hipFFT rocm_version`
- Provides GPU-accelerated Fast Fourier Transform (FFT) operations.
* - `hipRAND <https://github.com/ROCm/hipRAND>`_
- :version-ref:`hipRAND rocm_version`
- Provides fast random number generation for GPUs.
* - `hipSOLVER <https://github.com/ROCm/hipSOLVER>`_
- :version-ref:`hipSOLVER rocm_version`
- Provides GPU-accelerated solvers for linear systems, eigenvalues, and
singular value decompositions (SVD).
* - `hipSPARSE <https://github.com/ROCm/hipSPARSE>`_
- :version-ref:`hipSPARSE rocm_version`
- Accelerates operations on sparse matrices, such as sparse matrix-vector
or matrix-matrix products.
* - `hipSPARSELt <https://github.com/ROCm/hipSPARSELt>`_
- :version-ref:`hipSPARSELt rocm_version`
- Accelerates operations on sparse matrices, such as sparse matrix-vector
or matrix-matrix products.
* - `hipTensor <https://github.com/ROCm/hipTensor>`_
- :version-ref:`hipTensor rocm_version`
- Optimizes for high-performance tensor operations, such as contractions.
* - `MIOpen <https://github.com/ROCm/MIOpen>`_
- :version-ref:`MIOpen rocm_version`
- Optimizes deep learning primitives such as convolutions, pooling,
normalization, and activation functions.
* - `MIGraphX <https://github.com/ROCm/AMDMIGraphX>`_
- :version-ref:`MIGraphX rocm_version`
- Adds graph-level optimizations, ONNX models and mixed precision support
and enable Ahead-of-Time (AOT) Compilation.
* - `MIVisionX <https://github.com/ROCm/MIVisionX>`_
- :version-ref:`MIVisionX rocm_version`
- Optimizes acceleration for computer vision and AI workloads like
preprocessing, augmentation, and inferencing.
* - `rocAL <https://github.com/ROCm/rocAL>`_
- :version-ref:`rocAL rocm_version`
- Accelerates the data pipeline by offloading intensive preprocessing and
augmentation tasks. rocAL is part of MIVisionX.
* - `RCCL <https://github.com/ROCm/rccl>`_
- :version-ref:`RCCL rocm_version`
- Optimizes for multi-GPU communication for operations like AllReduce and
Broadcast.
* - `rocDecode <https://github.com/ROCm/rocDecode>`_
- :version-ref:`rocDecode rocm_version`
- Provides hardware-accelerated data decoding capabilities, particularly
for image, video, and other dataset formats.
* - `rocJPEG <https://github.com/ROCm/rocJPEG>`_
- :version-ref:`rocJPEG rocm_version`
- Provides hardware-accelerated JPEG image decoding and encoding.
* - `RPP <https://github.com/ROCm/RPP>`_
- :version-ref:`RPP rocm_version`
- Speeds up data augmentation, transformation, and other preprocessing steps.
* - `rocThrust <https://github.com/ROCm/rocThrust>`_
- :version-ref:`rocThrust rocm_version`
- Provides a C++ template library for parallel algorithms like sorting,
reduction, and scanning.
* - `rocWMMA <https://github.com/ROCm/rocWMMA>`_
- :version-ref:`rocWMMA rocm_version`
- Accelerates warp-level matrix-multiply and matrix-accumulate to speed up matrix
multiplication (GEMM) and accumulation operations with mixed precision
support.
Supported features
================================================================================
Many functions and methods available in DGL Upstream are also supported in DGL ROCm.
Instead of listing them all, support is grouped into the following categories to provide a general overview.
* DGL Base
* DGL Backend
* DGL Data
* DGL Dataloading
* DGL DGLGraph
* DGL Function
* DGL Ops
* DGL Sampling
* DGL Transforms
* DGL Utils
* DGL Distributed
* DGL Geometry
* DGL Mpops
* DGL NN
* DGL Optim
* DGL Sparse
Unsupported features
================================================================================
* Graphbolt
* Partial TF32 Support (MI250x only)
* Kineto/ ROCTracer integration
Unsupported functions
================================================================================
* ``more_nnz``
* ``format``
* ``multiprocess_sparse_adam_state_dict``
* ``record_stream_ndarray``
* ``half_spmm``
* ``segment_mm``
* ``gather_mm_idx_b``
* ``pgexplainer``
* ``sample_labors_prob``
* ``sample_labors_noprob``

View File

@@ -53,7 +53,7 @@ Use cases and recommendations
* The `nanoGPT in JAX <https://rocm.blogs.amd.com/artificial-intelligence/nanoGPT-JAX/README.html>`_
blog explores the implementation and training of a Generative Pre-trained
Transformer (GPT) model in JAX, inspired by Andrej Karpathys JAX-based
nanoGPT. Comparing how essential GPT components—such as self-attention
nanoGPT. Comparing how essential GPT components—such as self-attention
mechanisms and optimizers—are realized in JAX and JAX, also highlights
JAXs unique features.
@@ -160,12 +160,14 @@ associated inventories are tested for `ROCm 6.3.2 <https://repo.radeon.com/rocm/
- Ubuntu 22.04
- `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
.. _key_rocm_libraries:
Key ROCm libraries for JAX
================================================================================
JAX functionality on ROCm is determined by its underlying library
dependencies. These ROCm components affect the capabilities, performance, and
feature set available to developers.
The following ROCm libraries represent potential targets that could be utilized
by JAX on ROCm for various computational tasks. The actual libraries used will
depend on the specific implementation and operations performed.
.. list-table::
:header-rows: 1
@@ -173,347 +175,140 @@ feature set available to developers.
* - ROCm library
- Version
- Purpose
- Used in
* - `hipBLAS <https://github.com/ROCm/hipBLAS>`_
- :version-ref:`hipBLAS rocm_version`
- Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for
matrix and vector operations.
- Matrix multiplication in ``jax.numpy.matmul``, ``jax.lax.dot`` and
``jax.lax.dot_general``, operations like ``jax.numpy.dot``, which
involve vector and matrix computations and batch matrix multiplications
``jax.numpy.einsum`` with matrix-multiplication patterns algebra
operations.
* - `hipBLASLt <https://github.com/ROCm/hipBLASLt>`_
- :version-ref:`hipBLASLt rocm_version`
- hipBLASLt is an extension of hipBLAS, providing additional
features like epilogues fused into the matrix multiplication kernel or
use of integer tensor cores.
- Matrix multiplication in ``jax.numpy.matmul`` or ``jax.lax.dot``, and
the XLA (Accelerated Linear Algebra) use hipBLASLt for optimized matrix
operations, mixed-precision support, and hardware-specific
optimizations.
* - `hipCUB <https://github.com/ROCm/hipCUB>`_
- :version-ref:`hipCUB rocm_version`
- Provides a C++ template library for parallel algorithms for reduction,
scan, sort and select.
- Reduction functions (``jax.numpy.sum``, ``jax.numpy.mean``,
``jax.numpy.prod``, ``jax.numpy.max`` and ``jax.numpy.min``), prefix sum
(``jax.numpy.cumsum``, ``jax.numpy.cumprod``) and sorting
(``jax.numpy.sort``, ``jax.numpy.argsort``).
* - `hipFFT <https://github.com/ROCm/hipFFT>`_
- :version-ref:`hipFFT rocm_version`
- Provides GPU-accelerated Fast Fourier Transform (FFT) operations.
- Used in functions like ``jax.numpy.fft``.
* - `hipRAND <https://github.com/ROCm/hipRAND>`_
- :version-ref:`hipRAND rocm_version`
- Provides fast random number generation for GPUs.
- The ``jax.random.uniform``, ``jax.random.normal``,
``jax.random.randint`` and ``jax.random.split``.
* - `hipSOLVER <https://github.com/ROCm/hipSOLVER>`_
- :version-ref:`hipSOLVER rocm_version`
- Provides GPU-accelerated solvers for linear systems, eigenvalues, and
singular value decompositions (SVD).
- Solving linear systems (``jax.numpy.linalg.solve``), matrix
factorizations, SVD (``jax.numpy.linalg.svd``) and eigenvalue problems
(``jax.numpy.linalg.eig``).
* - `hipSPARSE <https://github.com/ROCm/hipSPARSE>`_
- :version-ref:`hipSPARSE rocm_version`
- Accelerates operations on sparse matrices, such as sparse matrix-vector
or matrix-matrix products.
- Sparse matrix multiplication (``jax.numpy.matmul``), sparse
matrix-vector and matrix-matrix products
(``jax.experimental.sparse.dot``), sparse linear system solvers and
sparse data handling.
* - `hipSPARSELt <https://github.com/ROCm/hipSPARSELt>`_
- :version-ref:`hipSPARSELt rocm_version`
- Accelerates operations on sparse matrices, such as sparse matrix-vector
or matrix-matrix products.
- Sparse matrix multiplication (``jax.numpy.matmul``), sparse
matrix-vector and matrix-matrix products
(``jax.experimental.sparse.dot``) and sparse linear system solvers.
* - `MIOpen <https://github.com/ROCm/MIOpen>`_
- :version-ref:`MIOpen rocm_version`
- Optimized for deep learning primitives such as convolutions, pooling,
normalization, and activation functions.
- Speeds up convolutional neural networks (CNNs), recurrent neural
networks (RNNs), and other layers. Used in operations like
``jax.nn.conv``, ``jax.nn.relu``, and ``jax.nn.batch_norm``.
* - `RCCL <https://github.com/ROCm/rccl>`_
- :version-ref:`RCCL rocm_version`
- Optimized for multi-GPU communication for operations like all-reduce,
broadcast, and scatter.
- Distribute computations across multiple GPU with ``pmap`` and
``jax.distributed``. XLA automatically uses rccl when executing
operations across multiple GPUs on AMD hardware.
* - `rocThrust <https://github.com/ROCm/rocThrust>`_
- :version-ref:`rocThrust rocm_version`
- Provides a C++ template library for parallel algorithms like sorting,
reduction, and scanning.
- Reduction operations like ``jax.numpy.sum``, ``jax.pmap`` for
distributed training, which involves parallel reductions or
operations like ``jax.numpy.cumsum`` can use rocThrust.
Supported features
.. note::
This table shows ROCm libraries that could potentially be utilized by JAX. Not
all libraries may be used in every configuration, and the actual library usage
will depend on the specific operations and implementation details.
Supported data types and modules
===============================================================================
The following table maps the public JAX API modules to their supported
ROCm and JAX versions.
The following tables lists the supported public JAX API data types and modules.
Supported data types
--------------------------------------------------------------------------------
ROCm supports all the JAX data types of `jax.dtypes <https://docs.jax.dev/en/latest/jax.dtypes.html>`_
module, `jax.numpy.dtype <https://docs.jax.dev/en/latest/_autosummary/jax.numpy.dtype.html>`_
and `default_dtype <https://docs.jax.dev/en/latest/default_dtypes.html>`_ .
The ROCm supported data types in JAX are collected in the following table.
.. list-table::
:header-rows: 1
* - Module
- Description
- As of JAX
- As of ROCm
* - ``jax.numpy``
- Implements the NumPy API, using the primitives in ``jax.lax``.
- 0.1.56
- 5.0.0
* - ``jax.scipy``
- Provides GPU-accelerated and differentiable implementations of many
functions from the SciPy library, leveraging JAX's transformations
(e.g., ``grad``, ``jit``, ``vmap``).
- 0.1.56
- 5.0.0
* - ``jax.lax``
- A library of primitives operations that underpins libraries such as
``jax.numpy.`` Transformation rules, such as Jacobian-vector product
(JVP) and batching rules, are typically defined as transformations on
``jax.lax`` primitives.
- 0.1.57
- 5.0.0
* - ``jax.random``
- Provides a number of routines for deterministic generation of sequences
of pseudorandom numbers.
- 0.1.58
- 5.0.0
* - ``jax.sharding``
- Allows to define partitioning and distributing arrays across multiple
devices.
- 0.3.20
- 5.1.0
* - ``jax.distributed``
- Enables the scaling of computations across multiple devices on a single
machine or across multiple machines.
- 0.1.74
- 5.0.0
* - ``jax.image``
- Contains image manipulation functions like resize, scale and translation.
- 0.1.57
- 5.0.0
* - ``jax.nn``
- Contains common functions for neural network libraries.
- 0.1.56
- 5.0.0
* - ``jax.ops``
- Computes the minimum, maximum, sum or product within segments of an
array.
- 0.1.57
- 5.0.0
* - ``jax.stages``
- Contains interfaces to stages of the compiled execution process.
- 0.3.4
- 5.0.0
* - ``jax.extend``
- Provides modules for access to JAX internal machinery module. The
``jax.extend`` module defines a library view of some of JAXs internal
components.
- 0.4.15
- 5.5.0
* - ``jax.example_libraries``
- Serves as a collection of example code and libraries that demonstrate
various capabilities of JAX.
- 0.1.74
- 5.0.0
* - ``jax.experimental``
- Namespace for experimental features and APIs that are in development or
are not yet fully stable for production use.
- 0.1.56
- 5.0.0
* - ``jax.lib``
- Set of internal tools and types for bridging between JAXs Python
frontend and its XLA backend.
- 0.4.6
- 5.3.0
* - ``jax_triton``
- Library that integrates the Triton deep learning compiler with JAX.
- jax_triton 0.2.0
- 6.2.4
jax.scipy module
-------------------------------------------------------------------------------
A SciPy-like API for scientific computing.
.. list-table::
:header-rows: 1
* - Module
- As of JAX
- As of ROCm
* - ``jax.scipy.cluster``
- 0.3.11
- 5.1.0
* - ``jax.scipy.fft``
- 0.1.71
- 5.0.0
* - ``jax.scipy.integrate``
- 0.4.15
- 5.5.0
* - ``jax.scipy.interpolate``
- 0.1.76
- 5.0.0
* - ``jax.scipy.linalg``
- 0.1.56
- 5.0.0
* - ``jax.scipy.ndimage``
- 0.1.56
- 5.0.0
* - ``jax.scipy.optimize``
- 0.1.57
- 5.0.0
* - ``jax.scipy.signal``
- 0.1.56
- 5.0.0
* - ``jax.scipy.spatial.transform``
- 0.4.12
- 5.4.0
* - ``jax.scipy.sparse.linalg``
- 0.1.56
- 5.0.0
* - ``jax.scipy.special``
- 0.1.56
- 5.0.0
* - ``jax.scipy.stats``
- 0.1.56
- 5.0.0
jax.scipy.stats module
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. list-table::
:header-rows: 1
* - Module
- As of JAX
- As of ROCm
* - ``jax.scipy.stats.bernouli``
- 0.1.56
- 5.0.0
* - ``jax.scipy.stats.beta``
- 0.1.56
- 5.0.0
* - ``jax.scipy.stats.betabinom``
- 0.1.61
- 5.0.0
* - ``jax.scipy.stats.binom``
- 0.4.14
- 5.4.0
* - ``jax.scipy.stats.cauchy``
- 0.1.56
- 5.0.0
* - ``jax.scipy.stats.chi2``
- 0.1.61
- 5.0.0
* - ``jax.scipy.stats.dirichlet``
- 0.1.56
- 5.0.0
* - ``jax.scipy.stats.expon``
- 0.1.56
- 5.0.0
* - ``jax.scipy.stats.gamma``
- 0.1.56
- 5.0.0
* - ``jax.scipy.stats.gennorm``
- 0.3.15
- 5.2.0
* - ``jax.scipy.stats.geom``
- 0.1.56
- 5.0.0
* - ``jax.scipy.stats.laplace``
- 0.1.56
- 5.0.0
* - ``jax.scipy.stats.logistic``
- 0.1.56
- 5.0.0
* - ``jax.scipy.stats.multinomial``
- 0.3.18
- 5.1.0
* - ``jax.scipy.stats.multivariate_normal``
- 0.1.56
- 5.0.0
* - ``jax.scipy.stats.nbinom``
- 0.1.72
- 5.0.0
* - ``jax.scipy.stats.norm``
- 0.1.56
- 5.0.0
* - ``jax.scipy.stats.pareto``
- 0.1.56
- 5.0.0
* - ``jax.scipy.stats.poisson``
- 0.1.56
- 5.0.0
* - ``jax.scipy.stats.t``
- 0.1.56
- 5.0.0
* - ``jax.scipy.stats.truncnorm``
- 0.4.0
- 5.3.0
* - ``jax.scipy.stats.uniform``
- 0.1.56
- 5.0.0
* - ``jax.scipy.stats.vonmises``
- 0.4.2
- 5.3.0
* - ``jax.scipy.stats.wrapcauchy``
- 0.4.20
- 5.6.0
jax.extend module
-------------------------------------------------------------------------------
Modules for JAX extensions.
.. list-table::
:header-rows: 1
* - Module
- As of JAX
- As of ROCm
* - ``jax.extend.ffi``
- 0.4.30
- 6.0.0
* - ``jax.extend.linear_util``
- 0.4.17
- 5.6.0
* - ``jax.extend.mlir``
- 0.4.26
- 5.6.0
* - ``jax.extend.random``
- 0.4.15
- 5.5.0
Unsupported JAX features
===============================================================================
The following GPU-accelerated JAX features are not supported by ROCm for
the listed supported JAX versions.
.. list-table::
:header-rows: 1
* - Feature
* - Data type
- Description
* - Mixed Precision with TF32
- Mixed precision with TF32 is used for matrix multiplications,
convolutions, and other linear algebra operations, particularly in
deep learning workloads like CNNs and transformers.
* - ``bfloat16``
- 16-bit bfloat (brain floating point).
* - XLA int4 support
- 4-bit integer (int4) precision in the XLA compiler.
* - ``bool``
- Boolean.
* - MOSAIC (GPU)
- Mosaic is a library of kernel-building abstractions for JAX's Pallas system
* - ``complex128``
- 128-bit complex.
* - ``complex64``
- 64-bit complex.
* - ``float16``
- 16-bit (half precision) floating-point.
* - ``float32``
- 32-bit (single precision) floating-point.
* - ``float64``
- 64-bit (double precision) floating-point.
* - ``half``
- 16-bit (half precision) floating-point.
* - ``int16``
- Signed 16-bit integer.
* - ``int32``
- Signed 32-bit integer.
* - ``int64``
- Signed 64-bit integer.
* - ``int8``
- Signed 8-bit integer.
* - ``uint16``
- Unsigned 16-bit (word) integer.
* - ``uint32``
- Unsigned 32-bit (dword) integer.
* - ``uint64``
- Unsigned 64-bit (qword) integer.
* - ``uint8``
- Unsigned 8-bit (byte) integer.
.. note::
JAX data type support is effected by the :ref:`key_rocm_libraries` and it's
collected on :doc:`ROCm data types and precision support <rocm:reference/precision-support>`
page.
Supported modules
--------------------------------------------------------------------------------
For a complete and up-to-date list of JAX public modules (for example, ``jax.numpy``,
``jax.scipy``, ``jax.lax``), their descriptions, and usage, please refer directly to the
`official JAX API documentation <https://jax.readthedocs.io/en/latest/jax.html>`_.
.. note::
Since version 0.1.56, JAX has full support for ROCm, and the
:ref:`Known issues and important notes <jax_comp_known_issues>` section
contains details about limitations specific to the ROCm backend. The list of
JAX API modules is maintained by the JAX project and is subject to change.
Refer to the official Jax documentation for the most up-to-date information.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,100 @@
:orphan:
.. meta::
:description: Stanford Megatron-LM compatibility
:keywords: Stanford, Megatron-LM, compatibility
.. version-set:: rocm_version latest
********************************************************************************
Stanford Megatron-LM compatibility
********************************************************************************
Stanford Megatron-LM is a large-scale language model training framework developed by NVIDIA `https://github.com/NVIDIA/Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_. It is
designed to train massive transformer-based language models efficiently by model and data parallelism.
* ROCm support for Stanford Megatron-LM is hosted in the official `https://github.com/ROCm/Stanford-Megatron-LM <https://github.com/ROCm/Stanford-Megatron-LM>`_ repository.
* Due to independent compatibility considerations, this location differs from the `https://github.com/stanford-futuredata/Megatron-LM <https://github.com/stanford-futuredata/Megatron-LM>`_ upstream repository.
* Use the prebuilt :ref:`Docker image <megatron-lm-docker-compat>` with ROCm, PyTorch, and Megatron-LM preinstalled.
* See the :doc:`ROCm Stanford Megatron-LM installation guide <rocm-install-on-linux:install/3rd-party/stanford-megatron-lm-install>` to install and get started.
.. note::
Stanford Megatron-LM is supported on ROCm 6.3.0.
Supported Devices
================================================================================
- **Officially Supported**: AMD Instinct MI300X
- **Partially Supported** (functionality or performance limitations): AMD Instinct MI250X, MI210X
Supported models and features
================================================================================
This section details models & features that are supported by the ROCm version on Stanford Megatron-LM.
Models:
* Bert
* GPT
* T5
* ICT
Features:
* Distributed Pre-training
* Activation Checkpointing and Recomputation
* Distributed Optimizer
* Mixture-of-Experts
.. _megatron-lm-recommendations:
Use cases and recommendations
================================================================================
See the `Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs blog <https://rocm.blogs.amd.com/artificial-intelligence/megablocks/README.html>`_ post
to leverage the ROCm platform for pre-training by using the Stanford Megatron-LM framework of pre-processing datasets on AMD GPUs.
Coverage includes:
* Single-GPU pre-training
* Multi-GPU pre-training
.. _megatron-lm-docker-compat:
Docker image compatibility
================================================================================
.. |docker-icon| raw:: html
<i class="fab fa-docker"></i>
AMD validates and publishes `Stanford Megatron-LM images <https://hub.docker.com/r/rocm/megatron-lm>`_
with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated
inventories represent the latest Megatron-LM version from the official Docker Hub.
The Docker images have been validated for `ROCm 6.3.0 <https://repo.radeon.com/rocm/apt/6.3/>`_.
Click |docker-icon| to view the image on Docker Hub.
.. list-table::
:header-rows: 1
:class: docker-image-compatibility
* - Docker image
- Stanford Megatron-LM
- PyTorch
- Ubuntu
- Python
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/stanford-megatron-lm/stanford-megatron-lm85f95ae_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0/images/sha256-070556f078be10888a1421a2cb4f48c29f28b02bfeddae02588d1f7fc02a96a6"><i class="fab fa-docker fa-lg"></i></a>
- `85f95ae <https://github.com/stanford-futuredata/Megatron-LM/commit/85f95aef3b648075fe6f291c86714fdcbd9cd1f5>`_
- `2.4.0 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
- 24.04
- `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_

View File

@@ -10,16 +10,16 @@
TensorFlow compatibility
*******************************************************************************
`TensorFlow <https://www.tensorflow.org/>`_ is an open-source library for
`TensorFlow <https://www.tensorflow.org/>`__ is an open-source library for
solving machine learning, deep learning, and AI problems. It can solve many
problems across different sectors and industries but primarily focuses on
neural network training and inference. It is one of the most popular and
in-demand frameworks and is very active in open-source contribution and
development.
The `official TensorFlow repository <http://github.com/tensorflow/tensorflow>`_
The `official TensorFlow repository <http://github.com/tensorflow/tensorflow>`__
includes full ROCm support. AMD maintains a TensorFlow `ROCm repository
<http://github.com/rocm/tensorflow-upstream>`_ in order to quickly add bug
<http://github.com/rocm/tensorflow-upstream>`__ in order to quickly add bug
fixes, updates, and support for the latest ROCM versions.
- ROCm TensorFlow release:
@@ -27,16 +27,16 @@ fixes, updates, and support for the latest ROCM versions.
- Offers :ref:`Docker images <tensorflow-docker-compat>` with
ROCm and TensorFlow pre-installed.
- ROCm TensorFlow repository: `<https://github.com/ROCm/tensorflow-upstream>`_
- ROCm TensorFlow repository: `<https://github.com/ROCm/tensorflow-upstream>`__
- See the :doc:`ROCm TensorFlow installation guide <rocm-install-on-linux:install/3rd-party/tensorflow-install>`
to get started.
- Official TensorFlow release:
- Official TensorFlow repository: `<https://github.com/tensorflow/tensorflow>`_
- Official TensorFlow repository: `<https://github.com/tensorflow/tensorflow>`__
- See the `TensorFlow API versions <https://www.tensorflow.org/versions>`_ list.
- See the `TensorFlow API versions <https://www.tensorflow.org/versions>`__ list.
.. note::
@@ -54,9 +54,9 @@ Docker image compatibility
<i class="fab fa-docker"></i>
AMD validates and publishes ready-made `TensorFlow images
<https://hub.docker.com/r/rocm/tensorflow>`_ with ROCm backends on
<https://hub.docker.com/r/rocm/tensorflow>`__ with ROCm backends on
Docker Hub. The following Docker image tags and associated inventories are
validated for `ROCm 6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`_. Click
validated for `ROCm 6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`__. Click
the |docker-icon| icon to view the image on Docker Hub.
.. list-table:: TensorFlow Docker image components
@@ -76,8 +76,8 @@ the |docker-icon| icon to view the image on Docker Hub.
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- dev
- 24.04
- `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`_
- `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
* - .. raw:: html
@@ -86,8 +86,8 @@ the |docker-icon| icon to view the image on Docker Hub.
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- runtime
- 24.04
- `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`_
- `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
* - .. raw:: html
@@ -96,8 +96,8 @@ the |docker-icon| icon to view the image on Docker Hub.
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.18.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
- dev
- 22.04
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`_
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`__
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
* - .. raw:: html
@@ -106,8 +106,8 @@ the |docker-icon| icon to view the image on Docker Hub.
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.18.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
- runtime
- 22.04
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`_
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`__
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
* - .. raw:: html
@@ -116,8 +116,8 @@ the |docker-icon| icon to view the image on Docker Hub.
- `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.17.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- dev
- 24.04
- `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`_
- `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`__
* - .. raw:: html
@@ -126,8 +126,8 @@ the |docker-icon| icon to view the image on Docker Hub.
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.17.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- runtime
- 24.04
- `Python 3.12.10 <https://www.python.org/downloads/release/python-3124/>`_
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`_
- `Python 3.12.10 <https://www.python.org/downloads/release/python-3124/>`__
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`__
* - .. raw:: html
@@ -136,8 +136,8 @@ the |docker-icon| icon to view the image on Docker Hub.
- `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.17.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- dev
- 22.04
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`_
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`__
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`__
* - .. raw:: html
@@ -146,8 +146,8 @@ the |docker-icon| icon to view the image on Docker Hub.
- `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.17.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
- runtime
- 22.04
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`_
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`__
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`__
* - .. raw:: html
@@ -156,8 +156,8 @@ the |docker-icon| icon to view the image on Docker Hub.
- `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.16.2-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- dev
- 24.04
- `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`_
- `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`__
* - .. raw:: html
@@ -166,8 +166,8 @@ the |docker-icon| icon to view the image on Docker Hub.
- `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.16.2-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- runtime
- 24.04
- `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`_
- `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`__
* - .. raw:: html
@@ -176,8 +176,8 @@ the |docker-icon| icon to view the image on Docker Hub.
- `tensorflow-rocm 2.16.2 <https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.10-tf2.16-dev/images/sha256-36c4fa047c86e2470ac473ec1429aea6d4b8934b90ffeb34d1afab40e7e5b377>`__
- dev
- 22.04
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`_
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`__
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`__
* - .. raw:: html
@@ -186,8 +186,8 @@ the |docker-icon| icon to view the image on Docker Hub.
- `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.16.2-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- runtime
- 22.04
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`_
- `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`__
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`__
Critical ROCm libraries for TensorFlow
@@ -207,43 +207,43 @@ are available in ROCm :version:`rocm_version`.
- Version
- Purpose
- Used in
* - `hipBLAS <https://github.com/ROCm/hipBLAS>`_
* - `hipBLAS <https://github.com/ROCm/hipBLAS>`__
- :version-ref:`hipBLAS rocm_version`
- Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for
matrix and vector operations.
- Accelerates operations like ``tf.matmul``, ``tf.linalg.matmul``, and
other matrix multiplications commonly used in neural network layers.
* - `hipBLASLt <https://github.com/ROCm/hipBLASLt>`_
* - `hipBLASLt <https://github.com/ROCm/hipBLASLt>`__
- :version-ref:`hipBLASLt rocm_version`
- Extends hipBLAS with additional optimizations like fused kernels and
integer tensor cores.
- Optimizes matrix multiplications and linear algebra operations used in
layers like dense, convolutional, and RNNs in TensorFlow.
* - `hipCUB <https://github.com/ROCm/hipCUB>`_
* - `hipCUB <https://github.com/ROCm/hipCUB>`__
- :version-ref:`hipCUB rocm_version`
- Provides a C++ template library for parallel algorithms for reduction,
scan, sort and select.
- Supports operations like ``tf.reduce_sum``, ``tf.cumsum``, ``tf.sort``
and other tensor operations in TensorFlow, especially those involving
scanning, sorting, and filtering.
* - `hipFFT <https://github.com/ROCm/hipFFT>`_
* - `hipFFT <https://github.com/ROCm/hipFFT>`__
- :version-ref:`hipFFT rocm_version`
- Accelerates Fast Fourier Transforms (FFT) for signal processing tasks.
- Used for operations like signal processing, image filtering, and
certain types of neural networks requiring FFT-based transformations.
* - `hipSOLVER <https://github.com/ROCm/hipSOLVER>`_
* - `hipSOLVER <https://github.com/ROCm/hipSOLVER>`__
- :version-ref:`hipSOLVER rocm_version`
- Provides GPU-accelerated direct linear solvers for dense and sparse
systems.
- Optimizes linear algebra functions such as solving systems of linear
equations, often used in optimization and training tasks.
* - `hipSPARSE <https://github.com/ROCm/hipSPARSE>`_
* - `hipSPARSE <https://github.com/ROCm/hipSPARSE>`__
- :version-ref:`hipSPARSE rocm_version`
- Optimizes sparse matrix operations for efficient computations on sparse
data.
- Accelerates sparse matrix operations in models with sparse weight
matrices or activations, commonly used in neural networks.
* - `MIOpen <https://github.com/ROCm/MIOpen>`_
* - `MIOpen <https://github.com/ROCm/MIOpen>`__
- :version-ref:`MIOpen rocm_version`
- Provides optimized deep learning primitives such as convolutions,
pooling,
@@ -251,13 +251,13 @@ are available in ROCm :version:`rocm_version`.
- Speeds up convolutional neural networks (CNNs) and other layers. Used
in TensorFlow for layers like ``tf.nn.conv2d``, ``tf.nn.relu``, and
``tf.nn.lstm_cell``.
* - `RCCL <https://github.com/ROCm/rccl>`_
* - `RCCL <https://github.com/ROCm/rccl>`__
- :version-ref:`RCCL rocm_version`
- Optimizes for multi-GPU communication for operations like AllReduce and
Broadcast.
- Distributed data parallel training (``tf.distribute.MirroredStrategy``).
Handles communication in multi-GPU setups.
* - `rocThrust <https://github.com/ROCm/rocThrust>`_
* - `rocThrust <https://github.com/ROCm/rocThrust>`__
- :version-ref:`rocThrust rocm_version`
- Provides a C++ template library for parallel algorithms like sorting,
reduction, and scanning.
@@ -278,7 +278,7 @@ The data type of a tensor is specified using the ``dtype`` attribute or
argument, and TensorFlow supports a wide range of data types for different use
cases.
The basic, single data types of `tf.dtypes <https://www.tensorflow.org/api_docs/python/tf/dtypes>`_
The basic, single data types of `tf.dtypes <https://www.tensorflow.org/api_docs/python/tf/dtypes>`__
are as follows:
.. list-table::
@@ -550,7 +550,7 @@ Use cases and recommendations
===============================================================================
* The `Training a Neural Collaborative Filtering (NCF) Recommender on an AMD
GPU <https://rocm.blogs.amd.com/artificial-intelligence/ncf/README.html>`_
GPU <https://rocm.blogs.amd.com/artificial-intelligence/ncf/README.html>`__
blog post discusses training an NCF recommender system using TensorFlow. It
explains how NCF improves traditional collaborative filtering methods by
leveraging neural networks to model non-linear user-item interactions. The
@@ -559,7 +559,7 @@ Use cases and recommendations
purchasing) and how it addresses challenges like the lack of negative values.
* The `Creating a PyTorch/TensorFlow code environment on AMD GPUs
<https://rocm.blogs.amd.com/software-tools-optimization/pytorch-tensorflow-env/README.html>`_
<https://rocm.blogs.amd.com/software-tools-optimization/pytorch-tensorflow-env/README.html>`__
blog post provides instructions for creating a machine learning environment
for PyTorch and TensorFlow on AMD GPUs using ROCm. It covers steps like
installing the libraries, cloning code repositories, installing dependencies,
@@ -568,4 +568,4 @@ Use cases and recommendations
for a better experience on AMD GPUs. This guide aims to help data scientists
and ML practitioners adapt their code for AMD GPUs.
For more use cases and recommendations, see the `ROCm Tensorflow blog posts <https://rocm.blogs.amd.com/blog/tag/tensorflow.html>`_.
For more use cases and recommendations, see the `ROCm Tensorflow blog posts <https://rocm.blogs.amd.com/blog/tag/tensorflow.html>`__.

View File

@@ -0,0 +1,85 @@
:orphan:
.. meta::
:description: verl compatibility
:keywords: GPU, verl compatibility
.. version-set:: rocm_version latest
*******************************************************************************
verl compatibility
*******************************************************************************
Volcano Engine Reinforcement Learning for LLMs (verl) is a reinforcement learning framework designed for large language models (LLMs).
verl offers a scalable, open-source fine-tuning solution optimized for AMD Instinct GPUs with full ROCm support.
* See the `verl documentation <https://verl.readthedocs.io/en/latest/>`_ for more information about verl.
* The official verl GitHub repository is `https://github.com/volcengine/verl <https://github.com/volcengine/verl>`_.
* Use the AMD-validated :ref:`Docker images <verl-docker-compat>` with ROCm and verl preinstalled.
* See the :doc:`ROCm verl installation guide <rocm-install-on-linux:install/3rd-party/verl-install>` to get started.
.. note::
verl is supported on ROCm 6.2.0.
.. _verl-recommendations:
Use cases and recommendations
================================================================================
The benefits of verl in large-scale reinforcement leaning from human feedback (RLHF) are discussed in the `Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration <https://rocm.blogs.amd.com/artificial-intelligence/verl-large-scale/README.html>`_ blog.
.. _verl-docker-compat:
Docker image compatibility
================================================================================
.. |docker-icon| raw:: html
<i class="fab fa-docker"></i>
AMD validates and publishes ready-made `ROCm verl Docker images <https://hub.docker.com/r/rocm/verl>`_
with ROCm backends on Docker Hub. The following Docker image tags and associated inventories represent the latest verl version from the official Docker Hub. The Docker images have been validated for `ROCm 6.2.0 <https://repo.radeon.com/rocm/apt/6.2/>`_.
.. list-table::
:header-rows: 1
* - Docker image
- verl
- Linux
- Pytorch
- Python
- vllm
* - .. raw:: html
<a href="https://hub.docker.com/layers/rocm/verl/verl-0.3.0.post0_rocm6.2_vllm0.6.3/images/sha256-cbe423803fd7850448b22444176bee06f4dcf22cd3c94c27732752d3a39b04b2"><i class="fab fa-docker fa-lg"></i> rocm/verl</a>
- `0.3.0post0 <https://github.com/volcengine/verl/releases/tag/v0.3.0.post0>`_
- Ubuntu 20.04
- `2.5.0 <https://download.pytorch.org/whl/cu118/torch-2.5.0%2Bcu118-cp39-cp39-linux_x86_64.whl#sha256=1ee24b267418c37b297529ede875b961e382c1c365482f4142af2398b92ed127>`_
- `3.9.19 <https://www.python.org/downloads/release/python-3919/>`_
- `0.6.4 <https://github.com/vllm-project/vllm/releases/tag/v0.6.4>`_
Supported features
===============================================================================
The following table shows verl and ROCm support for GPU-accelerated modules.
.. list-table::
:header-rows: 1
* - Module
- Description
- verl version
- ROCm version
* - ``FSDP``
- Training engine
- 0.3.0.post0
- 6.2
* - ``vllm``
- Inference engine
- 0.3.0.post0
- 6.2

View File

@@ -12,6 +12,54 @@ from pathlib import Path
shutil.copy2("../RELEASE.md", "./about/release-notes.md")
shutil.copy2("../CHANGELOG.md", "./release/changelog.md")
# Mark the consolidated changelog as orphan to prevent Sphinx from warning about missing toctree entries
with open("./release/changelog.md", "r+") as file:
content = file.read()
file.seek(0)
file.write(":orphan:\n" + content)
# Replace GitHub-style [!ADMONITION]s with Sphinx-compatible ```{admonition} blocks
with open("./release/changelog.md", "r") as file:
lines = file.readlines()
modified_lines = []
in_admonition_section = False
# Map for matching the specific admonition type to its corresponding Sphinx markdown syntax
admonition_types = {
'> [!NOTE]': '```{note}',
'> [!TIP]': '```{tip}',
'> [!IMPORTANT]': '```{important}',
'> [!WARNING]': '```{warning}',
'> [!CAUTION]': '```{caution}'
}
for line in lines:
if any(line.startswith(k) for k in admonition_types):
for key in admonition_types:
if(line.startswith(key)):
modified_lines.append(admonition_types[key] + '\n')
break
in_admonition_section = True
elif in_admonition_section:
if line.strip() == '':
# If we encounter an empty line, close the admonition section
modified_lines.append('```\n\n') # Close the admonition block
in_admonition_section = False
else:
modified_lines.append(line.lstrip('> '))
else:
modified_lines.append(line)
# In case the file ended while still in a admonition section, close it
if in_admonition_section:
modified_lines.append('```')
file.close()
with open("./release/changelog.md", 'w') as file:
file.writelines(modified_lines)
os.system("mkdir -p ../_readthedocs/html/downloads")
os.system("cp compatibility/compatibility-matrix-historical-6.0.csv ../_readthedocs/html/downloads/compatibility-matrix-historical-6.0.csv")
@@ -34,15 +82,15 @@ project = "ROCm Documentation"
project_path = os.path.abspath(".").replace("\\", "/")
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved."
version = "6.4.1"
release = "6.4.1"
version = "6.4.2"
release = "6.4.2"
setting_all_article_info = True
all_article_info_os = ["linux", "windows"]
all_article_info_author = ""
# pages with specific settings
article_pages = [
{"file": "about/release-notes", "os": ["linux"], "date": "2025-05-07"},
{"file": "about/release-notes", "os": ["linux"], "date": "2025-07-21"},
{"file": "release/changelog", "os": ["linux"],},
{"file": "compatibility/compatibility-matrix", "os": ["linux"]},
{"file": "compatibility/ml-compatibility/pytorch-compatibility", "os": ["linux"]},
@@ -57,10 +105,22 @@ article_pages = [
{"file": "how-to/rocm-for-ai/training/index", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/train-a-model", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/prerequisite-system-validation", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/megatron-lm", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/pytorch-training", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/mpt-llm-foundry", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/scale-model-training", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/megatron-lm", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-history", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-v24.12-dev", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-v25.3", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-v25.4", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-v25.5", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/pytorch-training", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/pytorch-training-history", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/pytorch-training-v25.3", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/pytorch-training-v25.4", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/pytorch-training-v25.5", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/jax-maxtext", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/jax-maxtext-history", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/jax-maxtext-v25.4", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/benchmark-docker/mpt-llm-foundry", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/fine-tuning/index", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/fine-tuning/overview", "os": ["linux"]},
@@ -72,7 +132,16 @@ article_pages = [
{"file": "how-to/rocm-for-ai/inference/hugging-face-models", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/llm-inference-frameworks", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/vllm", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-history", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.4.3", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.6.4", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.6.6", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.7.3-20250325", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.8.3-20250415", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.8.5-20250513", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.8.5-20250521", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.9.0.1-20250605", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.9.0.1-20250702", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/benchmark-docker/pytorch-inference", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/deploy-your-model", "os": ["linux"]},

View File

@@ -0,0 +1,163 @@
vllm_benchmark:
unified_docker:
latest:
# TODO: update me
pull_tag: rocm/vllm:rocm6.4.1_vllm_0.9.1_20250702
docker_hub_url: https://hub.docker.com/layers/rocm/vllm/rocm6.4.1_vllm_0.9.1_20250702/images/sha256-45068a2079cb8df554ed777141bf0c67d6627c470a897256e60c9f262677faab
rocm_version: 6.4.1
vllm_version: 0.9.1 (0.9.2.dev206+gb335519f2.rocm641)
pytorch_version: 2.7.0+gitf717b2a
hipblaslt_version: 0.15
model_groups:
- group: Meta Llama
tag: llama
models:
- model: Llama 3.1 8B
mad_tag: pyt_vllm_llama-3.1-8b
model_repo: meta-llama/Llama-3.1-8B-Instruct
url: https://huggingface.co/meta-llama/Llama-3.1-8B
precision: float16
- model: Llama 3.1 70B
mad_tag: pyt_vllm_llama-3.1-70b
model_repo: meta-llama/Llama-3.1-70B-Instruct
url: https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct
precision: float16
- model: Llama 3.1 405B
mad_tag: pyt_vllm_llama-3.1-405b
model_repo: meta-llama/Llama-3.1-405B-Instruct
url: https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct
precision: float16
- model: Llama 2 7B
mad_tag: pyt_vllm_llama-2-7b
model_repo: meta-llama/Llama-2-7b-chat-hf
url: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
precision: float16
- model: Llama 2 70B
mad_tag: pyt_vllm_llama-2-70b
model_repo: meta-llama/Llama-2-70b-chat-hf
url: https://huggingface.co/meta-llama/Llama-2-70b-chat-hf
precision: float16
- model: Llama 3.1 8B FP8
mad_tag: pyt_vllm_llama-3.1-8b_fp8
model_repo: amd/Llama-3.1-8B-Instruct-FP8-KV
url: https://huggingface.co/amd/Llama-3.1-8B-Instruct-FP8-KV
precision: float8
- model: Llama 3.1 70B FP8
mad_tag: pyt_vllm_llama-3.1-70b_fp8
model_repo: amd/Llama-3.1-70B-Instruct-FP8-KV
url: https://huggingface.co/amd/Llama-3.1-70B-Instruct-FP8-KV
precision: float8
- model: Llama 3.1 405B FP8
mad_tag: pyt_vllm_llama-3.1-405b_fp8
model_repo: amd/Llama-3.1-405B-Instruct-FP8-KV
url: https://huggingface.co/amd/Llama-3.1-405B-Instruct-FP8-KV
precision: float8
- group: Mistral AI
tag: mistral
models:
- model: Mixtral MoE 8x7B
mad_tag: pyt_vllm_mixtral-8x7b
model_repo: mistralai/Mixtral-8x7B-Instruct-v0.1
url: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
precision: float16
- model: Mixtral MoE 8x22B
mad_tag: pyt_vllm_mixtral-8x22b
model_repo: mistralai/Mixtral-8x22B-Instruct-v0.1
url: https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1
precision: float16
- model: Mistral 7B
mad_tag: pyt_vllm_mistral-7b
model_repo: mistralai/Mistral-7B-Instruct-v0.3
url: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
precision: float16
- model: Mixtral MoE 8x7B FP8
mad_tag: pyt_vllm_mixtral-8x7b_fp8
model_repo: amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV
url: https://huggingface.co/amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV
precision: float8
- model: Mixtral MoE 8x22B FP8
mad_tag: pyt_vllm_mixtral-8x22b_fp8
model_repo: amd/Mixtral-8x22B-Instruct-v0.1-FP8-KV
url: https://huggingface.co/amd/Mixtral-8x22B-Instruct-v0.1-FP8-KV
precision: float8
- model: Mistral 7B FP8
mad_tag: pyt_vllm_mistral-7b_fp8
model_repo: amd/Mistral-7B-v0.1-FP8-KV
url: https://huggingface.co/amd/Mistral-7B-v0.1-FP8-KV
precision: float8
- group: Qwen
tag: qwen
models:
- model: Qwen2 7B
mad_tag: pyt_vllm_qwen2-7b
model_repo: Qwen/Qwen2-7B-Instruct
url: https://huggingface.co/Qwen/Qwen2-7B-Instruct
precision: float16
- model: Qwen2 72B
mad_tag: pyt_vllm_qwen2-72b
model_repo: Qwen/Qwen2-72B-Instruct
url: https://huggingface.co/Qwen/Qwen2-72B-Instruct
precision: float16
- model: QwQ-32B
mad_tag: pyt_vllm_qwq-32b
model_repo: Qwen/QwQ-32B
url: https://huggingface.co/Qwen/QwQ-32B
precision: float16
tunableop: true
- group: Databricks DBRX
tag: dbrx
models:
- model: DBRX Instruct
mad_tag: pyt_vllm_dbrx-instruct
model_repo: databricks/dbrx-instruct
url: https://huggingface.co/databricks/dbrx-instruct
precision: float16
- model: DBRX Instruct FP8
mad_tag: pyt_vllm_dbrx_fp8
model_repo: amd/dbrx-instruct-FP8-KV
url: https://huggingface.co/amd/dbrx-instruct-FP8-KV
precision: float8
- group: Google Gemma
tag: gemma
models:
- model: Gemma 2 27B
mad_tag: pyt_vllm_gemma-2-27b
model_repo: google/gemma-2-27b
url: https://huggingface.co/google/gemma-2-27b
precision: float16
- group: Cohere
tag: cohere
models:
- model: C4AI Command R+ 08-2024
mad_tag: pyt_vllm_c4ai-command-r-plus-08-2024
model_repo: CohereForAI/c4ai-command-r-plus-08-2024
url: https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024
precision: float16
- model: C4AI Command R+ 08-2024 FP8
mad_tag: pyt_vllm_command-r-plus_fp8
model_repo: amd/c4ai-command-r-plus-FP8-KV
url: https://huggingface.co/amd/c4ai-command-r-plus-FP8-KV
precision: float8
- group: DeepSeek
tag: deepseek
models:
- model: DeepSeek MoE 16B
mad_tag: pyt_vllm_deepseek-moe-16b-chat
model_repo: deepseek-ai/deepseek-moe-16b-chat
url: https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat
precision: float16
- group: Microsoft Phi
tag: phi
models:
- model: Phi-4
mad_tag: pyt_vllm_phi-4
model_repo: microsoft/phi-4
url: https://huggingface.co/microsoft/phi-4
- group: TII Falcon
tag: falcon
models:
- model: Falcon 180B
mad_tag: pyt_vllm_falcon-180b
model_repo: tiiuae/falcon-180B
url: https://huggingface.co/tiiuae/falcon-180B
precision: float16

View File

@@ -1,6 +1,6 @@
pytorch_inference_benchmark:
unified_docker:
latest: &rocm-pytorch-docker-latest
latest:
pull_tag: rocm/pytorch:latest
docker_hub_url:
rocm_version:
@@ -39,3 +39,11 @@ pytorch_inference_benchmark:
model_repo: Wan-AI/Wan2.1-T2V-14B
url: https://huggingface.co/Wan-AI/Wan2.1-T2V-14B
precision: bfloat16
- group: Janus-Pro
tag: janus-pro
models:
- model: Janus Pro 7B
mad_tag: pyt_janus_pro_inference
model_repo: deepseek-ai/Janus-Pro-7B
url: https://huggingface.co/deepseek-ai/Janus-Pro-7B
precision: bfloat16

View File

@@ -0,0 +1,17 @@
sglang_benchmark:
unified_docker:
latest:
pull_tag: lmsysorg/sglang:v0.4.5-rocm630
docker_hub_url: https://hub.docker.com/layers/lmsysorg/sglang/v0.4.5-rocm630/images/sha256-63d2cb760a237125daf6612464cfe2f395c0784e21e8b0ea37d551cd10d3c951
rocm_version: 6.3.0
sglang_version: 0.4.5 (0.4.5-rocm)
pytorch_version: 2.6.0a0+git8d4926e
model_groups:
- group: DeepSeek
tag: deepseek
models:
- model: DeepSeek-R1-Distill-Qwen-32B
mad_tag: pyt_sglang_deepseek-r1-distill-qwen-32b
model_repo: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
url: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
precision: bfloat16

View File

@@ -2,10 +2,10 @@ vllm_benchmark:
unified_docker:
latest:
# TODO: update me
pull_tag: rocm/vllm:rocm6.4.1_vllm_0.9.1_20250702
docker_hub_url: https://hub.docker.com/layers/rocm/vllm/rocm6.4.1_vllm_0.9.1_20250702/images/sha256-45068a2079cb8df554ed777141bf0c67d6627c470a897256e60c9f262677faab
pull_tag: rocm/vllm:rocm6.4.1_vllm_0.9.1_20250715
docker_hub_url: https://hub.docker.com/layers/rocm/vllm/rocm6.4.1_vllm_0.9.1_20250715/images/sha256-4a429705fa95a58f6d20aceab43b1b76fa769d57f32d5d28bd3f4e030e2a78ea
rocm_version: 6.4.1
vllm_version: 0.9.1 (0.9.2.dev206+gb335519f2.rocm641)
vllm_version: 0.9.1 (0.9.2.dev364+gb432b7a28.rocm641)
pytorch_version: 2.7.0+gitf717b2a
hipblaslt_version: 0.15
model_groups:

View File

@@ -1,29 +1,60 @@
megatron-lm_benchmark:
model_groups:
- group: Meta Llama
tag: llama
models:
dockers:
- pull_tag: rocm/megatron-lm:v25.6_py312
docker_hub_url: https://hub.docker.com/layers/rocm/megatron-lm/v25.6_py312/images/sha256-482ff906532285bceabdf2bda629bd32cb6174d2d07f4243a736378001b28df0
components:
ROCm: 6.4.1
PyTorch: 2.8.0a0+git7d205b2
Python: 3.12
Transformer Engine: 2.1.0.dev0+8c4a512
hipBLASLt: 393e413
Triton: 3.3.0
RCCL: 2.23.4.7a84c5d
doc_name: Ubuntu 24.04 + Python 3.12
- pull_tag: rocm/megatron-lm:v25.6_py310
docker_hub_url: https://hub.docker.com/layers/rocm/megatron-lm/v25.6_py310/images/sha256-9627bd9378684fe26cb1a10c7dd817868f553b33402e49b058355b0f095568d6
components:
ROCm: 6.4.1
PyTorch: 2.8.0a0+git7d205b2
Python: "3.10"
Transformer Engine: 2.1.0.dev0+8c4a512
hipBLASLt: 393e413
Triton: 3.3.0
RCCL: 2.23.4.7a84c5d
doc_name: Ubuntu 22.04 + Python 3.10
model_groups:
- group: Meta Llama
tag: llama
models:
- model: Llama 3.3 70B
mad_tag: pyt_megatron_lm_train_llama-3.3-70b
- model: Llama 3.1 8B
mad_tag: pyt_megatron_lm_train_llama-3.1-8b
- model: Llama 3.1 70B
mad_tag: pyt_megatron_lm_train_llama-3.1-70b
- model: Llama 3.1 70B (proxy)
mad_tag: pyt_megatron_lm_train_llama-3.1-70b-proxy
- model: Llama 2 7B
mad_tag: pyt_megatron_lm_train_llama-2-7b
- model: Llama 2 70B
mad_tag: pyt_megatron_lm_train_llama-2-70b
- group: DeepSeek
tag: deepseek
models:
- model: DeepSeek-V3
- group: DeepSeek
tag: deepseek
models:
- model: DeepSeek-V3 (proxy)
mad_tag: pyt_megatron_lm_train_deepseek-v3-proxy
- model: DeepSeek-V2-Lite
mad_tag: pyt_megatron_lm_train_deepseek-v2-lite-16b
- group: Mistral AI
tag: mistral
models:
- group: Mistral AI
tag: mistral
models:
- model: Mixtral 8x7B
mad_tag: pyt_megatron_lm_train_mixtral-8x7b
- model: Mixtral 8x22B
- model: Mixtral 8x22B (proxy)
mad_tag: pyt_megatron_lm_train_mixtral-8x22b-proxy
- group: Qwen
tag: qwen
models:
- model: Qwen 2.5 7B
mad_tag: pyt_megatron_lm_train_qwen2.5-7b
- model: Qwen 2.5 72B
mad_tag: pyt_megatron_lm_train_qwen2.5-72b

View File

@@ -0,0 +1,29 @@
megatron-lm_benchmark:
model_groups:
- group: Meta Llama
tag: llama
models:
- model: Llama 3.3 70B
mad_tag: pyt_megatron_lm_train_llama-3.3-70b
- model: Llama 3.1 8B
mad_tag: pyt_megatron_lm_train_llama-3.1-8b
- model: Llama 3.1 70B
mad_tag: pyt_megatron_lm_train_llama-3.1-70b
- model: Llama 2 7B
mad_tag: pyt_megatron_lm_train_llama-2-7b
- model: Llama 2 70B
mad_tag: pyt_megatron_lm_train_llama-2-70b
- group: DeepSeek
tag: deepseek
models:
- model: DeepSeek-V3
mad_tag: pyt_megatron_lm_train_deepseek-v3-proxy
- model: DeepSeek-V2-Lite
mad_tag: pyt_megatron_lm_train_deepseek-v2-lite-16b
- group: Mistral AI
tag: mistral
models:
- model: Mixtral 8x7B
mad_tag: pyt_megatron_lm_train_mixtral-8x7b
- model: Mixtral 8x22B
mad_tag: pyt_megatron_lm_train_mixtral-8x22b-proxy

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.2 MiB

After

Width:  |  Height:  |  Size: 1.1 MiB

View File

@@ -17,6 +17,9 @@ features for these ROCm-enabled deep learning frameworks.
* :doc:`PyTorch compatibility <../compatibility/ml-compatibility/pytorch-compatibility>`
* :doc:`TensorFlow compatibility <../compatibility/ml-compatibility/tensorflow-compatibility>`
* :doc:`JAX compatibility <../compatibility/ml-compatibility/jax-compatibility>`
* :doc:`verl compatibility <../compatibility/ml-compatibility/verl-compatibility>`
* :doc:`Stanford Megatron-LM compatibility <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>`
* :doc:`DGL compatibility <../compatibility/ml-compatibility/dgl-compatibility>`
This chart steps through typical installation workflows for installing deep learning frameworks for ROCm.
@@ -29,6 +32,9 @@ See the installation instructions to get started.
* :doc:`PyTorch for ROCm <rocm-install-on-linux:install/3rd-party/pytorch-install>`
* :doc:`TensorFlow for ROCm <rocm-install-on-linux:install/3rd-party/tensorflow-install>`
* :doc:`JAX for ROCm <rocm-install-on-linux:install/3rd-party/jax-install>`
* :doc:`verl for ROCm <rocm-install-on-linux:install/3rd-party/verl-install>`
* :doc:`Stanford Megatron-LM for ROCm <rocm-install-on-linux:install/3rd-party/stanford-megatron-lm-install>`
* :doc:`DGL for ROCm <rocm-install-on-linux:install/3rd-party/dgl-install>`
.. note::

View File

@@ -0,0 +1,25 @@
:orphan:
****************************************************
SGLang inference performance testing version history
****************************************************
This table lists previous versions of the ROCm SGLang inference performance
testing environment. For detailed information about available models for
benchmarking, see the version-specific documentation.
.. list-table::
:header-rows: 1
* - Docker image tag
- Components
- Resources
* - ``lmsysorg/sglang:v0.4.5-rocm630``
-
* ROCm 6.3.0
* SGLang 0.4.5
* PyTorch 2.6.0
-
* :doc:`Documentation <../sglang>`
* `Docker Hub <https://hub.docker.com/layers/lmsysorg/sglang/v0.4.5-rocm630/images/sha256-63d2cb760a237125daf6612464cfe2f395c0784e21e8b0ea37d551cd10d3c951>`__

View File

@@ -80,11 +80,11 @@ MI300X accelerator with the prebuilt vLLM Docker image.
Once setup is complete, you can choose between two options to reproduce the
benchmark results:
- :ref:`MAD-integrated benchmarking <vllm-benchmark-mad>`
- :ref:`MAD-integrated benchmarking <vllm-benchmark-mad-v043>`
- :ref:`Standalone benchmarking <vllm-benchmark-standalone>`
- :ref:`Standalone benchmarking <vllm-benchmark-standalone-v043>`
.. _vllm-benchmark-mad:
.. _vllm-benchmark-mad-v043:
MAD-integrated benchmarking
===========================
@@ -112,7 +112,7 @@ model are collected in the following path: ``~/MAD/reports_float16/``
Although the following eight models are pre-configured to collect latency and
throughput performance data, users can also change the benchmarking parameters.
Refer to the :ref:`Standalone benchmarking <vllm-benchmark-standalone>` section.
Refer to the :ref:`Standalone benchmarking <vllm-benchmark-standalone-v043>` section.
Available models
----------------
@@ -136,7 +136,7 @@ Available models
* ``pyt_vllm_jais-30b``
.. _vllm-benchmark-standalone:
.. _vllm-benchmark-standalone-v043:
Standalone benchmarking
=======================
@@ -167,14 +167,14 @@ Command
^^^^^^^^^^^^^^^^^^^^^^^^^
To start the benchmark, use the following command with the appropriate options.
See :ref:`Options <vllm-benchmark-standalone-options>` for the list of
See :ref:`Options <vllm-benchmark-standalone-options-v043>` for the list of
options and their descriptions.
.. code-block:: shell
./vllm_benchmark_report.sh -s $test_option -m $model_repo -g $num_gpu -d $datatype
See the :ref:`examples <vllm-benchmark-run-benchmark>` for more information.
See the :ref:`examples <vllm-benchmark-run-benchmark-v043>` for more information.
.. note::
@@ -193,7 +193,7 @@ See the :ref:`examples <vllm-benchmark-run-benchmark>` for more information.
# pass your HF_TOKEN
export HF_TOKEN=$your_personal_hf_token
.. _vllm-benchmark-standalone-options:
.. _vllm-benchmark-standalone-options-v043:
Options
^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -265,13 +265,13 @@ Options
- ``float16``
- Data type
.. _vllm-benchmark-run-benchmark:
.. _vllm-benchmark-run-benchmark-v043:
Running the benchmark on the MI300X accelerator
-----------------------------------------------
Here are some examples of running the benchmark with various options.
See :ref:`Options <vllm-benchmark-standalone-options>` for the list of
See :ref:`Options <vllm-benchmark-standalone-options-v043>` for the list of
options and their descriptions.
Latency benchmark example

View File

@@ -103,11 +103,11 @@ MI300X accelerator with the prebuilt vLLM Docker image.
Once setup is complete, you can choose between two options to reproduce the
benchmark results:
- :ref:`MAD-integrated benchmarking <vllm-benchmark-mad>`
- :ref:`MAD-integrated benchmarking <vllm-benchmark-mad-v064>`
- :ref:`Standalone benchmarking <vllm-benchmark-standalone>`
- :ref:`Standalone benchmarking <vllm-benchmark-standalone-v064>`
.. _vllm-benchmark-mad:
.. _vllm-benchmark-mad-v064:
MAD-integrated benchmarking
===========================
@@ -135,7 +135,7 @@ model are collected in the following path: ``~/MAD/reports_float16/``.
Although the following models are preconfigured to collect latency and
throughput performance data, you can also change the benchmarking parameters.
Refer to the :ref:`Standalone benchmarking <vllm-benchmark-standalone>` section.
Refer to the :ref:`Standalone benchmarking <vllm-benchmark-standalone-v064>` section.
Available models
----------------
@@ -177,7 +177,7 @@ Available models
* ``pyt_vllm_mixtral-8x22b_fp8``
.. _vllm-benchmark-standalone:
.. _vllm-benchmark-standalone-v064:
Standalone benchmarking
=======================
@@ -203,14 +203,14 @@ Command
-------
To start the benchmark, use the following command with the appropriate options.
See :ref:`Options <vllm-benchmark-standalone-options>` for the list of
See :ref:`Options <vllm-benchmark-standalone-v064-options>` for the list of
options and their descriptions.
.. code-block:: shell
./vllm_benchmark_report.sh -s $test_option -m $model_repo -g $num_gpu -d $datatype
See the :ref:`examples <vllm-benchmark-run-benchmark>` for more information.
See the :ref:`examples <vllm-benchmark-run-benchmark-v064>` for more information.
.. note::
@@ -229,7 +229,7 @@ See the :ref:`examples <vllm-benchmark-run-benchmark>` for more information.
# pass your HF_TOKEN
export HF_TOKEN=$your_personal_hf_token
.. _vllm-benchmark-standalone-options:
.. _vllm-benchmark-standalone-v064-options:
Options
-------
@@ -330,13 +330,13 @@ Options
- ``float16`` or ``float8``
- Data type
.. _vllm-benchmark-run-benchmark:
.. _vllm-benchmark-run-benchmark-v064:
Running the benchmark on the MI300X accelerator
-----------------------------------------------
Here are some examples of running the benchmark with various options.
See :ref:`Options <vllm-benchmark-standalone-options>` for the list of
See :ref:`Options <vllm-benchmark-standalone-v064-options>` for the list of
options and their descriptions.
Example 1: latency benchmark

View File

@@ -31,8 +31,8 @@ accelerator and includes the following components:
With this Docker image, you can quickly validate the expected inference
performance numbers for the MI300X accelerator. This topic also provides tips on
optimizing performance with popular AI models. For more information, see the lists of
:ref:`available models for MAD-integrated benchmarking <vllm-benchmark-mad-models>`
and :ref:`standalone benchmarking <vllm-benchmark-standalone-options>`.
:ref:`available models for MAD-integrated benchmarking <vllm-benchmark-mad-v066-models>`
and :ref:`standalone benchmarking <vllm-benchmark-standalone-v066-options>`.
.. _vllm-benchmark-vllm:
@@ -76,11 +76,11 @@ MI300X accelerator with the prebuilt vLLM Docker image.
Once the setup is complete, choose between two options to reproduce the
benchmark results:
- :ref:`MAD-integrated benchmarking <vllm-benchmark-mad>`
- :ref:`MAD-integrated benchmarking <vllm-benchmark-mad-v066>`
- :ref:`Standalone benchmarking <vllm-benchmark-standalone>`
- :ref:`Standalone benchmarking <vllm-benchmark-standalone-v066>`
.. _vllm-benchmark-mad:
.. _vllm-benchmark-mad-v066:
MAD-integrated benchmarking
===========================
@@ -108,9 +108,9 @@ model are collected in the following path: ``~/MAD/reports_float16/``.
Although the following models are preconfigured to collect latency and
throughput performance data, you can also change the benchmarking parameters.
Refer to the :ref:`Standalone benchmarking <vllm-benchmark-standalone>` section.
Refer to the :ref:`Standalone benchmarking <vllm-benchmark-standalone-v066>` section.
.. _vllm-benchmark-mad-models:
.. _vllm-benchmark-mad-v066-models:
Available models
----------------
@@ -134,10 +134,10 @@ Available models
* - `Llama 3.2 11B Vision <https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct>`_
- ``pyt_vllm_llama-3.2-11b-vision-instruct``
* - `Llama 2 7B <https://huggingface.co/meta-llama/Llama-2-7b-chat-hf>`_
* - `Llama 2 7B <https://huggingface.co/meta-llama/Llama-2-7b-chat-hf>`__
- ``pyt_vllm_llama-2-7b``
* - `Llama 2 70B <https://huggingface.co/meta-llama/Llama-2-70b-chat-hf>`_
* - `Llama 2 70B <https://huggingface.co/meta-llama/Llama-2-70b-chat-hf>`__
- ``pyt_vllm_llama-2-70b``
* - `Mixtral MoE 8x7B <https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1>`_
@@ -194,7 +194,7 @@ Available models
* - `C4AI Command R+ 08-2024 FP8 <https://huggingface.co/amd/c4ai-command-r-plus-FP8-KV>`_
- ``pyt_vllm_command-r-plus_fp8``
.. _vllm-benchmark-standalone:
.. _vllm-benchmark-standalone-v066:
Standalone benchmarking
=======================
@@ -220,14 +220,14 @@ Command
-------
To start the benchmark, use the following command with the appropriate options.
See :ref:`Options <vllm-benchmark-standalone-options>` for the list of
See :ref:`Options <vllm-benchmark-standalone-v066-options>` for the list of
options and their descriptions.
.. code-block:: shell
./vllm_benchmark_report.sh -s $test_option -m $model_repo -g $num_gpu -d $datatype
See the :ref:`examples <vllm-benchmark-run-benchmark>` for more information.
See the :ref:`examples <vllm-benchmark-run-benchmark-v066>` for more information.
.. note::
@@ -246,7 +246,7 @@ See the :ref:`examples <vllm-benchmark-run-benchmark>` for more information.
# pass your HF_TOKEN
export HF_TOKEN=$your_personal_hf_token
.. _vllm-benchmark-standalone-options:
.. _vllm-benchmark-standalone-v066-options:
Options and available models
----------------------------
@@ -289,11 +289,11 @@ Options and available models
* -
- ``meta-llama/Llama-2-7b-chat-hf``
- `Llama 2 7B <https://huggingface.co/meta-llama/Llama-2-7b-chat-hf>`_
- `Llama 2 7B <https://huggingface.co/meta-llama/Llama-2-7b-chat-hf>`__
* -
- ``meta-llama/Llama-2-70b-chat-hf``
- `Llama 2 7B <https://huggingface.co/meta-llama/Llama-2-70b-chat-hf>`_
- `Llama 2 70B <https://huggingface.co/meta-llama/Llama-2-70b-chat-hf>`__
* -
- ``mistralai/Mixtral-8x7B-Instruct-v0.1``
@@ -375,13 +375,13 @@ Options and available models
- ``float16`` or ``float8``
- Data type
.. _vllm-benchmark-run-benchmark:
.. _vllm-benchmark-run-benchmark-v066:
Running the benchmark on the MI300X accelerator
-----------------------------------------------
Here are some examples of running the benchmark with various options.
See :ref:`Options <vllm-benchmark-standalone-options>` for the list of
See :ref:`Options <vllm-benchmark-standalone-v066-options>` for the list of
options and their descriptions.
Example 1: latency benchmark

View File

@@ -36,10 +36,10 @@ vLLM inference performance testing
* `hipBLASLt {{ unified_docker.hipblaslt_version }} <https://github.com/ROCm/hipBLASLt>`_
With this Docker image, you can quickly test the :ref:`expected
inference performance numbers <vllm-benchmark-performance-measurements>` for
inference performance numbers <vllm-benchmark-performance-measurements-v073>` for
MI300X series accelerators.
.. _vllm-benchmark-available-models:
.. _vllm-benchmark-available-models-v073:
Available models
================
@@ -95,7 +95,7 @@ vLLM inference performance testing
See :ref:`fine-tuning-llms-vllm` and :ref:`mi300x-vllm-optimization` for
more information.
.. _vllm-benchmark-performance-measurements:
.. _vllm-benchmark-performance-measurements-v073:
Performance measurements
========================
@@ -109,7 +109,7 @@ vLLM inference performance testing
The performance data presented in
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
only reflects the :doc:`latest version of this inference benchmarking environment <../vllm>`_.
only reflects the :doc:`latest version of this inference benchmarking environment <../vllm>`.
The listed measurements should not be interpreted as the peak performance achievable by AMD Instinct MI325X and MI300X accelerators or ROCm software.
Advanced features and known issues
@@ -130,7 +130,7 @@ vLLM inference performance testing
To optimize performance, disable automatic NUMA balancing. Otherwise, the GPU
might hang until the periodic balancing is finalized. For more information,
see the :ref:`system validation steps <rocm-for-ai-system-optimization>`.
see the :ref:`system validation steps <rocm-for-ai-system-optimization>`.
.. code-block:: shell
@@ -154,7 +154,7 @@ vLLM inference performance testing
Once the setup is complete, choose between two options to reproduce the
benchmark results:
.. _vllm-benchmark-mad:
.. _vllm-benchmark-mad-v073:
{% for model_group in model_groups %}
{% for model in model_group.models %}
@@ -175,7 +175,7 @@ vLLM inference performance testing
pip install -r requirements.txt
Use this command to run the performance benchmark test on the `{{model.model}} <{{ model.url }}>`_ model
using one GPU with the ``{{model.precision}}`` data type on the host machine.
using one GPU with the :literal:`{{model.precision}}` data type on the host machine.
.. code-block:: shell
@@ -186,7 +186,7 @@ vLLM inference performance testing
``container_ci-{{model.mad_tag}}``. The latency and throughput reports of the
model are collected in the following path: ``~/MAD/reports_{{model.precision}}/``.
Although the :ref:`available models <vllm-benchmark-available-models>` are preconfigured
Although the :ref:`available models <vllm-benchmark-available-models-v073>` are preconfigured
to collect latency and throughput performance data, you can also change the benchmarking
parameters. See the standalone benchmarking tab for more information.
@@ -264,7 +264,7 @@ vLLM inference performance testing
* Latency benchmark
Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with the ``{{model.precision}}`` data type.
Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with the :literal:`{{model.precision}}` data type.
.. code-block::
@@ -274,7 +274,7 @@ vLLM inference performance testing
* Throughput benchmark
Use this command to throughput the latency of the {{model.model}} model on eight GPUs with the ``{{model.precision}}`` data type.
Use this command to throughput the latency of the {{model.model}} model on eight GPUs with the :literal:`{{model.precision}}` data type.
.. code-block:: shell

View File

@@ -31,10 +31,10 @@ vLLM inference performance testing
* `hipBLASLt {{ unified_docker.hipblaslt_version }} <https://github.com/ROCm/hipBLASLt>`_
With this Docker image, you can quickly test the :ref:`expected
inference performance numbers <vllm-benchmark-performance-measurements>` for
inference performance numbers <vllm-benchmark-performance-measurements-v083>` for
MI300X series accelerators.
.. _vllm-benchmark-available-models:
.. _vllm-benchmark-available-models-v083:
Supported models
================
@@ -90,7 +90,7 @@ vLLM inference performance testing
See :ref:`fine-tuning-llms-vllm` and :ref:`mi300x-vllm-optimization` for
more information.
.. _vllm-benchmark-performance-measurements:
.. _vllm-benchmark-performance-measurements-v083:
Performance measurements
========================
@@ -104,7 +104,7 @@ vLLM inference performance testing
The performance data presented in
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
only reflects the :doc:`latest version of this inference benchmarking environment <../vllm>`_.
only reflects the :doc:`latest version of this inference benchmarking environment <../vllm>`.
The listed measurements should not be interpreted as the peak performance achievable by AMD Instinct MI325X and MI300X accelerators or ROCm software.
Advanced features and known issues
@@ -172,7 +172,7 @@ vLLM inference performance testing
pip install -r requirements.txt
Use this command to run the performance benchmark test on the `{{model.model}} <{{ model.url }}>`_ model
using one GPU with the ``{{model.precision}}`` data type on the host machine.
using one GPU with the :literal:`{{model.precision}}` data type on the host machine.
.. code-block:: shell
@@ -183,7 +183,7 @@ vLLM inference performance testing
``container_ci-{{model.mad_tag}}``. The latency and throughput reports of the
model are collected in the following path: ``~/MAD/reports_{{model.precision}}/``.
Although the :ref:`available models <vllm-benchmark-available-models>` are preconfigured
Although the :ref:`available models <vllm-benchmark-available-models-v083>` are preconfigured
to collect latency and throughput performance data, you can also change the benchmarking
parameters. See the standalone benchmarking tab for more information.
@@ -280,7 +280,7 @@ vLLM inference performance testing
* Latency benchmark
Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with ``{{model.precision}}`` precision.
Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with :literal:`{{model.precision}}` precision.
.. code-block::
@@ -290,7 +290,7 @@ vLLM inference performance testing
* Throughput benchmark
Use this command to benchmark the throughput of the {{model.model}} model on eight GPUs with ``{{model.precision}}`` precision.
Use this command to benchmark the throughput of the {{model.model}} model on eight GPUs with :literal:`{{model.precision}}` precision.
.. code-block:: shell

View File

@@ -36,10 +36,10 @@ vLLM inference performance testing
* `hipBLASLt {{ unified_docker.hipblaslt_version }} <https://github.com/ROCm/hipBLASLt>`_
With this Docker image, you can quickly test the :ref:`expected
inference performance numbers <vllm-benchmark-performance-measurements>` for
inference performance numbers <vllm-benchmark-performance-measurements-v085-20250513>` for
MI300X series accelerators.
.. _vllm-benchmark-available-models:
.. _vllm-benchmark-available-models-v085-20250513:
Supported models
================
@@ -99,7 +99,7 @@ vLLM inference performance testing
See :ref:`fine-tuning-llms-vllm` and :ref:`mi300x-vllm-optimization` for
more information.
.. _vllm-benchmark-performance-measurements:
.. _vllm-benchmark-performance-measurements-v085-20250513:
Performance measurements
========================
@@ -113,7 +113,7 @@ vLLM inference performance testing
The performance data presented in
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
only reflects the :doc:`latest version of this inference benchmarking environment <../vllm>`_.
only reflects the :doc:`latest version of this inference benchmarking environment <../vllm>`.
The listed measurements should not be interpreted as the peak performance achievable by AMD Instinct MI325X and MI300X accelerators or ROCm software.
Advanced features and known issues
@@ -181,7 +181,7 @@ vLLM inference performance testing
pip install -r requirements.txt
Use this command to run the performance benchmark test on the `{{model.model}} <{{ model.url }}>`_ model
using one GPU with the ``{{model.precision}}`` data type on the host machine.
using one GPU with the :literal:`{{model.precision}}` data type on the host machine.
.. code-block:: shell
@@ -192,7 +192,7 @@ vLLM inference performance testing
``container_ci-{{model.mad_tag}}``. The latency and throughput reports of the
model are collected in the following path: ``~/MAD/reports_{{model.precision}}/``.
Although the :ref:`available models <vllm-benchmark-available-models>` are preconfigured
Although the :ref:`available models <vllm-benchmark-available-models-v085-20250513>` are preconfigured
to collect latency and throughput performance data, you can also change the benchmarking
parameters. See the standalone benchmarking tab for more information.
@@ -289,7 +289,7 @@ vLLM inference performance testing
* Latency benchmark
Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with ``{{model.precision}}`` precision.
Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with :literal:`{{model.precision}}` precision.
.. code-block::
@@ -299,7 +299,7 @@ vLLM inference performance testing
* Throughput benchmark
Use this command to benchmark the throughput of the {{model.model}} model on eight GPUs with ``{{model.precision}}`` precision.
Use this command to benchmark the throughput of the {{model.model}} model on eight GPUs with :literal:`{{model.precision}}` precision.
.. code-block:: shell

View File

@@ -36,10 +36,10 @@ vLLM inference performance testing
* `hipBLASLt {{ unified_docker.hipblaslt_version }} <https://github.com/ROCm/hipBLASLt>`_
With this Docker image, you can quickly test the :ref:`expected
inference performance numbers <vllm-benchmark-performance-measurements>` for
inference performance numbers <vllm-benchmark-performance-measurements-v085-20250521>` for
MI300X series accelerators.
.. _vllm-benchmark-available-models:
.. _vllm-benchmark-available-models-v085-20250521:
Supported models
================
@@ -99,7 +99,7 @@ vLLM inference performance testing
See :ref:`fine-tuning-llms-vllm` and :ref:`mi300x-vllm-optimization` for
more information.
.. _vllm-benchmark-performance-measurements:
.. _vllm-benchmark-performance-measurements-v085-20250521:
Performance measurements
========================
@@ -181,7 +181,7 @@ vLLM inference performance testing
pip install -r requirements.txt
Use this command to run the performance benchmark test on the `{{model.model}} <{{ model.url }}>`_ model
using one GPU with the ``{{model.precision}}`` data type on the host machine.
using one GPU with the :literal:`{{model.precision}}` data type on the host machine.
.. code-block:: shell
@@ -192,7 +192,7 @@ vLLM inference performance testing
``container_ci-{{model.mad_tag}}``. The latency and throughput reports of the
model are collected in the following path: ``~/MAD/reports_{{model.precision}}/``.
Although the :ref:`available models <vllm-benchmark-available-models>` are preconfigured
Although the :ref:`available models <vllm-benchmark-available-models-v085-20250521>` are preconfigured
to collect latency and throughput performance data, you can also change the benchmarking
parameters. See the standalone benchmarking tab for more information.
@@ -289,7 +289,7 @@ vLLM inference performance testing
* Latency benchmark
Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with ``{{model.precision}}`` precision.
Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with :literal:`{{model.precision}}` precision.
.. code-block::
@@ -299,7 +299,7 @@ vLLM inference performance testing
* Throughput benchmark
Use this command to benchmark the throughput of the {{model.model}} model on eight GPUs with ``{{model.precision}}`` precision.
Use this command to benchmark the throughput of the {{model.model}} model on eight GPUs with :literal:`{{model.precision}}` precision.
.. code-block:: shell

View File

@@ -36,10 +36,10 @@ vLLM inference performance testing
* `hipBLASLt {{ unified_docker.hipblaslt_version }} <https://github.com/ROCm/hipBLASLt>`_
With this Docker image, you can quickly test the :ref:`expected
inference performance numbers <vllm-benchmark-performance-measurements>` for
inference performance numbers <vllm-benchmark-performance-measurements-v0901-20250605>` for
MI300X series accelerators.
.. _vllm-benchmark-available-models:
.. _vllm-benchmark-available-models-v0901-20250605:
Supported models
================
@@ -99,7 +99,7 @@ vLLM inference performance testing
See :ref:`fine-tuning-llms-vllm` and :ref:`mi300x-vllm-optimization` for
more information.
.. _vllm-benchmark-performance-measurements:
.. _vllm-benchmark-performance-measurements-v0901-20250605:
Performance measurements
========================
@@ -180,7 +180,7 @@ vLLM inference performance testing
pip install -r requirements.txt
Use this command to run the performance benchmark test on the `{{model.model}} <{{ model.url }}>`_ model
using one GPU with the ``{{model.precision}}`` data type on the host machine.
using one GPU with the :literal:`{{model.precision}}` data type on the host machine.
.. code-block:: shell
@@ -191,7 +191,7 @@ vLLM inference performance testing
``container_ci-{{model.mad_tag}}``. The latency and throughput reports of the
model are collected in the following path: ``~/MAD/reports_{{model.precision}}/``.
Although the :ref:`available models <vllm-benchmark-available-models>` are preconfigured
Although the :ref:`available models <vllm-benchmark-available-models-v0901-20250605>` are preconfigured
to collect latency and throughput performance data, you can also change the benchmarking
parameters. See the standalone benchmarking tab for more information.
@@ -288,7 +288,7 @@ vLLM inference performance testing
* Latency benchmark
Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with ``{{model.precision}}`` precision.
Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with :literal:`{{model.precision}}` precision.
.. code-block::
@@ -298,7 +298,7 @@ vLLM inference performance testing
* Throughput benchmark
Use this command to benchmark the throughput of the {{model.model}} model on eight GPUs with ``{{model.precision}}`` precision.
Use this command to benchmark the throughput of the {{model.model}} model on eight GPUs with :literal:`{{model.precision}}` precision.
.. code-block:: shell

View File

@@ -0,0 +1,353 @@
:orphan:
.. meta::
:description: Learn how to validate LLM inference performance on MI300X accelerators using AMD MAD and the
ROCm vLLM Docker image.
:keywords: model, MAD, automation, dashboarding, validate
**********************************
vLLM inference performance testing
**********************************
.. caution::
This documentation does not reflect the latest version of ROCm vLLM
inference performance documentation. See :doc:`../vllm` for the latest version.
.. _vllm-benchmark-unified-docker:
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/vllm_0.9.1_20250702-benchmark-models.yaml
{% set unified_docker = data.vllm_benchmark.unified_docker.latest %}
{% set model_groups = data.vllm_benchmark.model_groups %}
The `ROCm vLLM Docker <{{ unified_docker.docker_hub_url }}>`_ image offers
a prebuilt, optimized environment for validating large language model (LLM)
inference performance on AMD Instinct™ MI300X series accelerators. This ROCm vLLM
Docker image integrates vLLM and PyTorch tailored specifically for MI300X series
accelerators and includes the following components:
* `ROCm {{ unified_docker.rocm_version }} <https://github.com/ROCm/ROCm>`_
* `vLLM {{ unified_docker.vllm_version }} <https://docs.vllm.ai/en/latest>`_
* `PyTorch {{ unified_docker.pytorch_version }} <https://github.com/ROCm/pytorch.git>`_
* `hipBLASLt {{ unified_docker.hipblaslt_version }} <https://github.com/ROCm/hipBLASLt>`_
With this Docker image, you can quickly test the :ref:`expected
inference performance numbers <vllm-benchmark-performance-measurements-20250702>` for
MI300X series accelerators.
.. _vllm-benchmark-available-models-20250702:
Supported models
================
The following models are supported for inference performance benchmarking
with vLLM and ROCm. Some instructions, commands, and recommendations in this
documentation might vary by model -- select one to get started.
.. raw:: html
<div id="vllm-benchmark-ud-params-picker" class="container-fluid">
<div class="row">
<div class="col-2 me-2 model-param-head">Model group</div>
<div class="row col-10">
{% for model_group in model_groups %}
<div class="col-3 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
{% endfor %}
</div>
</div>
<div class="row mt-1">
<div class="col-2 me-2 model-param-head">Model</div>
<div class="row col-10">
{% for model_group in model_groups %}
{% set models = model_group.models %}
{% for model in models %}
{% if models|length % 3 == 0 %}
<div class="col-4 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
{% else %}
<div class="col-6 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
{% endif %}
{% endfor %}
{% endfor %}
</div>
</div>
</div>
.. _vllm-benchmark-vllm:
{% for model_group in model_groups %}
{% for model in model_group.models %}
.. container:: model-doc {{model.mad_tag}}
.. note::
See the `{{ model.model }} model card on Hugging Face <{{ model.url }}>`_ to learn more about your selected model.
Some models require access authorization prior to use via an external license agreement through a third party.
{% endfor %}
{% endfor %}
.. note::
vLLM is a toolkit and library for LLM inference and serving. AMD implements
high-performance custom kernels and modules in vLLM to enhance performance.
See :ref:`fine-tuning-llms-vllm` and :ref:`mi300x-vllm-optimization` for
more information.
.. _vllm-benchmark-performance-measurements-20250702:
Performance measurements
========================
To evaluate performance, the
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
page provides reference throughput and latency measurements for inferencing popular AI models.
.. important::
The performance data presented in
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
only reflects the latest version of this inference benchmarking environment.
The listed measurements should not be interpreted as the peak performance achievable by AMD Instinct MI325X and MI300X accelerators or ROCm software.
Advanced features and known issues
==================================
For information on experimental features and known issues related to ROCm optimization efforts on vLLM,
see the developer's guide at `<https://github.com/ROCm/vllm/tree/5486e7bc8523be0324ccd68f221959445b56cc2a/docs/dev-docker>`__.
System validation
=================
Before running AI workloads, it's important to validate that your AMD hardware is configured
correctly and performing optimally.
To optimize performance, disable automatic NUMA balancing. Otherwise, the GPU
might hang until the periodic balancing is finalized. For more information,
see the :ref:`system validation steps <rocm-for-ai-system-optimization>`.
.. code-block:: shell
# disable automatic NUMA balancing
sh -c 'echo 0 > /proc/sys/kernel/numa_balancing'
# check if NUMA balancing is disabled (returns 0 if disabled)
cat /proc/sys/kernel/numa_balancing
0
To test for optimal performance, consult the recommended :ref:`System health benchmarks
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
system's configuration.
Pull the Docker image
=====================
Download the `ROCm vLLM Docker image <{{ unified_docker.docker_hub_url }}>`_.
Use the following command to pull the Docker image from Docker Hub.
.. code-block:: shell
docker pull {{ unified_docker.pull_tag }}
Benchmarking
============
Once the setup is complete, choose between two options to reproduce the
benchmark results:
.. _vllm-benchmark-mad:
{% for model_group in model_groups %}
{% for model in model_group.models %}
.. container:: model-doc {{model.mad_tag}}
.. tab-set::
.. tab-item:: MAD-integrated benchmarking
Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
directory and install the required packages on the host machine.
.. code-block:: shell
git clone https://github.com/ROCm/MAD
cd MAD
pip install -r requirements.txt
Use this command to run the performance benchmark test on the `{{model.model}} <{{ model.url }}>`_ model
using one GPU with the :literal:`{{model.precision}}` data type on the host machine.
.. code-block:: shell
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
python3 tools/run_models.py --tags {{model.mad_tag}} --keep-model-dir --live-output --timeout 28800
MAD launches a Docker container with the name
``container_ci-{{model.mad_tag}}``. The latency and throughput reports of the
model are collected in the following path: ``~/MAD/reports_{{model.precision}}/``.
Although the :ref:`available models <vllm-benchmark-available-models-20250702>` are preconfigured
to collect latency and throughput performance data, you can also change the benchmarking
parameters. See the standalone benchmarking tab for more information.
{% if model.tunableop %}
.. note::
For improved performance, consider enabling :ref:`PyTorch TunableOp <mi300x-tunableop>`.
TunableOp automatically explores different implementations and configurations of certain PyTorch
operators to find the fastest one for your hardware.
By default, ``{{model.mad_tag}}`` runs with TunableOp disabled
(see
`<https://github.com/ROCm/MAD/blob/develop/models.json>`__). To
enable it, edit the default run behavior in the ``models.json``
configuration before running inference -- update the model's run
``args`` by changing ``--tunableop off`` to ``--tunableop on``.
Enabling TunableOp triggers a two-pass run -- a warm-up followed by the performance-collection run.
{% endif %}
.. tab-item:: Standalone benchmarking
Run the vLLM benchmark tool independently by starting the
`Docker container <{{ unified_docker.docker_hub_url }}>`_
as shown in the following snippet.
.. code-block::
docker pull {{ unified_docker.pull_tag }}
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video --shm-size 16G --security-opt seccomp=unconfined --security-opt apparmor=unconfined --cap-add=SYS_PTRACE -v $(pwd):/workspace --env HUGGINGFACE_HUB_CACHE=/workspace --name test {{ unified_docker.pull_tag }}
In the Docker container, clone the ROCm MAD repository and navigate to the
benchmark scripts directory at ``~/MAD/scripts/vllm``.
.. code-block::
git clone https://github.com/ROCm/MAD
cd MAD/scripts/vllm
To start the benchmark, use the following command with the appropriate options.
.. code-block::
./vllm_benchmark_report.sh -s $test_option -m {{model.model_repo}} -g $num_gpu -d {{model.precision}}
.. list-table::
:header-rows: 1
:align: center
* - Name
- Options
- Description
* - ``$test_option``
- latency
- Measure decoding token latency
* -
- throughput
- Measure token generation throughput
* -
- all
- Measure both throughput and latency
* - ``$num_gpu``
- 1 or 8
- Number of GPUs
* - ``$datatype``
- ``float16`` or ``float8``
- Data type
.. note::
The input sequence length, output sequence length, and tensor parallel (TP) are
already configured. You don't need to specify them with this script.
.. note::
If you encounter the following error, pass your access-authorized Hugging
Face token to the gated models.
.. code-block::
OSError: You are trying to access a gated repo.
# pass your HF_TOKEN
export HF_TOKEN=$your_personal_hf_token
Here are some examples of running the benchmark with various options.
* Latency benchmark
Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with :literal`{{model.precision}}` precision.
.. code-block::
./vllm_benchmark_report.sh -s latency -m {{model.model_repo}} -g 8 -d {{model.precision}}
Find the latency report at ``./reports_{{model.precision}}_vllm_rocm{{unified_docker.rocm_version}}/summary/{{model.model_repo.split('/', 1)[1] if '/' in model.model_repo else model.model_repo}}_latency_report.csv``.
* Throughput benchmark
Use this command to benchmark the throughput of the {{model.model}} model on eight GPUs with :literal:`{{model.precision}}` precision.
.. code-block:: shell
./vllm_benchmark_report.sh -s throughput -m {{model.model_repo}} -g 8 -d {{model.precision}}
Find the throughput report at ``./reports_{{model.precision}}_vllm_rocm{{unified_docker.rocm_version}}/summary/{{model.model_repo.split('/', 1)[1] if '/' in model.model_repo else model.model_repo}}_throughput_report.csv``.
.. raw:: html
<style>
mjx-container[jax="CHTML"][display="true"] {
text-align: left;
margin: 0;
}
</style>
.. note::
Throughput is calculated as:
- .. math:: throughput\_tot = requests \times (\mathsf{\text{input lengths}} + \mathsf{\text{output lengths}}) / elapsed\_time
- .. math:: throughput\_gen = requests \times \mathsf{\text{output lengths}} / elapsed\_time
{% endfor %}
{% endfor %}
Further reading
===============
- To learn more about the options for latency and throughput benchmark scripts,
see `<https://github.com/ROCm/vllm/tree/main/benchmarks>`_.
- To learn more about system settings and management practices to configure your system for
MI300X series accelerators, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_
- For application performance optimization strategies for HPC and AI workloads,
including inference with vLLM, see :doc:`/how-to/rocm-for-ai/inference-optimization/workload`.
- To learn how to run community models from Hugging Face on AMD GPUs, see
:doc:`Running models from Hugging Face </how-to/rocm-for-ai/inference/hugging-face-models>`.
- To learn how to fine-tune LLMs and optimize inference, see
:doc:`Fine-tuning LLMs and inference optimization </how-to/rocm-for-ai/fine-tuning/fine-tuning-and-inference>`.
- For a list of other ready-made Docker images for AI with ROCm, see
`AMD Infinity Hub <https://www.amd.com/en/developer/resources/infinity-hub.html#f-amd_hub_category=AI%20%26%20ML%20Models>`_.
Previous versions
=================
See :doc:`vllm-history` to find documentation for previous releases
of the ``ROCm/vllm`` Docker image.

View File

@@ -7,76 +7,103 @@ vLLM inference performance testing version history
This table lists previous versions of the ROCm vLLM inference Docker image for
inference performance testing. For detailed information about available models
for benchmarking, see the version-specific documentation. You can find tagged
previous releases of the ``ROCm/vllm`` Docker image on `Docker Hub <https://hub.docker.com/r/rocm/vllm/tags>`_.
previous releases of the ``ROCm/vllm`` Docker image on `Docker Hub <https://hub.docker.com/r/rocm/vllm/tags>`__.
.. list-table::
:header-rows: 1
:stub-columns: 1
* - ROCm version
- vLLM version
- PyTorch version
* - Docker image tag
- Components
- Resources
* - 6.4.1
- 0.9.1
- 2.7.0
* - ``rocm/vllm:rocm6.4.1_vllm_0.9.1_20250715``
(latest)
-
* ROCm 6.4.1
* vLLM 0.9.1
* PyTorch 2.7.0
-
* :doc:`Documentation <../vllm>`
* `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.4.1_vllm_0.9.1_20250702/images/sha256-45068a2079cb8df554ed777141bf0c67d6627c470a897256e60c9f262677faab>`_
* `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.4.1_vllm_0.9.1_20250715/images/sha256-4a429705fa95a58f6d20aceab43b1b76fa769d57f32d5d28bd3f4e030e2a78ea>`__
* - 6.4.1
- 0.9.0.1
- 2.7.0
* - ``rocm/vllm:rocm6.4.1_vllm_0.9.1_20250702``
-
* ROCm 6.4.1
* vLLM 0.9.1
* PyTorch 2.7.0
-
* :doc:`Documentation <vllm-0.9.1-20250702>`
* `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.4.1_vllm_0.9.1_20250702/images/sha256-45068a2079cb8df554ed777141bf0c67d6627c470a897256e60c9f262677faab>`__
* - ``rocm/vllm:rocm6.4.1_vllm_0.9.0.1_20250605``
-
* ROCm 6.4.1
* vLLM 0.9.0.1
* PyTorch 2.7.0
-
* :doc:`Documentation <vllm-0.9.0.1-20250605>`
* `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.4.1_vllm_0.9.0.1_20250605/images/sha256-f48beeb3d72663a93c77211eb45273d564451447c097e060befa713d565fa36c>`_
* `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.4.1_vllm_0.9.0.1_20250605/images/sha256-f48beeb3d72663a93c77211eb45273d564451447c097e060befa713d565fa36c>`__
* - 6.3.1
- 0.8.5 (0.8.6.dev)
- 2.7.0
* - ``rocm/vllm:rocm6.3.1_vllm_0.8.5_20250521``
-
* ROCm 6.3.1
* 0.8.5 vLLM (0.8.6.dev)
* PyTorch 2.7.0
-
* :doc:`Documentation <vllm-0.8.5-20250521>`
* `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_vllm_0.8.5_20250521/images/sha256-38410c51af7208897cd8b737c9bdfc126e9bc8952d4aa6b88c85482f03092a11>`__
* - 6.3.1
- 0.8.5
- 2.7.0
* - ``rocm/vllm:rocm6.3.1_vllm_0.8.5_20250513``
-
* ROCm 6.3.1
* vLLM 0.8.5
* PyTorch 2.7.0
-
* :doc:`Documentation <vllm-0.8.5-20250513>`
* `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_vllm_0.8.5_20250513/images/sha256-5c8b4436dd0464119d9df2b44c745fadf81512f18ffb2f4b5dc235c71ebe26b4>`__
* - 6.3.1
- 0.8.3
- 2.7.0
* - ``rocm/vllm:rocm6.3.1_instinct_vllm0.8.3_20250415``
-
* ROCm 6.3.1
* vLLM 0.8.3
* PyTorch 2.7.0
-
* :doc:`Documentation <vllm-0.8.3-20250415>`
* `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_instinct_vllm0.8.3_20250415/images/sha256-ad9062dea3483d59dedb17c67f7c49f30eebd6eb37c3fac0a171fb19696cc845>`__
* - 6.3.1
- 0.7.3
- 2.7.0
* - ``rocm/vllm:rocm6.3.1_instinct_vllm0.7.3_20250325``
-
* ROCm 6.3.1
* vLLM 0.7.3
* PyTorch 2.7.0
-
* :doc:`Documentation <vllm-0.7.3-20250325>`
* `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_instinct_vllm0.7.3_20250325/images/sha256-25245924f61750b19be6dcd8e787e46088a496c1fe17ee9b9e397f3d84d35640>`__
* - 6.3.1
- 0.6.6
- 2.7.0
* - ``rocm/vllm:rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6``
-
* ROCm 6.3.1
* vLLM 0.6.6
* PyTorch 2.7.0
-
* :doc:`Documentation <vllm-0.6.6>`
* `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6/images/sha256-9a12ef62bbbeb5a4c30a01f702c8e025061f575aa129f291a49fbd02d6b4d6c9>`__
* - 6.2.1
- 0.6.4
- 2.5.0
* - ``rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4``
-
* ROCm 6.2.1
* vLLM 0.6.4
* PyTorch 2.5.0
-
* :doc:`Documentation <vllm-0.6.4>`
* `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4/images/sha256-ccbb74cc9e7adecb8f7bdab9555f7ac6fc73adb580836c2a35ca96ff471890d8>`__
* - 6.2.0
- 0.4.3
- 2.4.0
* - ``rocm/vllm:rocm6.2_mi300_ubuntu22.04_py3.9_vllm_7c5fd50``
-
* ROCm 6.2.0
* vLLM 0.4.3
* PyTorch 2.4.0
-
* :doc:`Documentation <vllm-0.4.3>`
* `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.2_mi300_ubuntu22.04_py3.9_vllm_7c5fd50/images/sha256-9e4dd4788a794c3d346d7d0ba452ae5e92d39b8dfac438b2af8efdc7f15d22c0>`__

View File

@@ -93,7 +93,7 @@ PyTorch inference performance testing
.. container:: model-doc pyt_chai1_inference
Use the following command to pull the `ROCm PyTorch Docker image <https://hub.docker.com/layers/rocm/pytorch/rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0_triton_llvm_reg_issue/images/sha256-b736a4239ab38a9d0e448af6d4adca83b117debed00bfbe33846f99c4540f79b>`_ from Docker Hub.
Use the following command to pull the `ROCm PyTorch Docker image <https://hub.docker.com/layers/rocm/pytorch/rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0_triton_llvm_reg_issue/images/sha256-b736a4239ab38a9d0e448af6d4adca83b117debed00bfbe33846f99c4540f79b>`__ from Docker Hub.
.. code-block:: shell
@@ -103,9 +103,9 @@ PyTorch inference performance testing
The Chai-1 benchmark uses a specifically selected Docker image using ROCm 6.2.3 and PyTorch 2.3.0 to address an accuracy issue.
.. container:: model-doc pyt_clip_inference pyt_mochi_video_inference pyt_wan2.1_inference
.. container:: model-doc pyt_clip_inference pyt_mochi_video_inference pyt_wan2.1_inference pyt_janus_pro_inference
Use the following command to pull the `ROCm PyTorch Docker image <https://hub.docker.com/layers/rocm/pytorch/latest/images/sha256-05b55983e5154f46e7441897d0908d79877370adca4d1fff4899d9539d6c4969>`_ from Docker Hub.
Use the following command to pull the `ROCm PyTorch Docker image <https://hub.docker.com/layers/rocm/pytorch/latest/images/sha256-05b55983e5154f46e7441897d0908d79877370adca4d1fff4899d9539d6c4969>`__ from Docker Hub.
.. code-block:: shell
@@ -140,22 +140,27 @@ PyTorch inference performance testing
.. code-block:: shell
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
python3 tools/run_models.py --tags {{model.mad_tag}} --keep-model-dir --live-output --timeout 28800
madengine run \
--tags {{model.mad_tag}} \
--keep-model-dir \
--live-output \
--timeout 28800
MAD launches a Docker container with the name
``container_ci-{{model.mad_tag}}``. The latency and throughput reports of the
model are collected in ``perf.csv``.
model are collected in ``perf_{{model.mad_tag}}.csv``.
{% if model.mad_tag != "pyt_janus_pro_inference" %}
.. note::
For improved performance, consider enabling TunableOp. By default,
``{{model.mad_tag}}`` runs with TunableOp disabled (see
`<https://github.com/ROCm/MAD/blob/develop/models.json>`__). To enable
it, edit the default run behavior in the ``tools/run_models.py``-- update the model's
run ``args`` by changing ``--tunableop off`` to ``--tunableop on``.
it, include the ``--tunableop on`` argument in your run.
Enabling TunableOp triggers a two-pass run -- a warm-up followed by the performance-collection run.
Although this might increase the initial training time, it can result in a performance gain.
{% endif %}
{% endfor %}
{% endfor %}
@@ -163,8 +168,10 @@ PyTorch inference performance testing
Further reading
===============
- To learn more about MAD and the ``madengine`` CLI, see the `MAD usage guide <https://github.com/ROCm/MAD?tab=readme-ov-file#usage-guide>`__.
- To learn more about system settings and management practices to configure your system for
MI300X accelerators, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_.
AMD Instinct MI300X series accelerators, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_.
- For application performance optimization strategies for HPC and AI workloads,
including inference with vLLM, see :doc:`../../inference-optimization/workload`.

View File

@@ -0,0 +1,280 @@
.. meta::
:description: Learn how to validate LLM inference performance on MI300X accelerators using AMD MAD and SGLang
:keywords: model, MAD, automation, dashboarding, validate
************************************
SGLang inference performance testing
************************************
.. _sglang-benchmark-unified-docker:
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/sglang-benchmark-models.yaml
{% set unified_docker = data.sglang_benchmark.unified_docker.latest %}
`SGLang <https://docs.sglang.ai>`__ is a high-performance inference and
serving engine for large language models (LLMs) and vision models. The
ROCm-enabled `SGLang Docker image <{{ unified_docker.docker_hub_url }}>`__
bundles SGLang with PyTorch, optimized for AMD Instinct MI300X series
accelerators. It includes the following software components:
.. list-table::
:header-rows: 1
* - Software component
- Version
* - `ROCm <https://github.com/ROCm/ROCm>`__
- {{ unified_docker.rocm_version }}
* - `SGLang <https://docs.sglang.ai/index.html>`__
- {{ unified_docker.sglang_version }}
* - `PyTorch <https://github.com/pytorch/pytorch>`__
- {{ unified_docker.pytorch_version }}
System validation
=================
Before running AI workloads, it's important to validate that your AMD hardware is configured
correctly and performing optimally.
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
before starting training.
To test for optimal performance, consult the recommended :ref:`System health benchmarks
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
system's configuration.
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/sglang-benchmark-models.yaml
{% set unified_docker = data.sglang_benchmark.unified_docker.latest %}
{% set model_groups = data.sglang_benchmark.model_groups %}
Pull the Docker image
=====================
Download the `SGLang Docker image <{{ unified_docker.docker_hub_url }}>`__.
Use the following command to pull the Docker image from Docker Hub.
.. code-block:: shell
docker pull {{ unified_docker.pull_tag }}
Benchmarking
============
Once the setup is complete, choose one of the following methods to benchmark inference performance with
`DeepSeek-R1-Distill-Qwen-32B <https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B>`__.
.. _sglang-benchmark-mad:
{% for model_group in model_groups %}
{% for model in model_group.models %}
.. container:: model-doc {{model.mad_tag}}
.. tab-set::
.. tab-item:: MAD-integrated benchmarking
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
directory and install the required packages on the host machine.
.. code-block:: shell
git clone https://github.com/ROCm/MAD
cd MAD
pip install -r requirements.txt
2. Use this command to run the performance benchmark test on the `{{model.model}} <{{ model.url }}>`_ model
using one GPU with the ``{{model.precision}}`` data type on the host machine.
.. code-block:: shell
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
madengine run \
--tags {{model.mad_tag}} \
--keep-model-dir \
--live-output \
--timeout 28800
MAD launches a Docker container with the name
``container_ci-{{model.mad_tag}}``. The latency and throughput reports of the
model are collected in the following path: ``~/MAD/perf_DeepSeek-R1-Distill-Qwen-32B.csv``.
Although the DeepSeek-R1-Distill-Qwen-32B is preconfigured
to collect latency and throughput performance data, you can also change the benchmarking
parameters. See the standalone benchmarking tab for more information.
.. tab-item:: Standalone benchmarking
.. rubric:: Download the Docker image and required scripts
1. Run the SGLang benchmark script independently by starting the
`Docker container <{{ unified_docker.docker_hub_url }}>`__
as shown in the following snippet.
.. code-block:: shell
docker pull {{ unified_docker.pull_tag }}
docker run -it \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
--shm-size 16G \
--security-opt seccomp=unconfined \
--security-opt apparmor=unconfined \
--cap-add=SYS_PTRACE \
-v $(pwd):/workspace \
--env HUGGINGFACE_HUB_CACHE=/workspace \
--name test \
{{ unified_docker.pull_tag }}
2. In the Docker container, clone the ROCm MAD repository and navigate to the
benchmark scripts directory at ``~/MAD/scripts/sglang``.
.. code-block:: shell
git clone https://github.com/ROCm/MAD
cd MAD/scripts/sglang
3. To start the benchmark, use the following command with the appropriate options.
.. dropdown:: Benchmark options
:open:
.. list-table::
:header-rows: 1
:align: center
* - Name
- Options
- Description
* - ``$test_option``
- latency
- Measure decoding token latency
* -
- throughput
- Measure token generation throughput
* -
- all
- Measure both throughput and latency
* - ``$num_gpu``
- 8
- Number of GPUs
* - ``$datatype``
- ``bfloat16``
- Data type
* - ``$dataset``
- random
- Dataset
The input sequence length, output sequence length, and tensor parallel (TP) are
already configured. You don't need to specify them with this script.
Command:
.. code-block:: shell
./sglang_benchmark_report.sh -s $test_option -m {{model.model_repo}} -g $num_gpu -d $datatype [-a $dataset]
.. note::
If you encounter the following error, pass your access-authorized Hugging
Face token to the gated models.
.. code-block:: shell-session
OSError: You are trying to access a gated repo.
# pass your HF_TOKEN
export HF_TOKEN=$your_personal_hf_token
.. rubric:: Benchmarking examples
Here are some examples of running the benchmark with various options:
* Latency benchmark
Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with ``{{model.precision}}`` precision.
.. code-block:: shell
./sglang_benchmark_report.sh \
-s latency \
-m {{model.model_repo}} \
-g 8 \
-d {{model.precision}}
Find the latency report at ``./reports_{{model.precision}}/summary/{{model.model_repo.split('/', 1)[1] if '/' in model.model_repo else model.model_repo}}_latency_report.csv``.
* Throughput benchmark
Use this command to benchmark the throughput of the {{model.model}} model on eight GPUs with ``{{model.precision}}`` precision.
.. code-block:: shell
./sglang_benchmark_report.sh \
-s throughput \
-m {{model.model_repo}} \
-g 8 \
-d {{model.precision}} \
-a random
Find the throughput report at ``./reports_{{model.precision}}/summary/{{model.model_repo.split('/', 1)[1] if '/' in model.model_repo else model.model_repo}}_throughput_report.csv``.
.. raw:: html
<style>
mjx-container[jax="CHTML"][display="true"] {
text-align: left;
margin: 0;
}
</style>
.. note::
Throughput is calculated as:
- .. math:: throughput\_tot = requests \times (\mathsf{\text{input lengths}} + \mathsf{\text{output lengths}}) / elapsed\_time
- .. math:: throughput\_gen = requests \times \mathsf{\text{output lengths}} / elapsed\_time
{% endfor %}
{% endfor %}
Further reading
===============
- To learn more about the options for latency and throughput benchmark scripts,
see `<https://github.com/sgl-project/sglang/tree/main/benchmark/blog_v0_2>`__.
- To learn more about MAD and the ``madengine`` CLI, see the `MAD usage guide <https://github.com/ROCm/MAD?tab=readme-ov-file#usage-guide>`__.
- To learn more about system settings and management practices to configure your system for
MI300X series accelerators, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`__.
- For application performance optimization strategies for HPC and AI workloads,
including inference with vLLM, see :doc:`/how-to/rocm-for-ai/inference-optimization/workload`.
- To learn how to run community models from Hugging Face on AMD GPUs, see
:doc:`Running models from Hugging Face </how-to/rocm-for-ai/inference/hugging-face-models>`.
- To learn how to fine-tune LLMs and optimize inference, see
:doc:`Fine-tuning LLMs and inference optimization </how-to/rocm-for-ai/fine-tuning/fine-tuning-and-inference>`.
- For a list of other ready-made Docker images for AI with ROCm, see
`AMD Infinity Hub <https://www.amd.com/en/developer/resources/infinity-hub.html#f-amd_hub_category=AI%20%26%20ML%20Models>`_.
Previous versions
=================
See :doc:`previous-versions/sglang-history` to find documentation for previous releases
of SGLang inference performance testing.

View File

@@ -20,23 +20,55 @@ vLLM inference performance testing
Docker image integrates vLLM and PyTorch tailored specifically for MI300X series
accelerators and includes the following components:
* `ROCm {{ unified_docker.rocm_version }} <https://github.com/ROCm/ROCm>`_
.. list-table::
:header-rows: 1
* `vLLM {{ unified_docker.vllm_version }} <https://docs.vllm.ai/en/latest>`_
* - Software component
- Version
* `PyTorch {{ unified_docker.pytorch_version }} <https://github.com/ROCm/pytorch.git>`_
* - `ROCm <https://github.com/ROCm/ROCm>`__
- {{ unified_docker.rocm_version }}
* `hipBLASLt {{ unified_docker.hipblaslt_version }} <https://github.com/ROCm/hipBLASLt>`_
* - `vLLM <https://docs.vllm.ai/en/latest>`__
- {{ unified_docker.vllm_version }}
With this Docker image, you can quickly test the :ref:`expected
inference performance numbers <vllm-benchmark-performance-measurements>` for
MI300X series accelerators.
* - `PyTorch <https://github.com/ROCm/pytorch>`__
- {{ unified_docker.pytorch_version }}
* - `hipBLASLt <https://github.com/ROCm/hipBLASLt>`__
- {{ unified_docker.hipblaslt_version }}
With this Docker image, you can quickly test the :ref:`expected
inference performance numbers <vllm-benchmark-performance-measurements>` for
MI300X series accelerators.
What's new
==========
The following is summary of notable changes since the :doc:`previous ROCm/vLLM Docker release <previous-versions/vllm-history>`.
* The ``--compilation-config-parameter`` is no longer required as its options are now enabled by default.
This parameter has been removed from the benchmarking script.
* Resolved Llama 3.1 405 B custom all-reduce issue, eliminating the need for ``--disable-custom-all-reduce``.
This parameter has been removed from the benchmarking script.
* Fixed a ``+rms_norm`` custom kernel issue.
* Added quick reduce functionality. Set ``VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=FP`` to enable; supported modes are ``FP``, ``INT8``, ``INT6``, ``INT4``.
* Implemented a workaround to potentially mitigate GPU crashes experienced with the Command R+ model, pending a driver fix.
Supported models
================
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/vllm-benchmark-models.yaml
{% set unified_docker = data.vllm_benchmark.unified_docker.latest %}
{% set model_groups = data.vllm_benchmark.model_groups %}
.. _vllm-benchmark-available-models:
Supported models
================
The following models are supported for inference performance benchmarking
with vLLM and ROCm. Some instructions, commands, and recommendations in this
documentation might vary by model -- select one to get started.
@@ -44,18 +76,18 @@ vLLM inference performance testing
.. raw:: html
<div id="vllm-benchmark-ud-params-picker" class="container-fluid">
<div class="row">
<div class="col-2 me-2 model-param-head">Model group</div>
<div class="row col-10">
<div class="row">
<div class="col-2 me-2 model-param-head">Model group</div>
<div class="row col-10">
{% for model_group in model_groups %}
<div class="col-3 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
{% endfor %}
</div>
</div>
</div>
</div>
<div class="row mt-1">
<div class="col-2 me-2 model-param-head">Model</div>
<div class="row col-10">
<div class="row mt-1">
<div class="col-2 me-2 model-param-head">Model</div>
<div class="row col-10">
{% for model_group in model_groups %}
{% set models = model_group.models %}
{% for model in models %}
@@ -66,8 +98,8 @@ vLLM inference performance testing
{% endif %}
{% endfor %}
{% endfor %}
</div>
</div>
</div>
</div>
</div>
.. _vllm-benchmark-vllm:
@@ -85,56 +117,48 @@ vLLM inference performance testing
{% endfor %}
{% endfor %}
.. note::
.. note::
vLLM is a toolkit and library for LLM inference and serving. AMD implements
high-performance custom kernels and modules in vLLM to enhance performance.
See :ref:`fine-tuning-llms-vllm` and :ref:`mi300x-vllm-optimization` for
more information.
vLLM is a toolkit and library for LLM inference and serving. AMD implements
high-performance custom kernels and modules in vLLM to enhance performance.
See :ref:`fine-tuning-llms-vllm` and :ref:`mi300x-vllm-optimization` for
more information.
.. _vllm-benchmark-performance-measurements:
.. _vllm-benchmark-performance-measurements:
Performance measurements
========================
Performance measurements
========================
To evaluate performance, the
To evaluate performance, the
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
page provides reference throughput and latency measurements for inferencing popular AI models.
.. important::
The performance data presented in
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
page provides reference throughput and latency measurements for inferencing popular AI models.
only reflects the latest version of this inference benchmarking environment.
The listed measurements should not be interpreted as the peak performance achievable by AMD Instinct MI325X and MI300X accelerators or ROCm software.
.. important::
System validation
=================
The performance data presented in
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
only reflects the latest version of this inference benchmarking environment.
The listed measurements should not be interpreted as the peak performance achievable by AMD Instinct MI325X and MI300X accelerators or ROCm software.
Before running AI workloads, it's important to validate that your AMD hardware is configured
correctly and performing optimally.
Advanced features and known issues
==================================
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
before starting training.
For information on experimental features and known issues related to ROCm optimization efforts on vLLM,
see the developer's guide at `<https://github.com/ROCm/vllm/tree/5486e7bc8523be0324ccd68f221959445b56cc2a/docs/dev-docker>`__.
To test for optimal performance, consult the recommended :ref:`System health benchmarks
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
system's configuration.
System validation
=================
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/vllm-benchmark-models.yaml
Before running AI workloads, it's important to validate that your AMD hardware is configured
correctly and performing optimally.
To optimize performance, disable automatic NUMA balancing. Otherwise, the GPU
might hang until the periodic balancing is finalized. For more information,
see the :ref:`system validation steps <rocm-for-ai-system-optimization>`.
.. code-block:: shell
# disable automatic NUMA balancing
sh -c 'echo 0 > /proc/sys/kernel/numa_balancing'
# check if NUMA balancing is disabled (returns 0 if disabled)
cat /proc/sys/kernel/numa_balancing
0
To test for optimal performance, consult the recommended :ref:`System health benchmarks
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
system's configuration.
{% set unified_docker = data.vllm_benchmark.unified_docker.latest %}
{% set model_groups = data.vllm_benchmark.model_groups %}
Pull the Docker image
=====================
@@ -163,22 +187,26 @@ vLLM inference performance testing
.. tab-item:: MAD-integrated benchmarking
Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
directory and install the required packages on the host machine.
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
directory and install the required packages on the host machine.
.. code-block:: shell
.. code-block:: shell
git clone https://github.com/ROCm/MAD
cd MAD
pip install -r requirements.txt
git clone https://github.com/ROCm/MAD
cd MAD
pip install -r requirements.txt
Use this command to run the performance benchmark test on the `{{model.model}} <{{ model.url }}>`_ model
using one GPU with the ``{{model.precision}}`` data type on the host machine.
2. Use this command to run the performance benchmark test on the `{{model.model}} <{{ model.url }}>`_ model
using one GPU with the :literal:`{{model.precision}}` data type on the host machine.
.. code-block:: shell
.. code-block:: shell
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
python3 tools/run_models.py --tags {{model.mad_tag}} --keep-model-dir --live-output --timeout 28800
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
madengine run \
--tags {{model.mad_tag}} \
--keep-model-dir \
--live-output \
--timeout 28800
MAD launches a Docker container with the name
``container_ci-{{model.mad_tag}}``. The latency and throughput reports of the
@@ -198,104 +226,136 @@ vLLM inference performance testing
By default, ``{{model.mad_tag}}`` runs with TunableOp disabled
(see
`<https://github.com/ROCm/MAD/blob/develop/models.json>`__). To
enable it, edit the default run behavior in the ``models.json``
configuration before running inference -- update the model's run
``args`` by changing ``--tunableop off`` to ``--tunableop on``.
`<https://github.com/ROCm/MAD/blob/develop/models.json>`__).
To enable it, include the ``--tunableop on`` argument in your
run.
Enabling TunableOp triggers a two-pass run -- a warm-up followed by the performance-collection run.
Enabling TunableOp triggers a two-pass run -- a warm-up followed
by the performance-collection run.
{% endif %}
.. tab-item:: Standalone benchmarking
Run the vLLM benchmark tool independently by starting the
`Docker container <{{ unified_docker.docker_hub_url }}>`_
as shown in the following snippet.
.. rubric:: Download the Docker image and required scripts
.. code-block::
1. Run the vLLM benchmark tool independently by starting the
`Docker container <{{ unified_docker.docker_hub_url }}>`_
as shown in the following snippet.
docker pull {{ unified_docker.pull_tag }}
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video --shm-size 16G --security-opt seccomp=unconfined --security-opt apparmor=unconfined --cap-add=SYS_PTRACE -v $(pwd):/workspace --env HUGGINGFACE_HUB_CACHE=/workspace --name test {{ unified_docker.pull_tag }}
.. code-block:: shell
In the Docker container, clone the ROCm MAD repository and navigate to the
benchmark scripts directory at ``~/MAD/scripts/vllm``.
docker pull {{ unified_docker.pull_tag }}
docker run -it \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
--shm-size 16G \
--security-opt seccomp=unconfined \
--security-opt apparmor=unconfined \
--cap-add=SYS_PTRACE \
-v $(pwd):/workspace \
--env HUGGINGFACE_HUB_CACHE=/workspace \
--name test \
{{ unified_docker.pull_tag }}
.. code-block::
2. In the Docker container, clone the ROCm MAD repository and navigate to the
benchmark scripts directory at ``~/MAD/scripts/vllm``.
git clone https://github.com/ROCm/MAD
cd MAD/scripts/vllm
.. code-block:: shell
To start the benchmark, use the following command with the appropriate options.
git clone https://github.com/ROCm/MAD
cd MAD/scripts/vllm
.. code-block::
3. To start the benchmark, use the following command with the appropriate options.
./vllm_benchmark_report.sh -s $test_option -m {{model.model_repo}} -g $num_gpu -d {{model.precision}}
.. dropdown:: Benchmark options
:open:
.. list-table::
:header-rows: 1
:align: center
.. list-table::
:header-rows: 1
:align: center
* - Name
- Options
- Description
* - Name
- Options
- Description
* - ``$test_option``
- latency
- Measure decoding token latency
* - ``$test_option``
- latency
- Measure decoding token latency
* -
- throughput
- Measure token generation throughput
* -
- throughput
- Measure token generation throughput
* -
- all
- Measure both throughput and latency
* -
- all
- Measure both throughput and latency
* - ``$num_gpu``
- 1 or 8
- Number of GPUs
* - ``$num_gpu``
- 1 or 8
- Number of GPUs
* - ``$datatype``
- ``float16`` or ``float8``
- Data type
* - ``$datatype``
- ``float16`` or ``float8``
- Data type
.. note::
The input sequence length, output sequence length, and tensor parallel (TP) are
already configured. You don't need to specify them with this script.
The input sequence length, output sequence length, and tensor parallel (TP) are
already configured. You don't need to specify them with this script.
.. note::
If you encounter the following error, pass your access-authorized Hugging
Face token to the gated models.
Command:
.. code-block::
OSError: You are trying to access a gated repo.
./vllm_benchmark_report.sh \
-s $test_option \
-m {{model.model_repo}} \
-g $num_gpu \
-d {{model.precision}}
# pass your HF_TOKEN
export HF_TOKEN=$your_personal_hf_token
.. note::
Here are some examples of running the benchmark with various options.
For best performance, it's recommend to run with ``VLLM_V1_USE_PREFILL_DECODE_ATTENTION=1``.
If you encounter the following error, pass your access-authorized Hugging
Face token to the gated models.
.. code-block::
OSError: You are trying to access a gated repo.
# pass your HF_TOKEN
export HF_TOKEN=$your_personal_hf_token
.. rubric:: Benchmarking examples
Here are some examples of running the benchmark with various options:
* Latency benchmark
Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with ``{{model.precision}}`` precision.
Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with :literal:`{{model.precision}}` precision.
.. code-block::
./vllm_benchmark_report.sh -s latency -m {{model.model_repo}} -g 8 -d {{model.precision}}
./vllm_benchmark_report.sh \
-s latency \
-m {{model.model_repo}} \
-g 8 \
-d {{model.precision}}
Find the latency report at ``./reports_{{model.precision}}_vllm_rocm{{unified_docker.rocm_version}}/summary/{{model.model_repo.split('/', 1)[1] if '/' in model.model_repo else model.model_repo}}_latency_report.csv``.
* Throughput benchmark
Use this command to benchmark the throughput of the {{model.model}} model on eight GPUs with ``{{model.precision}}`` precision.
Use this command to benchmark the throughput of the {{model.model}} model on eight GPUs with :literal:`{{model.precision}}` precision.
.. code-block:: shell
./vllm_benchmark_report.sh -s throughput -m {{model.model_repo}} -g 8 -d {{model.precision}}
./vllm_benchmark_report.sh \
-s throughput \
-m {{model.model_repo}} \
-g 8 \
-d {{model.precision}}
Find the throughput report at ``./reports_{{model.precision}}_vllm_rocm{{unified_docker.rocm_version}}/summary/{{model.model_repo.split('/', 1)[1] if '/' in model.model_repo else model.model_repo}}_throughput_report.csv``.
@@ -318,14 +378,51 @@ vLLM inference performance testing
{% endfor %}
{% endfor %}
Advanced usage
==============
For information on experimental features and known issues related to ROCm optimization efforts on vLLM,
see the developer's guide at `<https://github.com/ROCm/vllm/tree/f94ec9beeca1071cc34f9d1e206d8c7f3ac76129/docs/dev-docker>`__.
Reproducing the Docker image
----------------------------
To reproduce this ROCm/vLLM Docker image release, follow these steps:
1. Clone the `vLLM repository <https://github.com/ROCm/vllm>`__.
.. code-block:: shell
git clone https://github.com/ROCm/vllm.git
2. Checkout the specific release commit.
.. code-block:: shell
cd vllm
git checkout b432b7a285aa0dcb9677380936ffa74931bb6d6f
3. Build the Docker image. Replace ``vllm-rocm`` with your desired image tag.
.. code-block:: shell
docker build -f docker/Dockerfile.rocm -t vllm-rocm .
Known issues and workarounds
============================
AITER does not support FP8 KV cache yet.
Further reading
===============
- To learn more about the options for latency and throughput benchmark scripts,
see `<https://github.com/ROCm/vllm/tree/main/benchmarks>`_.
- To learn more about MAD and the ``madengine`` CLI, see the `MAD usage guide <https://github.com/ROCm/MAD?tab=readme-ov-file#usage-guide>`__.
- To learn more about system settings and management practices to configure your system for
MI300X series accelerators, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_
AMD Instinct MI300X series accelerators, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_.
- For application performance optimization strategies for HPC and AI workloads,
including inference with vLLM, see :doc:`/how-to/rocm-for-ai/inference-optimization/workload`.

View File

@@ -24,4 +24,6 @@ training, fine-tuning, and inference. It leverages popular machine learning fram
- :doc:`PyTorch inference performance testing <benchmark-docker/pytorch-inference>`
- :doc:`SGLang inference performance testing <benchmark-docker/sglang>`
- :doc:`Deploying your model <deploy-your-model>`

View File

@@ -24,12 +24,13 @@ If youre new to ROCm, refer to the :doc:`ROCm quick start install guide for L
If youre using a Radeon GPU for graphics-accelerated applications, refer to the
`Radeon installation instructions <https://rocm.docs.amd.com/projects/radeon/en/docs-6.1.3/docs/install/native_linux/install-radeon.html>`_.
ROCm supports multiple :doc:`installation methods <rocm-install-on-linux:install/install-overview>`:
You can install ROCm on :ref:`compatible systems <rocm-install-on-linux:reference/system-requirements>` via your Linux
distribution's package manager. See the following documentation resources to get started:
* :doc:`ROCm installation overview <rocm-install-on-linux:install/install-overview>`
* :doc:`Using your Linux distribution's package manager <rocm-install-on-linux:install/install-methods/package-manager-index>`
* :doc:`Using the AMDGPU installer <rocm-install-on-linux:install/install-methods/amdgpu-installer-index>`
* :ref:`Multi-version installation <rocm-install-on-linux:installation-types>`
.. grid:: 1
@@ -59,6 +60,12 @@ images with the framework pre-installed.
* :doc:`JAX for ROCm <rocm-install-on-linux:install/3rd-party/jax-install>`
* :doc:`verl for ROCm <rocm-install-on-linux:install/3rd-party/verl-install>`
* :doc:`Stanford Megatron-LM for ROCm <rocm-install-on-linux:install/3rd-party/jax-install>`
* :doc:`DGL for ROCm <rocm-install-on-linux:install/3rd-party/jax-install>`
Next steps
==========

View File

@@ -15,57 +15,51 @@ purpose-built to support models like Llama, DeepSeek, and Mixtral,
enabling developers to train next-generation AI models more
efficiently.
AMD provides a ready-to-use Docker image for MI300X series accelerators containing
AMD provides ready-to-use Docker images for MI300X series accelerators containing
essential components, including PyTorch, ROCm libraries, and Megatron-LM
utilities. It contains the following software components to accelerate training
workloads:
+--------------------------+--------------------------------+
| Software component | Version |
+==========================+================================+
| ROCm | 6.3.4 |
+--------------------------+--------------------------------+
| PyTorch | 2.8.0a0+gite2f9759 |
+--------------------------+--------------------------------+
| Python | 3.12 or 3.10 |
+--------------------------+--------------------------------+
| Transformer Engine | 1.13.0+bb061ade |
+--------------------------+--------------------------------+
| Flash Attention | 3.0.0 |
+--------------------------+--------------------------------+
| hipBLASLt | 0.13.0-4f18bf6 |
+--------------------------+--------------------------------+
| Triton | 3.3.0 |
+--------------------------+--------------------------------+
| RCCL | 2.22.3 |
+--------------------------+--------------------------------+
Megatron-LM provides the following key features to train large language models efficiently:
- Transformer Engine (TE)
- APEX
- GEMM tuning
- Torch.compile
- 3D parallelism: TP + SP + CP
- Distributed optimizer
- Flash Attention (FA) 3
- Fused kernels
- Pre-training
.. _amd-megatron-lm-model-support:
The following models are pre-optimized for performance on AMD Instinct MI300X series accelerators.
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/megatron-lm-benchmark-models.yaml
{% set dockers = data.dockers %}
{% if dockers|length > 1 %}
.. tab-set::
{% for docker in data.dockers %}
.. tab-item:: ``{{ docker.pull_tag }}``
:sync: {{ docker.pull_tag }}
.. list-table::
:header-rows: 1
* - Software component
- Version
{% for component_name, component_version in docker.components.items() %}
* - {{ component_name }}
- {{ component_version }}
{% endfor %}
{% endfor %}
{% elif dockers|length == 1 %}
.. list-table::
:header-rows: 1
* - Software component
- Version
{% for component_name, component_version in docker.components %}
* - {{ component_name }}
- {{ component_version }}
{% endfor %}
{% endif %}
.. _amd-megatron-lm-model-support:
The following models are pre-optimized for performance on AMD Instinct MI300X series accelerators.
Supported models
================
@@ -73,8 +67,7 @@ The following models are pre-optimized for performance on AMD Instinct MI300X se
Some instructions, commands, and training recommendations in this documentation might
vary by model -- select one to get started.
{% set model_groups = data["megatron-lm_benchmark"].model_groups %}
{% set model_groups = data.model_groups %}
.. raw:: html
<div id="vllm-benchmark-ud-params-picker" class="container-fluid">
@@ -82,7 +75,7 @@ The following models are pre-optimized for performance on AMD Instinct MI300X se
<div class="col-2 me-2 model-param-head">Model</div>
<div class="row col-10">
{% for model_group in model_groups %}
<div class="col-4 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
<div class="col-3 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
{% endfor %}
</div>
</div>
@@ -115,14 +108,14 @@ Performance measurements
========================
To evaluate performance, the
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html#tabs-a8deaeb413-item-21cea50186-tab>`_
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html#tabs-a8deaeb413-item-21cea50186-tab>`__
page provides reference throughput and latency measurements for training
popular AI models.
.. important::
The performance data presented in
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`__
only reflects the latest version of this training benchmarking environment.
The listed measurements should not be interpreted as the peak performance achievable by AMD Instinct MI325X and MI300X accelerators or ROCm software.
@@ -155,42 +148,77 @@ image.
Download the Docker image
-------------------------
1. Use the following command to pull the Docker image from Docker Hub.
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/megatron-lm-benchmark-models.yaml
.. tab-set::
{% set dockers = data.dockers %}
1. Use the following command to pull the Docker image from Docker Hub.
.. tab-item:: Ubuntu 24.04 + Python 3.12
:sync: py312
{% if dockers|length > 1 %}
.. tab-set::
.. code-block:: shell
{% for docker in data.dockers %}
.. tab-item:: {{ docker.doc_name }}
:sync: {{ docker.pull_tag }}
docker pull rocm/megatron-lm:v25.5_py312
.. code-block:: shell
.. tab-item:: Ubuntu 22.04 + Python 3.10
:sync: py310
docker pull {{ docker.pull_tag }}
.. code-block:: shell
{% endfor %}
{% elif dockers|length == 1 %}
{% set docker = dockers[0] %}
.. code-block:: shell
docker pull rocm/megatron-lm:v25.5_py310
docker pull {{ docker.pull_tag }}
2. Launch the Docker container.
{% endif %}
2. Launch the Docker container.
.. tab-set::
{% if dockers|length > 1 %}
.. tab-set::
.. tab-item:: Ubuntu 24.04 + Python 3.12
:sync: py312
{% for docker in data.dockers %}
.. tab-item:: {{ docker.doc_name }}
:sync: {{ docker.pull_tag }}
.. code-block:: shell
.. code-block:: shell
docker run -it --device /dev/dri --device /dev/kfd --device /dev/infiniband --network host --ipc host --group-add video --cap-add SYS_PTRACE --security-opt seccomp=unconfined --privileged -v $HOME:$HOME -v $HOME/.ssh:/root/.ssh --shm-size 128G --name megatron_training_env rocm/megatron-lm:v25.5_py312
docker run -it \
--device /dev/dri \
--device /dev/kfd \
--device /dev/infiniband \
--network host --ipc host \
--group-add video \
--cap-add SYS_PTRACE \
--security-opt seccomp=unconfined \
--privileged \
-v $HOME:$HOME \
-v $HOME/.ssh:/root/.ssh \
--shm-size 128G \
--name megatron_training_env \
{{ docker.pull_tag }}
{% endfor %}
{% elif dockers|length == 1 %}
{% set docker = dockers[0] %}
.. code-block:: shell
.. tab-item:: Ubuntu 22.04 + Python 3.10
:sync: py310
docker run -it \
--device /dev/dri \
--device /dev/kfd \
--device /dev/infiniband \
--network host --ipc host \
--group-add video \
--cap-add SYS_PTRACE \
--security-opt seccomp=unconfined \
--privileged \
-v $HOME:$HOME \
-v $HOME/.ssh:/root/.ssh \
--shm-size 128G \
--name megatron_training_env \
{{ docker.pull_tag }}
.. code-block:: shell
docker run -it --device /dev/dri --device /dev/kfd --device /dev/infiniband --network host --ipc host --group-add video --cap-add SYS_PTRACE --security-opt seccomp=unconfined --privileged -v $HOME:$HOME -v $HOME/.ssh:/root/.ssh --shm-size 128G --name megatron_training_env rocm/megatron-lm:v25.5_py310
{% endif %}
3. Use these commands if you exit the ``megatron_training_env`` container and need to return to it.
@@ -348,6 +376,22 @@ If the tokenizer is not found, it'll be downloaded if publicly available.
TOKENIZER_MODEL=tokenizer/tokenizer.model
.. container:: model-doc pyt_megatron_lm_train_qwen2.5-7b
The training script uses the ``HuggingFaceTokenizer``. Set ``TOKENIZER_MODEL`` to the appropriate Hugging Face model path.
.. code-block:: shell
TOKENIZER_MODEL="Qwen/Qwen2.5-7B"
.. container:: model-doc pyt_megatron_lm_train_qwen2.5-72b
The training script uses the ``HuggingFaceTokenizer``. Set ``TOKENIZER_MODEL`` to the appropriate Hugging Face model path.
.. code-block:: shell
TOKENIZER_MODEL="Qwen/Qwen2.5-72B"
Dataset options
---------------
@@ -373,7 +417,7 @@ You can use either mock data or real data for training.
Download the dataset
^^^^^^^^^^^^^^^^^^^^
.. container:: model-doc pyt_megatron_lm_train_llama-3.3-70b pyt_megatron_lm_train_llama-3.1-8b pyt_megatron_lm_train_llama-3.1-70b pyt_megatron_lm_train_llama-2-7b pyt_megatron_lm_train_llama-2-70b
.. container:: model-doc pyt_megatron_lm_train_llama-3.3-70b pyt_megatron_lm_train_llama-3.1-8b pyt_megatron_lm_train_llama-3.1-70b pyt_megatron_lm_train_llama-2-7b pyt_megatron_lm_train_llama-2-70b pyt_megatron_lm_train_llama-3.1-70b-proxy
For Llama models, use the `prepare_dataset.sh
<https://github.com/ROCm/Megatron-LM/tree/rocm_dev/examples/llama>`_ script
@@ -412,8 +456,8 @@ Download the dataset
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/SlimPajama.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/alpaca_zh-train.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/alpaca_zh-valid.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/mmap_deepseekv2_datasets_text_document.bin
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/mmap_deepseekv2_datasets_text_document.idx
cd ..
bash tools/run_make_pretraining_dataset_megatron.sh deepseek-datasets/SlimPajama.json DeepSeekV3Tokenizer text deepseek-datasets deepseek-ai/DeepSeek-V3
To train on this data, update the ``DATA_DIR`` variable to point to the location of your dataset.
@@ -437,8 +481,8 @@ Download the dataset
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/SlimPajama.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/alpaca_zh-train.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/alpaca_zh-valid.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/mmap_deepseekv2_datasets_text_document.bin
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/mmap_deepseekv2_datasets_text_document.idx
cd ..
bash tools/run_make_pretraining_dataset_megatron.sh deepseek-datasets/SlimPajama.json DeepSeekV3Tokenizer text deepseek-datasets deepseek-ai/DeepSeek-V3
To train on this data, update the ``DATA_DIR`` variable to point to the location of your dataset.
@@ -448,8 +492,6 @@ Download the dataset
DATA_DIR="<path-to>/deepseek-datasets" # Change to where your dataset is stored
Ensure that the files are accessible inside the Docker container.
.. container:: model-doc pyt_megatron_lm_train_mixtral-8x7b pyt_megatron_lm_train_mixtral-8x22b-proxy
If you don't already have the dataset, download the Mixtral dataset using the following
@@ -472,6 +514,27 @@ Download the dataset
Ensure that the files are accessible inside the Docker container.
.. container:: model-doc pyt_megatron_lm_train_qwen2.5-7b pyt_megatron_lm_train_qwen2.5-72b
If you don't already have the dataset, download the Mixtral dataset using the following
commands:
.. code-block:: shell
mkdir -p temp/qwen-datasets
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/wudao_qwenbpe_text_document.bin
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-datasets/wudao_qwenbpe_text_document.idx
To train on this data, update the ``DATA_DIR`` variable to point to the location of your dataset.
.. code-block:: bash
MOCK_DATA=0 # Train on real data
DATA_DIR="<path-to>/qwen-datasets" # Change to where your dataset is stored
Ensure that the files are accessible inside the Docker container.
Multi-node configuration
------------------------
@@ -512,27 +575,17 @@ also be passed as command line arguments. Refer to the following example configu
# Specify which RDMA interfaces to use for communication
export NCCL_IB_HCA=rdma0,rdma1,rdma2,rdma3,rdma4,rdma5,rdma6,rdma7
Getting started
===============
The prebuilt Megatron-LM with ROCm training environment allows users to quickly validate
system performance, conduct training benchmarks, and achieve superior
performance for models like Llama, DeepSeek, and Mixtral. This container should not be
expected to provide generalized performance across all training workloads. You
can expect the container to perform in the model configurations described in
the following section, but other configurations are not validated by AMD.
.. _amd-megatron-lm-run-training:
Run training
------------
============
Use the following example commands to set up the environment, configure
:ref:`key options <amd-megatron-lm-benchmark-test-vars>`, and run training on
MI300X series accelerators with the AMD Megatron-LM environment.
Single node training
^^^^^^^^^^^^^^^^^^^^
--------------------
.. container:: model-doc pyt_megatron_lm_train_llama-3.3-70b
@@ -541,7 +594,20 @@ Single node training
.. code-block:: shell
TEE_OUTPUT=1 RECOMPUTE=1 SEQ_LENGTH=8192 MBS=2 BS=16 TE_FP8=0 TP=1 PP=1 FSDP=1 MODEL_SIZE=70 TOTAL_ITERS=50 bash examples/llama/train_llama3.sh
TOKENIZER_MODEL=meta-llama/Llama-3.3-70B-Instruct \
CKPT_FORMAT=torch_dist \
TEE_OUTPUT=1 \
RECOMPUTE=1 \
SEQ_LENGTH=8192 \
MBS=2 \
BS=16 \
TE_FP8=0 \
TP=1 \
PP=1 \
FSDP=1 \
MODEL_SIZE=70 \
TOTAL_ITERS=50 \
bash examples/llama/train_llama3.sh
.. note::
@@ -550,8 +616,6 @@ Single node training
parallelism, MCore's distributed optimizer, gradient accumulation fusion,
or FP16.
Currently, FSDP is only compatible with BF16 precision.
.. container:: model-doc pyt_megatron_lm_train_llama-3.1-8b
To run training on a single node for Llama 3.1 8B FP8, navigate to the Megatron-LM folder and use the
@@ -559,13 +623,29 @@ Single node training
.. code-block:: shell
TEE_OUTPUT=1 MBS=2 BS=128 TP=1 TE_FP8=1 SEQ_LENGTH=8192 MODEL_SIZE=8 TOTAL_ITERS=50 bash examples/llama/train_llama3.sh
TEE_OUTPUT=1 \
MBS=2 \
BS=128 \
TP=1 \
TE_FP8=1 \
SEQ_LENGTH=8192 \
MODEL_SIZE=8 \
TOTAL_ITERS=50 \
bash examples/llama/train_llama3.sh
For Llama 3.1 8B BF16, use the following command:
.. code-block:: shell
TEE_OUTPUT=1 MBS=2 BS=128 TP=1 TE_FP8=0 SEQ_LENGTH=8192 MODEL_SIZE=8 TOTAL_ITERS=50 bash examples/llama/train_llama3.sh
TEE_OUTPUT=1 \
MBS=2 \
BS=128 \
TP=1 \
TE_FP8=0 \
SEQ_LENGTH=8192 \
MODEL_SIZE=8 \
TOTAL_ITERS=50 \
bash examples/llama/train_llama3.sh
.. container:: model-doc pyt_megatron_lm_train_llama-3.1-70b
@@ -574,7 +654,18 @@ Single node training
.. code-block:: shell
TEE_OUTPUT=1 MBS=3 BS=24 TP=1 TE_FP8=0 FSDP=1 RECOMPUTE=1 SEQ_LENGTH=8192 MODEL_SIZE=70 TOTAL_ITERS=50 bash examples/llama/train_llama3.sh
CKPT_FORMAT=torch_dist \
TEE_OUTPUT=1 \
MBS=3 \
BS=24 \
TP=1 \
TE_FP8=0 \
FSDP=1 \
RECOMPUTE=1 \
SEQ_LENGTH=8192 \
MODEL_SIZE=70 \
TOTAL_ITERS=50 \
bash examples/llama/train_llama3.sh
.. note::
@@ -583,7 +674,36 @@ Single node training
parallelism, MCore's distributed optimizer, gradient accumulation fusion,
or FP16.
Currently, FSDP is only compatible with BF16 precision.
.. container:: model-doc pyt_megatron_lm_train_llama-3.1-70b-proxy
To run the training on a single node for Llama 3.1 70B with proxy, use the following command.
.. code-block:: shell
CKPT_FORMAT=torch_dist \
TEE_OUTPUT=1 \
RECOMPUTE=1 \
MBS=3 \
BS=24 \
TP=1 \
TE_FP8=1 \
SEQ_LENGTH=8192 \
MODEL_SIZE=70 \
FSDP=1 \
TOTAL_ITERS=10 \
NUM_LAYERS=40 \
bash examples/llama/train_llama3.sh
.. note::
Use two or more nodes to run the *full* Llama 70B model with FP8 precision.
.. note::
It is suggested to use ``TP=1`` when FSDP is enabled for higher
throughput. FSDP-v2 is not supported with pipeline parallelism, expert
parallelism, MCore's distributed optimizer, gradient accumulation fusion,
or FP16.
.. container:: model-doc pyt_megatron_lm_train_llama-2-7b
@@ -592,13 +712,29 @@ Single node training
.. code-block:: shell
TEE_OUTPUT=1 MBS=4 BS=256 TP=1 TE_FP8=1 SEQ_LENGTH=4096 MODEL_SIZE=7 TOTAL_ITERS=50 bash examples/llama/train_llama2.sh
TEE_OUTPUT=1 \
MBS=4 \
BS=256 \
TP=1 \
TE_FP8=1 \
SEQ_LENGTH=4096 \
MODEL_SIZE=7 \
TOTAL_ITERS=50 \
bash examples/llama/train_llama2.sh
For Llama 2 7B BF16, use the following command:
.. code-block:: shell
TEE_OUTPUT=1 MBS=4 BS=256 TP=1 TE_FP8=0 SEQ_LENGTH=4096 MODEL_SIZE=7 TOTAL_ITERS=50 bash examples/llama/train_llama2.sh
TEE_OUTPUT=1 \
MBS=4 \
BS=256 \
TP=1 \
TE_FP8=0 \
SEQ_LENGTH=4096 \
MODEL_SIZE=7 \
TOTAL_ITERS=50 \
bash examples/llama/train_llama2.sh
.. container:: model-doc pyt_megatron_lm_train_llama-2-70b
@@ -607,7 +743,18 @@ Single node training
.. code-block:: shell
TEE_OUTPUT=1 MBS=7 BS=56 TP=1 TE_FP8=0 FSDP=1 RECOMPUTE=1 SEQ_LENGTH=4096 MODEL_SIZE=70 TOTAL_ITERS=50 bash examples/llama/train_llama2.sh
CKPT_FORMAT=torch_dist \
TEE_OUTPUT=1 \
MBS=7 \
BS=56 \
TP=1 \
TE_FP8=0 \
FSDP=1 \
RECOMPUTE=1 \
SEQ_LENGTH=4096 \
MODEL_SIZE=70 \
TOTAL_ITERS=50 \
bash examples/llama/train_llama2.sh
.. note::
@@ -616,8 +763,6 @@ Single node training
parallelism, MCore's distributed optimizer, gradient accumulation fusion,
or FP16.
Currently, FSDP is only compatible with BF16 precision.
.. container:: model-doc pyt_megatron_lm_train_deepseek-v3-proxy
To run training on a single node for DeepSeek-V3 (MoE with expert parallel) with 3-layer proxy,
@@ -625,7 +770,8 @@ Single node training
.. code-block:: shell
FORCE_BANLANCE=true \
export NVTE_FUSED_ATTN_CK=0
FORCE_BALANCE=true \
RUN_ENV=cluster \
MODEL_SIZE=671B \
TRAIN_ITERS=50 \
@@ -647,7 +793,15 @@ Single node training
.. code-block:: shell
GEMM_TUNING=1 PR=bf16 MBS=4 AC=none SEQ_LEN=4096 PAD_LEN=4096 TRAIN_ITERS=50 bash examples/deepseek_v2/train_deepseekv2.sh
export NVTE_FUSED_ATTN_CK=0
GEMM_TUNING=1 \
PR=bf16 \
MBS=4 \
AC=none \
SEQ_LEN=4096 \
PAD_LEN=4096 \
TRAIN_ITERS=50 \
bash examples/deepseek_v2/train_deepseekv2.sh
.. container:: model-doc pyt_megatron_lm_train_mixtral-8x7b
@@ -656,7 +810,24 @@ Single node training
.. code-block:: shell
RECOMPUTE_NUM_LAYERS=0 TEE_OUTPUT=1 MBS=1 GBS=16 TP_SIZE=1 PP_SIZE=1 AC=none PR=bf16 EP_SIZE=8 ETP_SIZE=1 SEQLEN=4096 FORCE_BALANCE=true MOCK_DATA=1 RUN_ENV=cluster MODEL_SIZE=8x7B TRAIN_ITERS=50 bash examples/mixtral/train_mixtral_moe.sh
TOKENIZER_MODEL=<path/to/tokenizer/model>
RECOMPUTE_NUM_LAYERS=0 \
TEE_OUTPUT=1 \
MBS=1 \
GBS=16 \
TP_SIZE=1 \
PP_SIZE=1 \
AC=none \
PR=bf16 \
EP_SIZE=8 \
ETP_SIZE=1 \
SEQLEN=4096 \
FORCE_BALANCE=true \
MOCK_DATA=1 \
RUN_ENV=cluster \
MODEL_SIZE=8x7B \
TRAIN_ITERS=50 \
bash examples/mixtral/train_mixtral_moe.sh
.. container:: model-doc pyt_megatron_lm_train_mixtral-8x22b-proxy
@@ -665,10 +836,85 @@ Single node training
.. code-block:: shell
RECOMPUTE_NUM_LAYERS=4 TEE_OUTPUT=1 MBS=1 GBS=16 TP_SIZE=1 PP_SIZE=1 AC=full NUM_LAYERS=4 PR=bf16 EP_SIZE=8 ETP_SIZE=1 SEQLEN=8192 FORCE_BALANCE=true MOCK_DATA=1 RUN_ENV=cluster MODEL_SIZE=8x22B TRAIN_ITERS=50 bash examples/mixtral/train_mixtral_moe.sh
TOKENIZER_MODEL=<path/to/tokenizer/model>
RECOMPUTE_NUM_LAYERS=4 \
TEE_OUTPUT=1 \
MBS=1 \
GBS=16 \
TP_SIZE=1 \
PP_SIZE=1 \
AC=full \
NUM_LAYERS=4 \
PR=bf16 \
EP_SIZE=8 \
ETP_SIZE=1 \
SEQLEN=8192 \
FORCE_BALANCE=true \
MOCK_DATA=1 \
RUN_ENV=cluster \
MODEL_SIZE=8x22B \
TRAIN_ITERS=50 \
bash examples/mixtral/train_mixtral_moe.sh
Multi-node training
^^^^^^^^^^^^^^^^^^^
.. container:: model-doc pyt_megatron_lm_train_qwen2.5-7b
To run training on a single node for Qwen 2.5 7B BF16, use the following
command.
.. code-block:: shell
bash examples/qwen/train_qwen2.sh TP=1 \
CP=1 \
PP=1 \
MBS=10 \
BS=640 \
TE_FP8=0 \
MODEL_SIZE=7 \
SEQ_LENGTH=2048 \
TOTAL_ITERS=50 \
MOCK_DATA=1 \
TOKENIZER_MODEL=Qwen/Qwen2.5-7B
For FP8, use the following command.
.. code-block:: shell
bash examples/qwen/train_qwen2.sh \
TP=1 \
CP=1 \
PP=1 \
MBS=10 \
BS=640 \
TE_FP8=1 \
MODEL_SIZE=7 \
SEQ_LENGTH=2048 \
TOTAL_ITERS=50 \
MOCK_DATA=1 \
TOKENIZER_MODEL=Qwen/Qwen2.5-7B
.. container:: model-doc pyt_megatron_lm_train_qwen2.5-72b
To run the training on a single node for Qwen 2.5 72B BF16, use the following command.
.. code-block:: shell
bash examples/qwen/train_qwen2.sh \
FSDP=1 \
CP=1 \
PP=1 \
MBS=3 \
BS=24 \
TE_FP8=0 \
MODEL_SIZE=72 \
SEQ_LENGTH=2048 \
TOTAL_ITERS=50 \
MOCK_DATA=1 \
TOKENIZER_MODEL=Qwen/Qwen2.5-72B \
RECOMPUTE_ACTIVATIONS=full \
CKPT_FORMAT=torch_dist
Multi-node training examples
----------------------------
To run training on multiple nodes, launch the Docker container on each node.
For example, for Llama 3 using a two node setup (``NODE0`` as the master node),
@@ -678,13 +924,33 @@ use these commands.
.. code-block:: shell
TEE_OUTPUT=1 MBS=2 BS=256 TP=1 TE_FP8=1 SEQ_LENGTH=8192 MODEL_SIZE=8 MASTER_ADDR=IP_NODE0 NNODES=2 NODE_RANK=0 bash examples/llama/train_llama3.sh
TEE_OUTPUT=1 \
MBS=2 \
BS=256 \
TP=1 \
TE_FP8=1 \
SEQ_LENGTH=8192 \
MODEL_SIZE=8 \
MASTER_ADDR=IP_NODE0 \
NNODES=2 \
NODE_RANK=0 \
bash examples/llama/train_llama3.sh
* On the worker node ``NODE1``:
.. code-block:: shell
TEE_OUTPUT=1 MBS=2 BS=256 TP=1 TE_FP8=1 SEQ_LENGTH=8192 MODEL_SIZE=8 MASTER_ADDR=IP_NODE0 NNODES=2 NODE_RANK=1 bash examples/llama/train_llama3.sh
TEE_OUTPUT=1 \
MBS=2 \
BS=256 \
TP=1 \
TE_FP8=1 \
SEQ_LENGTH=8192 \
MODEL_SIZE=8 \
MASTER_ADDR=IP_NODE0 \
NNODES=2 \
NODE_RANK=1 \
bash examples/llama/train_llama3.sh
Or, for DeepSeek-V3, an example script ``train_deepseek_v3_slurm.sh`` is
provided in

View File

@@ -73,7 +73,11 @@ document are not validated.
.. code-block:: shell
python3 tools/run_models.py --tags pyt_mpt30b_training --keep-model-dir --live-output --clean-docker-cache
madengine run \
--tags pyt_mpt30b_training \
--keep-model-dir \
--live-output \
--clean-docker-cache
.. tip::
@@ -90,7 +94,7 @@ document are not validated.
For improved performance (training throughput), consider enabling TunableOp.
By default, ``pyt_mpt30b_training`` runs with TunableOp disabled. To enable it,
run ``tools/run_models.py`` with the ``--tunableop on`` argument or edit the
run ``madengine run`` with the ``--tunableop on`` argument or edit the
``models.json`` configuration before running training.
Although this might increase the initial training time, it can result in a performance gain.
@@ -172,4 +176,13 @@ Key performance metrics include:
Overall training loss. A decreasing trend indicates the model is learning effectively.
Further reading
===============
- To learn more about MAD and the ``madengine`` CLI, see the `MAD usage guide <https://github.com/ROCm/MAD?tab=readme-ov-file#usage-guide>`__.
- To learn more about system settings and management practices to configure your system for
AMD Instinct MI300X series accelerators, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_.
- For a list of other ready-made Docker images for AI with ROCm, see
`AMD Infinity Hub <https://www.amd.com/en/developer/resources/infinity-hub.html#f-amd_hub_category=AI%20%26%20ML%20Models>`_.

View File

@@ -12,23 +12,23 @@ previous releases of the ``ROCm/jax-training`` Docker image on `Docker Hub <http
.. list-table::
:header-rows: 1
:stub-columns: 1
* - Image version
- ROCm version
- JAX version
- Components
- Resources
* - 25.5
- 6.3.4
- 0.4.35
* - 25.5 (latest)
-
* ROCm 6.3.4
* JAX 0.4.35
-
* :doc:`Documentation <../jax-maxtext>`
* `Docker Hub <https://hub.docker.com/layers/rocm/jax-training/maxtext-v25.5/images/sha256-4e0516358a227cae8f552fb866ec07e2edcf244756f02e7b40212abfbab5217b>`_
* `Docker Hub <https://hub.docker.com/layers/rocm/jax-training/maxtext-v25.5/images/sha256-4e0516358a227cae8f552fb866ec07e2edcf244756f02e7b40212abfbab5217b>`__
* - 25.4
- 6.3.0
- 0.4.31
-
* ROCm 6.3.0
* JAX 0.4.31
-
* :doc:`Documentation <jax-maxtext-v25.4>`
* `Docker Hub <https://hub.docker.com/layers/rocm/jax-training/maxtext-v25.4/images/sha256-fb3eb71cd74298a7b3044b7130cf84113f14d518ff05a2cd625c11ea5f6a7b01>`_
* `Docker Hub <https://hub.docker.com/layers/rocm/jax-training/maxtext-v25.4/images/sha256-fb3eb71cd74298a7b3044b7130cf84113f14d518ff05a2cd625c11ea5f6a7b01>`__

View File

@@ -96,14 +96,14 @@ This Docker image is optimized for specific model configurations outlined
as follows. Performance can vary for other training workloads, as AMD
doesnt validate configurations and run conditions outside those described.
.. _amd-maxtext-multi-node-setup:
.. _amd-maxtext-multi-node-setup-v254:
Multi-node setup
----------------
For multi-node environments, ensure you have all the necessary packages for
your network device, such as, RDMA. If you're not using a multi-node setup
with RDMA, skip ahead to :ref:`amd-maxtext-download-docker`.
with RDMA, skip ahead to :ref:`amd-maxtext-download-docker-v254`.
1. Install the following packages to build and install the RDMA driver.
@@ -168,7 +168,7 @@ with RDMA, skip ahead to :ref:`amd-maxtext-download-docker`.
e. RDMA interface
Ensure the :ref:`required packages <amd-maxtext-multi-node-setup>` are installed on all nodes.
Ensure the :ref:`required packages <amd-maxtext-multi-node-setup-v254>` are installed on all nodes.
Then, set the RDMA interfaces to use for communication.
.. code-block:: bash
@@ -178,7 +178,7 @@ with RDMA, skip ahead to :ref:`amd-maxtext-download-docker`.
# If using Mellanox NIC
export NCCL_IB_HCA=mlx5_0,mlx5_1,mlx5_2,mlx5_3,mlx5_4,mlx5_5,mlx5_8,mlx5_9
.. _amd-maxtext-download-docker:
.. _amd-maxtext-download-docker-v254:
Download the Docker image
-------------------------
@@ -195,7 +195,7 @@ Download the Docker image
docker run -it --device /dev/dri --device /dev/kfd --network host --ipc host --group-add video --cap-add SYS_PTRACE --security-opt seccomp=unconfined --privileged -v $HOME/.ssh:/root/.ssh --shm-size 128G --name maxtext_training rocm/jax-training:maxtext-v25.4
.. _amd-maxtext-get-started:
.. _amd-maxtext-get-started-v254:
Getting started
===============

View File

@@ -7,41 +7,53 @@ Megatron-LM training performance testing version history
This table lists previous versions of the ROCm Megatron-LM training Docker image for
inference performance testing. For detailed information about available models
for benchmarking, see the version-specific documentation. You can find tagged
previous releases of the ``ROCm/megatron-lm`` Docker image on `Docker Hub <https://hub.docker.com/r/rocm/megatron-lm/tags>`_.
previous releases of the ``ROCm/megatron-lm`` Docker image on `Docker Hub <https://hub.docker.com/r/rocm/megatron-lm/tags>`__.
.. list-table::
:header-rows: 1
:stub-columns: 1
* - Image version
- ROCm version
- PyTorch version
- Components
- Resources
* - v25.5
- 6.3.4
- 2.8.0a0+gite2f9759
* - v25.6 (latest)
-
* ROCm 6.4.1
* PyTorch 2.8.0a0+git7d205b2
-
* :doc:`Documentation <../megatron-lm>`
* `Docker Hub <https://hub.docker.com/layers/rocm/megatron-lm/v25.5_py312/images/sha256-4506f18ba188d24189c6b1f95130b425f52c528a543bb3f420351824edceadc2>`_
* `Docker Hub (py312) <https://hub.docker.com/layers/rocm/megatron-lm/v25.6_py312/images/sha256-482ff906532285bceabdf2bda629bd32cb6174d2d07f4243a736378001b28df0>`__
* `Docker Hub (py310) <https://hub.docker.com/layers/rocm/megatron-lm/v25.6_py310/images/sha256-9627bd9378684fe26cb1a10c7dd817868f553b33402e49b058355b0f095568d6>`__
* - v25.5
-
* ROCm 6.3.4
* PyTorch 2.8.0a0+gite2f9759
-
* :doc:`Documentation <megatron-lm-v25.5>`
* `Docker Hub (py312) <https://hub.docker.com/layers/rocm/megatron-lm/v25.5_py312/images/sha256-4506f18ba188d24189c6b1f95130b425f52c528a543bb3f420351824edceadc2>`__
* `Docker Hub (py310) <https://hub.docker.com/layers/rocm/megatron-lm/v25.5_py310/images/sha256-743fbf1ceff7a44c4452f938d783a7abf143737d1c15b2b95f6f8a62e0fd048b>`__
* - v25.4
- 6.3.0
- 2.7.0a0+git637433
-
* ROCm 6.3.0
* PyTorch 2.7.0a0+git637433
-
* :doc:`Documentation <megatron-lm-v25.4>`
* `Docker Hub <https://hub.docker.com/layers/rocm/megatron-lm/v25.4/images/sha256-941aa5387918ea91c376c13083aa1e6c9cab40bb1875abbbb73bbb65d8736b3f>`_
* `Docker Hub <https://hub.docker.com/layers/rocm/megatron-lm/v25.4/images/sha256-941aa5387918ea91c376c13083aa1e6c9cab40bb1875abbbb73bbb65d8736b3f>`__
* - v25.3
- 6.3.0
- 2.7.0a0+git637433
-
* ROCm 6.3.0
* PyTorch 2.7.0a0+git637433
-
* :doc:`Documentation <megatron-lm-v25.3>`
* `Docker Hub <https://hub.docker.com/layers/rocm/megatron-lm/v25.3/images/sha256-1e6ed9bdc3f4ca397300d5a9907e084ab5e8ad1519815ee1f868faf2af1e04e2>`_
* `Docker Hub <https://hub.docker.com/layers/rocm/megatron-lm/v25.3/images/sha256-1e6ed9bdc3f4ca397300d5a9907e084ab5e8ad1519815ee1f868faf2af1e04e2>`__
* - v24.12-dev
- 6.1.0
- 2.4.0
-
* ROCm 6.1.0
* PyTorch 2.4.0
-
* :doc:`Documentation <megatron-lm-v24.12-dev>`
* `Docker Hub <https://hub.docker.com/layers/rocm/megatron-lm/24.12-dev/images/sha256-5818c50334ce3d69deeeb8f589d83ec29003817da34158ebc9e2d112b929bf2e>`_
* `Docker Hub <https://hub.docker.com/layers/rocm/megatron-lm/24.12-dev/images/sha256-5818c50334ce3d69deeeb8f589d83ec29003817da34158ebc9e2d112b929bf2e>`__

View File

@@ -203,10 +203,10 @@ Use the following script to run the RCCL test for four MI300X GPU nodes. Modify
-x NCCL_DEBUG=version \
$HOME/rccl-tests/build/all_reduce_perf -b 8 -e 8g -f 2 -g 1
.. image:: ../../data/how-to/rocm-for-ai/rccl-tests-4-mi300x-gpu-nodes.png
.. image:: /data/how-to/rocm-for-ai/rccl-tests-4-mi300x-gpu-nodes.png
:width: 800
.. _mi300x-amd-megatron-lm-training:
.. _mi300x-amd-megatron-lm-training-v2412:
Start training on MI300X accelerators
=====================================
@@ -218,7 +218,7 @@ Use the following instructions to set up the environment, configure the script t
reproduce the benchmark results on the MI300X accelerators with the AMD Megatron-LM Docker
image.
.. _amd-megatron-lm-requirements:
.. _amd-megatron-lm-requirements-v2412:
Download the Docker image and required packages
-----------------------------------------------
@@ -275,7 +275,7 @@ In this case, the automatically generated output files are named ``my-gpt2_text_
.. image:: /data/how-to/rocm-for-ai/prep-training-datasets-my-gpt2-text-document.png
:width: 800
.. _amd-megatron-lm-environment-setup:
.. _amd-megatron-lm-environment-setup-v2412:
Environment setup
-----------------
@@ -375,19 +375,19 @@ Run benchmark tests
NODE_RANK="${NODE_RANK:-0}"
* Use this command to run a performance benchmark test of any of the Llama 2 models that this Docker image supports (see :ref:`variables <amd-megatron-lm-benchmark-test-vars>`).
* Use this command to run a performance benchmark test of any of the Llama 2 models that this Docker image supports (see :ref:`variables <amd-megatron-lm-benchmark-test-vars-v2412>`).
.. code-block:: shell
{variables} bash examples/llama/train_llama2.sh
* Use this command to run a performance benchmark test of any of the Llama 3 and Llama 3.1 models that this Docker image supports (see :ref:`variables <amd-megatron-lm-benchmark-test-vars>`).
* Use this command to run a performance benchmark test of any of the Llama 3 and Llama 3.1 models that this Docker image supports (see :ref:`variables <amd-megatron-lm-benchmark-test-vars-v2412>`).
.. code-block:: shell
{variables} bash examples/llama/train_llama3.sh
.. _amd-megatron-lm-benchmark-test-vars:
.. _amd-megatron-lm-benchmark-test-vars-v2412:
The benchmark tests support the same set of variables:
@@ -466,7 +466,7 @@ Benchmarking examples
TEE_OUTPUT=1 MBS=5 BS=120 TP=8 TE_FP8=0 NO_TORCH_COMPILE=1
SEQ_LENGTH=4096 bash examples/llama/train_llama2.sh
You can find the training logs at the location defined in ``$TRAIN_LOG`` in the :ref:`configuration script <amd-megatron-lm-environment-setup>`.
You can find the training logs at the location defined in ``$TRAIN_LOG`` in the :ref:`configuration script <amd-megatron-lm-environment-setup-v2412>`.
See the sample output:
@@ -495,7 +495,7 @@ Benchmarking examples
TEE_OUTPUT=1 MBS=4 BS=64 TP=8 TE_FP8=0 NO_TORCH_COMPILE=1
SEQ_LENGTH=4096 bash examples/llama/train_llama2.sh
You can find the training logs at the location defined in ``$TRAIN_LOG`` in the :ref:`configuration script <amd-megatron-lm-environment-setup>`.
You can find the training logs at the location defined in ``$TRAIN_LOG`` in the :ref:`configuration script <amd-megatron-lm-environment-setup-v2412>`.
Sample output for 2-node training:

View File

@@ -114,7 +114,7 @@ the output is ``1``, run the following command to disable NUMA auto-balancing.
See :ref:`System validation and optimization <rocm-for-ai-system-optimization>`
for more information.
.. _mi300x-amd-megatron-lm-training:
.. _mi300x-amd-megatron-lm-training-v253:
Environment setup
=================
@@ -126,7 +126,7 @@ Use the following instructions to set up the environment, configure the script t
reproduce the benchmark results on the MI300X accelerators with the AMD Megatron-LM Docker
image.
.. _amd-megatron-lm-requirements:
.. _amd-megatron-lm-requirements-v253:
Download the Docker image
-------------------------
@@ -152,7 +152,7 @@ Download the Docker image
The Docker container includes a pre-installed, verified version of Megatron-LM from the `release branch <https://github.com/ROCm/Megatron-LM/tree/megatron_release_v25.3>`_.
.. _amd-megatron-lm-environment-setup:
.. _amd-megatron-lm-environment-setup-v253:
Configuration scripts
---------------------
@@ -396,7 +396,7 @@ accelerators with the AMD Megatron-LM Docker image.
Key options
-----------
.. _amd-megatron-lm-benchmark-test-vars:
.. _amd-megatron-lm-benchmark-test-vars-v253:
The benchmark tests support the following sets of variables:
@@ -486,7 +486,7 @@ Benchmarking examples
TEE_OUTPUT=1 MBS=5 BS=120 TP=8 TE_FP8=0 NO_TORCH_COMPILE=1
SEQ_LENGTH=4096 bash examples/llama/train_llama2.sh
You can find the training logs at the location defined in ``$TRAIN_LOG`` in the :ref:`configuration script <amd-megatron-lm-environment-setup>`.
You can find the training logs at the location defined in ``$TRAIN_LOG`` in the :ref:`configuration script <amd-megatron-lm-environment-setup-v253>`.
See the sample output:
@@ -515,7 +515,7 @@ Benchmarking examples
TEE_OUTPUT=1 MBS=4 BS=64 TP=8 TE_FP8=0 NO_TORCH_COMPILE=1
SEQ_LENGTH=4096 bash examples/llama/train_llama2.sh
You can find the training logs at the location defined in ``$TRAIN_LOG`` in the :ref:`configuration script <amd-megatron-lm-environment-setup>`.
You can find the training logs at the location defined in ``$TRAIN_LOG`` in the :ref:`configuration script <amd-megatron-lm-environment-setup-v253>`.
Sample output for 2-node training:

View File

@@ -90,21 +90,21 @@ The following models are pre-optimized for performance on AMD Instinct MI300X se
Some models, such as Llama, require an external license agreement through
a third party (for example, Meta).
.. _amd-megatron-lm-performance-measurements:
.. _amd-megatron-lm-performance-measurements-v254:
Performance measurements
========================
To evaluate performance, the
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html#tabs-a8deaeb413-item-21cea50186-tab>`_
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html#tabs-a8deaeb413-item-21cea50186-tab>`__
page provides reference throughput and latency measurements for training
popular AI models.
.. important::
The performance data presented in
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
only reflects the :doc:`latest version of this training benchmarking environment <../megatron-lm>`_.
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`__
only reflects the :doc:`latest version of this training benchmarking environment <../megatron-lm>`.
The listed measurements should not be interpreted as the peak performance achievable by AMD Instinct MI325X and MI300X accelerators or ROCm software.
System validation
@@ -115,7 +115,7 @@ auto-balancing, skip this step. Otherwise, complete the :ref:`system validation
and optimization steps <train-a-model-system-validation>` to set up your system
before starting training.
.. _mi300x-amd-megatron-lm-training:
.. _mi300x-amd-megatron-lm-training-v254:
Environment setup
=================
@@ -127,7 +127,7 @@ Use the following instructions to set up the environment, configure the script t
reproduce the benchmark results on MI300X series accelerators with the AMD Megatron-LM Docker
image.
.. _amd-megatron-lm-requirements:
.. _amd-megatron-lm-requirements-v254:
Download the Docker image
-------------------------
@@ -154,7 +154,7 @@ Download the Docker image
The Docker container includes a pre-installed, verified version of the ROCm Megatron-LM development branch `<https://github.com/ROCm/Megatron-LM/tree/rocm_dev>`__
(commit `fd6f01 <https://github.com/ROCm/Megatron-LM/tree/fd6f0d11d7f9480ace32f22eb7e4dab5314fa350>`_).
.. _amd-megatron-lm-environment-setup:
.. _amd-megatron-lm-environment-setup-v254:
Configuration scripts
---------------------
@@ -468,7 +468,7 @@ accelerators with the AMD Megatron-LM Docker image.
Key options
-----------
.. _amd-megatron-lm-benchmark-test-vars:
.. _amd-megatron-lm-benchmark-test-vars-v254:
The benchmark tests support the following sets of variables:
@@ -568,7 +568,7 @@ Benchmarking examples
TEE_OUTPUT=1 MBS=5 BS=120 TP=8 TE_FP8=0 NO_TORCH_COMPILE=1
SEQ_LENGTH=4096 bash examples/llama/train_llama2.sh
You can find the training logs at the location defined in ``$TRAIN_LOG`` in the :ref:`configuration script <amd-megatron-lm-environment-setup>`.
You can find the training logs at the location defined in ``$TRAIN_LOG`` in the :ref:`configuration script <amd-megatron-lm-environment-setup-v254>`.
See the sample output:
@@ -597,7 +597,7 @@ Benchmarking examples
TEE_OUTPUT=1 MBS=4 BS=64 TP=8 TE_FP8=0 NO_TORCH_COMPILE=1
SEQ_LENGTH=4096 bash examples/llama/train_llama2.sh
You can find the training logs at the location defined in ``$TRAIN_LOG`` in the :ref:`configuration script <amd-megatron-lm-environment-setup>`.
You can find the training logs at the location defined in ``$TRAIN_LOG`` in the :ref:`configuration script <amd-megatron-lm-environment-setup-v254>`.
Sample output for 2-node training:

View File

@@ -0,0 +1,775 @@
:orphan:
.. meta::
:description: How to train a model using Megatron-LM for ROCm.
:keywords: ROCm, AI, LLM, train, Megatron-LM, megatron, Llama, tutorial, docker, torch
******************************************
Training a model with Megatron-LM for ROCm
******************************************
.. caution::
This documentation does not reflect the latest version of ROCm Megatron-LM
training performance documentation. See :doc:`../megatron-lm` for the latest version.
The `Megatron-LM framework for ROCm <https://github.com/ROCm/Megatron-LM>`_ is
a specialized fork of the robust Megatron-LM, designed to enable efficient
training of large-scale language models on AMD GPUs. By leveraging AMD
Instinct™ MI300X series accelerators, Megatron-LM delivers enhanced
scalability, performance, and resource utilization for AI workloads. It is
purpose-built to support models like Llama, DeepSeek, and Mixtral,
enabling developers to train next-generation AI models more
efficiently.
AMD provides a ready-to-use Docker image for MI300X series accelerators containing
essential components, including PyTorch, ROCm libraries, and Megatron-LM
utilities. It contains the following software components to accelerate training
workloads:
+--------------------------+--------------------------------+
| Software component | Version |
+==========================+================================+
| ROCm | 6.3.4 |
+--------------------------+--------------------------------+
| PyTorch | 2.8.0a0+gite2f9759 |
+--------------------------+--------------------------------+
| Python | 3.12 or 3.10 |
+--------------------------+--------------------------------+
| Transformer Engine | 1.13.0+bb061ade |
+--------------------------+--------------------------------+
| Flash Attention | 3.0.0 |
+--------------------------+--------------------------------+
| hipBLASLt | 0.13.0-4f18bf6 |
+--------------------------+--------------------------------+
| Triton | 3.3.0 |
+--------------------------+--------------------------------+
| RCCL | 2.22.3 |
+--------------------------+--------------------------------+
Megatron-LM provides the following key features to train large language models efficiently:
- Transformer Engine (TE)
- APEX
- GEMM tuning
- Torch.compile
- 3D parallelism: TP + SP + CP
- Distributed optimizer
- Flash Attention (FA) 3
- Fused kernels
- Pre-training
.. _amd-megatron-lm-model-support-v255:
The following models are pre-optimized for performance on AMD Instinct MI300X series accelerators.
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/megatron-lm-v25.5-benchmark-models.yaml
Supported models
================
The following models are supported for training performance benchmarking with Megatron-LM and ROCm.
Some instructions, commands, and training recommendations in this documentation might
vary by model -- select one to get started.
{% set model_groups = data["megatron-lm_benchmark"].model_groups %}
.. raw:: html
<div id="vllm-benchmark-ud-params-picker" class="container-fluid">
<div class="row">
<div class="col-2 me-2 model-param-head">Model</div>
<div class="row col-10">
{% for model_group in model_groups %}
<div class="col-4 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
{% endfor %}
</div>
</div>
<div class="row mt-1">
<div class="col-2 me-2 model-param-head">Model variant</div>
<div class="row col-10">
{% for model_group in model_groups %}
{% set models = model_group.models %}
{% for model in models %}
{% if models|length % 3 == 0 %}
<div class="col-4 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
{% else %}
<div class="col-6 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
{% endif %}
{% endfor %}
{% endfor %}
</div>
</div>
</div>
.. note::
Some models, such as Llama, require an external license agreement through
a third party (for example, Meta).
.. _amd-megatron-lm-performance-measurements-v255:
Performance measurements
========================
To evaluate performance, the
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html#tabs-a8deaeb413-item-21cea50186-tab>`__
page provides reference throughput and latency measurements for training
popular AI models.
.. important::
The performance data presented in
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`__
only reflects the latest version of this training benchmarking environment.
The listed measurements should not be interpreted as the peak performance achievable by AMD Instinct MI325X and MI300X accelerators or ROCm software.
System validation
=================
Before running AI workloads, it's important to validate that your AMD hardware is configured
correctly and performing optimally.
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
before starting training.
To test for optimal performance, consult the recommended :ref:`System health benchmarks
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
system's configuration.
.. _mi300x-amd-megatron-lm-training-v255:
Environment setup
=================
Use the following instructions to set up the environment, configure the script to train models, and
reproduce the benchmark results on MI300X series accelerators with the AMD Megatron-LM Docker
image.
.. _amd-megatron-lm-requirements-v255:
Download the Docker image
-------------------------
1. Use the following command to pull the Docker image from Docker Hub.
.. tab-set::
.. tab-item:: Ubuntu 24.04 + Python 3.12
:sync: py312
.. code-block:: shell
docker pull rocm/megatron-lm:v25.5_py312
.. tab-item:: Ubuntu 22.04 + Python 3.10
:sync: py310
.. code-block:: shell
docker pull rocm/megatron-lm:v25.5_py310
2. Launch the Docker container.
.. tab-set::
.. tab-item:: Ubuntu 24.04 + Python 3.12
:sync: py312
.. code-block:: shell
docker run -it --device /dev/dri --device /dev/kfd --device /dev/infiniband --network host --ipc host --group-add video --cap-add SYS_PTRACE --security-opt seccomp=unconfined --privileged -v $HOME:$HOME -v $HOME/.ssh:/root/.ssh --shm-size 128G --name megatron_training_env rocm/megatron-lm:v25.5_py312
.. tab-item:: Ubuntu 22.04 + Python 3.10
:sync: py310
.. code-block:: shell
docker run -it --device /dev/dri --device /dev/kfd --device /dev/infiniband --network host --ipc host --group-add video --cap-add SYS_PTRACE --security-opt seccomp=unconfined --privileged -v $HOME:$HOME -v $HOME/.ssh:/root/.ssh --shm-size 128G --name megatron_training_env rocm/megatron-lm:v25.5_py310
3. Use these commands if you exit the ``megatron_training_env`` container and need to return to it.
.. code-block:: shell
docker start megatron_training_env
docker exec -it megatron_training_env bash
The Docker container includes a pre-installed, verified version of the ROCm
Megatron-LM development branch
`<https://github.com/ROCm/Megatron-LM/tree/rocm_dev>`__, including necessary
training scripts.
.. _amd-megatron-lm-environment-setup-v255:
Configuration
=============
.. container:: model-doc pyt_megatron_lm_train_llama-3.3-70b pyt_megatron_lm_train_llama-3.1-8b pyt_megatron_lm_train_llama-3.1-70b
Update the ``train_llama3.sh`` configuration script in the ``examples/llama``
directory of
`<https://github.com/ROCm/Megatron-LM/tree/rocm_dev/examples/llama>`__ to configure your training run.
Options can also be passed as command line arguments as described in :ref:`Run training <amd-megatron-lm-run-training-v255>`.
.. container:: model-doc pyt_megatron_lm_train_llama-2-7b pyt_megatron_lm_train_llama-2-70b
Update the ``train_llama2.sh`` configuration script in the ``examples/llama``
directory of
`<https://github.com/ROCm/Megatron-LM/tree/rocm_dev/examples/llama>`__ to configure your training run.
Options can also be passed as command line arguments as described in :ref:`Run training <amd-megatron-lm-run-training-v255>`.
.. container:: model-doc pyt_megatron_lm_train_deepseek-v3-proxy
Update the ``train_deepseekv3.sh`` configuration script in the ``examples/deepseek_v3``
directory of
`<https://github.com/ROCm/Megatron-LM/tree/rocm_dev/examples/deepseek_v3>`__ to configure your training run.
Options can also be passed as command line arguments as described in :ref:`Run training <amd-megatron-lm-run-training-v255>`.
.. container:: model-doc pyt_megatron_lm_train_deepseek-v2-lite-16b
Update the ``train_deepseekv2.sh`` configuration script in the ``examples/deepseek_v2``
directory of
`<https://github.com/ROCm/Megatron-LM/tree/rocm_dev/examples/deepseek_v2>`__ to configure your training run.
Options can also be passed as command line arguments as described in :ref:`Run training <amd-megatron-lm-run-training-v255>`.
.. container:: model-doc pyt_megatron_lm_train_mixtral-8x7b pyt_megatron_lm_train_mixtral-8x22b-proxy
Update the ``train_mixtral_moe.sh`` configuration script in the ``examples/mixtral``
directory of
`<https://github.com/ROCm/Megatron-LM/tree/rocm_dev/examples/mixtral>`__ to configure your training run.
Options can also be passed as command line arguments as described in :ref:`Run training <amd-megatron-lm-run-training-v255>`.
.. note::
See :ref:`Key options <amd-megatron-lm-benchmark-test-vars-v255>` for more information on configuration options.
Network interface
-----------------
Update the network interface in the script to match your system's network interface. To
find your network interface, run the following (outside of any Docker container):
.. code-block:: bash
ip a
Look for an active interface that has an IP address in the same subnet as
your other nodes. Then, update the following variables in the script, for
example:
.. code-block:: bash
export NCCL_SOCKET_IFNAME=ens50f0np0
export GLOO_SOCKET_IFNAME=ens50f0np0
.. _amd-megatron-lm-tokenizer-v255:
Tokenizer
---------
You can assign the path of an existing tokenizer to the ``TOKENIZER_MODEL`` as shown in the following examples.
If the tokenizer is not found, it'll be downloaded if publicly available.
.. container:: model-doc pyt_megatron_lm_train_llama-3.3-70b
If you do not have Llama 3.3 tokenizer locally, you need to use your
personal Hugging Face access token ``HF_TOKEN`` to download the tokenizer.
See `Llama-3.3-70B-Instruct
<https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct>`_. After you are
authorized, use your ``HF_TOKEN`` to download the tokenizer and set the
variable ``TOKENIZER_MODEL`` to the tokenizer path.
.. code-block:: shell
export HF_TOKEN=<Your personal Hugging Face access token>
The training script uses the ``HuggingFaceTokenizer``. Set ``TOKENIZER_MODEL`` to the appropriate Hugging Face model path.
.. code-block:: shell
TOKENIZER_MODEL="meta-llama/Llama-3.3-70B-Instruct"
.. container:: model-doc pyt_megatron_lm_train_llama-3.1-8b
The training script uses the ``HuggingFaceTokenizer``. Set ``TOKENIZER_MODEL`` to the appropriate Hugging Face model path.
.. code-block:: shell
TOKENIZER_MODEL="meta-llama/Llama-3.1-8B"
.. container:: model-doc pyt_megatron_lm_train_llama-3.1-70b
The training script uses the ``HuggingFaceTokenizer``. Set ``TOKENIZER_MODEL`` to the appropriate Hugging Face model path.
.. code-block:: shell
TOKENIZER_MODEL="meta-llama/Llama-3.1-70B"
.. container:: model-doc pyt_megatron_lm_train_llama-2-7b pyt_megatron_lm_train_llama-2-70b
The training script uses either the ``Llama2Tokenizer`` or ``HuggingFaceTokenizer`` by default.
.. container:: model-doc pyt_megatron_lm_train_deepseek-v3-proxy
The training script uses the ``HuggingFaceTokenizer``. Set ``TOKENIZER_MODEL`` to the appropriate Hugging Face model path.
.. code-block:: shell
TOKENIZER_MODEL="deepseek-ai/DeepSeek-V3"
.. container:: model-doc pyt_megatron_lm_train_deepseek-v2-lite-16b
The training script uses the ``HuggingFaceTokenizer``. Set ``TOKENIZER_MODEL`` to the appropriate Hugging Face model path.
.. code-block:: shell
TOKENIZER_MODEL="deepseek-ai/DeepSeek-V2-Lite"
.. container:: model-doc pyt_megatron_lm_train_mixtral-8x7b pyt_megatron_lm_train_mixtral-8x22b-proxy
Download the Mixtral tokenizer.
.. code-block:: shell
mkdir tokenizer
cd tokenizer
export HF_TOKEN=<Your personal Hugging Face access token>
wget --header="Authorization: Bearer $HF_TOKEN" -O ./tokenizer.model https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/resolve/main/tokenizer.model
Use the ``HuggingFaceTokenizer``. Set ``TOKENIZER_MODEL`` to the appropriate Hugging Face model path.
.. code-block:: shell
TOKENIZER_MODEL=tokenizer/tokenizer.model
Dataset options
---------------
You can use either mock data or real data for training.
* Mock data can be useful for testing and validation. Use the ``MOCK_DATA`` variable to toggle between mock and real data. The default
value is ``1`` for enabled.
.. code-block:: bash
MOCK_DATA=1
* If you're using a real dataset, update the ``DATA_PATH`` variable to point to the location of your dataset.
.. code-block:: bash
MOCK_DATA=0
DATA_PATH="/data/bookcorpus_text_sentence" # Change to where your dataset is stored
Ensure that the files are accessible inside the Docker container.
Download the dataset
^^^^^^^^^^^^^^^^^^^^
.. container:: model-doc pyt_megatron_lm_train_llama-3.3-70b pyt_megatron_lm_train_llama-3.1-8b pyt_megatron_lm_train_llama-3.1-70b pyt_megatron_lm_train_llama-2-7b pyt_megatron_lm_train_llama-2-70b
For Llama models, use the `prepare_dataset.sh
<https://github.com/ROCm/Megatron-LM/tree/rocm_dev/examples/llama>`_ script
to prepare your dataset.
To download the dataset, set the ``DATASET`` variable to the dataset you'd
like to use. Three datasets are supported: ``DATASET=wiki``, ``DATASET=fineweb``, and
``DATASET=bookcorpus``.
.. code-block:: shell
DATASET=wiki TOKENIZER_MODEL=NousResearch/Llama-2-7b-chat-hf bash examples/llama/prepare_dataset.sh #for wiki-en dataset
DATASET=bookcorpus TOKENIZER_MODEL=NousResearch/Llama-2-7b-chat-hf bash examples/llama/prepare_dataset.sh #for bookcorpus dataset
``TOKENIZER_MODEL`` can be any accessible Hugging Face tokenizer.
Remember to either pre-download the tokenizer or setup Hugging Face access
otherwise when needed -- see the :ref:`Tokenizer <amd-megatron-lm-tokenizer-v255>` section.
.. note::
When training set ``DATA_PATH`` to the specific file name prefix pointing to the ``.bin`` or ``.idx``
as in the following example:
.. code-block:: shell
DATA_PATH="data/bookcorpus_text_sentence" # Change to where your dataset is stored.
.. container:: model-doc pyt_megatron_lm_train_deepseek-v3-proxy
If you don't already have the dataset, download the DeepSeek dataset using the following
commands:
.. code-block:: shell
mkdir deepseek-datasets
cd deepseek-datasets
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/SlimPajama.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/alpaca_zh-train.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/alpaca_zh-valid.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/mmap_deepseekv2_datasets_text_document.bin
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/mmap_deepseekv2_datasets_text_document.idx
To train on this data, update the ``DATA_DIR`` variable to point to the location of your dataset.
.. code-block:: bash
MOCK_DATA=0 # Train on real data
DATA_DIR="<path-to>/deepseek-datasets" # Change to where your dataset is stored
Ensure that the files are accessible inside the Docker container.
.. container:: model-doc pyt_megatron_lm_train_deepseek-v2-lite-16b
If you don't already have the dataset, download the DeepSeek dataset using the following
commands:
.. code-block:: shell
mkdir deepseek-datasets
cd deepseek-datasets
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/SlimPajama.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/alpaca_zh-train.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/alpaca_zh-valid.json
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/mmap_deepseekv2_datasets_text_document.bin
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/deepseek-datasets/mmap_deepseekv2_datasets_text_document.idx
To train on this data, update the ``DATA_DIR`` variable to point to the location of your dataset.
.. code-block:: bash
MOCK_DATA=0 # Train on real data
DATA_DIR="<path-to>/deepseek-datasets" # Change to where your dataset is stored
Ensure that the files are accessible inside the Docker container.
.. container:: model-doc pyt_megatron_lm_train_mixtral-8x7b pyt_megatron_lm_train_mixtral-8x22b-proxy
If you don't already have the dataset, download the Mixtral dataset using the following
commands:
.. code-block:: shell
mkdir mixtral-datasets
cd mixtral-datasets
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/mistral-datasets/wudao_mistralbpe_content_document.bin
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/mistral-datasets/wudao_mistralbpe_content_document.idx
To train on this data, update the ``DATA_DIR`` variable to point to the location of your dataset.
.. code-block:: bash
MOCK_DATA=0 # Train on real data
DATA_DIR="<path-to>/mixtral-datasets" # Change to where your dataset is stored
Ensure that the files are accessible inside the Docker container.
Multi-node configuration
------------------------
If you're running multi-node training, update the following environment variables. They can
also be passed as command line arguments. Refer to the following example configurations.
* Change ``localhost`` to the master node's hostname:
.. code-block:: shell
MASTER_ADDR="${MASTER_ADDR:-localhost}"
* Set the number of nodes you want to train on (for instance, ``2``, ``4``, ``8``):
.. code-block:: shell
NNODES="${NNODES:-1}"
* Set the rank of each node (0 for master, 1 for the first worker node, and so on):
.. code-block:: shell
NODE_RANK="${NODE_RANK:-0}"
* Set ``DATA_CACHE_PATH`` to a common directory accessible by all the nodes (for example, an
NFS directory) for multi-node runs:
.. code-block:: shell
DATA_CACHE_PATH=/root/cache # Set to a common directory for multi-node runs
* For multi-node runs, make sure the correct network drivers are installed on the nodes. If
inside a Docker container, either install the drivers inside the Docker container or pass the network
drivers from the host while creating the Docker container.
.. code-block:: shell
# Specify which RDMA interfaces to use for communication
export NCCL_IB_HCA=rdma0,rdma1,rdma2,rdma3,rdma4,rdma5,rdma6,rdma7
Getting started
===============
The prebuilt Megatron-LM with ROCm training environment allows users to quickly validate
system performance, conduct training benchmarks, and achieve superior
performance for models like Llama, DeepSeek, and Mixtral. This container should not be
expected to provide generalized performance across all training workloads. You
can expect the container to perform in the model configurations described in
the following section, but other configurations are not validated by AMD.
.. _amd-megatron-lm-run-training-v255:
Run training
------------
Use the following example commands to set up the environment, configure
:ref:`key options <amd-megatron-lm-benchmark-test-vars-v255>`, and run training on
MI300X series accelerators with the AMD Megatron-LM environment.
Single node training
^^^^^^^^^^^^^^^^^^^^
.. container:: model-doc pyt_megatron_lm_train_llama-3.3-70b
To run the training on a single node for Llama 3.3 70B BF16 with FSDP-v2 enabled, add the ``FSDP=1`` argument.
For example, use the following command:
.. code-block:: shell
TEE_OUTPUT=1 RECOMPUTE=1 SEQ_LENGTH=8192 MBS=2 BS=16 TE_FP8=0 TP=1 PP=1 FSDP=1 MODEL_SIZE=70 TOTAL_ITERS=50 bash examples/llama/train_llama3.sh
.. note::
It is suggested to use ``TP=1`` when FSDP is enabled for higher
throughput. FSDP-v2 is not supported with pipeline parallelism, expert
parallelism, MCore's distributed optimizer, gradient accumulation fusion,
or FP16.
Currently, FSDP is only compatible with BF16 precision.
.. container:: model-doc pyt_megatron_lm_train_llama-3.1-8b
To run training on a single node for Llama 3.1 8B FP8, navigate to the Megatron-LM folder and use the
following command.
.. code-block:: shell
TEE_OUTPUT=1 MBS=2 BS=128 TP=1 TE_FP8=1 SEQ_LENGTH=8192 MODEL_SIZE=8 TOTAL_ITERS=50 bash examples/llama/train_llama3.sh
For Llama 3.1 8B BF16, use the following command:
.. code-block:: shell
TEE_OUTPUT=1 MBS=2 BS=128 TP=1 TE_FP8=0 SEQ_LENGTH=8192 MODEL_SIZE=8 TOTAL_ITERS=50 bash examples/llama/train_llama3.sh
.. container:: model-doc pyt_megatron_lm_train_llama-3.1-70b
To run the training on a single node for Llama 3.1 70B BF16 with FSDP-v2 enabled, add the ``FSDP=1`` argument.
For example, use the following command:
.. code-block:: shell
TEE_OUTPUT=1 MBS=3 BS=24 TP=1 TE_FP8=0 FSDP=1 RECOMPUTE=1 SEQ_LENGTH=8192 MODEL_SIZE=70 TOTAL_ITERS=50 bash examples/llama/train_llama3.sh
.. note::
It is suggested to use ``TP=1`` when FSDP is enabled for higher
throughput. FSDP-v2 is not supported with pipeline parallelism, expert
parallelism, MCore's distributed optimizer, gradient accumulation fusion,
or FP16.
Currently, FSDP is only compatible with BF16 precision.
.. container:: model-doc pyt_megatron_lm_train_llama-2-7b
To run training on a single node for Llama 2 7B FP8, navigate to the Megatron-LM folder and use the
following command.
.. code-block:: shell
TEE_OUTPUT=1 MBS=4 BS=256 TP=1 TE_FP8=1 SEQ_LENGTH=4096 MODEL_SIZE=7 TOTAL_ITERS=50 bash examples/llama/train_llama2.sh
For Llama 2 7B BF16, use the following command:
.. code-block:: shell
TEE_OUTPUT=1 MBS=4 BS=256 TP=1 TE_FP8=0 SEQ_LENGTH=4096 MODEL_SIZE=7 TOTAL_ITERS=50 bash examples/llama/train_llama2.sh
.. container:: model-doc pyt_megatron_lm_train_llama-2-70b
To run the training on a single node for Llama 2 70B BF16 with FSDP-v2 enabled, add the ``FSDP=1`` argument.
For example, use the following command:
.. code-block:: shell
TEE_OUTPUT=1 MBS=7 BS=56 TP=1 TE_FP8=0 FSDP=1 RECOMPUTE=1 SEQ_LENGTH=4096 MODEL_SIZE=70 TOTAL_ITERS=50 bash examples/llama/train_llama2.sh
.. note::
It is suggested to use ``TP=1`` when FSDP is enabled for higher
throughput. FSDP-v2 is not supported with pipeline parallelism, expert
parallelism, MCore's distributed optimizer, gradient accumulation fusion,
or FP16.
Currently, FSDP is only compatible with BF16 precision.
.. container:: model-doc pyt_megatron_lm_train_deepseek-v3-proxy
To run training on a single node for DeepSeek-V3 (MoE with expert parallel) with 3-layer proxy,
navigate to the Megatron-LM folder and use the following command.
.. code-block:: shell
FORCE_BANLANCE=true \
RUN_ENV=cluster \
MODEL_SIZE=671B \
TRAIN_ITERS=50 \
SEQ_LEN=4096 \
NUM_LAYERS=3 \
MICRO_BATCH_SIZE=1 GLOBAL_BATCH_SIZE=32 \
PR=bf16 \
TP=1 PP=1 ETP=1 EP=8 \
GEMM_TUNING=1 \
NVTE_CK_USES_BWD_V3=1 \
USE_GROUPED_GEMM=true MOE_USE_LEGACY_GROUPED_GEMM=true \
GPT_LAYER_IN_TE=true \
bash examples/deepseek_v3/train_deepseekv3.sh
.. container:: model-doc pyt_megatron_lm_train_deepseek-v2-lite-16b
To run training on a single node for DeepSeek-V2-Lite (MoE with expert parallel),
navigate to the Megatron-LM folder and use the following command.
.. code-block:: shell
GEMM_TUNING=1 PR=bf16 MBS=4 AC=none SEQ_LEN=4096 PAD_LEN=4096 TRAIN_ITERS=50 bash examples/deepseek_v2/train_deepseekv2.sh
.. container:: model-doc pyt_megatron_lm_train_mixtral-8x7b
To run training on a single node for Mixtral 8x7B (MoE with expert parallel),
navigate to the Megatron-LM folder and use the following command.
.. code-block:: shell
RECOMPUTE_NUM_LAYERS=0 TEE_OUTPUT=1 MBS=1 GBS=16 TP_SIZE=1 PP_SIZE=1 AC=none PR=bf16 EP_SIZE=8 ETP_SIZE=1 SEQLEN=4096 FORCE_BALANCE=true MOCK_DATA=1 RUN_ENV=cluster MODEL_SIZE=8x7B TRAIN_ITERS=50 bash examples/mixtral/train_mixtral_moe.sh
.. container:: model-doc pyt_megatron_lm_train_mixtral-8x22b-proxy
To run training on a single node for Mixtral 8x7B (MoE with expert parallel) with 4-layer proxy,
navigate to the Megatron-LM folder and use the following command.
.. code-block:: shell
RECOMPUTE_NUM_LAYERS=4 TEE_OUTPUT=1 MBS=1 GBS=16 TP_SIZE=1 PP_SIZE=1 AC=full NUM_LAYERS=4 PR=bf16 EP_SIZE=8 ETP_SIZE=1 SEQLEN=8192 FORCE_BALANCE=true MOCK_DATA=1 RUN_ENV=cluster MODEL_SIZE=8x22B TRAIN_ITERS=50 bash examples/mixtral/train_mixtral_moe.sh
Multi-node training
^^^^^^^^^^^^^^^^^^^
To run training on multiple nodes, launch the Docker container on each node.
For example, for Llama 3 using a two node setup (``NODE0`` as the master node),
use these commands.
* On the master node ``NODE0``:
.. code-block:: shell
TEE_OUTPUT=1 MBS=2 BS=256 TP=1 TE_FP8=1 SEQ_LENGTH=8192 MODEL_SIZE=8 MASTER_ADDR=IP_NODE0 NNODES=2 NODE_RANK=0 bash examples/llama/train_llama3.sh
* On the worker node ``NODE1``:
.. code-block:: shell
TEE_OUTPUT=1 MBS=2 BS=256 TP=1 TE_FP8=1 SEQ_LENGTH=8192 MODEL_SIZE=8 MASTER_ADDR=IP_NODE0 NNODES=2 NODE_RANK=1 bash examples/llama/train_llama3.sh
Or, for DeepSeek-V3, an example script ``train_deepseek_v3_slurm.sh`` is
provided in
`<https://github.com/ROCm/Megatron-LM/tree/rocm_dev/examples/deepseek_v3>`__ to
enable training at scale under a SLURM environment. For example, to run
training on 16 nodes, try the following command:
.. code-block:: shell
sbatch examples/deepseek_v3/train_deepseek_v3_slurm.sh
.. _amd-megatron-lm-benchmark-test-vars-v255:
Key options
-----------
The benchmark tests support the following sets of variables.
``TEE_OUTPUT``
``1`` to enable training logs or ``0`` to disable.
``TE_FP8``
``0`` for B16 or ``1`` for FP8 -- ``0`` by default.
``GEMM_TUNING``
``1`` to enable GEMM tuning, which boosts performance by using the best GEMM kernels.
``USE_FLASH_ATTN``
``1`` to enable Flash Attention.
``FSDP``
``1`` to enable PyTorch FSDP2. If FSDP is enabled, ``--use-distributed-optimizer``,
``--overlap-param-gather``, and ``--sequence-parallel`` are automatically disabled.
``ENABLE_PROFILING``
``1`` to enable PyTorch profiling for performance analysis.
``transformer-impl``
``transformer_engine`` to use the Transformer Engine (TE) or ``local`` to disable TE.
``MODEL_SIZE``
``8B`` or ``70B`` for Llama 3 and 3.1. ``7B`` or ``70B`` for Llama 2, for example.
``TOTAL_ITERS``
The total number of iterations -- ``10`` by default.
``MOCK_DATA``
``1`` to use mock data or ``0`` to use real data you provide.
``MBS``
Micro batch size.
``BS``
Global batch size.
``TP`` / ``TP_SIZE``
Tensor parallel (``1``, ``2``, ``4``, ``8``). ``TP`` is disabled when ``FSDP`` is turned on.
``EP`` / ``EP_SIZE``
Expert parallel for MoE models.
``SEQ_LENGTH``
Input sequence length.
``PR``
Precision for training. ``bf16`` for BF16 (default) or ``fp8`` for FP8 GEMMs.
``AC``
Activation checkpointing (``none``, ``sel``, or ``full``) -- ``sel`` by default.
``NUM_LAYERS``
Use reduced number of layers as a proxy model.
``RECOMPUTE_NUM_LAYERS``
Number of layers used for checkpointing recompute.
Previous versions
=================
See :doc:`megatron-lm-history` to find documentation for previous releases
of the ``ROCm/megatron-lm`` Docker image.

View File

@@ -11,37 +11,39 @@ previous releases of the ``ROCm/pytorch-training`` Docker image on `Docker Hub <
.. list-table::
:header-rows: 1
:stub-columns: 1
* - Image version
- ROCm version
- PyTorch version
- Components
- Resources
* - v25.6
- 6.3.4
- 2.8.0a0+git7d205b2
-
* ROCm 6.3.4
* PyTorch 2.8.0a0+git7d205b2
-
* :doc:`Documentation <../pytorch-training>`
* `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-training/v25.6/images/sha256-a4cea3c493a4a03d199a3e81960ac071d79a4a7a391aa9866add3b30a7842661>`_
* `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-training/v25.6/images/sha256-a4cea3c493a4a03d199a3e81960ac071d79a4a7a391aa9866add3b30a7842661>`__
* - v25.5
- 6.3.4
- 2.7.0a0+git637433
-
* ROCm 6.3.4
* PyTorch 2.7.0a0+git637433
-
* :doc:`Documentation <pytorch-training-v25.5>`
* `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-training/v25.5/images/sha256-d47850a9b25b4a7151f796a8d24d55ea17bba545573f0d50d54d3852f96ecde5>`_
* `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-training/v25.5/images/sha256-d47850a9b25b4a7151f796a8d24d55ea17bba545573f0d50d54d3852f96ecde5>`__
* - v25.4
- 6.3.0
- 2.7.0a0+git637433
-
* ROCm 6.3.0
* PyTorch 2.7.0a0+git637433
-
* :doc:`Documentation <pytorch-training-v25.4>`
* `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-training/v25.4/images/sha256-fa98a9aa69968e654466c06f05aaa12730db79b48b113c1ab4f7a5fe6920a20b>`_
* `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-training/v25.4/images/sha256-fa98a9aa69968e654466c06f05aaa12730db79b48b113c1ab4f7a5fe6920a20b>`__
* - v25.3
- 6.3.0
- 2.7.0a0+git637433
-
* ROCm 6.3.0
* PyTorch 2.7.0a0+git637433
-
* :doc:`Documentation <pytorch-training-v25.3>`
* `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-training/v25.3/images/sha256-0ffdde1b590fd2787b1c7adf5686875b100980b0f314090901387c44253e709b>`_
* `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-training/v25.3/images/sha256-0ffdde1b590fd2787b1c7adf5686875b100980b0f314090901387c44253e709b>`__

View File

@@ -39,7 +39,7 @@ software components to accelerate training workloads:
| Triton | 3.1 |
+--------------------------+--------------------------------+
.. _amd-pytorch-training-model-support:
.. _amd-pytorch-training-model-support-v253:
Supported models
================

View File

@@ -39,7 +39,7 @@ software components to accelerate training workloads:
| Triton | 3.1 |
+--------------------------+--------------------------------+
.. _amd-pytorch-training-model-support:
.. _amd-pytorch-training-model-support-v254:
Supported models
================
@@ -61,7 +61,7 @@ The following models are pre-optimized for performance on the AMD Instinct MI325
Some models, such as Llama 3, require an external license agreement through
a third party (for example, Meta).
.. _amd-pytorch-training-performance-measurements:
.. _amd-pytorch-training-performance-measurements-v254:
Performance measurements
========================

View File

@@ -40,7 +40,7 @@ software components to accelerate training workloads:
| Triton | 3.2.0 |
+--------------------------+--------------------------------+
.. _amd-pytorch-training-model-support:
.. _amd-pytorch-training-model-support-v255:
Supported models
================
@@ -64,7 +64,7 @@ The following models are pre-optimized for performance on the AMD Instinct MI325
Some models, such as Llama 3, require an external license agreement through
a third party (for example, Meta).
.. _amd-pytorch-training-performance-measurements:
.. _amd-pytorch-training-performance-measurements-v255:
Performance measurements
========================

View File

@@ -142,7 +142,11 @@ The following models are pre-optimized for performance on the AMD Instinct MI325
.. code-block:: shell
export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
python3 tools/run_models.py --tags {{ model.mad_tag }} --keep-model-dir --live-output --timeout 28800
madengine run \
--tags {{ model.mad_tag }} \
--keep-model-dir \
--live-output \
--timeout 28800
MAD launches a Docker container with the name
``container_ci-{{ model.mad_tag }}``, for example. The latency and throughput reports of the
@@ -427,6 +431,17 @@ The following models are pre-optimized for performance on the AMD Instinct MI325
For examples of benchmarking commands, see `<https://github.com/ROCm/MAD/tree/develop/benchmark/pytorch_train#benchmarking-examples>`__.
Further reading
===============
- To learn more about MAD and the ``madengine`` CLI, see the `MAD usage guide <https://github.com/ROCm/MAD?tab=readme-ov-file#usage-guide>`__.
- To learn more about system settings and management practices to configure your system for
AMD Instinct MI300X series accelerators, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_.
- For a list of other ready-made Docker images for AI with ROCm, see
`AMD Infinity Hub <https://www.amd.com/en/developer/resources/infinity-hub.html#f-amd_hub_category=AI%20%26%20ML%20Models>`_.
Previous versions
=================

View File

@@ -4,6 +4,7 @@ myst:
"description": "Learn about system settings and performance tuning for RDNA2-based GPUs."
"keywords": "RDNA2, workstation, desktop, BIOS, installation, Radeon, pro, v620, w6000"
---
:orphan:
# AMD RDNA2 system optimization

View File

@@ -1,3 +1,5 @@
:orphan:
.. meta::
:description: How to configure MI300X accelerators to fully leverage their capabilities and achieve optimal performance.
:keywords: ROCm, AI, machine learning, MI300X, LLM, usage, tutorial, optimization, tuning

View File

@@ -10,6 +10,7 @@
| Version | Release date |
| ------- | ------------ |
| [6.4.2](https://rocm.docs.amd.com/en/docs-6.4.2/) | July 21, 2025 |
| [6.4.1](https://rocm.docs.amd.com/en/docs-6.4.1/) | May 21, 2025 |
| [6.4.0](https://rocm.docs.amd.com/en/docs-6.4.0/) | April 11, 2025 |
| [6.3.3](https://rocm.docs.amd.com/en/docs-6.3.3/) | February 19, 2025 |

View File

@@ -19,9 +19,9 @@ subtrees:
- caption: Install
entries:
- url: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/
- url: https://rocm.docs.amd.com/projects/install-on-linux/en/${branch}/
title: ROCm on Linux
- url: https://rocm.docs.amd.com/projects/install-on-windows/en/${branch}/
- url: https://rocm.docs.amd.com/projects/install-on-windows/en/latest/
title: HIP SDK on Windows
- url: https://rocm.docs.amd.com/projects/radeon/en/latest/index.html
title: ROCm on Radeon GPUs
@@ -82,6 +82,8 @@ subtrees:
title: vLLM inference performance testing
- file: how-to/rocm-for-ai/inference/benchmark-docker/pytorch-inference.rst
title: PyTorch inference performance testing
- file: how-to/rocm-for-ai/inference/benchmark-docker/sglang.rst
title: SGLang inference performance testing
- file: how-to/rocm-for-ai/inference/deploy-your-model.rst
title: Deploy your model

View File

@@ -1,7 +1,7 @@
<?xml version="1.0" encoding="UTF-8"?>
<manifest>
<remote name="rocm-org" fetch="https://github.com/ROCm/" />
<default revision="refs/tags/rocm-6.4.1"
<default revision="refs/tags/rocm-6.4.2"
remote="rocm-org"
sync-c="true"
sync-j="4" />

View File

@@ -0,0 +1,79 @@
<?xml version="1.0" encoding="UTF-8"?>
<manifest>
<remote name="rocm-org" fetch="https://github.com/ROCm/" />
<default revision="refs/tags/rocm-6.4.2"
remote="rocm-org"
sync-c="true"
sync-j="4" />
<!--list of projects for ROCm-->
<project name="ROCm" revision="roc-6.4.x" />
<project name="ROCK-Kernel-Driver" />
<project name="ROCR-Runtime" />
<project name="amdsmi" />
<project name="rdc" />
<project name="rocm_bandwidth_test" />
<project name="rocm_smi_lib" />
<project name="rocm-core" />
<project name="rocm-examples" />
<project name="rocminfo" />
<project name="rocprofiler" />
<project name="rocprofiler-register" />
<project name="rocprofiler-sdk" />
<project name="rocprofiler-compute" />
<project name="rocprofiler-systems" />
<project name="roctracer" />
<!--HIP Projects-->
<project name="hip" />
<project name="hip-tests" />
<project name="HIPIFY" />
<project name="clr" />
<project name="hipother" />
<!-- The following projects are all associated with the AMDGPU LLVM compiler -->
<project name="half" />
<project name="llvm-project" />
<project name="spirv-llvm-translator" />
<!-- gdb projects -->
<project name="ROCdbgapi" />
<project name="ROCgdb" />
<project name="rocr_debug_agent" />
<!-- ROCm Libraries -->
<project groups="mathlibs" name="AMDMIGraphX" />
<project groups="mathlibs" name="MIOpen" />
<project groups="mathlibs" name="MIVisionX" />
<project groups="mathlibs" name="ROCmValidationSuite" />
<project groups="mathlibs" name="Tensile" />
<project groups="mathlibs" name="composable_kernel" />
<project groups="mathlibs" name="hipBLAS-common" />
<project groups="mathlibs" name="hipBLAS" />
<project groups="mathlibs" name="hipBLASLt" />
<project groups="mathlibs" name="hipCUB" />
<project groups="mathlibs" name="hipFFT" />
<project groups="mathlibs" name="hipRAND" />
<project groups="mathlibs" name="hipSOLVER" />
<project groups="mathlibs" name="hipSPARSE" />
<project groups="mathlibs" name="hipSPARSELt" />
<project groups="mathlibs" name="hipTensor" />
<project groups="mathlibs" name="hipfort" />
<project groups="mathlibs" name="rccl" />
<project groups="mathlibs" name="rocAL" />
<project groups="mathlibs" name="rocALUTION" />
<project groups="mathlibs" name="rocBLAS" />
<project groups="mathlibs" name="rocDecode" />
<project groups="mathlibs" name="rocJPEG" />
<project groups="mathlibs" name="rocPyDecode" />
<project groups="mathlibs" name="rocFFT" />
<project groups="mathlibs" name="rocPRIM" />
<project groups="mathlibs" name="rocRAND" />
<project groups="mathlibs" name="rocSHMEM" />
<project groups="mathlibs" name="rocSOLVER" />
<project groups="mathlibs" name="rocSPARSE" />
<project groups="mathlibs" name="rocThrust" />
<project groups="mathlibs" name="rocWMMA" />
<project groups="mathlibs" name="rocm-cmake" />
<project groups="mathlibs" name="rpp" />
<project groups="mathlibs" name="TransferBench" />
<!-- Projects for OpenMP-Extras -->
<project name="aomp" path="openmp-extras/aomp" />
<project name="aomp-extras" path="openmp-extras/aomp-extras" />
<project name="flang" path="openmp-extras/flang" />
</manifest>