Docs: Overhaul JAX compatibility page

Link to 6.4.1 updated from internal to public (#4913 ) (#4914 )
Merge pull request #4911 from peterjunpark/docs/6.4.1
2026-01-09 22:58:17 -05:00 · 2025-06-12 14:35:41 +02:00 · 2025-06-10 17:19:45 -04:00 · 2025-06-10 13:18:50 -04:00 · 2025-06-10 13:07:35 -04:00 · 2025-06-09 14:54:01 -04:00
24 changed files with 1466 additions and 887 deletions
--- a/.wordlist.txt
+++ b/.wordlist.txt
@@ -228,6 +228,7 @@ LM
 LSAN
 LSan
 LTS
+LSTMs
 LanguageCrossEntropy
 LoRA
 MEM
@@ -272,6 +273,7 @@ NBIO
 NBIOs
 NCCL
 NCF
+NFS
 NIC
 NICs
 NLI
@@ -500,6 +502,7 @@ ZenDNN
 accuracies
 activations
 addr
+ade
 ai
 alloc
 allocatable
@@ -515,6 +518,7 @@ avx
 awk
 backend
 backends
+bb
 benchmarked
 benchmarking
 bfloat
@@ -538,6 +542,7 @@ cd
 centos
 centric
 changelog
+checkpointing
 chiplet
 cmake
 cmd
@@ -578,6 +583,7 @@ de
 deallocation
 debuggability
 debian
+deepseek
 denoise
 denoised
 denoises
@@ -601,6 +607,7 @@ embeddings
 enablement
 encodings
 endfor
+endif
 endpgm
 enqueue
 env
@@ -673,6 +680,7 @@ installable
 interop
 interprocedural
 intra
+intrinsics
 invariants
 invocating
 ipo
@@ -702,6 +710,7 @@ migratable
 miopen
 miopengemm
 mivisionx
+mixtral
 mjx
 mkdir
 mlirmiopen
@@ -833,6 +842,7 @@ sm
 smi
 softmax
 spack
+spmm
 src
 stochastically
 strided
@@ -843,6 +853,7 @@ subfolder
 subfolders
 submodule
 submodules
+subnet
 supercomputing
 symlink
 symlinks
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,7 +6,7 @@ different versions of the ROCm software stack and its components.

 ## ROCm 6.4.1

-See the [ROCm 6.4.1 release notes](https://rocm-stg.amd.com/en/latest/about/release-notes.html)
+See the [ROCm 6.4.1 release notes](https://rocm.docs.amd.com/en/docs-6.4.1/about/release-notes.html)
 for a complete overview of this release.

 ### **AMD SMI** (25.4.2)
@@ -894,6 +894,18 @@ See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/rele

 - Fixed an issue where sampling multi-GPU Python workloads caused the system to stop responding.

+### **ROCm Validation Suite** (1.1.0)
+
+#### Added
+
+* Configuration files for MI210.
+* Support for OCP fp8 data type.
+* GPU index-based CLI execution.
+
+#### Changed
+
+* JSON logging with updated schema.
+
 ### **rocPRIM** (3.4.0)

 #### Added
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -74,14 +74,14 @@ ROCm documentation continues to be updated to provide clearer and more comprehen

 ROCm 6.4.1 introduces support for the RDNA4 architecture-based [Radeon AI PRO
 R9700](https://www.amd.com/en/products/graphics/workstations/radeon-ai-pro/ai-9000-series/amd-radeon-ai-pro-r9700.html),
-[Radeon RX 9070 XT](https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9070xt.html), and
+[Radeon RX 9070](https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9070.html),
+[Radeon RX 9070 XT](https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9070xt.html),
+Radeon RX 9070 GRE, and
 [Radeon RX 9060 XT](https://www.amd.com/en/products/graphics/desktops/radeon/9000-series/amd-radeon-rx-9060xt.html) GPUs
-for compute workloads. Currently, these GPUs are only supported on Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.5, and RHEL 9.4.
+for compute workloads. It also adds support for RDNA3 architecture-based [Radeon PRO W7700](https://www.amd.com/en/products/graphics/workstations/radeon-pro/w7700.html) and [Radeon RX 7800 XT](https://www.amd.com/en/products/graphics/desktops/radeon/7000-series/amd-radeon-rx-7800-xt.html) GPUs. These GPUs are supported on Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.6, RHEL 9.5, and RHEL 9.4.
 For details, see the full list of [Supported GPUs
 (Linux)](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-gpus).

-Operating system support remains unchanged in this release.
-
 See the [Compatibility
 matrix](../../docs/compatibility/compatibility-matrix.rst)
 for more information about operating system and hardware compatibility.
@@ -165,7 +165,7 @@ Click {fab}`github` to go to the component's source code on GitHub.
                <td><a href="https://github.com/ROCm/rccl"><i class="fab fa-github fa-lg"></i></a></td>
            </tr>
            <tr>
-            <td><a href="https://github.com/ROCm/rocSHMEM">rocSHMEM</a></td>
+            <td><a href="https://rocm.docs.amd.com/projects/rocSHMEM/en/docs-6.4.1/index.html">rocSHMEM</a></td>
                <td>2.0.0</td>
                <td><a href="https://github.com/ROCm/rocSHMEM"><i class="fab fa-github fa-lg"></i></a></td>
            </tr>
@@ -654,4 +654,4 @@ There are a number of upcoming changes planned for HIP runtime API in an upcomin
 that are not backward compatible with prior releases. Most of these changes increase 
 alignment between HIP and CUDA APIs or behavior. Some of the upcoming changes are to 
 clean up header files, remove namespace collision, and have a clear separation between 
-`hipRTC` and HIP runtime. For more information refer to [HIP Upcoming changes](https://rocm.docs.amd.com/en/docs-6.4.0/about/release-notes.html#id15).
+`hipRTC` and HIP runtime. For more information, see [HIP 7.0 Is Coming: What You Need to Know to Stay Ahead](https://rocm.blogs.amd.com/ecosystems-and-partners/transition-to-hip-7.0:-guidance-on-upcoming-compatibility-changes/README.html).
--- a/docs/compatibility/compatibility-matrix-historical-6.0.csv
+++ b/docs/compatibility/compatibility-matrix-historical-6.0.csv
@@ -2,7 +2,7 @@ ROCm Version,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5,
      :ref:`Operating systems & kernels <OS-kernel-versions>`,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,"Ubuntu 24.04.1, 24.04","Ubuntu 24.04.1, 24.04","Ubuntu 24.04.1, 24.04",Ubuntu 24.04,,,,,,
      ,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,"Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3, 22.04.2","Ubuntu 22.04.4, 22.04.3, 22.04.2"
      ,,,,,,,,,,,"Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5"
-      ,"RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.3, 9.2","RHEL 9.3, 9.2"
+      ,"RHEL 9.6, 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.3, 9.2","RHEL 9.3, 9.2"
      ,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,"RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8"
      ,SLES 15 SP6,SLES 15 SP6,"SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4"
      ,,,,,,,,,,,,CentOS 7.9,CentOS 7.9,CentOS 7.9,CentOS 7.9,CentOS 7.9
@@ -17,8 +17,9 @@ ROCm Version,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5,
      ,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3
      ,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2
      ,.. _gpu-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
-      :doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>`,gfx1201 [#RDNA4-OS-past-60]_,,,,,,,,,,,,,,,
-      ,gfx1200 [#RDNA4-OS-past-60]_,,,,,,,,,,,,,,,
+      :doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>`,gfx1201 [#RDNA-OS-past-60]_,,,,,,,,,,,,,,,
+      ,gfx1200 [#RDNA-OS-past-60]_,,,,,,,,,,,,,,,
+,gfx1101 [#RDNA-OS-past-60]_,,,,,,,,,,,,,,,
      ,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100
      ,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030
      ,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942 [#mi300_624-past-60]_,gfx942 [#mi300_622-past-60]_,gfx942 [#mi300_621-past-60]_,gfx942 [#mi300_620-past-60]_, gfx942 [#mi300_612-past-60]_, gfx942 [#mi300_612-past-60]_, gfx942 [#mi300_611-past-60]_, gfx942 [#mi300_610-past-60]_, gfx942 [#mi300_602-past-60]_, gfx942 [#mi300_600-past-60]_
@@ -41,7 +42,7 @@ ROCm Version,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5,
      CUB,2.5.0,2.5.0,2.3.2,2.3.2,2.3.2,2.3.2,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.1,2.0.1
 ,,,,,,,,,,,,,,,,
      KMD & USER SPACE [#kfd_support-past-60]_,.. _kfd-userspace-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
-      KMD versions,"6.4.x, 6.3.x","6.4.x, 6.3.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x"
+      :doc:`KMD versions <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x"
      ,,,,,,,,,,,,,,,,
      ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
      :doc:`Composable Kernel <composable_kernel:index>`,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0
@@ -56,7 +57,7 @@ ROCm Version,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5,
      ,,,,,,,,,,,,,,,,
      COMMUNICATION,.. _commlibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
      :doc:`RCCL <rccl:index>`,2.22.3,2.22.3,2.21.5,2.21.5,2.21.5,2.21.5,2.20.5,2.20.5,2.20.5,2.20.5,2.18.6,2.18.6,2.18.6,2.18.6,2.18.3,2.18.3
-      `rocSHMEM <https://github.com/ROCm/rocSHMEM>`_ ,2.0.0,2.0.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
+      :doc:`rocSHMEM <rocshmem:index>`,2.0.0,2.0.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
      ,,,,,,,,,,,,,,,,
      MATH LIBS,.. _mathlibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,
      `half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0
--- a/docs/compatibility/compatibility-matrix.rst
+++ b/docs/compatibility/compatibility-matrix.rst
@@ -28,7 +28,7 @@ compatibility and system requirements.

      :ref:`Operating systems & kernels <OS-kernel-versions>`,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2
      ,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5
-      ,"RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4"
+      ,"RHEL 9.6, 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4"
      ,RHEL 8.10,RHEL 8.10,RHEL 8.10
      ,SLES 15 SP6,SLES 15 SP6,"SLES 15 SP6, SP5"
      ,"Oracle Linux 9, 8 [#mi300x]_","Oracle Linux 9, 8 [#mi300x]_",Oracle Linux 8.10 [#mi300x]_
@@ -42,8 +42,9 @@ compatibility and system requirements.
      ,RDNA3,RDNA3,RDNA3
      ,RDNA2,RDNA2,RDNA2
      ,.. _gpu-support-compatibility-matrix:,,
-      :doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>`,gfx1201 [#RDNA4-OS]_,,
-      ,gfx1200 [#RDNA4-OS]_,,
+      :doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>`,gfx1201 [#RDNA-OS]_,,
+      ,gfx1200 [#RDNA-OS]_,,
+      ,gfx1101 [#RDNA-OS]_,,
      ,gfx1100,gfx1100,gfx1100
      ,gfx1030,gfx1030,gfx1030
      ,gfx942,gfx942,gfx942
@@ -65,7 +66,7 @@ compatibility and system requirements.
      CUB,2.5.0,2.5.0,2.3.2
      ,,,
      KMD & USER SPACE [#kfd_support]_,.. _kfd-userspace-support-compatibility-matrix:,,
-      KMD versions,"6.4.x, 6.3.x","6.4.x, 6.3.x","6.4.x, 6.3.x, 6.2.x, 6.1.x"
+      :doc:`KMD versions <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x"
      ,,,
      ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix:,,
      :doc:`Composable Kernel <composable_kernel:index>`,1.1.0,1.1.0,1.1.0
@@ -80,7 +81,7 @@ compatibility and system requirements.
      ,,,
      COMMUNICATION,.. _commlibs-support-compatibility-matrix:,,
      :doc:`RCCL <rccl:index>`,2.22.3,2.22.3,2.21.5
-      `rocSHMEM <https://github.com/ROCm/rocSHMEM>`_ ,2.0.0,2.0.0,N/A
+      :doc:`rocSHMEM <rocshmem:index>`,2.0.0,2.0.0,N/A
      ,,,
      MATH LIBS,.. _mathlibs-support-compatibility-matrix:,,
      `half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0,1.12.0
@@ -156,7 +157,7 @@ compatibility and system requirements.
 .. [#mi300_620] **For ROCm 6.2.0** - MI300X (gfx942) is supported on listed operating systems *except* Ubuntu 22.04.5 [6.8 HWE] and Ubuntu 22.04.4 [6.5 HWE].
 .. [#kfd_support] Starting from ROCm 6.4.0, forward and backward compatibility between the AMD Kernel-mode GPU Driver (KMD) and its user space software is provided up to a year apart (assuming hardware support is available in both). For earlier ROCm releases, the compatibility is provided for +/- 2 releases. These are the compatibility combinations that are currently supported.
 .. [#ROCT-rocr] Starting from ROCm 6.3.0, the ROCT Thunk Interface is included as part of the ROCr runtime package.
-.. [#RDNA4-OS] Radeon AI PRO R9700, Radeon RX 9070 XT (gfx1201), and Radeon RX 9060 XT (gfx1200) are supported only on Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.5, and RHEL 9.4.
+.. [#RDNA-OS] Radeon AI PRO R9700, Radeon RX 9070 XT (gfx1201), Radeon RX 9060 XT (gfx1200), Radeon PRO W7700 (gfx1101), and Radeon RX 7800 XT (gfx1101) are supported only on Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.6, RHEL 9.5, and RHEL 9.4.

 .. _OS-kernel-versions:

@@ -174,7 +175,8 @@ Use this lookup table to confirm which operating system and kernel versions are
   ,,
   `Ubuntu <https://ubuntu.com/about/release-cycle#ubuntu-kernel-release-cycle>`_, 22.04.5, "5.15 GA, 6.8 HWE", 2.35
   ,,
-   `Red Hat Enterprise Linux (RHEL 9) <https://access.redhat.com/articles/3078#RHEL9>`_, 9.5, 5.14+, 2.34
+   `Red Hat Enterprise Linux (RHEL 9) <https://access.redhat.com/articles/3078#RHEL9>`_, 9.6, 5.14+, 2.34
+   , 9.5, 5.14+, 2.34
   ,9.4, 5.14+, 2.34
   ,9.3, 5.14+, 2.34
   ,,
@@ -235,4 +237,4 @@ Expand for full historical view of:
   .. [#mi300_600-past-60] **For ROCm 6.0.0** - MI300A (gfx942) is supported on Ubuntu 22.04.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.3.
   .. [#kfd_support-past-60] Starting from ROCm 6.4.0, forward and backward compatibility between the AMD Kernel-mode GPU Driver (KMD) and its user space software is provided up to a year apart (assuming hardware support is available in both). For earlier ROCm releases, the compatibility is provided for +/- 2 releases. These are the compatibility combinations that are currently supported.
   .. [#ROCT-rocr-past-60] Starting from ROCm 6.3.0, the ROCT Thunk Interface is included as part of the ROCr runtime package.
-   .. [#RDNA4-OS-past-60] Radeon AI PRO R9700, Radeon RX 9070 XT (gfx1201), and Radeon RX 9060 XT (gfx1200) are supported only on Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.5, and RHEL 9.4.
+   .. [#RDNA-OS-past-60] Radeon AI PRO R9700, Radeon RX 9070 XT (gfx1201), Radeon RX 9060 XT (gfx1200), Radeon PRO W7700 (gfx1101), and Radeon RX 7800 XT (gfx1101) are supported only on Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.6, RHEL 9.5, and RHEL 9.4.
--- a/docs/compatibility/ml-compatibility/jax-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/jax-compatibility.rst
@@ -53,7 +53,7 @@ Use cases and recommendations
 * The `nanoGPT in JAX <https://rocm.blogs.amd.com/artificial-intelligence/nanoGPT-JAX/README.html>`_
  blog explores the implementation and training of a Generative Pre-trained
  Transformer (GPT) model in JAX, inspired by Andrej Karpathy’s JAX-based
-  nanoGPT. Comparing how essential GPT components—such as self-attention 
+  nanoGPT. Comparing how essential GPT components—such as self-attention
  mechanisms and optimizers—are realized in JAX and JAX, also highlights
  JAX’s unique features.

@@ -97,7 +97,7 @@ Docker image compatibility
 AMD validates and publishes ready-made `ROCm JAX Docker images <https://hub.docker.com/r/rocm/jax>`_
 with ROCm backends on Docker Hub. The following Docker image tags and
 associated inventories represent the latest JAX version from the official Docker Hub and are validated for
-`ROCm 6.4.0 <https://repo.radeon.com/rocm/apt/6.4/>`_. Click the |docker-icon|
+`ROCm 6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`_. Click the |docker-icon|
 icon to view the image on Docker Hub.

 .. list-table:: JAX Docker image components
@@ -110,19 +110,19 @@ icon to view the image on Docker Hub.

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/jax/rocm6.4-jax0.4.35-py3.12/images/sha256-4069398229078f3311128b6d276c6af377c7e97d3363d020b0bf7154fae619ca"><i class="fab fa-docker fa-lg"></i> rocm/jax</a>
+           <a href="https://hub.docker.com/layers/rocm/jax/rocm6.4.1-jax0.4.35-py3.12/images/sha256-7a0745a2a2758bdf86397750bac00e9086cbf67d170cfdbb08af73f7c7d18a6a"><i class="fab fa-docker fa-lg"></i> rocm/jax</a>

      - `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
      - Ubuntu 24.04
-      - `3.12.7 <https://www.python.org/downloads/release/python-3127/>`_
+      - `3.12.10 <https://www.python.org/downloads/release/python-31210/>`_

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/jax/rocm6.4-jax0.4.35-py3.10/images/sha256-a137f901f91ce6c13b424c40a6cf535248d4d20fd36d5daf5eee0570190a4a11"><i class="fab fa-docker fa-lg"></i> rocm/jax</a>
+           <a href="https://hub.docker.com/layers/rocm/jax/rocm6.4.1-jax0.4.35-py3.10/images/sha256-5f9e8d6e6e69fdc9a1a3f2ba3b1234c3f46c53b7468538c07fd18b00899da54f"><i class="fab fa-docker fa-lg"></i> rocm/jax</a>

      - `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
      - Ubuntu 22.04
-      - `3.10.14 <https://www.python.org/downloads/release/python-31014/>`_
+      - `3.10.17 <https://www.python.org/downloads/release/python-31017/>`_

 AMD publishes `Community ROCm JAX Docker images <https://hub.docker.com/r/rocm/jax-community>`_
 with ROCm backends on Docker Hub. The following Docker image tags and
@@ -160,12 +160,14 @@ associated inventories are tested for `ROCm 6.3.2 <https://repo.radeon.com/rocm/
      - Ubuntu 22.04
      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_

+.. _key_rocm_libraries:
+
 Key ROCm libraries for JAX
 ================================================================================

-JAX functionality on ROCm is determined by its underlying library
-dependencies. These ROCm components affect the capabilities, performance, and
-feature set available to developers.
+The following ROCm libraries represent potential targets that could be utilized
+by JAX on ROCm for various computational tasks. The actual libraries used will
+depend on the specific implementation and operations performed.

 .. list-table::
    :header-rows: 1
@@ -173,347 +175,140 @@ feature set available to developers.
    * - ROCm library
      - Version
      - Purpose
-      - Used in
    * - `hipBLAS <https://github.com/ROCm/hipBLAS>`_
      - :version-ref:`hipBLAS rocm_version`
      - Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for
        matrix and vector operations.
-      - Matrix multiplication in ``jax.numpy.matmul``, ``jax.lax.dot`` and
-        ``jax.lax.dot_general``, operations like ``jax.numpy.dot``, which
-        involve vector and matrix computations and batch matrix multiplications
-        ``jax.numpy.einsum`` with matrix-multiplication patterns algebra
-        operations.
    * - `hipBLASLt <https://github.com/ROCm/hipBLASLt>`_
      - :version-ref:`hipBLASLt rocm_version`
      - hipBLASLt is an extension of hipBLAS, providing additional
        features like epilogues fused into the matrix multiplication kernel or
        use of integer tensor cores.
-      - Matrix multiplication in ``jax.numpy.matmul`` or ``jax.lax.dot``, and
-        the XLA (Accelerated Linear Algebra) use hipBLASLt for optimized matrix
-        operations, mixed-precision support, and hardware-specific
-        optimizations.
    * - `hipCUB <https://github.com/ROCm/hipCUB>`_
      - :version-ref:`hipCUB rocm_version`
      - Provides a C++ template library for parallel algorithms for reduction,
        scan, sort and select.
-      - Reduction functions (``jax.numpy.sum``, ``jax.numpy.mean``,
-        ``jax.numpy.prod``, ``jax.numpy.max`` and ``jax.numpy.min``), prefix sum
-        (``jax.numpy.cumsum``, ``jax.numpy.cumprod``) and sorting
-        (``jax.numpy.sort``, ``jax.numpy.argsort``).
    * - `hipFFT <https://github.com/ROCm/hipFFT>`_
      - :version-ref:`hipFFT rocm_version`
      - Provides GPU-accelerated Fast Fourier Transform (FFT) operations.
-      - Used in functions like ``jax.numpy.fft``.
    * - `hipRAND <https://github.com/ROCm/hipRAND>`_
      - :version-ref:`hipRAND rocm_version`
      - Provides fast random number generation for GPUs.
-      - The ``jax.random.uniform``, ``jax.random.normal``,
-        ``jax.random.randint`` and ``jax.random.split``.
    * - `hipSOLVER <https://github.com/ROCm/hipSOLVER>`_
      - :version-ref:`hipSOLVER rocm_version`
      - Provides GPU-accelerated solvers for linear systems, eigenvalues, and
        singular value decompositions (SVD).
-      - Solving linear systems (``jax.numpy.linalg.solve``), matrix
-        factorizations, SVD (``jax.numpy.linalg.svd``) and eigenvalue problems
-        (``jax.numpy.linalg.eig``).
    * - `hipSPARSE <https://github.com/ROCm/hipSPARSE>`_
      - :version-ref:`hipSPARSE rocm_version`
      - Accelerates operations on sparse matrices, such as sparse matrix-vector
        or matrix-matrix products.
-      - Sparse matrix multiplication (``jax.numpy.matmul``), sparse
-        matrix-vector and matrix-matrix products
-        (``jax.experimental.sparse.dot``), sparse linear system solvers and
-        sparse data handling.
    * - `hipSPARSELt <https://github.com/ROCm/hipSPARSELt>`_
      - :version-ref:`hipSPARSELt rocm_version`
      - Accelerates operations on sparse matrices, such as sparse matrix-vector
        or matrix-matrix products.
-      - Sparse matrix multiplication (``jax.numpy.matmul``), sparse
-        matrix-vector and matrix-matrix products
-        (``jax.experimental.sparse.dot``) and sparse linear system solvers.
    * - `MIOpen <https://github.com/ROCm/MIOpen>`_
      - :version-ref:`MIOpen rocm_version`
      - Optimized for deep learning primitives such as convolutions, pooling,
        normalization, and activation functions.
-      - Speeds up convolutional neural networks (CNNs), recurrent neural
-        networks (RNNs), and other layers. Used in operations like
-        ``jax.nn.conv``, ``jax.nn.relu``, and ``jax.nn.batch_norm``.
    * - `RCCL <https://github.com/ROCm/rccl>`_
      - :version-ref:`RCCL rocm_version`
      - Optimized for multi-GPU communication for operations like  all-reduce,
        broadcast, and scatter.
-      - Distribute computations across multiple GPU with ``pmap`` and
-        ``jax.distributed``. XLA automatically uses rccl when executing
-        operations across multiple GPUs on AMD hardware.
    * - `rocThrust <https://github.com/ROCm/rocThrust>`_
      - :version-ref:`rocThrust rocm_version`
      - Provides a C++ template library for parallel algorithms like sorting,
        reduction, and scanning.
-      - Reduction operations like ``jax.numpy.sum``, ``jax.pmap`` for
-        distributed training, which involves parallel reductions or
-        operations like ``jax.numpy.cumsum`` can use rocThrust.

-Supported features
+.. note::
+
+    This table shows ROCm libraries that could potentially be utilized by JAX. Not
+    all libraries may be used in every configuration, and the actual library usage
+    will depend on the specific operations and implementation details.
+
+Supported data types and modules
 ===============================================================================

-The following table maps the public JAX API modules to their supported
-ROCm and JAX versions.
+The following tables lists the supported public JAX API data types and modules.
+
+Supported data types
+--------------------------------------------------------------------------------
+
+ROCm supports all the JAX data types of `jax.dtypes <https://docs.jax.dev/en/latest/jax.dtypes.html>`_
+module, `jax.numpy.dtype <https://docs.jax.dev/en/latest/_autosummary/jax.numpy.dtype.html>`_
+and `default_dtype <https://docs.jax.dev/en/latest/default_dtypes.html>`_ .
+The ROCm supported data types in JAX are collected in the following table.

 .. list-table::
    :header-rows: 1

-    * - Module
-      - Description
-      - As of JAX
-      - As of ROCm
-    * - ``jax.numpy``
-      - Implements the NumPy API, using the primitives in ``jax.lax``.
-      - 0.1.56
-      - 5.0.0
-    * - ``jax.scipy``
-      - Provides GPU-accelerated and differentiable implementations of many
-        functions from the SciPy library, leveraging JAX's transformations
-        (e.g., ``grad``, ``jit``, ``vmap``).
-      - 0.1.56
-      - 5.0.0
-    * - ``jax.lax``
-      - A library of primitives operations that underpins libraries such as
-        ``jax.numpy.`` Transformation rules, such as Jacobian-vector product
-        (JVP) and batching rules, are typically defined as transformations on
-        ``jax.lax`` primitives.
-      - 0.1.57
-      - 5.0.0
-    * - ``jax.random``
-      - Provides a number of routines for deterministic generation of sequences
-        of pseudorandom numbers.
-      - 0.1.58
-      - 5.0.0
-    * - ``jax.sharding``
-      - Allows to define partitioning and distributing arrays across multiple
-        devices.
-      - 0.3.20
-      - 5.1.0
-    * - ``jax.distributed``
-      - Enables the scaling of computations across multiple devices on a single
-        machine or across multiple machines.
-      - 0.1.74
-      - 5.0.0
-    * - ``jax.image``
-      - Contains image manipulation functions like resize, scale and translation.
-      - 0.1.57
-      - 5.0.0
-    * - ``jax.nn``
-      - Contains common functions for neural network libraries.
-      - 0.1.56
-      - 5.0.0
-    * - ``jax.ops``
-      - Computes the minimum, maximum, sum or product within segments of an
-        array.
-      - 0.1.57
-      - 5.0.0
-    * - ``jax.stages``
-      - Contains interfaces to stages of the compiled execution process.
-      - 0.3.4
-      - 5.0.0
-    * - ``jax.extend``
-      - Provides modules for access to JAX internal machinery module. The
-        ``jax.extend`` module defines a library view of some of JAX’s internal
-        components.
-      - 0.4.15
-      - 5.5.0
-    * - ``jax.example_libraries``
-      - Serves as a collection of example code and libraries that demonstrate
-        various capabilities of JAX.
-      - 0.1.74
-      - 5.0.0
-    * - ``jax.experimental``
-      - Namespace for experimental features and APIs that are in development or
-        are not yet fully stable for production use.
-      - 0.1.56
-      - 5.0.0
-    * - ``jax.lib``
-      - Set of internal tools and types for bridging between JAX’s Python
-        frontend and its XLA backend.
-      - 0.4.6
-      - 5.3.0
-    * - ``jax_triton``
-      - Library that integrates the Triton deep learning compiler with JAX.
-      - jax_triton 0.2.0
-      - 6.2.4
-
-jax.scipy module
-------------------------------------------------------------------------------
-
-A SciPy-like API for scientific computing.
-
-.. list-table::
-    :header-rows: 1
-
-    * - Module
-      - As of JAX
-      - As of ROCm
-    * - ``jax.scipy.cluster``
-      - 0.3.11
-      - 5.1.0
-    * - ``jax.scipy.fft``
-      - 0.1.71
-      - 5.0.0
-    * - ``jax.scipy.integrate``
-      - 0.4.15
-      - 5.5.0
-    * - ``jax.scipy.interpolate``
-      - 0.1.76
-      - 5.0.0
-    * - ``jax.scipy.linalg``
-      - 0.1.56
-      - 5.0.0
-    * - ``jax.scipy.ndimage``
-      - 0.1.56
-      - 5.0.0
-    * - ``jax.scipy.optimize``
-      - 0.1.57
-      - 5.0.0
-    * - ``jax.scipy.signal``
-      - 0.1.56
-      - 5.0.0
-    * - ``jax.scipy.spatial.transform``
-      - 0.4.12
-      - 5.4.0
-    * - ``jax.scipy.sparse.linalg``
-      - 0.1.56
-      - 5.0.0
-    * - ``jax.scipy.special``
-      - 0.1.56
-      - 5.0.0
-    * - ``jax.scipy.stats``
-      - 0.1.56
-      - 5.0.0
-
-jax.scipy.stats module
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. list-table::
-   :header-rows: 1
-
-   * - Module
-     - As of JAX
-     - As of ROCm
-   * - ``jax.scipy.stats.bernouli``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.beta``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.betabinom``
-     - 0.1.61
-     - 5.0.0
-   * - ``jax.scipy.stats.binom``
-     - 0.4.14
-     - 5.4.0
-   * - ``jax.scipy.stats.cauchy``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.chi2``
-     - 0.1.61
-     - 5.0.0
-   * - ``jax.scipy.stats.dirichlet``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.expon``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.gamma``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.gennorm``
-     - 0.3.15
-     - 5.2.0
-   * - ``jax.scipy.stats.geom``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.laplace``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.logistic``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.multinomial``
-     - 0.3.18
-     - 5.1.0
-   * - ``jax.scipy.stats.multivariate_normal``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.nbinom``
-     - 0.1.72
-     - 5.0.0
-   * - ``jax.scipy.stats.norm``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.pareto``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.poisson``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.t``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.truncnorm``
-     - 0.4.0
-     - 5.3.0
-   * - ``jax.scipy.stats.uniform``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.vonmises``
-     - 0.4.2
-     - 5.3.0
-   * - ``jax.scipy.stats.wrapcauchy``
-     - 0.4.20
-     - 5.6.0
-
-jax.extend module
-------------------------------------------------------------------------------
-
-Modules for JAX extensions.
-
-.. list-table::
-    :header-rows: 1
-
-    * - Module
-      - As of JAX
-      - As of ROCm
-    * - ``jax.extend.ffi``
-      - 0.4.30
-      - 6.0.0
-    * - ``jax.extend.linear_util``
-      - 0.4.17
-      - 5.6.0
-    * - ``jax.extend.mlir``
-      - 0.4.26
-      - 5.6.0
-    * - ``jax.extend.random``
-      - 0.4.15
-      - 5.5.0
-
-Unsupported JAX features
-===============================================================================
-
-The following GPU-accelerated JAX features are not supported by ROCm for
-the listed supported JAX versions.
-
-.. list-table::
-    :header-rows: 1
-
-    * - Feature
+    * - Data type
      - Description

-    * - Mixed Precision with TF32
-      - Mixed precision with TF32 is used for matrix multiplications,
-        convolutions, and other linear algebra operations, particularly in
-        deep learning workloads like CNNs and transformers.
+    * - ``bfloat16``
+      - 16-bit bfloat (brain floating point).

-    * - XLA int4 support
-      - 4-bit integer (int4) precision in the XLA compiler.
+    * - ``bool``
+      - Boolean.

-    * - MOSAIC (GPU)
-      - Mosaic is a library of kernel-building abstractions for JAX's Pallas system
+    * - ``complex128``
+      - 128-bit complex.
+
+    * - ``complex64``
+      - 64-bit complex.
+
+    * - ``float16``
+      - 16-bit (half precision) floating-point.
+
+    * - ``float32``
+      - 32-bit (single precision) floating-point.
+
+    * - ``float64``
+      - 64-bit (double precision) floating-point.
+
+    * - ``half``
+      - 16-bit (half precision) floating-point.
+
+    * - ``int16``
+      - Signed 16-bit integer.
+
+    * - ``int32``
+      - Signed 32-bit integer.
+
+    * - ``int64``
+      - Signed 64-bit integer.
+
+    * - ``int8``
+      - Signed 8-bit integer.
+
+    * - ``uint16``
+      - Unsigned 16-bit (word) integer.
+
+    * - ``uint32``
+      - Unsigned 32-bit (dword) integer.
+
+    * - ``uint64``
+      - Unsigned 64-bit (qword) integer.
+
+    * - ``uint8``
+      - Unsigned 8-bit (byte) integer.
+
+.. note::
+
+  JAX data type support is effected by the :ref:`key_rocm_libraries` and it's
+  collected on :doc:`ROCm data types and precision support <rocm:reference/precision-support>`
+  page.
+
+Supported modules
+--------------------------------------------------------------------------------
+
+For a complete and up-to-date list of JAX public modules (for example, ``jax.numpy``,
+``jax.scipy``, ``jax.lax``), their descriptions, and usage, please refer directly to the
+`official JAX API documentation <https://jax.readthedocs.io/en/latest/jax.html>`_.
+
+.. note::
+
+  Since version 0.1.56, JAX has full support for ROCm, and the
+  :ref:`Known issues and important notes <jax_comp_known_issues>` section
+  contains details about limitations specific to the ROCm backend. The list of
+  JAX API modules is maintained by the JAX project and is subject to change. 
+  Refer to the official Jax documentation for the most up-to-date information.
--- a/docs/compatibility/ml-compatibility/pytorch-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/pytorch-compatibility.rst
@@ -95,7 +95,7 @@ Docker image compatibility

 AMD validates and publishes `PyTorch images <https://hub.docker.com/r/rocm/pytorch>`_
 with ROCm backends on Docker Hub. The following Docker image tags and associated
-inventories were tested on `ROCm 6.4.0 <https://repo.radeon.com/rocm/apt/6.4/>`_.
+inventories were tested on `ROCm 6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`_.
 Click |docker-icon| to view the image on Docker Hub.

 .. list-table:: PyTorch Docker image components
@@ -116,137 +116,122 @@ Click |docker-icon| to view the image on Docker Hub.

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.6.0/images/sha256-ab1d350b818b90123cfda31363019d11c0d41a8f12a19e3cb2cb40cf0261137d"><i class="fab fa-docker fa-lg"></i></a>
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.6.0/images/sha256-c76af9bfb1c25b0f40d4c29e8652105c57250bf018d23ff595b06bd79666fdd7"><i class="fab fa-docker fa-lg"></i></a>

      - `2.6.0 <https://github.com/ROCm/pytorch/tree/release/2.6>`_
      - 24.04
-      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
+      - `3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
      - `1.6.0 <https://github.com/ROCm/apex/tree/release/1.6.0>`_
      - `0.21.0 <https://github.com/pytorch/vision/tree/v0.21.0>`_
      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
      - `master <https://bitbucket.org/icl/magma/src/master/>`_
-      - `1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
-      - `4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
+      - `1.16.0 <https://github.com/openucx/ucx/tree/v1.16.0>`_
+      - `4.1.6-7ubuntu2 <https://github.com/open-mpi/ompi/tree/v4.1.6>`_
      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.6.0/images/sha256-130536fdfceb374626a7bcb8d00b9d796ddfc3115677d51229e5b852d96b5ef4"><i class="fab fa-docker fa-lg"></i></a>
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.1_ubuntu22.04_py3.10_pytorch_release_2.6.0/images/sha256-f9d226135d51831c810dcb1251636ec61f85c65fcdda03e188c053a5d4f6585b"><i class="fab fa-docker fa-lg"></i></a>

      - `2.6.0 <https://github.com/ROCm/pytorch/tree/release/2.6>`_
      - 22.04
-      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
+      - `3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
      - `1.6.0 <https://github.com/ROCm/apex/tree/release/1.6.0>`_
      - `0.21.0 <https://github.com/pytorch/vision/tree/v0.21.0>`_
      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
      - `master <https://bitbucket.org/icl/magma/src/master/>`_
-      - `1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
-      - `4.0.7 <https://github.com/open-mpi/ompi/tree/v4.0.7>`_
+      - `1.12.1~rc2-1 <https://github.com/openucx/ucx/tree/v1.12.1>`_
+      - `4.1.2-2ubuntu1 <https://github.com/open-mpi/ompi/tree/v4.1.2>`_
      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.5.1/images/sha256-20a2e24b4738dc1f1a44a04f23827918b56c99f7e697e6fccb90e9c4fae8ca9b"><i class="fab fa-docker fa-lg"></i></a>
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.5.1/images/sha256-3490e74d4f43dcdb3351dd334108d1ccd47e5a687c0523a2424ac1bcdd3dd6dd"><i class="fab fa-docker fa-lg"></i></a>

      - `2.5.1 <https://github.com/ROCm/pytorch/tree/release/2.5>`_
      - 24.04
-      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
+      - `3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
      - `1.5.0 <https://github.com/ROCm/apex/tree/release/1.5.0>`_
      - `0.20.1 <https://github.com/pytorch/vision/tree/v0.20.1>`_
      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
      - `master <https://bitbucket.org/icl/magma/src/master/>`_
-      - `1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
-      - `4.0.7 <https://github.com/open-mpi/ompi/tree/v4.0.7>`_
+      - `1.16.0+ds-5ubuntu1 <https://github.com/openucx/ucx/tree/v1.10.0>`_
+      - `4.1.6-7ubuntu2 <https://github.com/open-mpi/ompi/tree/v4.1.6>`_
      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu22.04_py3.11_pytorch_release_2.5.1/images/sha256-f09cb8ca39cc39222fb554060711f5c19130f7b4047aaf41fad4ba3ec470ca03"><i class="fab fa-docker fa-lg"></i></a>
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.1_ubuntu22.04_py3.10_pytorch_release_2.5.1/images/sha256-26c5dfffb4a54625884abca83166940f17dd27bc75f1b24f6e80fbcb7d4e9afb"><i class="fab fa-docker fa-lg"></i></a>

      - `2.5.1 <https://github.com/ROCm/pytorch/tree/release/2.5>`_
      - 22.04
-      - `3.11.9 <https://www.python.org/downloads/release/python-3119/>`_
+      - `3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
      - `1.5.0 <https://github.com/ROCm/apex/tree/release/1.5.0>`_
      - `0.20.1 <https://github.com/pytorch/vision/tree/v0.20.1>`_
      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
      - `master <https://bitbucket.org/icl/magma/src/master/>`_
-      - `1.14.1 <https://github.com/openucx/ucx/tree/v1.14.1>`_
-      - `4.1.5 <https://github.com/open-mpi/ompi/tree/v4.1.5>`_
+      - `1.12.1~rc2-1 <https://github.com/openucx/ucx/tree/v1.12.1>`_
+      - `4.1.2-2ubuntu1 <https://github.com/open-mpi/ompi/tree/v4.1.2>`_
      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.5.1/images/sha256-a91c100d1fe608dae3eb7f60a751630363d4027ac3d077d428e92945204c338e"><i class="fab fa-docker fa-lg"></i></a>
-
-      - `2.5.1 <https://github.com/ROCm/pytorch/tree/release/2.5>`_
-      - 22.04
-      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
-      - `1.5.0 <https://github.com/ROCm/apex/tree/release/1.5.0>`_
-      - `0.20.1 <https://github.com/pytorch/vision/tree/v0.20.1>`_
-      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
-      - `master <https://bitbucket.org/icl/magma/src/master/>`_
-      - `1.14.1 <https://github.com/openucx/ucx/tree/v1.14.1>`_
-      - `4.1.5 <https://github.com/open-mpi/ompi/tree/v4.1.5>`_
-      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
-
-    * - .. raw:: html
-
-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1/images/sha256-66a89ce6485bb887af74bb9bd76bb613ab9834a6b1374649ea7ae379883454a4"><i class="fab fa-docker fa-lg"></i></a>
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.4.1/images/sha256-f378a24561fa6efc178b6dc93fc7d82e5b93653ecd59c89d4476674d29e1284d"><i class="fab fa-docker fa-lg"></i></a>

      - `2.4.1 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
      - 24.04
-      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
+      - `3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
      - `1.4.0 <https://github.com/ROCm/apex/tree/release/1.4.0>`_
      - `0.19.0 <https://github.com/pytorch/vision/tree/v0.19.0>`_
      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
      - `master <https://bitbucket.org/icl/magma/src/master/>`_
-      - `1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
-      - `4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
+      - `1.16.0+ds-5ubuntu1 <https://github.com/openucx/ucx/tree/v1.16.0>`_
+      - `4.1.6-7ubuntu2 <https://github.com/open-mpi/ompi/tree/v4.1.6>`_
      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.4.1/images/sha256-c716cf167e6e49893f11de03606ed37044153aca089e74ca615065c06877f86b"><i class="fab fa-docker fa-lg"></i></a>
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.1_ubuntu22.04_py3.10_pytorch_release_2.4.1/images/sha256-2308dbd0e650b7bf8d548575cbb6e2bdc021f9386384ce570da16d58ee684d22"><i class="fab fa-docker fa-lg"></i></a>

      - `2.4.1 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
      - 22.04
-      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
+      - `3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
      - `1.4.0 <https://github.com/ROCm/apex/tree/release/1.4.0>`_
      - `0.19.0 <https://github.com/pytorch/vision/tree/v0.19.0>`_
      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
      - `master <https://bitbucket.org/icl/magma/src/master/>`_
-      - `1.14.1 <https://github.com/openucx/ucx/tree/v1.14.1>`_
-      - `4.1.5 <https://github.com/open-mpi/ompi/tree/v4.1.5>`_
+      - `1.12.1~rc2-1 <https://github.com/openucx/ucx/tree/v1.12.1>`_
+      - `4.1.2-2ubuntu1 <https://github.com/open-mpi/ompi/tree/v4.1.2>`_
      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.3.0/images/sha256-0434cbc9b07b2c26e39480d7447f676f9057a1054dcff00e0050c25a6eddbd3c"><i class="fab fa-docker fa-lg"></i></a>
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.3.0/images/sha256-eefd2ab019728f91f94c5e6a9463cb0ea900b3011458d18fe5d88e50c0b57d86"><i class="fab fa-docker fa-lg"></i></a>

      - `2.3.0 <https://github.com/ROCm/pytorch/tree/release/2.3>`_
      - 24.04
-      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
+      - `3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
      - `1.3.0 <https://github.com/ROCm/apex/tree/release/1.3.0>`_
      - `0.18.0 <https://github.com/pytorch/vision/tree/v0.18.0>`_
      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13>`_
      - `master <https://bitbucket.org/icl/magma/src/master/>`_
-      - `1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
-      - `4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
+      - `1.16.0+ds-5ubuntu1 <https://github.com/openucx/ucx/tree/v1.16.0>`_
+      - `4.1.6-7ubuntu2 <https://github.com/open-mpi/ompi/tree/v4.1.6>`_
      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.3.0/images/sha256-688b1c0073092615fb98778d78b16191e506097ee116a2d3d2628b264d5d367b"><i class="fab fa-docker fa-lg"></i></a>
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.1_ubuntu22.04_py3.10_pytorch_release_2.3.0/images/sha256-473643226ab0e93a04720b256ed772619878abf9c42b9f84828cefed522696fd"><i class="fab fa-docker fa-lg"></i></a>

      - `2.3.0 <https://github.com/ROCm/pytorch/tree/release/2.3>`_
      - 22.04
-      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
+      - `3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
      - `1.3.0 <https://github.com/ROCm/apex/tree/release/1.3.0>`_
      - `0.18.0 <https://github.com/pytorch/vision/tree/v0.18.0>`_
      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13>`_
      - `master <https://bitbucket.org/icl/magma/src/master/>`_
-      - `1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
-      - `4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
+      - `1.12.1~rc2-1 <https://github.com/openucx/ucx/tree/v1.12.1>`_
+      - `4.1.2-2ubuntu1 <https://github.com/open-mpi/ompi/tree/v4.1.2>`_
      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_

 Key ROCm libraries for PyTorch
--- a/docs/compatibility/ml-compatibility/tensorflow-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/tensorflow-compatibility.rst
@@ -56,7 +56,7 @@ Docker image compatibility
 AMD validates and publishes ready-made `TensorFlow images
 <https://hub.docker.com/r/rocm/tensorflow>`_ with ROCm backends on
 Docker Hub. The following Docker image tags and associated inventories are
-validated for `ROCm 6.4.0 <https://repo.radeon.com/rocm/apt/6.4/>`_. Click
+validated for `ROCm 6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`_. Click
 the |docker-icon| icon to view the image on Docker Hub.

 .. list-table:: TensorFlow Docker image components
@@ -73,82 +73,122 @@ the |docker-icon| icon to view the image on Docker Hub.

           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.12-tf2.18-dev/images/sha256-fa9cf5fa6c6079a7118727531ccd0056c6e3224a42c3d6e78a49e7781daafff4"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>

-      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
+      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
      - dev
      - 24.04
-      - `Python 3.12.4 <https://www.python.org/downloads/release/python-3124/>`_
+      - `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
      - `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`_

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.12-tf2.18-runtime/images/sha256-14addca4b92a47c806b83ebaeed593fc6672cd99f0017ed8dad759fe72ed0309"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.12-tf2.18-runtime/images/sha256-d14d8c4989e7c9a60f4e72461b9e349de72347c6162dcd6897e6f4f80ffbb440"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>

-      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
+      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
      - runtime
      - 24.04
-      - `Python 3.12.4 <https://www.python.org/downloads/release/python-3124/>`_
+      - `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
      - `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`_

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.10-tf2.18-dev/images/sha256-f5e151060df04ff5fb59f5604b49cd371931bbe75b06aec9fe7781397c4be0ce"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.10-tf2.18-dev/images/sha256-081e5bd6615a5dc17247ebd2ccc26895c3feeff086720400fa39b477e60a77c0"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>

-      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.18.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
+      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.18.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
      - dev
      - 22.04
-      - `Python 3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
+      - `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
      - `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`_

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.10-tf2.18-runtime/images/sha256-5cd4c03fdb1036570c0d4929da60a65c4466998dc80f1dc8a5a0b173eae017fb"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.10-tf2.18-runtime/images/sha256-bf369637378264f4af6ddad5ca8b8611d3e372ffbea9ab7a06f1e122f0a0867b"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>

-      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.18.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
+      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.18.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
      - runtime
      - 22.04
-      - `Python 3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
+      - `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
      - `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`_

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.12-tf2.17-dev/images/sha256-b3add80e374a2db2d1088d746e740afa89d439aca02cacba959ad298f5cd2b3f"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.12-tf2.17-dev/images/sha256-5a502008c50d0b6508e6027f911bdff070a7493700ae064bed74e1d22b91ed50"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>

      - `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.17.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
      - dev
      - 24.04
-      - `Python 3.12.4 <https://www.python.org/downloads/release/python-3124/>`_
+      - `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
      - `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`_

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.12-tf2.17-runtime/images/sha256-3a244f026c32177eff7958ffbad390de85b438b2b48b455cc39f15d70fa1270d"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.12-tf2.17-runtime/images/sha256-1ee5dfffceb71ac66617ada33de3a10de0cb74199cc4b82441192e5e92fa2ddf"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>

      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.17.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
      - runtime
      - 24.04
-      - `Python 3.12.4 <https://www.python.org/downloads/release/python-3124/>`_
+      - `Python 3.12.10 <https://www.python.org/downloads/release/python-3124/>`_
      - `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`_

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.10-tf2.17-dev/images/sha256-e0cecdfacb59169335049983cdab6da578c209bb9f4d08aad97e184ae59171a6"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.10-tf2.17-dev/images/sha256-109218ad92bfae83bbd2710475f7502166e1ed54ca0b9748a9cbc3f5a1d75af1"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>

-      - `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.17.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
+      - `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.17.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
      - dev
      - 22.04
-      - `Python 3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
+      - `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
      - `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`_

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.10-tf2.17-runtime/images/sha256-6f43de12f7eb202791b698ac51d28b72098de90034dbcd48486629b0125f7707"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.10-tf2.17-runtime/images/sha256-5d78bd5918d394f92263daa2990e88d695d27200dd90ed83ec64d20c7661c9c1"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>

-      - `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.17.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
+      - `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.17.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
      - runtime
      - 22.04
-      - `Python 3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
+      - `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
      - `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`_

+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.12-tf2.16-dev/images/sha256-b09b1ad921c09c687b7c916141051e9fcf15539a5686e5aa67c689195a522719"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+
+      - `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.16.2-cp312-cp312-manylinux_2_28_x86_64.whl>`__
+      - dev
+      - 24.04
+      - `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
+      - `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`_
+
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.12-tf2.16-runtime/images/sha256-20dbd824e85558abfe33fc9283cc547d88cde3c623fe95322743a5082f883a64"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+
+      - `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.16.2-cp312-cp312-manylinux_2_28_x86_64.whl>`__
+      - runtime
+      - 24.04
+      - `Python 3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
+      - `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`_
+
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.10-tf2.16-dev/images/sha256-36c4fa047c86e2470ac473ec1429aea6d4b8934b90ffeb34d1afab40e7e5b377"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+
+      - `tensorflow-rocm 2.16.2 <https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.10-tf2.16-dev/images/sha256-36c4fa047c86e2470ac473ec1429aea6d4b8934b90ffeb34d1afab40e7e5b377>`__
+      - dev
+      - 22.04
+      - `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
+      - `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`_
+
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.1-py3.10-tf2.16-runtime/images/sha256-a94150ffb81365234ebfa34e764db5474bc6ab7d141b56495eac349778dafcf3"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+
+      - `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1/tensorflow_rocm-2.16.2-cp312-cp312-manylinux_2_28_x86_64.whl>`__
+      - runtime
+      - 22.04
+      - `Python 3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
+      - `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`_
+

 Critical ROCm libraries for TensorFlow
 ===============================================================================
--- a/docs/data/how-to/rocm-for-ai/inference/previous-versions/vllm_0.8.5_20250513-benchmark-models.yaml
+++ b/docs/data/how-to/rocm-for-ai/inference/previous-versions/vllm_0.8.5_20250513-benchmark-models.yaml
@@ -0,0 +1,152 @@
+vllm_benchmark:
+  unified_docker:
+    latest:
+      pull_tag: rocm/vllm:rocm6.3.1_vllm0.8.5_20250513
+      docker_hub_url: https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_vllm_0.8.5_20250513/images/sha256-5c8b4436dd0464119d9df2b44c745fadf81512f18ffb2f4b5dc235c71ebe26b4
+      rocm_version: 6.3.1
+      vllm_version: 0.8.5
+      pytorch_version: 2.7.0+gitf717b2a
+      hipblaslt_version: 0.15
+  model_groups:
+    - group: Meta Llama
+      tag: llama
+      models:
+      - model: Llama 3.1 8B
+        mad_tag: pyt_vllm_llama-3.1-8b
+        model_repo: meta-llama/Llama-3.1-8B-Instruct
+        url: https://huggingface.co/meta-llama/Llama-3.1-8B
+        precision: float16
+      - model: Llama 3.1 70B
+        mad_tag: pyt_vllm_llama-3.1-70b
+        model_repo: meta-llama/Llama-3.1-70B-Instruct
+        url: https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct
+        precision: float16
+      - model: Llama 3.1 405B
+        mad_tag: pyt_vllm_llama-3.1-405b
+        model_repo: meta-llama/Llama-3.1-405B-Instruct
+        url: https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct
+        precision: float16
+      - model: Llama 3.2 11B Vision
+        mad_tag: pyt_vllm_llama-3.2-11b-vision-instruct
+        model_repo: meta-llama/Llama-3.2-11B-Vision-Instruct
+        url: https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct
+        precision: float16
+      - model: Llama 2 7B
+        mad_tag: pyt_vllm_llama-2-7b
+        model_repo: meta-llama/Llama-2-7b-chat-hf
+        url: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
+        precision: float16
+      - model: Llama 2 70B
+        mad_tag: pyt_vllm_llama-2-70b
+        model_repo: meta-llama/Llama-2-70b-chat-hf
+        url: https://huggingface.co/meta-llama/Llama-2-70b-chat-hf
+        precision: float16
+      - model: Llama 3.1 8B FP8
+        mad_tag: pyt_vllm_llama-3.1-8b_fp8
+        model_repo: amd/Llama-3.1-8B-Instruct-FP8-KV
+        url: https://huggingface.co/amd/Llama-3.1-8B-Instruct-FP8-KV
+        precision: float8
+      - model: Llama 3.1 70B FP8
+        mad_tag: pyt_vllm_llama-3.1-70b_fp8
+        model_repo: amd/Llama-3.1-70B-Instruct-FP8-KV
+        url: https://huggingface.co/amd/Llama-3.1-70B-Instruct-FP8-KV
+        precision: float8
+      - model: Llama 3.1 405B FP8
+        mad_tag: pyt_vllm_llama-3.1-405b_fp8
+        model_repo: amd/Llama-3.1-405B-Instruct-FP8-KV
+        url: https://huggingface.co/amd/Llama-3.1-405B-Instruct-FP8-KV
+        precision: float8
+    - group: Mistral AI
+      tag: mistral
+      models:
+      - model: Mixtral MoE 8x7B
+        mad_tag: pyt_vllm_mixtral-8x7b
+        model_repo: mistralai/Mixtral-8x7B-Instruct-v0.1
+        url: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
+        precision: float16
+      - model: Mixtral MoE 8x22B
+        mad_tag: pyt_vllm_mixtral-8x22b
+        model_repo: mistralai/Mixtral-8x22B-Instruct-v0.1
+        url: https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1
+        precision: float16
+      - model: Mistral 7B
+        mad_tag: pyt_vllm_mistral-7b
+        model_repo: mistralai/Mistral-7B-Instruct-v0.3
+        url: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
+        precision: float16
+      - model: Mixtral MoE 8x7B FP8
+        mad_tag: pyt_vllm_mixtral-8x7b_fp8
+        model_repo: amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV
+        url: https://huggingface.co/amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV
+        precision: float8
+      - model: Mixtral MoE 8x22B FP8
+        mad_tag: pyt_vllm_mixtral-8x22b_fp8
+        model_repo: amd/Mixtral-8x22B-Instruct-v0.1-FP8-KV
+        url: https://huggingface.co/amd/Mixtral-8x22B-Instruct-v0.1-FP8-KV
+        precision: float8
+      - model: Mistral 7B FP8
+        mad_tag: pyt_vllm_mistral-7b_fp8
+        model_repo: amd/Mistral-7B-v0.1-FP8-KV
+        url: https://huggingface.co/amd/Mistral-7B-v0.1-FP8-KV
+        precision: float8
+    - group: Qwen
+      tag: qwen
+      models:
+      - model: Qwen2 7B
+        mad_tag: pyt_vllm_qwen2-7b
+        model_repo: Qwen/Qwen2-7B-Instruct
+        url: https://huggingface.co/Qwen/Qwen2-7B-Instruct
+        precision: float16
+      - model: Qwen2 72B
+        mad_tag: pyt_vllm_qwen2-72b
+        model_repo: Qwen/Qwen2-72B-Instruct
+        url: https://huggingface.co/Qwen/Qwen2-72B-Instruct
+        precision: float16
+      - model: QwQ-32B
+        mad_tag: pyt_vllm_qwq-32b
+        model_repo: Qwen/QwQ-32B
+        url: https://huggingface.co/Qwen/QwQ-32B
+        precision: float16
+        tunableop: true
+    - group: Databricks DBRX
+      tag: dbrx
+      models:
+      - model: DBRX Instruct
+        mad_tag: pyt_vllm_dbrx-instruct
+        model_repo: databricks/dbrx-instruct
+        url: https://huggingface.co/databricks/dbrx-instruct
+        precision: float16
+      - model: DBRX Instruct FP8
+        mad_tag: pyt_vllm_dbrx_fp8
+        model_repo: amd/dbrx-instruct-FP8-KV
+        url: https://huggingface.co/amd/dbrx-instruct-FP8-KV
+        precision: float8
+    - group: Google Gemma
+      tag: gemma
+      models:
+      - model: Gemma 2 27B
+        mad_tag: pyt_vllm_gemma-2-27b
+        model_repo: google/gemma-2-27b
+        url: https://huggingface.co/google/gemma-2-27b
+        precision: float16
+    - group: Cohere
+      tag: cohere
+      models:
+      - model: C4AI Command R+ 08-2024
+        mad_tag: pyt_vllm_c4ai-command-r-plus-08-2024
+        model_repo: CohereForAI/c4ai-command-r-plus-08-2024
+        url: https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024
+        precision: float16
+      - model: C4AI Command R+ 08-2024 FP8
+        mad_tag: pyt_vllm_command-r-plus_fp8
+        model_repo: amd/c4ai-command-r-plus-FP8-KV
+        url: https://huggingface.co/amd/c4ai-command-r-plus-FP8-KV
+        precision: float8
+    - group: DeepSeek
+      tag: deepseek
+      models:
+      - model: DeepSeek MoE 16B
+        mad_tag: pyt_vllm_deepseek-moe-16b-chat
+        model_repo: deepseek-ai/deepseek-moe-16b-chat
+        url: https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat
+        precision: float16
--- a/docs/data/how-to/rocm-for-ai/inference/pytorch-inference-benchmark-models.yaml
+++ b/docs/data/how-to/rocm-for-ai/inference/pytorch-inference-benchmark-models.yaml
@@ -23,3 +23,11 @@ pytorch_inference_benchmark:
        model_repo: meta-llama/Llama-3.1-8B-Instruct
        url: https://huggingface.co/chaidiscovery/chai-1
        precision: float16
+    - group: Mochi Video
+      tag: mochi
+      models:
+      - model: Mochi 1
+        mad_tag: pyt_mochi_video_inference
+        model_repo: genmo/mochi-1-preview
+        url: https://huggingface.co/genmo/mochi-1-preview
+        precision: float16
--- a/docs/data/how-to/rocm-for-ai/inference/vllm-benchmark-models.yaml
+++ b/docs/data/how-to/rocm-for-ai/inference/vllm-benchmark-models.yaml
@@ -1,14 +1,14 @@
 vllm_benchmark:
  unified_docker:
    latest:
-      pull_tag: rocm/vllm:rocm6.3.1_instinct_vllm0.8.3_20250415
-      docker_hub_url: https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_instinct_vllm0.8.3_20250415/images/sha256-ad9062dea3483d59dedb17c67f7c49f30eebd6eb37c3fac0a171fb19696cc845
+      pull_tag: rocm/vllm:rocm6.3.1_vllm0.8.5_20250521
+      docker_hub_url: https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_vllm_0.8.5_20250521/images/sha256-38410c51af7208897cd8b737c9bdfc126e9bc8952d4aa6b88c85482f03092a11
      rocm_version: 6.3.1
-      vllm_version: 0.8.3
-      pytorch_version: 2.7.0 (dev nightly)
-      hipblaslt_version: 0.13
+      vllm_version: 0.8.5 (0.8.6.dev315+g91a560098.rocm631)
+      pytorch_version: 2.7.0+gitf717b2a
+      hipblaslt_version: 0.15
  model_groups:
-    - group: Llama
+    - group: Meta Llama
      tag: llama
      models:
      - model: Llama 3.1 8B
@@ -56,7 +56,7 @@ vllm_benchmark:
        model_repo: amd/Llama-3.1-405B-Instruct-FP8-KV
        url: https://huggingface.co/amd/Llama-3.1-405B-Instruct-FP8-KV
        precision: float8
-    - group: Mistral
+    - group: Mistral AI
      tag: mistral
      models:
      - model: Mixtral MoE 8x7B
@@ -108,7 +108,7 @@ vllm_benchmark:
        url: https://huggingface.co/Qwen/QwQ-32B
        precision: float16
        tunableop: true
-    - group: DBRX
+    - group: Databricks DBRX
      tag: dbrx
      models:
      - model: DBRX Instruct
@@ -121,7 +121,7 @@ vllm_benchmark:
        model_repo: amd/dbrx-instruct-FP8-KV
        url: https://huggingface.co/amd/dbrx-instruct-FP8-KV
        precision: float8
-    - group: Gemma
+    - group: Google Gemma
      tag: gemma
      models:
      - model: Gemma 2 27B
@@ -150,3 +150,18 @@ vllm_benchmark:
        model_repo: deepseek-ai/deepseek-moe-16b-chat
        url: https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat
        precision: float16
+    - group: Microsoft Phi
+      tag: phi
+      models:
+      - model: Phi-4
+        mad_tag: pyt_vllm_phi-4
+        model_repo: microsoft/phi-4
+        url: https://huggingface.co/microsoft/phi-4
+    - group: TII Falcon
+      tag: falcon
+      models:
+      - model: Falcon 180B
+        mad_tag: pyt_vllm_falcon-180b
+        model_repo: tiiuae/falcon-180B
+        url: https://huggingface.co/tiiuae/falcon-180B
+        precision: float16
--- a/docs/data/how-to/rocm-for-ai/training/megatron-lm-benchmark-models.yaml
+++ b/docs/data/how-to/rocm-for-ai/training/megatron-lm-benchmark-models.yaml
@@ -0,0 +1,29 @@
+megatron-lm_benchmark:
+  model_groups:
+    - group: Meta Llama
+      tag: llama
+      models:
+      - model: Llama 3.3 70B
+        mad_tag: pyt_megatron_lm_train_llama-3.3-70b
+      - model: Llama 3.1 8B
+        mad_tag: pyt_megatron_lm_train_llama-3.1-8b
+      - model: Llama 3.1 70B
+        mad_tag: pyt_megatron_lm_train_llama-3.1-70b
+      - model: Llama 2 7B
+        mad_tag: pyt_megatron_lm_train_llama-2-7b
+      - model: Llama 2 70B
+        mad_tag: pyt_megatron_lm_train_llama-2-70b
+    - group: DeepSeek
+      tag: deepseek
+      models:
+      - model: DeepSeek-V3
+        mad_tag: pyt_megatron_lm_train_deepseek-v3-proxy
+      - model: DeepSeek-V2-Lite
+        mad_tag: pyt_megatron_lm_train_deepseek-v2-lite-16b
+    - group: Mistral AI
+      tag: mistral
+      models:
+      - model: Mixtral 8x7B
+        mad_tag: pyt_megatron_lm_train_mixtral-8x7b
+      - model: Mixtral 8x22B
+        mad_tag: pyt_megatron_lm_train_mixtral-8x22b-proxy
--- a/docs/data/how-to/rocm-for-ai/training/pytorch-training-benchmark-models.yaml
+++ b/docs/data/how-to/rocm-for-ai/training/pytorch-training-benchmark-models.yaml
--- a/docs/how-to/rocm-for-ai/inference-optimization/workload.rst
+++ b/docs/how-to/rocm-for-ai/inference-optimization/workload.rst
@@ -678,7 +678,7 @@ To specify the quantization scaling config, use the
 ``--quantization-param-path`` parameter. If the parameter is not specified,
 the default scaling factor of ``1`` is used, which can lead to less accurate
 results. To generate ``kv-cache`` scaling JSON file, see `FP8 KV
-Cache <https://github.com/vllm-project/vllm/blob/main/examples/fp8/README.md>`__
+Cache <https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_kv_cache/README.md>`__
 in the vLLM GitHub repository.

 Two sample Llama scaling configuration files are in vLLM for ``llama2-70b`` and
--- a/docs/how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.8.5-20250513.rst
+++ b/docs/how-to/rocm-for-ai/inference/benchmark-docker/previous-versions/vllm-0.8.5-20250513.rst
@@ -0,0 +1,319 @@
+.. meta::
+   :description: Learn how to validate LLM inference performance on MI300X accelerators using AMD MAD and the
+                 ROCm vLLM Docker image.
+   :keywords: model, MAD, automation, dashboarding, validate
+
+**********************************
+vLLM inference performance testing
+**********************************
+
+.. _vllm-benchmark-unified-docker:
+
+.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/previous-versions/vllm_0.8.5_20250513-benchmark-models.yaml
+   {% set unified_docker = data.vllm_benchmark.unified_docker.latest %}
+   {% set model_groups = data.vllm_benchmark.model_groups %}
+
+   The `ROCm vLLM Docker <{{ unified_docker.docker_hub_url }}>`_ image offers
+   a prebuilt, optimized environment for validating large language model (LLM)
+   inference performance on AMD Instinct™ MI300X series accelerators. This ROCm vLLM
+   Docker image integrates vLLM and PyTorch tailored specifically for MI300X series
+   accelerators and includes the following components:
+
+   * `ROCm {{ unified_docker.rocm_version }} <https://github.com/ROCm/ROCm>`_
+
+   * `vLLM {{ unified_docker.vllm_version }} <https://docs.vllm.ai/en/latest>`_
+
+   * `PyTorch {{ unified_docker.pytorch_version }} <https://github.com/pytorch/pytorch>`_
+
+   * `hipBLASLt {{ unified_docker.hipblaslt_version }} <https://github.com/ROCm/hipBLASLt>`_
+
+   With this Docker image, you can quickly test the :ref:`expected
+   inference performance numbers <vllm-benchmark-performance-measurements>` for
+   MI300X series accelerators.
+
+   .. _vllm-benchmark-available-models:
+
+   Supported models
+   ================
+
+   The following models are supported for inference performance benchmarking
+   with vLLM and ROCm. Some instructions, commands, and recommendations in this
+   documentation might vary by model -- select one to get started.
+
+   .. raw:: html
+
+      <div id="vllm-benchmark-ud-params-picker" class="container-fluid">
+        <div class="row">
+          <div class="col-2 me-2 model-param-head">Model group</div>
+          <div class="row col-10">
+   {% for model_group in model_groups %}
+            <div class="col-3 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
+   {% endfor %}
+          </div>
+        </div>
+
+        <div class="row mt-1">
+          <div class="col-2 me-2 model-param-head">Model</div>
+          <div class="row col-10">
+   {% for model_group in model_groups %}
+      {% set models = model_group.models %}
+      {% for model in models %}
+         {% if models|length % 3 == 0 %}
+            <div class="col-4 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
+         {% else %}
+            <div class="col-6 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
+         {% endif %}
+      {% endfor %}
+   {% endfor %}
+          </div>
+        </div>
+      </div>
+
+   .. _vllm-benchmark-vllm:
+
+   {% for model_group in model_groups %}
+      {% for model in model_group.models %}
+
+   .. container:: model-doc {{model.mad_tag}}
+
+      .. note::
+
+         See the `{{ model.model }} model card on Hugging Face <{{ model.url }}>`_ to learn more about your selected model.
+         Some models require access authorization prior to use via an external license agreement through a third party.
+
+      {% endfor %}
+   {% endfor %}
+
+   .. note::
+
+      vLLM is a toolkit and library for LLM inference and serving. AMD implements
+      high-performance custom kernels and modules in vLLM to enhance performance.
+      See :ref:`fine-tuning-llms-vllm` and :ref:`mi300x-vllm-optimization` for
+      more information.
+
+   .. _vllm-benchmark-performance-measurements:
+
+   Performance measurements
+   ========================
+
+   To evaluate performance, the
+   `Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
+   page provides reference throughput and latency measurements for inferencing
+   popular AI models.
+
+   .. note::
+
+      The performance data presented in
+      `Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
+      should not be interpreted as the peak performance achievable by AMD
+      Instinct MI325X and MI300X accelerators or ROCm software.
+
+   Advanced features and known issues
+   ==================================
+
+   For information on experimental features and known issues related to ROCm optimization efforts on vLLM,
+   see the developer's guide at `<https://github.com/ROCm/vllm/blob/main/docs/dev-docker/README.md>`__.
+
+   System validation
+   =================
+
+   Before running AI workloads, it's important to validate that your AMD hardware is configured
+   correctly and performing optimally.
+
+   To optimize performance, disable automatic NUMA balancing. Otherwise, the GPU
+   might hang until the periodic balancing is finalized. For more information,
+   see the :ref:`system validation steps <rocm-for-ai-system-optimization>`.
+
+   .. code-block:: shell
+      # disable automatic NUMA balancing
+      sh -c 'echo 0 > /proc/sys/kernel/numa_balancing'
+      # check if NUMA balancing is disabled (returns 0 if disabled)
+      cat /proc/sys/kernel/numa_balancing
+      0
+   To test for optimal performance, consult the recommended :ref:`System health benchmarks
+   <rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
+   system's configuration.
+
+   Pull the Docker image
+   =====================
+
+   Download the `ROCm vLLM Docker image <{{ unified_docker.docker_hub_url }}>`_.
+   Use the following command to pull the Docker image from Docker Hub.
+
+   .. code-block:: shell
+      docker pull {{ unified_docker.pull_tag }}
+   Benchmarking
+   ============
+
+   Once the setup is complete, choose between two options to reproduce the
+   benchmark results:
+
+   .. _vllm-benchmark-mad:
+
+   {% for model_group in model_groups %}
+      {% for model in model_group.models %}
+
+   .. container:: model-doc {{model.mad_tag}}
+
+      .. tab-set::
+
+         .. tab-item:: MAD-integrated benchmarking
+
+            Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
+            directory and install the required packages on the host machine.
+
+            .. code-block:: shell
+               git clone https://github.com/ROCm/MAD
+               cd MAD
+               pip install -r requirements.txt
+            Use this command to run the performance benchmark test on the `{{model.model}} <{{ model.url }}>`_ model
+            using one GPU with the ``{{model.precision}}`` data type on the host machine.
+
+            .. code-block:: shell
+               export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
+               python3 tools/run_models.py --tags {{model.mad_tag}} --keep-model-dir --live-output --timeout 28800
+            MAD launches a Docker container with the name
+            ``container_ci-{{model.mad_tag}}``. The latency and throughput reports of the
+            model are collected in the following path: ``~/MAD/reports_{{model.precision}}/``.
+
+            Although the :ref:`available models <vllm-benchmark-available-models>` are preconfigured
+            to collect latency and throughput performance data, you can also change the benchmarking
+            parameters. See the standalone benchmarking tab for more information.
+
+            {% if model.tunableop %}
+
+            .. note::
+
+               For improved performance, consider enabling :ref:`PyTorch TunableOp <mi300x-tunableop>`.
+               TunableOp automatically explores different implementations and configurations of certain PyTorch
+               operators to find the fastest one for your hardware.
+
+               By default, ``{{model.mad_tag}}`` runs with TunableOp disabled
+               (see
+               `<https://github.com/ROCm/MAD/blob/develop/models.json>`__). To
+               enable it, edit the default run behavior in the ``models.json``
+               configuration before running inference -- update the model's run
+               ``args`` by changing ``--tunableop off`` to ``--tunableop on``.
+
+               Enabling TunableOp triggers a two-pass run -- a warm-up followed by the performance-collection run.
+
+            {% endif %}
+
+         .. tab-item:: Standalone benchmarking
+
+            Run the vLLM benchmark tool independently by starting the
+            `Docker container <{{ unified_docker.docker_hub_url }}>`_
+            as shown in the following snippet.
+
+            .. code-block::
+               docker pull {{ unified_docker.pull_tag }}
+               docker run -it --device=/dev/kfd --device=/dev/dri --group-add video --shm-size 16G --security-opt seccomp=unconfined --security-opt apparmor=unconfined --cap-add=SYS_PTRACE -v $(pwd):/workspace --env HUGGINGFACE_HUB_CACHE=/workspace --name test {{ unified_docker.pull_tag }}
+            In the Docker container, clone the ROCm MAD repository and navigate to the
+            benchmark scripts directory at ``~/MAD/scripts/vllm``.
+
+            .. code-block::
+               git clone https://github.com/ROCm/MAD
+               cd MAD/scripts/vllm
+            To start the benchmark, use the following command with the appropriate options.
+
+            .. code-block::
+               ./vllm_benchmark_report.sh -s $test_option -m {{model.model_repo}} -g $num_gpu -d {{model.precision}}
+            .. list-table::
+               :header-rows: 1
+               :align: center
+
+               * - Name
+                 - Options
+                 - Description
+
+               * - ``$test_option``
+                 - latency
+                 - Measure decoding token latency
+
+               * -
+                 - throughput
+                 - Measure token generation throughput
+
+               * -
+                 - all
+                 - Measure both throughput and latency
+
+               * - ``$num_gpu``
+                 - 1 or 8
+                 - Number of GPUs
+
+               * - ``$datatype``
+                 - ``float16`` or ``float8``
+                 - Data type
+
+            .. note::
+
+               The input sequence length, output sequence length, and tensor parallel (TP) are
+               already configured. You don't need to specify them with this script.
+
+            .. note::
+
+               If you encounter the following error, pass your access-authorized Hugging
+               Face token to the gated models.
+
+               .. code-block::
+                  OSError: You are trying to access a gated repo.
+                  # pass your HF_TOKEN
+                  export HF_TOKEN=$your_personal_hf_token
+            Here are some examples of running the benchmark with various options.
+
+            * Latency benchmark
+
+              Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with ``{{model.precision}}`` precision.
+
+              .. code-block::
+                 ./vllm_benchmark_report.sh -s latency -m {{model.model_repo}} -g 8 -d {{model.precision}}
+              Find the latency report at ``./reports_{{model.precision}}_vllm_rocm{{unified_docker.rocm_version}}/summary/{{model.model_repo.split('/', 1)[1] if '/' in model.model_repo else model.model_repo}}_latency_report.csv``.
+
+            * Throughput benchmark
+
+              Use this command to benchmark the throughput of the {{model.model}} model on eight GPUs with ``{{model.precision}}`` precision.
+
+              .. code-block:: shell
+                 ./vllm_benchmark_report.sh -s throughput -m {{model.model_repo}} -g 8 -d {{model.precision}}
+              Find the throughput report at ``./reports_{{model.precision}}_vllm_rocm{{unified_docker.rocm_version}}/summary/{{model.model_repo.split('/', 1)[1] if '/' in model.model_repo else model.model_repo}}_throughput_report.csv``.
+
+            .. raw:: html
+
+               <style>
+               mjx-container[jax="CHTML"][display="true"] {
+                  text-align: left;
+                  margin: 0;
+               }
+               </style>
+
+            .. note::
+
+               Throughput is calculated as:
+
+               - .. math:: throughput\_tot = requests \times (\mathsf{\text{input lengths}} + \mathsf{\text{output lengths}}) / elapsed\_time
+
+               - .. math:: throughput\_gen = requests \times \mathsf{\text{output lengths}} / elapsed\_time
+      {% endfor %}
+   {% endfor %}
+
+Further reading
+===============
+
+- To learn more about the options for latency and throughput benchmark scripts,
+  see `<https://github.com/ROCm/vllm/tree/main/benchmarks>`_.
+
+- To learn more about system settings and management practices to configure your system for
+  MI300X accelerators, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_
+
+- For application performance optimization strategies for HPC and AI workloads,
+  including inference with vLLM, see :doc:`../../../inference-optimization/workload`.
+
+- To learn how to run LLM models from Hugging Face or your own model, see
+  :doc:`Running models from Hugging Face <../../hugging-face-models>`.
+
+- To learn how to optimize inference on LLMs, see
+  :doc:`Inference optimization <../../../inference-optimization/index>`.
+
+- To learn how to fine-tune LLMs, see
+  :doc:`Fine-tuning LLMs <../../../fine-tuning/index>`.
--- a/docs/how-to/rocm-for-ai/inference/benchmark-docker/pytorch-inference.rst
+++ b/docs/how-to/rocm-for-ai/inference/benchmark-docker/pytorch-inference.rst
@@ -24,20 +24,24 @@ PyTorch inference performance testing
   Supported models
   ================

+   The following models are supported for inference performance benchmarking
+   with PyTorch and ROCm. Some instructions, commands, and recommendations in this
+   documentation might vary by model -- select one to get started.
+
   .. raw:: html

      <div id="vllm-benchmark-ud-params-picker" class="container-fluid">
        <div class="row">
-          <div class="col-2 me-2 model-param-head">Model</div>
+          <div class="col-2 me-2 model-param-head">Model group</div>
          <div class="row col-10">
   {% for model_group in model_groups %}
-            <div class="col-6 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
+            <div class="col-4 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
   {% endfor %}
          </div>
        </div>

        <div class="row mt-1" style="display: none;">
-          <div class="col-2 me-2 model-param-head">Model variant</div>
+          <div class="col-2 me-2 model-param-head">Model</div>
          <div class="row col-10">
   {% for model_group in model_groups %}
      {% set models = model_group.models %}
@@ -99,7 +103,7 @@ PyTorch inference performance testing

         The Chai-1 benchmark uses a specifically selected Docker image using ROCm 6.2.3 and PyTorch 2.3.0 to address an accuracy issue.

-   .. container:: model-doc pyt_clip_inference
+   .. container:: model-doc pyt_clip_inference pyt_mochi_video_inference

      Use the following command to pull the `ROCm PyTorch Docker image <https://hub.docker.com/layers/rocm/pytorch/latest/images/sha256-05b55983e5154f46e7441897d0908d79877370adca4d1fff4899d9539d6c4969>`_ from Docker Hub.

@@ -162,11 +166,14 @@ Further reading
 - To learn more about system settings and management practices to configure your system for
  MI300X accelerators, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_.

+- For application performance optimization strategies for HPC and AI workloads,
+  including inference with vLLM, see :doc:`../../inference-optimization/workload`.
+
 - To learn how to run LLM models from Hugging Face or your model, see
-  :doc:`Running models from Hugging Face <hugging-face-models>`.
+  :doc:`Running models from Hugging Face <../hugging-face-models>`.

 - To learn how to optimize inference on LLMs, see
-  :doc:`Inference optimization <../inference-optimization/index>`.
+  :doc:`Inference optimization <../../inference-optimization/index>`.

 - To learn how to fine-tune LLMs, see
-  :doc:`Fine-tuning LLMs <../fine-tuning/index>`.
+  :doc:`Fine-tuning LLMs <../../fine-tuning/index>`.
--- a/docs/how-to/rocm-for-ai/inference/benchmark-docker/vllm.rst
+++ b/docs/how-to/rocm-for-ai/inference/benchmark-docker/vllm.rst
@@ -24,7 +24,7 @@ vLLM inference performance testing

   * `vLLM {{ unified_docker.vllm_version }} <https://docs.vllm.ai/en/latest>`_

-   * `PyTorch {{ unified_docker.pytorch_version }} <https://github.com/pytorch/pytorch>`_
+   * `PyTorch {{ unified_docker.pytorch_version }} <https://github.com/ROCm/pytorch.git>`_

   * `hipBLASLt {{ unified_docker.hipblaslt_version }} <https://github.com/ROCm/hipBLASLt>`_

@@ -37,11 +37,15 @@ vLLM inference performance testing
   Supported models
   ================

+   The following models are supported for inference performance benchmarking
+   with vLLM and ROCm. Some instructions, commands, and recommendations in this
+   documentation might vary by model -- select one to get started.
+
   .. raw:: html

      <div id="vllm-benchmark-ud-params-picker" class="container-fluid">
        <div class="row">
-          <div class="col-2 me-2 model-param-head">Model</div>
+          <div class="col-2 me-2 model-param-head">Model group</div>
          <div class="row col-10">
   {% for model_group in model_groups %}
            <div class="col-3 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
@@ -50,7 +54,7 @@ vLLM inference performance testing
        </div>

        <div class="row mt-1">
-          <div class="col-2 me-2 model-param-head">Model variant</div>
+          <div class="col-2 me-2 model-param-head">Model</div>
          <div class="row col-10">
   {% for model_group in model_groups %}
      {% set models = model_group.models %}
@@ -318,23 +322,23 @@ vLLM inference performance testing
 Further reading
 ===============

- For application performance optimization strategies for HPC and AI workloads,
-  including inference with vLLM, see :doc:`../inference-optimization/workload`.
-
 - To learn more about the options for latency and throughput benchmark scripts,
  see `<https://github.com/ROCm/vllm/tree/main/benchmarks>`_.

 - To learn more about system settings and management practices to configure your system for
  MI300X accelerators, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_

+- For application performance optimization strategies for HPC and AI workloads,
+  including inference with vLLM, see :doc:`../../inference-optimization/workload`.
+
 - To learn how to run LLM models from Hugging Face or your own model, see
-  :doc:`Running models from Hugging Face <hugging-face-models>`.
+  :doc:`Running models from Hugging Face <../hugging-face-models>`.

 - To learn how to optimize inference on LLMs, see
-  :doc:`Inference optimization <../inference-optimization/index>`.
+  :doc:`Inference optimization <../../inference-optimization/index>`.

 - To learn how to fine-tune LLMs, see
-  :doc:`Fine-tuning LLMs <../fine-tuning/index>`.
+  :doc:`Fine-tuning LLMs <../../fine-tuning/index>`.

 Previous versions
 =================
@@ -352,6 +356,20 @@ for benchmarking, see the version-specific documentation.
     - PyTorch version
     - Resources

+   * - 6.3.1
+     - 0.8.5
+     - 2.7.0
+     - 
+       * :doc:`Documentation <previous-versions/vllm-0.8.5-20250513>`
+       * `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_vllm_0.8.5_20250513/images/sha256-5c8b4436dd0464119d9df2b44c745fadf81512f18ffb2f4b5dc235c71ebe26b4>`_
+
+   * - 6.3.1
+     - 0.8.3
+     - 2.7.0
+     - 
+       * `Documentation <https://rocm.docs.amd.com/en/docs-6.4.0/how-to/rocm-for-ai/inference/vllm-benchmark.html>`_
+       * `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_instinct_vllm0.8.3_20250415/images/sha256-ad9062dea3483d59dedb17c67f7c49f30eebd6eb37c3fac0a171fb19696cc845>`_
+
   * - 6.3.1
     - 0.7.3
     - 2.7.0
--- a/docs/how-to/rocm-for-ai/training/benchmark-docker/megatron-lm.rst
+++ b/docs/how-to/rocm-for-ai/training/benchmark-docker/megatron-lm.rst
--- a/docs/reference/api-libraries.md
+++ b/docs/reference/api-libraries.md
@@ -45,7 +45,7 @@
 (communication-libraries)=

 * {doc}`RCCL <rccl:index>`
-* [rocSHMEM](https://github.com/ROCm/rocSHMEM)
+* {doc}`rocSHMEM <rocshmem:index>`
 :::

 :::{grid-item-card} Math
--- a/docs/reference/gpu-arch-specs.rst
+++ b/docs/reference/gpu-arch-specs.rst
@@ -282,7 +282,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - GFXIP Major version
          - GFXIP Minor version
        *
-          - Radeon AI PRO R7900
+          - Radeon AI PRO R9700
          - RDNA4
          - gfx1201
          - 16
@@ -305,7 +305,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1101
          - 28
          - 54
-          - 32
+          - 32 or 64
          - 128
          - 56
          - 4
@@ -314,7 +314,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 768
-          - 16
+          - 32
          - 11
          - 0
        *
@@ -323,7 +323,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1100
          - 48
          - 96
-          - 32
+          - 32 or 64
          - 128
          - 96
          - 6
@@ -332,7 +332,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 768
-          - 16
+          - 32
          - 11
          - 0
        *
@@ -341,7 +341,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1100
          - 48
          - 96
-          - 32
+          - 32 or 64
          - 128
          - 96
          - 6
@@ -350,7 +350,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 768
-          - 16
+          - 32
          - 11
          - 0
        *
@@ -359,7 +359,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1100
          - 48
          - 70
-          - 32
+          - 32 or 64
          - 128
          - 96
          - 6
@@ -368,7 +368,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 768
-          - 16
+          - 32
          - 11
          - 0
        *
@@ -377,7 +377,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1100
          - 32
          - 70
-          - 32
+          - 32 or 64
          - 128
          - 64
          - 6
@@ -386,7 +386,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 768
-          - 16
+          - 32
          - 11
          - 0
        *
@@ -395,7 +395,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1101
          - 16
          - 48
-          - 32
+          - 32 or 64
          - 128
          - 64
          - 4
@@ -404,7 +404,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 768
-          - 16
+          - 32
          - 11
          - 0
        *
@@ -413,7 +413,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1030
          - 32
          - 60
-          - 32
+          - 32 or 64
          - 128
          - 128
          - 4
@@ -422,7 +422,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 512
-          - 16
+          - 32
          - 10
          - 3
        *
@@ -431,7 +431,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1032
          - 8
          - 28
-          - 32
+          - 32 or 64
          - 128
          - 32
          - 2
@@ -440,7 +440,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 512
-          - 16
+          - 32
          - 10
          - 3
        *
@@ -449,7 +449,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1030
          - 32
          - 72
-          - 32
+          - 32 or 64
          - 128
          - 128
          - 4
@@ -458,7 +458,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 512
-          - 16
+          - 32
          - 10
          - 3
        *
@@ -467,7 +467,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1012
          - 8
          - 22
-          - 32
+          - 32 or 64
          - 128
          -
          - 4
@@ -525,7 +525,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
        *
          - Radeon RX 9070 XT
          - RDNA4
-          - gfx1200
+          - gfx1201
          - 16
          - 64
          - 32 or 64
@@ -540,6 +540,42 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 32
          - 12
          - 0
+        *
+          - Radeon RX 9070 GRE
+          - RDNA4
+          - gfx1201
+          - 16
+          - 48
+          - 32 or 64
+          - 128
+          - 48
+          - 6
+          - N/A
+          - 32
+          - 16
+          - 32
+          - 768
+          - 32
+          - 12
+          - 0
+        *
+          - Radeon RX 9070
+          - RDNA4
+          - gfx1201
+          - 16
+          - 56
+          - 32 or 64
+          - 128
+          - 64
+          - 8
+          - N/A
+          - 32
+          - 16
+          - 32
+          - 768
+          - 32
+          - 12
+          - 0
        *
          - Radeon RX 9060 XT
          - RDNA4
@@ -564,7 +600,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1100
          - 24
          - 96
-          - 32
+          - 32 or 64
          - 128
          - 96
          - 6
@@ -573,7 +609,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 768
-          - 16
+          - 32
          - 11
          - 0
        *
@@ -582,7 +618,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1100
          - 20
          - 84
-          - 32
+          - 32 or 64
          - 128
          - 80
          - 6
@@ -591,7 +627,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 768
-          - 16
+          - 32
          - 11
          - 0
        *
@@ -600,7 +636,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1100
          - 16
          - 80
-          - 32
+          - 32 or 64
          - 128
          - 64
          - 6
@@ -609,7 +645,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 768
-          - 16
+          - 32
          - 11
          - 0
        *
@@ -618,7 +654,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1101
          - 16
          - 60
-          - 32
+          - 32 or 64
          - 128
          - 64
          - 4
@@ -627,7 +663,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 768
-          - 16
+          - 32
          - 11
          - 0
        *
@@ -636,7 +672,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1101
          - 12
          - 54
-          - 32
+          - 32 or 64
          - 128
          - 48
          - 4
@@ -645,7 +681,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 768
-          - 16
+          - 32
          - 11
          - 0
        *
@@ -654,7 +690,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1102
          - 8
          - 32
-          - 32
+          - 32 or 64
          - 128
          - 32
          - 2
@@ -663,7 +699,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 512
-          - 16
+          - 32
          - 11
          - 0
        *
@@ -672,7 +708,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1030
          - 16
          - 80
-          - 32
+          - 32 or 64
          - 128
          - 128
          - 4
@@ -681,7 +717,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 512
-          - 16
+          - 32
          - 10
          - 3
        *
@@ -690,7 +726,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1030
          - 16
          - 80
-          - 32
+          - 32 or 64
          - 128
          - 128
          - 4
@@ -699,7 +735,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 512
-          - 16
+          - 32
          - 10
          - 3
        *
@@ -708,7 +744,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1030
          - 16
          - 72
-          - 32
+          - 32 or 64
          - 128
          - 128
          - 4
@@ -717,7 +753,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 512
-          - 16
+          - 32
          - 10
          - 3
        *
@@ -726,7 +762,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1030
          - 16
          - 60
-          - 32
+          - 32 or 64
          - 128
          - 128
          - 4
@@ -735,7 +771,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 512
-          - 16
+          - 32
          - 10
          - 3
        *
@@ -744,7 +780,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1031
          - 12
          - 40
-          - 32
+          - 32 or 64
          - 128
          - 96
          - 3
@@ -753,7 +789,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 512
-          - 16
+          - 32
          - 10
          - 3
        *
@@ -762,7 +798,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1031
          - 12
          - 40
-          - 32
+          - 32 or 64
          - 128
          - 96
          - 3
@@ -771,7 +807,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 512
-          - 16
+          - 32
          - 10
          - 3
        *
@@ -780,7 +816,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1031
          - 10
          - 36
-          - 32
+          - 32 or 64
          - 128
          - 80
          - 3
@@ -789,7 +825,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 512
-          - 16
+          - 32
          - 10
          - 3
        *
@@ -798,7 +834,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1032
          - 8
          - 32
-          - 32
+          - 32 or 64
          - 128
          - 32
          - 2
@@ -807,7 +843,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 512
-          - 16
+          - 32
          - 10
          - 3
        *
@@ -816,7 +852,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1032
          - 8
          - 32
-          - 32
+          - 32 or 64
          - 128
          - 32
          - 2
@@ -825,7 +861,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 512
-          - 16
+          - 32
          - 10
          - 3
        *
@@ -834,7 +870,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - gfx1032
          - 8
          - 28
-          - 32
+          - 32 or 64
          - 128
          - 32
          - 2
@@ -843,7 +879,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
          - 16
          - 32
          - 512
-          - 16
+          - 32
          - 10
          - 3
        *
--- a/docs/sphinx/_toc.yml.in
+++ b/docs/sphinx/_toc.yml.in
@@ -44,11 +44,11 @@ subtrees:
        title: Training
        subtrees:
        - entries:
-          - file: how-to/rocm-for-ai/training/benchmark-docker/megatron-lm
+          - file: how-to/rocm-for-ai/training/benchmark-docker/megatron-lm.rst
            title: Train a model with Megatron-LM
-          - file: how-to/rocm-for-ai/training/benchmark-docker/pytorch-training
+          - file: how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.rst
            title: Train a model with PyTorch
-          - file: how-to/rocm-for-ai/training/benchmark-docker/jax-maxtext
+          - file: how-to/rocm-for-ai/training/benchmark-docker/jax-maxtext.rst
            title: Train a model with JAX MaxText
          - file: how-to/rocm-for-ai/training/benchmark-docker/mpt-llm-foundry
            title: Train a model with LLM Foundry
@@ -78,9 +78,9 @@ subtrees:
            title: Run models from Hugging Face
          - file: how-to/rocm-for-ai/inference/llm-inference-frameworks.rst
            title: LLM inference frameworks
-          - file: how-to/rocm-for-ai/inference/vllm-benchmark.rst
+          - file: how-to/rocm-for-ai/inference/benchmark-docker/vllm.rst
            title: vLLM inference performance testing
-          - file: how-to/rocm-for-ai/inference/pytorch-inference-benchmark.rst
+          - file: how-to/rocm-for-ai/inference/benchmark-docker/pytorch-inference.rst
            title: PyTorch inference performance testing
          - file: how-to/rocm-for-ai/inference/deploy-your-model.rst
            title: Deploy your model
--- a/docs/sphinx/requirements.in
+++ b/docs/sphinx/requirements.in
@@ -1,4 +1,4 @@
-rocm-docs-core==1.18.2
+rocm-docs-core==1.20.1
 sphinx-reredirects
 sphinx-sitemap
 sphinxcontrib.datatemplates==0.11.0
--- a/docs/sphinx/requirements.txt
+++ b/docs/sphinx/requirements.txt
@@ -2,7 +2,7 @@
 # This file is autogenerated by pip-compile with Python 3.10
 # by the following command:
 #
-#    pip-compile docs/sphinx/requirements.in
+#    pip-compile requirements.in
 #
 accessible-pygments==0.0.5
    # via pydata-sphinx-theme
@@ -10,74 +10,73 @@ alabaster==1.0.0
    # via sphinx
 asttokens==3.0.0
    # via stack-data
-attrs==25.1.0
+attrs==25.3.0
    # via
    #   jsonschema
    #   jupyter-cache
    #   referencing
-babel==2.16.0
+babel==2.17.0
    # via
    #   pydata-sphinx-theme
    #   sphinx
-beautifulsoup4==4.12.3
+beautifulsoup4==4.13.4
    # via pydata-sphinx-theme
-breathe==4.35.0
+breathe==4.36.0
    # via rocm-docs-core
-certifi==2024.8.30
+certifi==2025.4.26
    # via requests
 cffi==1.17.1
    # via
    #   cryptography
    #   pynacl
-charset-normalizer==3.4.0
+charset-normalizer==3.4.2
    # via requests
-click==8.1.7
+click==8.2.1
    # via
    #   jupyter-cache
    #   sphinx-external-toc
 comm==0.2.2
    # via ipykernel
-cryptography==44.0.1
+cryptography==45.0.3
    # via pyjwt
-debugpy==1.8.12
+debugpy==1.8.14
    # via ipykernel
-decorator==5.1.1
+decorator==5.2.1
    # via ipython
 defusedxml==0.7.1
    # via sphinxcontrib-datatemplates
-deprecated==1.2.15
+deprecated==1.2.18
    # via pygithub
 docutils==0.21.2
    # via
-    #   breathe
    #   myst-parser
    #   pydata-sphinx-theme
    #   sphinx
-exceptiongroup==1.2.2
+exceptiongroup==1.3.0
    # via ipython
 executing==2.2.0
    # via stack-data
-fastjsonschema==2.20.0
+fastjsonschema==2.21.1
    # via
    #   nbformat
    #   rocm-docs-core
-gitdb==4.0.11
+gitdb==4.0.12
    # via gitpython
-gitpython==3.1.43
+gitpython==3.1.44
    # via rocm-docs-core
-greenlet==3.1.1
+greenlet==3.2.3
    # via sqlalchemy
 idna==3.10
    # via requests
 imagesize==1.4.1
    # via sphinx
-importlib-metadata==8.6.1
+importlib-metadata==8.7.0
    # via
    #   jupyter-cache
    #   myst-nb
 ipykernel==6.29.5
    # via myst-nb
-ipython==8.31.0
+ipython==8.37.0
    # via
    #   ipykernel
    #   myst-nb
@@ -87,9 +86,9 @@ jinja2==3.1.6
    # via
    #   myst-parser
    #   sphinx
-jsonschema==4.23.0
+jsonschema==4.24.0
    # via nbformat
-jsonschema-specifications==2024.10.1
+jsonschema-specifications==2025.4.1
    # via jsonschema
 jupyter-cache==1.0.1
    # via myst-nb
@@ -97,7 +96,7 @@ jupyter-client==8.6.3
    # via
    #   ipykernel
    #   nbclient
-jupyter-core==5.7.2
+jupyter-core==5.8.1
    # via
    #   ipykernel
    #   jupyter-client
@@ -117,9 +116,9 @@ mdit-py-plugins==0.4.2
    # via myst-parser
 mdurl==0.1.2
    # via markdown-it-py
-myst-nb==1.1.2
+myst-nb==1.2.0
    # via rocm-docs-core
-myst-parser==4.0.0
+myst-parser==4.0.1
    # via myst-nb
 nbclient==0.10.2
    # via
@@ -132,19 +131,20 @@ nbformat==5.10.4
    #   nbclient
 nest-asyncio==1.6.0
    # via ipykernel
-packaging==24.2
+packaging==25.0
    # via
    #   ipykernel
+    #   pydata-sphinx-theme
    #   sphinx
 parso==0.8.4
    # via jedi
 pexpect==4.9.0
    # via ipython
-platformdirs==4.3.6
+platformdirs==4.3.8
    # via jupyter-core
-prompt-toolkit==3.0.50
+prompt-toolkit==3.0.51
    # via ipython
-psutil==6.1.1
+psutil==7.0.0
    # via ipykernel
 ptyprocess==0.7.0
    # via pexpect
@@ -152,19 +152,19 @@ pure-eval==0.2.3
    # via stack-data
 pycparser==2.22
    # via cffi
-pydata-sphinx-theme==0.16.0
+pydata-sphinx-theme==0.15.4
    # via
    #   rocm-docs-core
    #   sphinx-book-theme
-pygithub==2.5.0
+pygithub==2.6.1
    # via rocm-docs-core
-pygments==2.18.0
+pygments==2.19.1
    # via
    #   accessible-pygments
    #   ipython
    #   pydata-sphinx-theme
    #   sphinx
-pyjwt[crypto]==2.10.0
+pyjwt[crypto]==2.10.1
    # via pygithub
 pynacl==1.5.0
    # via pygithub
@@ -178,7 +178,7 @@ pyyaml==6.0.2
    #   rocm-docs-core
    #   sphinx-external-toc
    #   sphinxcontrib-datatemplates
-pyzmq==26.2.0
+pyzmq==26.4.0
    # via
    #   ipykernel
    #   jupyter-client
@@ -186,23 +186,23 @@ referencing==0.36.2
    # via
    #   jsonschema
    #   jsonschema-specifications
-requests==2.32.3
+requests==2.32.4
    # via
    #   pygithub
    #   sphinx
-rocm-docs-core==1.18.2
+rocm-docs-core==1.20.1
    # via -r requirements.in
-rpds-py==0.22.3
+rpds-py==0.25.1
    # via
    #   jsonschema
    #   referencing
 six==1.17.0
    # via python-dateutil
-smmap==5.0.1
+smmap==5.0.2
    # via gitdb
-snowballstemmer==2.2.0
+snowballstemmer==3.0.1
    # via sphinx
-soupsieve==2.6
+soupsieve==2.7
    # via beautifulsoup4
 sphinx==8.1.3
    # via
@@ -220,7 +220,7 @@ sphinx==8.1.3
    #   sphinx-sitemap
    #   sphinxcontrib-datatemplates
    #   sphinxcontrib-runcmd
-sphinx-book-theme==1.1.3
+sphinx-book-theme==1.1.4
    # via rocm-docs-core
 sphinx-copybutton==0.5.2
    # via rocm-docs-core
@@ -228,7 +228,7 @@ sphinx-design==0.6.1
    # via rocm-docs-core
 sphinx-external-toc==1.0.1
    # via rocm-docs-core
-sphinx-notfound-page==1.0.4
+sphinx-notfound-page==1.1.0
    # via rocm-docs-core
 sphinx-reredirects==0.1.6
    # via -r requirements.in
@@ -250,13 +250,13 @@ sphinxcontrib-runcmd==0.2.0
    # via sphinxcontrib-datatemplates
 sphinxcontrib-serializinghtml==2.0.0
    # via sphinx
-sqlalchemy==2.0.37
+sqlalchemy==2.0.41
    # via jupyter-cache
 stack-data==0.6.3
    # via ipython
 tabulate==0.9.0
    # via jupyter-cache
-tomli==2.1.0
+tomli==2.2.1
    # via sphinx
 tornado==6.4.2
    # via
@@ -272,21 +272,23 @@ traitlets==5.14.3
    #   matplotlib-inline
    #   nbclient
    #   nbformat
-typing-extensions==4.12.2
+typing-extensions==4.14.0
    # via
+    #   beautifulsoup4
+    #   exceptiongroup
    #   ipython
    #   myst-nb
    #   pydata-sphinx-theme
    #   pygithub
    #   referencing
    #   sqlalchemy
-urllib3==2.2.3
+urllib3==2.4.0
    # via
    #   pygithub
    #   requests
 wcwidth==0.2.13
    # via prompt-toolkit
-wrapt==1.17.0
+wrapt==1.17.2
    # via deprecated
-zipp==3.21.0
+zipp==3.23.0
    # via importlib-metadata
--- a/docs/what-is-rocm.rst
+++ b/docs/what-is-rocm.rst
@@ -52,7 +52,7 @@ Communication
  :header: "Component", "Description"

  ":doc:`RCCL <rccl:index>`", "Standalone library that provides multi-GPU and multi-node collective communication primitives"
-  "`rocSHMEM <https://github.com/ROCm/rocSHMEM>`_", "Runtime that provides GPU-centric networking through an OpenSHMEM-like interface. This intra-kernel networking library simplifies application code complexity and enables more fine-grained communication/computation overlap than traditional host-driven networking."
+  ":doc:`rocSHMEM <rocshmem:index>`", "An intra-kernel networking library that provides GPU-centric networking through an OpenSHMEM-like interface"

 Math
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -117,6 +117,11 @@ Performance
  ":doc:`ROCprofiler-SDK <rocprofiler-sdk:index>`", "Toolkit for developing analysis tools for profiling and tracing GPU compute applications. This toolkit is in beta and subject to change"
  ":doc:`ROCTracer <roctracer:index>`", "Intercepts runtime API calls and traces asynchronous activity"

+.. note::
+
+  `ROCprof Compute Viewer <https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/amd-mainline/>`_ is a tool for visualizing and analyzing GPU thread trace data collected using :doc:`rocprofv3 <rocprofiler-sdk:index>`.
+  Note that `ROCprof Compute Viewer <https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/amd-mainline/>`_ is in an early access state. Running production workloads is not recommended.
+
 Development
 ^^^^^^^^^^^
Author	SHA1	Message	Date
Adel Johar	0e184e66d7	Docs: Overhaul JAX compatibility page	2025-06-12 14:35:41 +02:00
Pratik Basyal	629b9184b4	Link to 6.4.1 updated from internal to public (#4913 ) (#4914 )	2025-06-10 17:19:45 -04:00
Peter Park	b3e8ac32e7	Merge pull request #4911 from peterjunpark/docs/6.4.1 [docs/6.4.1] Add Mochi Video to pytorch-inference-benchmark-models.yaml	2025-06-10 13:18:50 -04:00
Peter Park	419b3a02a2	add mochi video to pytorch-inference-benchmark-models.yaml fix container tag fix container tag update model selector col width in pytorch-inference.rst model name (cherry picked from commit `51fc77d7fc`)	2025-06-10 13:07:35 -04:00
Alex Xu	304809951f	upgrade rocm-docs-core to 1.20.1 (cherry picked from commit `685457834a`)	2025-06-09 14:54:01 -04:00
yugang-amd	c9f1c821eb	Update for vllm -05/27 (#4886 ) (#4888 ) * Update vLLM inference benchmark Docker page for rocm/vllm 5/27 * update repo for Pytorch	2025-06-05 13:40:56 -04:00
Pratik Basyal	876e11fc8d	KMD version updated in compatibility matrix (#4873 ) (#4879 )	2025-06-04 06:43:49 -04:00
Pratik Basyal	1c2513b788	GPU SKU added to ROCm 6.4.1 (#4875 )	2025-06-03 16:28:34 -04:00
yugang-amd	7d26eb0e6f	Fix broken link (#4867 )	2025-06-03 11:01:44 -04:00
randyh62	a62f4a5296	add reference to HIP 7.0 blog for upcoming changes (#4862 )	2025-05-30 19:37:06 -07:00
yugang-amd	404e91f2d9	Update compatibility-matrix.rst (#4860 )	2025-05-30 17:50:33 -04:00
alexxu-amd	50cfc538ff	Change viewer link from latest to mainline in what-is-rocm page (#4856 ) * change viewer link from latest to mainline * correct format (cherry picked from commit `c1919faccd`)	2025-05-30 17:18:40 -04:00
Swati Rawat	a9c323e596	Docs: Add rocprof-compute-viewer (#4850 ) * Docs: Add rocprof-compute-viewer * update requirements.txt --------- Co-authored-by: Alex Xu <alex.xu@amd.com> (cherry picked from commit `6142df329b`)	2025-05-30 15:22:51 -04:00
Peter Park	7a81d10c1d	Add RHEL 9.6 to compat matrix (#4839 ) * add RHEL 9.6 to compat matrix * add os support note (cherry picked from commit `2addcb0bca`)	2025-05-30 14:57:24 -04:00
Jeffrey Novotny	43736ef655	Merge pull request #4853 from amd-jnovotny/release-notes-641-docs641 Cherry-pick to docs/6.4.1: Update release notes with RHEL 9.6 (#4848)	2025-05-30 14:54:17 -04:00
Jeffrey Novotny	d4416e2162	Update release notes with RHEL 9.6 (#4848 ) (cherry picked from commit `106cecba5e`)	2025-05-30 14:50:30 -04:00
yugang-amd	00f74d2d8e	Add microsoft/phi-4 vllm-benchmark-models (#4801 ) (#4847 ) * add Phi-4 to vllm-benchmark-models.yaml fix model_repo * update model group names Co-authored-by: Peter Park <peter.park@amd.com>	2025-05-30 09:20:17 -04:00
Peter Park	db9e845844	Add vLLM benchmark and ML framework Docker doc updates to docs/6.4.1 (#4844 ) * Add Falcon-180B to vLLM benchmark Docker doc (#4836) * add Falcon to vllm-benchmark-models.yaml * update group name (cherry picked from commit `daf2e980d9`) * Update ML framework Docker inventories for 6.4.1 (#4841) * Update tensorflow Docker compatibility table * update jax Docker compatibility table * fix py versions * update pytorch Docker compatibility table (cherry picked from commit `93fd0ef1d4`)	2025-05-29 18:50:03 -04:00
Peter Park	4963eeab00	Update ML framework Docker inventories for 6.4.1 (#4841 ) * Update tensorflow Docker compatibility table * update jax Docker compatibility table * fix py versions * update pytorch Docker compatibility table (cherry picked from commit `93fd0ef1d4`)	2025-05-29 18:34:47 -04:00
Peter Park	7c25ce240b	Add Falcon-180B to vLLM benchmark Docker doc (#4836 ) * add Falcon to vllm-benchmark-models.yaml * update group name (cherry picked from commit `daf2e980d9`)	2025-05-29 18:34:47 -04:00
Peter Park	bac2d038f7	Merge pull request #4830 from peterjunpark/docs/6.4.1 [docs/6.4.1] Fix typo in Megatron-LM Docker pull tags	2025-05-28 15:18:14 -04:00
Peter Park	fdeaacd3cc	fix megatron-lm pull tags	2025-05-28 15:12:50 -04:00
Peter Park	8e61ba4f90	Fix rocm/vllm pull tag fix	2025-05-28 14:42:35 -04:00
Peter Park	4051e985d4	Merge pull request #4826 from peterjunpark/docs/6.4.1 [6.4.1] Add latest rocm/vllm Docker details in vLLM inference benchmark guide	2025-05-28 14:27:08 -04:00
Peter Park	94ee445a8a	Add latest rocm/vllm Docker details in vLLM inference benchmark guide (#4824 ) * update rocm/vllm Docker details to latest release * Add previous vLLM version * fix 'further reading' xrefs * improve model grouping names * fix links * update model picker text (cherry picked from commit `cebf0f5975`)	2025-05-28 14:23:05 -04:00
Peter Park	535859ac9f	Add RDNA4 RX 9070 GRE to gpu-arch-specs.rst and RELEASE.md (#4820 ) (#4821 ) (cherry picked from commit `0acb457389`)	2025-05-28 10:26:55 -04:00
Peter Park	2e5fe544a0	Add RDNA4 RX 9070 GRE to gpu-arch-specs.rst and RELEASE.md (#4820 ) (cherry picked from commit `0acb457389`)	2025-05-28 10:21:50 -04:00
yugang-amd	4dae0ba84d	Update SGPR for RDNA3 and RDNA2 series (#4815 )	2025-05-27 15:13:22 -04:00
yugang-amd	5ddab465c3	Bump up requirement version (#4805 ) * bump up requirement version * update requirements.txt * Use Python 3.10	2025-05-27 11:08:55 -04:00
yugang-amd	151e563dcb	Merge pull request #4792 from yugang-amd/wavefront-size-6-4-1 Update wavefront size	2025-05-26 14:56:38 -04:00
yugang-amd	2098af1456	Merge pull request #4803 from yugang-amd/link-fix-6-4-1 fix broken links	2025-05-26 14:42:39 -04:00
yugang-amd	ae1a330fd7	fix links	2025-05-26 14:35:36 -04:00
yugang-amd	cab805674a	update wavefront size (cherry picked from commit `230b01565f`)	2025-05-26 13:56:14 -04:00
yugang-amd	387cfab91f	fix typo	2025-05-26 12:53:18 -04:00
yugang-amd	525703a5ab	update wavefront size	2025-05-22 17:41:36 -04:00
Peter Park	ce65e6783b	Merge pull request #4783 from peterjunpark/docs/6.4.1 Document specs for Radeon RX 9070 + small fix in megatron-lm doc (#4780)	2025-05-22 16:33:33 -04:00
Peter Park	6d2b1595b3	Document specs for Radeon RX 9070 + small fix in megatron-lm doc (#4780 ) * Document specs for Radeon RX 9070 * fix wrong version in megatron-lm.rst (cherry picked from commit `505041d90a`)	2025-05-22 16:30:56 -04:00
yugang-amd	31e9013bdc	update rocSHMEM xrefs (cherry picked from commit `7697298f5d`)	2025-05-22 15:19:09 -04:00
Peter Park	698ac70662	Merge pull request #4779 from peterjunpark/docs/6.4.1 [6.4.1] Add Megatron-LM benchmark doc 5/2 (#4778)	2025-05-22 14:36:29 -04:00
Peter Park	9b69755b99	Add Megatron-LM benchmark doc 5/2 (#4778 ) * reorg files * add tabs * update template * update template * update wordlist and toc * add previous version to doc * add selector paragraph * update wordlist.txt (cherry picked from commit `9ed65a81c4`)	2025-05-22 14:29:40 -04:00
Peter Park	05773ca41e	Merge pull request #4776 from peterjunpark/docs/6.4.1 [docs/6.4.1] fix 9070 XT gfx target in gpu-arch-specs table (#4775)	2025-05-22 12:15:41 -04:00
Peter Park	4f80043312	fix 9070 XT gfx target in gpu-arch-specs table (#4775 ) (cherry picked from commit `6d9f430c70`)	2025-05-22 12:12:14 -04:00
Peter Park	223fbb8f28	remove HIP upcoming changes reference link (#4771 ) (#4772 ) (cherry picked from commit `f1f2b3cac2`)	2025-05-21 12:27:07 -07:00
Alex Xu	845b3c4d5a	Merge branch 'roc-6.4.x' into docs/6.4.1	2025-05-21 15:04:20 -04:00
Alex Xu	8e7d43bec2	Merge branch 'roc-6.4.x' into docs/6.4.1	2025-05-21 12:27:43 -04:00
alexxu-amd	080b15d261	Sync develop into docs/6.4.1	2025-05-20 21:24:27 -04:00