Update RELEASE.md

Update deprecation notice for `roc-obj` tools in HIP
Merge pull request #4686 from peterjunpark/docs/6.4.0
2026-01-10 15:18:11 -05:00 · 2025-04-25 17:16:57 -07:00 · 2025-04-24 18:05:55 -04:00 · 2025-04-24 17:57:05 -04:00 · 2025-04-24 17:54:30 -04:00 · 2025-04-24 16:46:48 -04:00
19 changed files with 748 additions and 421 deletions
--- a/.wordlist.txt
+++ b/.wordlist.txt
@@ -76,6 +76,7 @@ Concretized
 Conda
 ConnectX
 CuPy
+da
 Dashboarding
 DBRX
 DDR
@@ -751,6 +752,7 @@ profilers
 protobuf
 pseudorandom
 py
+pytorch
 recommender
 recommenders
 quantile
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -125,7 +125,7 @@ Some workaround options are as follows:
 - The `pasid` field in struct `amdsmi_process_info_t` will be deprecated in a future ROCm release.

 ```{note}
-See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/rocm-6.4.x/CHANGELOG.md) for details, examples,
+See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples,
 and in-depth descriptions.
 ```

@@ -678,7 +678,6 @@ The following lists the backward incompatible changes planned for upcoming major

 * Roofline support for Ubuntu 24.04.
 * Experimental support `rocprofv3` (not enabled as default).
-* Experimental feature: Spatial multiplexing.

 #### Resolved issues

@@ -737,7 +736,7 @@ The following lists the backward incompatible changes planned for upcoming major
 - Fixed `rsmi_dev_target_graphics_version_get`, `rocm-smi --showhw`, and `rocm-smi --showprod` not displaying graphics version correctly for Instinct MI200 series, MI100 series, and RDNA3-based GPUs. 

 ```{note}
-See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/rocm-6.4.x/CHANGELOG.md) for details, examples,
+See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples,
 and in-depth descriptions.
 ```

@@ -3456,7 +3455,7 @@ See [issue #3499](https://github.com/ROCm/ROCm/issues/3499) on GitHub.

 - Error when running Omniperf with an application with command line arguments. As a workaround, create an
  intermediary script to call the application with the necessary arguments, then call the script with Omniperf. This
-  issue is fixed in a future release of Omniperf. See [#347](https://github.com/ROCm/omniperf/issues/347).
+  issue is fixed in a future release of Omniperf. See [#347](https://github.com/ROCm/rocprofiler-compute/issues/347).

 - Omniperf might not work with AMD Instinct MI300 accelerators out of the box, resulting in the following error:
  "*ERROR gfx942 is not enabled rocprofv1. Available profilers include: ['rocprofv2']*". As a workaround, add the
@@ -4333,7 +4332,7 @@ for a complete overview of this release.
 * New multiple node and GPU support.
  Unsmoothed and smoothed aggregations and Ruge-Stueben AMG now work with multiple nodes
  and GPUs. For more information, refer to the 
-  [API documentation](https://rocm.docs.amd.com/projects/rocALUTION/en/latest/usermanual/solvers.html#unsmoothed-aggregation-amg).
+  [API documentation](https://rocm.docs.amd.com/projects/rocALUTION/en/docs-6.1.0/usermanual/solvers.html#unsmoothed-aggregation-amg).

 ### **rocDecode** (0.5.0)

--- a/RELEASE.md
+++ b/RELEASE.md
@@ -64,7 +64,7 @@ ROCm 6.4.0 has been tested to allow you to choose a combination of AMD Kernel-mo

 ### Separation of user space and driver space components documentation

-As of ROCm 6.4.0, the driver space components documentation has moved from [AMD ROCm documentation](https://rocmdocs.amd.com/) to its own documentation site, [AMD Instinct Data Center GPU Driver](instinct.docs.amd.com). The goal is to make the software for AMD Instinct GPUs more modular. This helps in having a clear understanding of the options for installation combinations of Instinct driver and multiple supported ROCm user space versions.
+As of ROCm 6.4.0, the driver space components documentation has moved from [AMD ROCm documentation](https://rocmdocs.amd.com/) to its own documentation site, [AMD Instinct Data Center GPU Driver](https://instinct.docs.amd.com/latest/). The goal is to make the software for AMD Instinct GPUs more modular. This helps in having a clear understanding of the options for installation combinations of Instinct driver and multiple supported ROCm user space versions.

 Information about the variant of the `amdgpu` driver built for Instinct GPUs is available on [AMD Instinct Data Center GPU Driver](https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/). See [ROCm/ROCK-Kernel-Driver](https://github.com/ROCm/ROCK-Kernel-Driver) GitHub repository for source code, which is planned to be renamed to **instinct-driver** in a future ROCm release. For ROCm 6.4.0, the versioning scheme for the Instinct driver is parallel to the ROCm versioning; that is, 6.4.0. In future ROCm releases, the Instinct driver version is planned to be separate from the ROCm versioning.

@@ -80,23 +80,23 @@ for the complete list of PyTorch versions tested for compatibility with ROCm. Se

 ### VP9 support added to rocDecode and rocPyDecode

-VP9 support is added to [rocDecode](https://github.com/ROCm/rocDecode) and [rocPyDecode](https://github.com/ROCm/rocPyDecode), allowing enhanced codec support with VP9 encoding.
+VP9 support is added to [rocDecode](https://rocm.docs.amd.com/projects/rocDecode/en/latest/index.html) and [rocPyDecode](https://rocm.docs.amd.com/projects/rocPyDecode/en/latest/index.html), allowing enhanced codec support with VP9 encoding.

 ### Bitstream reader support added to rocDecode

-The new bitstream reader feature has been added to [rocDecode](https://github.com/ROCm/rocDecode). It contains built-in stream file parsers, including an elementary stream file parser and an IVF container file parser. It enables decoding without the requirement for FFmpeg demuxer. The reader can parse AVC, HEVC, and AV1 elementary stream files, and AV1 IVF container files. See [Using the rocDecode bitstream reader APIs](https://rocm.docs.amd.com/projects/rocDecode/en/latest/how-to/using-rocDecode-bitstream.html) for more information.
+The new bitstream reader feature has been added to [rocDecode](https://rocm.docs.amd.com/projects/rocDecode/en/latest/index.html). It contains built-in stream file parsers, including an elementary stream file parser and an IVF container file parser. It enables decoding without the requirement for FFmpeg demuxer. The reader can parse AVC, HEVC, and AV1 elementary stream files, and AV1 IVF container files. See [Using the rocDecode bitstream reader APIs](https://rocm.docs.amd.com/projects/rocDecode/en/latest/how-to/using-rocDecode-bitstream.html) for more information.

 ### DLPack support added to rocAL

-[rocAL](https://github.com/ROCm/rocAL) now supports DLPack, allowing rocAL GPU tensor to be exchanged with PyTorch. This allows faster data processing by leveraging DLPack tensors. It also improves the GPU based workload performance. For more details, see [DLpack github reference documentation](https://dmlc.github.io/dlpack/latest/).
+[rocAL](https://rocm.docs.amd.com/projects/rocAL/en/latest/index.html) now supports DLPack, allowing rocAL GPU tensor to be exchanged with PyTorch. This allows faster data processing by leveraging DLPack tensors. It also improves the GPU based workload performance. For more details, see [DLpack github reference documentation](https://dmlc.github.io/dlpack/latest/).

 ### ROCm Compute Profiler updates

-* ROCm Compute Profiler now supports:
+ROCm Compute Profiler now supports:

-    * ROCprofiler-SDK (`rocprofv3`)
-    * Experimental multi-nodes profiling support.
-    * Roofline plot for 64-bit floating point (FP64) and 32-bit floating point (FP32) data types.
+* ROCprofiler-SDK (`rocprofv3`)
+* Experimental multi-nodes profiling support.
+* Roofline plot for 64-bit floating point (FP64) and 32-bit floating point (FP32) data types.

 ### ROCm Systems Profiler updates

@@ -164,9 +164,10 @@ ROCm documentation continues to be updated to provide clearer and more comprehen
    - The new [HIP complex math API](https://rocm.docs.amd.com/projects/HIP/en/latest/reference/complex_math_api.html) topic describes HIP complex number types and usage of these types with example code.
    - The new [HIP error codes](https://rocm.docs.amd.com/projects/HIP/en/latest/reference/error_codes.html) topic list notes all HIP runtime error codes and their descriptions. HIP API functions return these error codes to indicate various runtime conditions and errors.
    - The [Introduction to the HIP programming model](https://rocm.docs.amd.com/projects/HIP/en/latest/understand/programming_model.html) topic has been updated, providing a more robust introduction to HIP.
-    - The [Math API](https://rocm.docs.amd.com/projects/HIP/en/latest/understand/programming_model.html) topic has been reorganized, and the ULP difference of maximum absolute error information has been added.
-    - The new [Low precision floating point types](https://rocm.docs.amd.com/projects/HIP/en/latest/understand/programming_model.html) topic includes information about FP8 (Quarter Precision) and FP16 (Half Precision).
+    - The [Math API](https://rocm.docs.amd.com/projects/HIP/en/latest/reference/math_api.html) topic has been reorganized, and the ULP difference of maximum absolute error information has been added.
+    - The new [Low precision floating point types](https://rocm.docs.amd.com/projects/HIP/en/latest/reference/low_fp_types.html) topic includes information about FP8 (Quarter Precision) and FP16 (Half Precision).

+* In addition to these Release Notes, see the blog [Breaking Barriers in AI, HPC, and Modular GPU Software](https://rocm.blogs.amd.com/ecosystems-and-partners/rocm-6.4-blog/README.html) for a wide-ranging discussion of the key advancements and highlights of ROCm 6.4.0.

 ## Operating system and hardware support changes

@@ -628,7 +629,7 @@ Some workaround options are as follows:
 - The `pasid` field in struct `amdsmi_process_info_t` will be deprecated in a future ROCm release.

 ```{note}
-See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/rocm-6.4.x/CHANGELOG.md) for details, examples,
+See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples,
 and in-depth descriptions.
 ```

@@ -761,10 +762,10 @@ and in-depth descriptions.

 #### Changed

-* `roc-obj` tools is deprecated and will be removed in an upcoming release.
+* The `roc-obj` tools have been deprecated and will be removed in a future release.

-    - Perl package installation is not required, and users will need to install this themselves if they want to.
-    - Support for ROCm Object tooling has moved into `llvm-objdump` provided by package `rocm-llvm`.
+   -	`llvm-objdump`, `llvm-objcopy`, and `llvm-readobj` will be enhanced to provide similar functionality as that provided by the `roc-obj` tools . The LLVM tools are available in the `rocm-llvm` pkg.
+   -	While not related to the deprecation, also note that the `roc-obj` tools’ package dependency on Perl has been changed to recommended. It is the user’s responsibility to install Perl to use these tools.

 * SDMA retainer logic is removed for engine selection in operation of runtime buffer copy.

@@ -1181,7 +1182,6 @@ The following lists the backward incompatible changes planned for upcoming major

 * Roofline support for Ubuntu 24.04.
 * Experimental support `rocprofv3` (not enabled as default).
-* Experimental feature: Spatial multiplexing.

 #### Resolved issues

@@ -1240,7 +1240,7 @@ The following lists the backward incompatible changes planned for upcoming major
 - Fixed `rsmi_dev_target_graphics_version_get`, `rocm-smi --showhw`, and `rocm-smi --showprod` not displaying graphics version correctly for Instinct MI200 series, MI100 series, and RDNA3-based GPUs. 

 ```{note}
-See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/rocm-6.4.x/CHANGELOG.md) for details, examples,
+See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples,
 and in-depth descriptions.
 ```

@@ -1577,34 +1577,35 @@ modprobe.blacklist=ast
 ```bash
 modprobe.blacklist=mgag200
 ```
+See [GitHub issue #4589](https://github.com/ROCm/ROCm/issues/4589).

 ### Failure when using a generic target with compression and vice versa

-In ROCm 6.4.0, compilation for generic target with compression will fail. As a result, you won't be able to compile for a generic target and use compression simultaneously. As a workaround, it's recommended not to use compression when using generic targets and vice versa. This issue will be addressed in a future ROCm release.
+In ROCm 6.4.0, compilation for generic target with compression will fail. As a result, you won't be able to compile for a generic target and use compression simultaneously. As a workaround, it's recommended not to use compression when using generic targets and vice versa. This issue will be addressed in a future ROCm release. See [GitHub issue #4602](https://github.com/ROCm/ROCm/issues/4602).

 ### GFX Freq information is unavailable in the rocm-smi when running in SRIOV mode enabled on MI210

-In ROCm 6.4.0, you cannot see the GFX Freq information in the guest VM. In SRIOV mode, the AMD Platform Management Firmware (PMFW) does not share the graphics frequency information with the guest VMs and is only available to Host systems. This issue will be addressed in a future ROCm release.
+In ROCm 6.4.0, you cannot see the GFX Freq information in the guest VM. In SRIOV mode, the AMD Platform Management Firmware (PMFW) does not share the graphics frequency information with the guest VMs and is only available to Host systems. This issue will be addressed in a future ROCm release. See [GitHub issue #4603](https://github.com/ROCm/ROCm/issues/4603).

 ### Failure to use --kokkos-trace option in ROCm Compute Profiler

-In ROCm 6.4.0, it’s not recommended to use the `--kokkos-trace` option. `--kokkos-trace` has been partially implemented in the `rocprofv3` tool, resulting in a difference between the output of `--kokkos-trace` and the `counter_collection.csv` output file. The program will exit with a warning message if the `-kokkos-trace` option is detected in the ROCm Compute Profiler. The issue will be addressed in a future ROCm release.
+In ROCm 6.4.0, it’s not recommended to use the `--kokkos-trace` option. `--kokkos-trace` has been partially implemented in the `rocprofv3` tool, resulting in a difference between the output of `--kokkos-trace` and the `counter_collection.csv` output file. The program will exit with a warning message if the `-kokkos-trace` option is detected in the ROCm Compute Profiler. The issue will be addressed in a future ROCm release. See [GitHub issue #4604](https://github.com/ROCm/ROCm/issues/4604).

 ### Compute partition modification is restricted with concurrent operations running in parallel
 
-Modification to compute partition in GPU is prohibited by design while concurrent operations run in parallel. You must ensure no concurrent operations on the device are running when attempting to modify the compute partitions. Additional checks and error messaging to inform users of correct operation for partition modification are planned for future ROCm releases.
+Modification to compute partition in GPU is prohibited by design while concurrent operations run in parallel. You must ensure no concurrent operations on the device are running when attempting to modify the compute partitions. Additional checks and error messaging to inform users of correct operation for partition modification are planned for future ROCm releases. See [GitHub issue #4605](https://github.com/ROCm/ROCm/issues/4605).

 ### MIOpen generates incorrect results for particular input with FP32 data type

-In ROCm 6.4.0, MIOpen generates incorrect results on the `conv2dbackward` function for a particular input with 32-bit floating point (FP32) data types. The issue is only specific to FP32 data types with 2 * 2 kernel size and dilation 2 * 1. As a workaround, change the data type from FP32 to FP16. The issue will be addressed in a future ROCm release.  
+In ROCm 6.4.0, MIOpen generates incorrect results on the `conv2dbackward` function for a particular input with 32-bit floating point (FP32) data types. The issue is only specific to FP32 data types with 2 * 2 kernel size and dilation 2 * 1. As a workaround, change the data type from FP32 to FP16. The issue will be addressed in a future ROCm release. See [GitHub issue #4606](https://github.com/ROCm/ROCm/issues/4606). 

 ### ROCm Debugger (ROCgdb) might not work correctly on the AMD Radeon PRO W6800 SR-IOV virtualization environment

-The ROCm Debugger (ROCgdb) component needs access to some registers to fetch debugging information. These registers are blocked in the AMD Radeon PRO W6800 SR-IOV virtualization environment, resulting in the ROCm Debugger (ROCgdb) being non-functional. The issue is due to the limitation in the virtualization environment and isn't specific to ROCm. Further investigation is in progress.
+The ROCm Debugger (ROCgdb) component needs access to some registers to fetch debugging information. These registers are blocked in the AMD Radeon PRO W6800 SR-IOV virtualization environment, resulting in the ROCm Debugger (ROCgdb) being non-functional. The issue is due to the limitation in the virtualization environment and isn't specific to ROCm. Further investigation is in progress. See [GitHub issue #4607](https://github.com/ROCm/ROCm/issues/4607).

 ### Limited support for Sparse API and Pallas functionality in JAX

-In ROCm 6.4.0, due to limited support for Sparse API in JAX, some of the functionality of the Pallas extension is restricted. This results in issues porting existing workloads. The issue will be addressed in a future ROCm release.
+In ROCm 6.4.0, due to limited support for Sparse API in JAX, some of the functionality of the Pallas extension is restricted. This results in issues porting existing workloads. The issue will be addressed in a future ROCm release. See [GitHub issue #4608](https://github.com/ROCm/ROCm/issues/4608).

 ### Inconsistent log probabilities when using the Mixtral 8x7B model in vLLM and SGLang framework

@@ -1612,7 +1613,7 @@ In ROCm 6.4.0, using a Mixtral 8X7B model with different tensor parallelism (TP)

 The inconsistency primarily impacts the applications that rely on consistent log probabilities, such as those involving uncertainty estimation or probabilistic decision-making. This known limitation results from how TP distributes computations across multiple GPUs, resulting in slight variations in floating-point arithmetic. Currently, there is no direct resolution as this is a framework-level characteristic rather than a defect.

-As a workaround, you can standardize the TP sizes across all the deployments to minimize the inconsistency in the log probabilities. For information on the resolution of this inconsistency in the future, see the [SGlang](https://github.com/sgl-project/sglang) and [vLLM](https://github.com/vllm-project/vllm) GitHub repositories. 
+As a workaround, you can standardize the TP sizes across all the deployments to minimize the inconsistency in the log probabilities. For information on the resolution of this inconsistency in the future, see the [SGlang](https://github.com/sgl-project/sglang) and [vLLM](https://github.com/vllm-project/vllm) GitHub repositories. See [GitHub issue #4609](https://github.com/ROCm/ROCm/issues/4609).

 ### No module named more_itertools warning on Azure Linux 3

@@ -1621,14 +1622,15 @@ During the driver installation process on Azure Linux 3, you might encounter the
 ```
 sudo python3 -m pip install more_itertools
 ```
+See [GitHub issue #4610](https://github.com/ROCm/ROCm/issues/4610).

 ### Rare occurrence of AMDGPU driver failing to load in a VM on Quanta system

-In a rare occurrence (1 in 500 reboots), the guest kernel might display the call trace due to the AMDGPU driver failing to load in a repeated power cycle virtual machine (VM) on a Quanta system. This issue will limit you from using the AMD GPUs in the guest kernel. As a workaround, reboot the VM to avoid the failure.
+In a rare occurrence (1 in 500 reboots), the guest kernel might display the call trace due to the AMDGPU driver failing to load in a repeated power cycle virtual machine (VM) on a Quanta system. This issue will limit you from using the AMD GPUs in the guest kernel. As a workaround, reboot the VM to avoid the failure. See [GitHub issue #4611](https://github.com/ROCm/ROCm/issues/4611).

 ### Clang compilation failure might occur due to incorrectly installed GNU C++ runtime

-Clang compilation failure with the error `fatal error: 'cmath' file not found` might occur if the GNU C++ runtime is not installed correctly. The error indicates that the `libstdc++-dev` package, compatible with the latest installed GNU Compiler Collection (GCC) version, is missing. This issue is a result of Clang being unable to find the newest GNU C++ runtimes it recognizes and the associated header files. As a workaround, install the `libstdc++-dev` package compatible with the installed GCC version.
+Clang compilation failure with the error `fatal error: 'cmath' file not found` might occur if the GNU C++ runtime is not installed correctly. The error indicates that the `libstdc++-dev` package, compatible with the latest installed GNU Compiler Collection (GCC) version, is missing. This issue is a result of Clang being unable to find the newest GNU C++ runtimes it recognizes and the associated header files. As a workaround, install the `libstdc++-dev` package compatible with the installed GCC version. See [GitHub issue #4612](https://github.com/ROCm/ROCm/issues/4612).

 ### ROCProfiler with rocprof might fail to initialize in some PyTorch applications

@@ -1644,18 +1646,27 @@ Alternatively, you can modify the `rocprof` script located at `/opt/rocm-6.x.x/b
 ```
 ROCPROFV1_LD_PRELOAD=$MY_HSA_TOOLS_LIB
 ```
+See [GitHub issue #4613](https://github.com/ROCm/ROCm/issues/4613).

 ### Applications using HIP runtime might stop the graph capture process

-Applications using the HIP runtime might stop the graph capture process if the HIP runtime detects an invalid stale state from a previous capture on the same HIP stream. Resetting the stale set for every new capture in the HIP runtime can resolve the issue. The issue will be fixed in a future ROCm release.
+Applications using the HIP runtime might stop the graph capture process if the HIP runtime detects an invalid stale state from a previous capture on the same HIP stream. Resetting the stale set for every new capture in the HIP runtime can resolve the issue. The issue will be fixed in a future ROCm release. See [GitHub issue #4614](https://github.com/ROCm/ROCm/issues/4614).

 ### Incorrect computation results in hipBLASLt for specific transpose configuration

-When running the hipBLASLt library using the transpose configuration (TT) with FP32 and XF32 data types, you might receive incorrect computation results. As a workaround, select alternative solutions from the list returned by `hipblasLtMatmulAlgoGetHeuristic()`. Verify the result to identify the correct alternative solution. The issue will be fixed in a future ROCm release.
+When running the hipBLASLt library using the transpose configuration (TT) with FP32 and XF32 data types, you might receive incorrect computation results. As a workaround, select alternative solutions from the list returned by `hipblasLtMatmulAlgoGetHeuristic()`. Verify the result to identify the correct alternative solution. The issue will be fixed in a future ROCm release. See [GitHub issue #4615](https://github.com/ROCm/ROCm/issues/4615).

 ### Incorrect result in RCCL when using LL protocol in graph mode with MSCCL++ enabled

-In RCCL library, you might receive incorrect results in All-Reduce collective API, when using Link Layer (LL) protocol in graph mode while MSCCL++ is enabled. This issue occurs when the protocal state information are updated in the host-side code instead of in a kernel, which is not supported in graph mode. As a workaround, you can disable MSCCL++ by setting the environment variable `RCCL_MSCCLPP_ENABLE=0`. However, consider that this might negatively impact the performance. The issue will be fixed in a future ROCm release. 
+In RCCL library, you might receive incorrect results in All-Reduce collective API, when using Link Layer (LL) protocol in graph mode while MSCCL++ is enabled. This issue occurs when the protocal state information are updated in the host-side code instead of in a kernel, which is not supported in graph mode. As a workaround, you can disable MSCCL++ by setting the environment variable `RCCL_MSCCLPP_ENABLE=0`. However, consider that this might negatively impact the performance. The issue will be fixed in a future ROCm release. See [GitHub issue #4616](https://github.com/ROCm/ROCm/issues/4616).
+
+### ROCm installation might fail in some Linux distribution kernels
+
+ROCm 6.4.0 might encounter an installation issue on some Linux distribution kernels, including the [patch](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9011e49d54dcc7653ebb8a1e05b5badb5ecfa9f9) that adds more restrictions for symbol lookups. This change breaks the standard symbol lookup methods in the kernel.
+
+As a result, the AMD kernel driver Dynamic Kernel Mode Support (DKMS) package might fail to install when the symbols required to use the PeerDirect API with Mellanox NICs are not found. In the event of such a failure, the AMD DKMS package attempts to locate these symbols directly from the Mellanox installation. However, for non-standard Mellanox NIC installations, the AMD DKMS package might not be able to locate these symbols.
+
+This issue will be fixed in a future ROCm release. As a workaround, you can run the script that allows the DKMS package to locate Mellanox symbols from the Mellanox installation without you requiring to update the new DKMS package. For downloading the script and getting more details on the issue and workaround, see [GitHub issue #4671](https://github.com/ROCm/ROCm/issues/4671).

 ## ROCm resolved issues

@@ -1704,7 +1715,7 @@ and will be disabled in a future release.

 * The `__AMDGCN_WAVEFRONT_SIZE__` macro and `__AMDGCN_WAVEFRONT_SIZE` alias will be removed in an upcoming release.
  It is recommended to remove any use of this macro. For more information, see
-  [AMDGPU support](https://rocm.docs.amd.com/projects/llvm-project/en/docs-6.3.2/LLVM/clang/html/AMDGPUSupport.html).
+  [AMDGPU support](https://rocm.docs.amd.com/projects/llvm-project/en/docs-6.4.0/LLVM/clang/html/AMDGPUSupport.html).
 * `warpSize` will only be available as a non-`constexpr` variable. Where required,
  the wavefront size should be queried via the `warpSize` variable in device code,
  or via `hipGetDeviceProperties` in host code. Neither of these will result in a compile-time constant. 
--- a/docs/compatibility/compatibility-matrix-historical-6.0.csv
+++ b/docs/compatibility/compatibility-matrix-historical-6.0.csv
@@ -1,6 +1,6 @@
 ROCm Version,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5, 6.1.2, 6.1.1, 6.1.0, 6.0.2, 6.0.0
      :ref:`Operating systems & kernels <OS-kernel-versions>`,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,Ubuntu 24.04.2,"Ubuntu 24.04.1, 24.04","Ubuntu 24.04.1, 24.04","Ubuntu 24.04.1, 24.04",Ubuntu 24.04,,,,,,
-      ,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,"Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3, 22.04.2","Ubuntu 22.04.4, 22.04.3, 22.04.2"
+      ,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5,"Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4","Ubuntu 22.04.5, 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3","Ubuntu 22.04.4, 22.04.3, 22.04.2","Ubuntu 22.04.4, 22.04.3, 22.04.2"
      ,,,,,,,,,,"Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5"
      ,"RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.3, 9.2","RHEL 9.3, 9.2"
      ,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,"RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8"
@@ -26,7 +26,7 @@ ROCm Version,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5, 6.1.2
      :doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.6, 2.5, 2.4, 2.3","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
      :doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>`,"2.18.1, 2.17.1, 2.16.2","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.17.0, 2.16.2, 2.15.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.16.1, 2.15.1, 2.14.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.15.0, 2.14.0, 2.13.1","2.14.0, 2.13.1, 2.12.1","2.14.0, 2.13.1, 2.12.1"
      :doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.4.35,0.4.31,0.4.31,0.4.31,0.4.31,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26,0.4.26
-      `ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.20,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.14.1,1.14.1
+      `ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.2,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.14.1,1.14.1
 ,,,,,,,,,,,,,,,
      ,,,,,,,,,,,,,,,
      THIRD PARTY COMMS,.. _thirdpartycomms-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,
@@ -38,7 +38,7 @@ ROCm Version,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5, 6.1.2
      CUB,2.5.0,2.3.2,2.3.2,2.3.2,2.3.2,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.1,2.0.1
 ,,,,,,,,,,,,,,,
      KMD & USER SPACE [#kfd_support-past-60]_,.. _kfd-userspace-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,
-      Tested user space versions,"6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.3.x, 6.2.x, 6.1.x, 6.0.x","6.3.x, 6.2.x, 6.1.x, 6.0.x","6.3.x, 6.2.x, 6.1.x, 6.0.x","6.3.x, 6.2.x, 6.1.x, 6.0.x","6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x"
+      KMD versions,"6.4.x, 6.3.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x"
      ,,,,,,,,,,,,,,,
      ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,
      :doc:`Composable Kernel <composable_kernel:index>`,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0
@@ -81,7 +81,7 @@ ROCm Version,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5, 6.1.2
      ,,,,,,,,,,,,,,,
      SUPPORT LIBS,,,,,,,,,,,,,,,
      `hipother <https://github.com/ROCm/hipother>`_,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
-      `rocm-core <https://github.com/ROCm/rocm-core>`_,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0,6.1.2,6.1.2,6.1.1,6.1.0,6.0.2,6.0.0
+      `rocm-core <https://github.com/ROCm/rocm-core>`_,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0,6.1.5,6.1.2,6.1.1,6.1.0,6.0.2,6.0.0
      `ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,20240607.5.7,20240607.5.7,20240607.4.05,20240607.1.4246,20240125.5.08,20240125.5.08,20240125.5.08,20240125.3.30,20231016.2.245,20231016.2.245
      ,,,,,,,,,,,,,,,
      SYSTEM MGMT TOOLS,.. _tools-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,
@@ -89,15 +89,15 @@ ROCm Version,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5, 6.1.2
      :doc:`ROCm Data Center Tool <rdc:index>`,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0,0.3.0
      :doc:`rocminfo <rocminfo:index>`,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0,1.0.0
      :doc:`ROCm SMI <rocm_smi_lib:index>`,7.5.0,7.4.0,7.4.0,7.4.0,7.4.0,7.3.0,7.3.0,7.3.0,7.3.0,7.2.0,7.2.0,7.0.0,7.0.0,6.0.2,6.0.0
-      :doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.0.60204,1.0.60202,1.0.60201,1.0.60200,1.0.60102,1.0.60102,1.0.60101,1.0.60100,1.0.60002,1.0.60000
+      :doc:`ROCm Validation Suite <rocmvalidationsuite:index>`,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.0.60204,1.0.60202,1.0.60201,1.0.60200,1.0.60105,1.0.60102,1.0.60101,1.0.60100,1.0.60002,1.0.60000
      ,,,,,,,,,,,,,,,
      PERFORMANCE TOOLS,,,,,,,,,,,,,,,
      :doc:`ROCm Bandwidth Test <rocm_bandwidth_test:index>`,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0,1.4.0
      :doc:`ROCm Compute Profiler <rocprofiler-compute:index>`,3.1.0,3.0.0,3.0.0,3.0.0,3.0.0,2.0.1,2.0.1,2.0.1,2.0.1,N/A,N/A,N/A,N/A,N/A,N/A
      :doc:`ROCm Systems Profiler <rocprofiler-systems:index>`,1.0.0,0.1.2,0.1.1,0.1.0,0.1.0,1.11.2,1.11.2,1.11.2,1.11.2,N/A,N/A,N/A,N/A,N/A,N/A
-      :doc:`ROCProfiler <rocprofiler:index>`,2.0.60400,2.0.60303,2.0.60302,2.0.60301,2.0.60300,2.0.60204,2.0.60202,2.0.60201,2.0.60200,2.0.60102,2.0.60102,2.0.60101,2.0.60100,2.0.60002,2.0.60000
+      :doc:`ROCProfiler <rocprofiler:index>`,2.0.60400,2.0.60303,2.0.60302,2.0.60301,2.0.60300,2.0.60204,2.0.60202,2.0.60201,2.0.60200,2.0.60105,2.0.60102,2.0.60101,2.0.60100,2.0.60002,2.0.60000
      :doc:`ROCprofiler-SDK <rocprofiler-sdk:index>`,0.6.0,0.5.0,0.5.0,0.5.0,0.5.0,0.4.0,0.4.0,0.4.0,0.4.0,N/A,N/A,N/A,N/A,N/A,N/A
-      :doc:`ROCTracer <roctracer:index>`,4.1.60400,4.1.60303,4.1.60302,4.1.60301,4.1.60300,4.1.60204,4.1.60202,4.1.60201,4.1.60200,4.1.60102,4.1.60102,4.1.60101,4.1.60100,4.1.60002,4.1.60000
+      :doc:`ROCTracer <roctracer:index>`,4.1.60400,4.1.60303,4.1.60302,4.1.60301,4.1.60300,4.1.60204,4.1.60202,4.1.60201,4.1.60200,4.1.60105,4.1.60102,4.1.60101,4.1.60100,4.1.60002,4.1.60000
      ,,,,,,,,,,,,,,,
      DEVELOPMENT TOOLS,,,,,,,,,,,,,,,
      :doc:`HIPIFY <hipify:index>`,19.0.0.25104,18.0.0.25012,18.0.0.25012,18.0.0.24491,18.0.0.24455,18.0.0.24392,18.0.0.24355,18.0.0.24355,18.0.0.24232,17.0.0.24193,17.0.0.24193,17.0.0.24154,17.0.0.24103,17.0.0.24012,17.0.0.23483
--- a/docs/compatibility/compatibility-matrix.rst
+++ b/docs/compatibility/compatibility-matrix.rst
@@ -62,7 +62,7 @@ compatibility and system requirements.
      CUB,2.5.0,2.3.2,2.2.0
      ,,,
      KMD & USER SPACE [#kfd_support]_,.. _kfd-userspace-support-compatibility-matrix:,,
-      Tested user space versions,"6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.3.x, 6.2.x, 6.1.x, 6.0.x"
+      KMD versions,"6.4.x, 6.3.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x"
      ,,,
      ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix:,,
      :doc:`Composable Kernel <composable_kernel:index>`,1.1.0,1.1.0,1.1.0
--- a/docs/compatibility/ml-compatibility/jax-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/jax-compatibility.rst
@@ -58,7 +58,7 @@ Docker image compatibility
 AMD validates and publishes ready-made `ROCm JAX Docker images <https://hub.docker.com/r/rocm/jax>`_
 with ROCm backends on Docker Hub. The following Docker image tags and
 associated inventories are validated for
-`ROCm 6.3.1 <https://repo.radeon.com/rocm/apt/6.3.1/>`_. Click the |docker-icon|
+`ROCm 6.4.0 <https://repo.radeon.com/rocm/apt/6.4/>`_. Click the |docker-icon|
 icon to view the image on Docker Hub.

 .. list-table:: JAX Docker image components
@@ -68,24 +68,26 @@ icon to view the image on Docker Hub.
      - JAX
      - Linux
      - Python
+
    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/jax/rocm6.3.1-jax0.4.31-py3.12/images/sha256-085a0cd5207110922f1fca684933a9359c66d42db6c5aba4760ed5214fdabde0"><i class="fab fa-docker fa-lg"></i> rocm/jax</a>
+           <a href="https://hub.docker.com/layers/rocm/jax/rocm6.4-jax0.4.35-py3.12/images/sha256-4069398229078f3311128b6d276c6af377c7e97d3363d020b0bf7154fae619ca"><i class="fab fa-docker fa-lg"></i> rocm/jax</a>

-      - `0.4.31 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.31>`_
+      - `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
      - Ubuntu 24.04
      - `3.12.7 <https://www.python.org/downloads/release/python-3127/>`_
+
    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/jax/rocm6.3.1-jax0.4.31-py3.10/images/sha256-f88eddad8f47856d8640b694da4da347ffc1750d7363175ab7dc872e82b43324"><i class="fab fa-docker fa-lg"></i> rocm/jax</a>
+           <a href="https://hub.docker.com/layers/rocm/jax/rocm6.4-jax0.4.35-py3.10/images/sha256-a137f901f91ce6c13b424c40a6cf535248d4d20fd36d5daf5eee0570190a4a11"><i class="fab fa-docker fa-lg"></i> rocm/jax</a>

-      - `0.4.31 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.31>`_
+      - `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
      - Ubuntu 22.04
      - `3.10.14 <https://www.python.org/downloads/release/python-31014/>`_

 AMD publishes `Community ROCm JAX Docker images <https://hub.docker.com/r/rocm/jax-community>`_
 with ROCm backends on Docker Hub. The following Docker image tags and
-associated inventories are tested for `ROCm 6.2.4 <https://repo.radeon.com/rocm/apt/6.2.4/>`_.
+associated inventories are tested for `ROCm 6.3.2 <https://repo.radeon.com/rocm/apt/6.3.2/>`_.

 .. list-table:: JAX community Docker image components
    :header-rows: 1
@@ -94,27 +96,30 @@ associated inventories are tested for `ROCm 6.2.4 <https://repo.radeon.com/rocm/
      - JAX
      - Linux
      - Python
+
    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.2.4-jax0.4.35-py3.12.7/images/sha256-a6032d89c07573b84c44e42c637bf9752b1b7cd2a222d39344e603d8f4c63beb?context=explore"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>
+           <a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.3.2-jax0.5.0-py3.12.8/images/sha256-25dfaa0183e274bd0a3554a309af3249c6f16a1793226cb5373f418e39d3146a"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>

-      - `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
+      - `0.5.0 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.5.0>`_
      - Ubuntu 22.04
-      - `3.12.7 <https://www.python.org/downloads/release/python-3127/>`_
+      - `3.12.8 <https://www.python.org/downloads/release/python-3128/>`_
+
    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.2.4-jax0.4.35-py3.11.10/images/sha256-d462f7e445545fba2f3b92234a21beaa52fe6c5f550faabcfdcd1bf53486d991?context=explore"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>
+           <a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.3.2-jax0.5.0-py3.11.11/images/sha256-ff9baeca9067d13e6c279c911e5a9e5beed0817d24fafd424367cc3d5bd381d7"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>

-      - `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
+      - `0.5.0 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.5.0>`_
      - Ubuntu 22.04
-      - `3.11.10 <https://www.python.org/downloads/release/python-31110/>`_
+      - `3.11.11 <https://www.python.org/downloads/release/python-31111/>`_
+
    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.2.4-jax0.4.35-py3.10.15/images/sha256-6f2d4d0f529378d9572f0e8cfdcbc101d1e1d335bd626bb3336fff87814e9d60?context=explore"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>
+           <a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.3.2-jax0.5.0-py3.10.16/images/sha256-8bab484be1713655f74da51a191ed824bb9d03db1104fd63530a1ac3c37cf7b1"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>

-      - `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
+      - `0.5.0 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.5.0>`_
      - Ubuntu 22.04
-      - `3.10.15 <https://www.python.org/downloads/release/python-31015/>`_
+      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_

 Critical ROCm libraries for JAX
 ================================================================================
--- a/docs/compatibility/ml-compatibility/pytorch-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/pytorch-compatibility.rst
@@ -58,7 +58,7 @@ Docker image compatibility

 AMD validates and publishes ready-made `PyTorch images <https://hub.docker.com/r/rocm/pytorch>`_
 with ROCm backends on Docker Hub. The following Docker image tags and
-associated inventories are validated for `ROCm 6.3.3 <https://repo.radeon.com/rocm/apt/6.3.3/>`_.
+associated inventories are validated for `ROCm 6.4.0 <https://repo.radeon.com/rocm/apt/6.4/>`_.
 Click the |docker-icon| icon to view the image on Docker Hub.

 .. list-table:: PyTorch Docker image components
@@ -79,9 +79,84 @@ Click the |docker-icon| icon to view the image on Docker Hub.

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.3.3_ubuntu24.04_py3.12_pytorch_release_2.4.0/images/sha256-6c798857b2c9526b44ba535710b93a1737546acea79b53a93c646195c272f1d5"><i class="fab fa-docker fa-lg"></i></a>
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.6.0/images/sha256-ab1d350b818b90123cfda31363019d11c0d41a8f12a19e3cb2cb40cf0261137d"><i class="fab fa-docker fa-lg"></i></a>

-      - `2.4.0 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
+      - `2.6.0 <https://github.com/ROCm/pytorch/tree/release/2.6>`_
+      - 24.04
+      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
+      - `1.6.0 <https://github.com/ROCm/apex/tree/release/1.6.0>`_
+      - `0.21.0 <https://github.com/pytorch/vision/tree/v0.21.0>`_
+      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
+      - `master <https://bitbucket.org/icl/magma/src/master/>`_
+      - `1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
+      - `4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
+      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
+
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.6.0/images/sha256-130536fdfceb374626a7bcb8d00b9d796ddfc3115677d51229e5b852d96b5ef4"><i class="fab fa-docker fa-lg"></i></a>
+
+      - `2.6.0 <https://github.com/ROCm/pytorch/tree/release/2.6>`_
+      - 22.04
+      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
+      - `1.6.0 <https://github.com/ROCm/apex/tree/release/1.6.0>`_
+      - `0.21.0 <https://github.com/pytorch/vision/tree/v0.21.0>`_
+      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
+      - `master <https://bitbucket.org/icl/magma/src/master/>`_
+      - `1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
+      - `4.0.7 <https://github.com/open-mpi/ompi/tree/v4.0.7>`_
+      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
+
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.5.1/images/sha256-20a2e24b4738dc1f1a44a04f23827918b56c99f7e697e6fccb90e9c4fae8ca9b"><i class="fab fa-docker fa-lg"></i></a>
+
+      - `2.5.1 <https://github.com/ROCm/pytorch/tree/release/2.5>`_
+      - 24.04
+      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
+      - `1.5.0 <https://github.com/ROCm/apex/tree/release/1.5.0>`_
+      - `0.20.1 <https://github.com/pytorch/vision/tree/v0.20.1>`_
+      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
+      - `master <https://bitbucket.org/icl/magma/src/master/>`_
+      - `1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
+      - `4.0.7 <https://github.com/open-mpi/ompi/tree/v4.0.7>`_
+      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
+
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu22.04_py3.11_pytorch_release_2.5.1/images/sha256-f09cb8ca39cc39222fb554060711f5c19130f7b4047aaf41fad4ba3ec470ca03"><i class="fab fa-docker fa-lg"></i></a>
+
+      - `2.5.1 <https://github.com/ROCm/pytorch/tree/release/2.5>`_
+      - 22.04
+      - `3.11.9 <https://www.python.org/downloads/release/python-3119/>`_
+      - `1.5.0 <https://github.com/ROCm/apex/tree/release/1.5.0>`_
+      - `0.20.1 <https://github.com/pytorch/vision/tree/v0.20.1>`_
+      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
+      - `master <https://bitbucket.org/icl/magma/src/master/>`_
+      - `1.14.1 <https://github.com/openucx/ucx/tree/v1.14.1>`_
+      - `4.1.5 <https://github.com/open-mpi/ompi/tree/v4.1.5>`_
+      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
+
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.5.1/images/sha256-a91c100d1fe608dae3eb7f60a751630363d4027ac3d077d428e92945204c338e"><i class="fab fa-docker fa-lg"></i></a>
+
+      - `2.5.1 <https://github.com/ROCm/pytorch/tree/release/2.5>`_
+      - 22.04
+      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
+      - `1.5.0 <https://github.com/ROCm/apex/tree/release/1.5.0>`_
+      - `0.20.1 <https://github.com/pytorch/vision/tree/v0.20.1>`_
+      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
+      - `master <https://bitbucket.org/icl/magma/src/master/>`_
+      - `1.14.1 <https://github.com/openucx/ucx/tree/v1.14.1>`_
+      - `4.1.5 <https://github.com/open-mpi/ompi/tree/v4.1.5>`_
+      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
+
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1/images/sha256-66a89ce6485bb887af74bb9bd76bb613ab9834a6b1374649ea7ae379883454a4"><i class="fab fa-docker fa-lg"></i></a>
+
+      - `2.4.1 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
      - 24.04
      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
      - `1.4.0 <https://github.com/ROCm/apex/tree/release/1.4.0>`_
@@ -94,74 +169,29 @@ Click the |docker-icon| icon to view the image on Docker Hub.

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.3.3_ubuntu22.04_py3.10_pytorch_release_2.4.0/images/sha256-a09b21248133876fc8912a5ff4e6ee2c8d62b14120313e426b3dadda5702713d"><i class="fab fa-docker fa-lg"></i></a>
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.4.1/images/sha256-c716cf167e6e49893f11de03606ed37044153aca089e74ca615065c06877f86b"><i class="fab fa-docker fa-lg"></i></a>

-      - `2.4.0 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
+      - `2.4.1 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
      - 22.04
      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
      - `1.4.0 <https://github.com/ROCm/apex/tree/release/1.4.0>`_
      - `0.19.0 <https://github.com/pytorch/vision/tree/v0.19.0>`_
      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
      - `master <https://bitbucket.org/icl/magma/src/master/>`_
-      - `1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
-      - `4.0.7 <https://github.com/open-mpi/ompi/tree/v4.0.7>`_
+      - `1.14.1 <https://github.com/openucx/ucx/tree/v1.14.1>`_
+      - `4.1.5 <https://github.com/open-mpi/ompi/tree/v4.1.5>`_
      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.3.3_ubuntu22.04_py3.9_pytorch_release_2.4.0/images/sha256-963187534467f0f9da77996762fc1d112a6faa5372277c348a505533e7876ec8"><i class="fab fa-docker fa-lg"></i></a>
-
-      - `2.4.0 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
-      - 22.04
-      - `3.9.21 <https://www.python.org/downloads/release/python-3921/>`_
-      - `1.4.0 <https://github.com/ROCm/apex/tree/release/1.4.0>`_
-      - `0.19.0 <https://github.com/pytorch/vision/tree/v0.19.0>`_
-      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
-      - `master <https://bitbucket.org/icl/magma/src/master/>`_
-      - `1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
-      - `4.0.7 <https://github.com/open-mpi/ompi/tree/v4.0.7>`_
-      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
-
-    * - .. raw:: html
-
-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.3.3_ubuntu22.04_py3.10_pytorch_release_2.3.0/images/sha256-952f2621bd2bf3078bef19061e05b209105a82a7908e7e6cdf85014938a4d93a"><i class="fab fa-docker fa-lg"></i></a>
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.3.0/images/sha256-0434cbc9b07b2c26e39480d7447f676f9057a1054dcff00e0050c25a6eddbd3c"><i class="fab fa-docker fa-lg"></i></a>

      - `2.3.0 <https://github.com/ROCm/pytorch/tree/release/2.3>`_
-      - 22.04
-      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
+      - 24.04
+      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
      - `1.3.0 <https://github.com/ROCm/apex/tree/release/1.3.0>`_
      - `0.18.0 <https://github.com/pytorch/vision/tree/v0.18.0>`_
-      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
-      - `master <https://bitbucket.org/icl/magma/src/master/>`_
-      - `1.14.1 <https://github.com/openucx/ucx/tree/v1.14.1>`_
-      - `4.1.5 <https://github.com/open-mpi/ompi/tree/v4.1.5>`_
-      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
-
-    * - .. raw:: html
-
-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.3.3_ubuntu22.04_py3.10_pytorch_release_2.2.1/images/sha256-a2fe20e170feb9e05da3e5728bb98e40d08567e137be8e6ba797962ed2852608"><i class="fab fa-docker fa-lg"></i></a>
-
-      - `2.2.1 <https://github.com/ROCm/pytorch/tree/release/2.2>`_
-      - 22.04
-      - `3.10 <https://www.python.org/downloads/release/python-31016/>`_
-      - `1.2.0 <https://github.com/ROCm/apex/tree/release/1.2.0>`_
-      - `0.17.1 <https://github.com/pytorch/vision/tree/v0.17.1>`_
-      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
-      - `master <https://bitbucket.org/icl/magma/src/master/>`_
-      - `1.14.1 <https://github.com/openucx/ucx/tree/v1.14.1>`_
-      - `4.1.5 <https://github.com/open-mpi/ompi/tree/v4.1.5>`_
-      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
-
-    * - .. raw:: html
-
-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.3.3_ubuntu20.04_py3.9_pytorch_release_2.2.1/images/sha256-7f231937c897cca5f89e360be33c70a2017d60f62d1fbe81292be48c15fe345b"><i class="fab fa-docker fa-lg"></i></a>
-
-      - `2.2.1 <https://github.com/ROCm/pytorch/tree/release/2.2>`_
-      - 20.04
-      - `3.9.21 <https://www.python.org/downloads/release/python-3921/>`_
-      - `1.2.0 <https://github.com/ROCm/apex/tree/release/1.2.0>`_
-      - `0.17.1 <https://github.com/pytorch/vision/tree/v0.17.1>`_
-      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13.0>`_
+      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13>`_
      - `master <https://bitbucket.org/icl/magma/src/master/>`_
      - `1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
      - `4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
@@ -169,29 +199,14 @@ Click the |docker-icon| icon to view the image on Docker Hub.

    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.3.3_ubuntu22.04_py3.9_pytorch_release_1.13.1/images/sha256-616a47758004f91951e2da6c1fe291f903de65a7b2318d4b18359b48fe3032f4"><i class="fab fa-docker fa-lg"></i></a>
+           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.3.0/images/sha256-688b1c0073092615fb98778d78b16191e506097ee116a2d3d2628b264d5d367b"><i class="fab fa-docker fa-lg"></i></a>

-      - `1.13.1 <https://github.com/ROCm/pytorch/tree/release/1.13>`_
+      - `2.3.0 <https://github.com/ROCm/pytorch/tree/release/2.3>`_
      - 22.04
-      - `3.9.21 <https://www.python.org/downloads/release/python-3921/>`_
-      - `1.0.0 <https://github.com/ROCm/apex/tree/release/1.0.0>`_
-      - `0.14.0 <https://github.com/pytorch/vision/tree/v0.14.0>`_
-      - `2.19.0 <https://github.com/tensorflow/tensorboard/tree/2.19>`_
-      - `master <https://bitbucket.org/icl/magma/src/master/>`_
-      - `1.14.1 <https://github.com/openucx/ucx/tree/v1.14.1>`_
-      - `4.1.5 <https://github.com/open-mpi/ompi/tree/v4.1.5>`_
-      - `5.3-1.0.5.0 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
-
-    * - .. raw:: html
-
-           <a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.3.3_ubuntu20.04_py3.9_pytorch_release_1.13.1/images/sha256-a2cfb365aea58b84595e241ffdb0d5ef3e6566e98c10b5499f4aa29983a74ea2"><i class="fab fa-docker fa-lg"></i></a>
-
-      - `1.13.1 <https://github.com/ROCm/pytorch/tree/release/1.13>`_
-      - 20.04
-      - `3.9.21 <https://www.python.org/downloads/release/python-3921/>`_
-      - `1.0.0 <https://github.com/ROCm/apex/tree/release/1.0.0>`_
-      - `0.14.0 <https://github.com/pytorch/vision/tree/v0.14.0>`_
-      - `2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18>`_
+      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
+      - `1.3.0 <https://github.com/ROCm/apex/tree/release/1.3.0>`_
+      - `0.18.0 <https://github.com/pytorch/vision/tree/v0.18.0>`_
+      - `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13>`_
      - `master <https://bitbucket.org/icl/magma/src/master/>`_
      - `1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
      - `4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
--- a/docs/compatibility/ml-compatibility/tensorflow-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/tensorflow-compatibility.rst
@@ -56,7 +56,7 @@ Docker image compatibility
 AMD validates and publishes ready-made `TensorFlow images
 <https://hub.docker.com/r/rocm/tensorflow>`_ with ROCm backends on
 Docker Hub. The following Docker image tags and associated inventories are
-validated for `ROCm 6.3.3 <https://repo.radeon.com/rocm/apt/6.3.3/>`_. Click
+validated for `ROCm 6.4.0 <https://repo.radeon.com/rocm/apt/6.4/>`_. Click
 the |docker-icon| icon to view the image on Docker Hub.

 .. list-table:: TensorFlow Docker image components
@@ -64,57 +64,91 @@ the |docker-icon| icon to view the image on Docker Hub.

    * - Docker image
      - TensorFlow
+      - Ubuntu
      - Dev
      - Python
      - TensorBoard

    * - .. raw:: html

-      - `rocm/tensorflow`__
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.12-tf2.18-dev/images/sha256-fa9cf5fa6c6079a7118727531ccd0056c6e3224a42c3d6e78a49e7781daafff4"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+
      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
      - dev
+      - 24.04
      - `Python 3.12.4 <https://www.python.org/downloads/release/python-3124/>`_
      - `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`_

    * - .. raw:: html

-      - `rocm/tensorflow`__
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.12-tf2.18-runtime/images/sha256-14addca4b92a47c806b83ebaeed593fc6672cd99f0017ed8dad759fe72ed0309"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+
+      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
+      - runtime
+      - 24.04
+      - `Python 3.12.4 <https://www.python.org/downloads/release/python-3124/>`_
+      - `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`_
+
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.10-tf2.18-dev/images/sha256-f5e151060df04ff5fb59f5604b49cd371931bbe75b06aec9fe7781397c4be0ce"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+
      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.18.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
      - dev
+      - 22.04
      - `Python 3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
      - `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`_

    * - .. raw:: html

-      - `rocm/tensorflow`__
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.10-tf2.18-runtime/images/sha256-5cd4c03fdb1036570c0d4929da60a65c4466998dc80f1dc8a5a0b173eae017fb"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+
+      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.18.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
+      - runtime
+      - 22.04
+      - `Python 3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
+      - `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`_
+
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.12-tf2.17-dev/images/sha256-b3add80e374a2db2d1088d746e740afa89d439aca02cacba959ad298f5cd2b3f"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+
      - `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.17.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
      - dev
+      - 24.04
      - `Python 3.12.4 <https://www.python.org/downloads/release/python-3124/>`_
      - `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`_

    * - .. raw:: html

-      - `rocm/tensorflow`__
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.12-tf2.17-runtime/images/sha256-3a244f026c32177eff7958ffbad390de85b438b2b48b455cc39f15d70fa1270d"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+
+      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.17.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
+      - runtime
+      - 24.04
+      - `Python 3.12.4 <https://www.python.org/downloads/release/python-3124/>`_
+      - `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`_
+
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.10-tf2.17-dev/images/sha256-e0cecdfacb59169335049983cdab6da578c209bb9f4d08aad97e184ae59171a6"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
+
      - `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.17.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
      - dev
+      - 22.04
      - `Python 3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
      - `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`_

    * - .. raw:: html

-      - `rocm/tensorflow`__
-      - `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.16.2-cp312-cp312-manylinux_2_28_x86_64.whl>`__
-      - dev
-      - `Python 3.12.4 <https://www.python.org/downloads/release/python-3124/>`_
-      - `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`_
+           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4-py3.10-tf2.17-runtime/images/sha256-6f43de12f7eb202791b698ac51d28b72098de90034dbcd48486629b0125f7707"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>

-    * - .. raw:: html
-
-      - `rocm/tensorflow`__
-      - `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.16.2-cp310-cp310-manylinux_2_28_x86_64.whl>`__
-      - dev
+      - `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.17.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
+      - runtime
+      - 22.04
      - `Python 3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
-      - `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`_
+      - `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`_
+

 Critical ROCm libraries for TensorFlow
 ===============================================================================
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -42,7 +42,7 @@ all_article_info_author = ""

 # pages with specific settings
 article_pages = [
-    {"file": "about/release-notes", "os": ["linux"], "date": "2025-04-10"},
+    {"file": "about/release-notes", "os": ["linux"], "date": "2025-04-11"},
    {"file": "release/changelog", "os": ["linux"],},
    {"file": "compatibility/compatibility-matrix", "os": ["linux"]},
    {"file": "compatibility/ml-compatibility/pytorch-compatibility", "os": ["linux"]},
@@ -70,6 +70,7 @@ article_pages = [
    {"file": "how-to/rocm-for-ai/inference/hugging-face-models", "os": ["linux"]},
    {"file": "how-to/rocm-for-ai/inference/llm-inference-frameworks", "os": ["linux"]},
    {"file": "how-to/rocm-for-ai/inference/vllm-benchmark", "os": ["linux"]},
+    {"file": "how-to/rocm-for-ai/inference/pytorch-inference-benchmark", "os": ["linux"]},
    {"file": "how-to/rocm-for-ai/inference/deploy-your-model", "os": ["linux"]},

    {"file": "how-to/rocm-for-ai/inference-optimization/index", "os": ["linux"]},
--- a/docs/data/how-to/rocm-for-ai/inference/pytorch-inference-benchmark-models.yaml
+++ b/docs/data/how-to/rocm-for-ai/inference/pytorch-inference-benchmark-models.yaml
@@ -0,0 +1,25 @@
+pytorch_inference_benchmark:
+  unified_docker:
+    latest: &rocm-pytorch-docker-latest
+      pull_tag: rocm/pytorch:latest
+      docker_hub_url:
+      rocm_version:
+      pytorch_version:
+      hipblaslt_version:
+  model_groups:
+    - group: CLIP
+      tag: clip
+      models:
+      - model: CLIP
+        mad_tag: pyt_clip_inference
+        model_repo: laion/CLIP-ViT-B-32-laion2B-s34B-b79K
+        url: https://huggingface.co/laion/CLIP-ViT-B-32-laion2B-s34B-b79K
+        precision: float16
+    - group: Chai-1
+      tag: chai
+      models:
+      - model: Chai-1
+        mad_tag: pyt_chai1_inference
+        model_repo: meta-llama/Llama-3.1-8B-Instruct
+        url: https://huggingface.co/chaidiscovery/chai-1
+        precision: float16
--- a/docs/data/how-to/rocm-for-ai/inference/vllm-benchmark-models.yaml
+++ b/docs/data/how-to/rocm-for-ai/inference/vllm-benchmark-models.yaml
@@ -1,10 +1,10 @@
 vllm_benchmark:
  unified_docker:
    latest:
-      pull_tag: rocm/vllm:instinct_main
-      docker_hub_url: https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_instinct_vllm0.7.3_20250311/images/sha256-de0a2649b735f45b7ecab8813eb7b19778ae1f40591ca1196b07bc29c42ed4a3
+      pull_tag: rocm/vllm:rocm6.3.1_instinct_vllm0.8.3_20250410
+      docker_hub_url: https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_instinct_vllm0.8.3_20250410/images/sha256-a0b55c6c0f3fa5d437fb54a66e32a108306c36d4776e570dfd0ae902719bd190
      rocm_version: 6.3.1
-      vllm_version: 0.7.3
+      vllm_version: 0.8.3
      pytorch_version: 2.7.0 (dev nightly)
      hipblaslt_version: 0.13
  model_groups:
@@ -102,19 +102,12 @@ vllm_benchmark:
        model_repo: Qwen/Qwen2-72B-Instruct
        url: https://huggingface.co/Qwen/Qwen2-72B-Instruct
        precision: float16
-    - group: JAIS
-      tag: jais
-      models:
-      - model: JAIS 13B
-        mad_tag: pyt_vllm_jais-13b
-        model_repo: core42/jais-13b-chat
-        url: https://huggingface.co/core42/jais-13b-chat
-        precision: float16
-      - model: JAIS 30B
-        mad_tag: pyt_vllm_jais-30b
-        model_repo: core42/jais-30b-chat-v3
-        url: https://huggingface.co/core42/jais-30b-chat-v3
+      - model: QwQ-32B
+        mad_tag: pyt_vllm_qwq-32b
+        model_repo: Qwen/QwQ-32B
+        url: https://huggingface.co/Qwen/QwQ-32B
        precision: float16
+        tunableop: true
    - group: DBRX
      tag: dbrx
      models:
--- a/docs/how-to/rocm-for-ai/inference-optimization/workload.rst
+++ b/docs/how-to/rocm-for-ai/inference-optimization/workload.rst
@@ -685,7 +685,7 @@ Two sample Llama scaling configuration files are in vLLM for ``llama2-70b`` and
 ``llama2-7b``.

 If building the vLLM using
-`Dockerfile.rocm <https://github.com/vllm-project/vllm/blob/main/Dockerfile.rocm>`_
+`Dockerfile.rocm <https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm>`_
 for ``llama2-70b`` scale config, find the file at
 ``/vllm-workspace/tests/fp8_kv/llama2-70b-fp8-kv/kv_cache_scales.json`` at
 runtime.
--- a/docs/how-to/rocm-for-ai/inference/deploy-your-model.rst
+++ b/docs/how-to/rocm-for-ai/inference/deploy-your-model.rst
@@ -16,8 +16,7 @@ ROCm supports vLLM and Hugging Face TGI as major LLM-serving frameworks.
 Serving using vLLM
 ==================

-vLLM is a fast and easy-to-use library for LLM inference and serving. vLLM officially supports ROCm versions 5.7 and
-6.0. AMD is actively working with the vLLM team to improve performance and support later ROCm versions.
+vLLM is a fast and easy-to-use library for LLM inference and serving. AMD is actively working with the vLLM team to improve performance and support the latest ROCm versions.

 See the `GitHub repository <https://github.com/vllm-project/vllm>`_ and `official vLLM documentation
 <https://docs.vllm.ai/>`_ for more information.
@@ -31,9 +30,9 @@ vLLM installation
 vLLM supports two ROCm-capable installation methods. Refer to the official documentation use the following links.

 -  `Build from source with Docker
-   <https://docs.vllm.ai/en/latest/getting_started/amd-installation.html#build-from-source-docker-rocm>`_ (recommended)
+   <https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html?device=rocm#build-image-from-source>`_ (recommended)

-  `Build from source <https://docs.vllm.ai/en/latest/getting_started/amd-installation.html#build-from-source-rocm>`_
+-  `Build from source <https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html?device=rocm#build-wheel-from-source>`_

 vLLM walkthrough
 ----------------
--- a/docs/how-to/rocm-for-ai/inference/llm-inference-frameworks.rst
+++ b/docs/how-to/rocm-for-ai/inference/llm-inference-frameworks.rst
@@ -36,7 +36,7 @@ Installing vLLM

      git clone https://github.com/vllm-project/vllm.git
      cd vllm
-      docker build -f Dockerfile.rocm -t vllm-rocm .
+      docker build -f docker/Dockerfile.rocm -t vllm-rocm .

 .. tab-set::

--- a/docs/how-to/rocm-for-ai/inference/pytorch-inference-benchmark.rst
+++ b/docs/how-to/rocm-for-ai/inference/pytorch-inference-benchmark.rst
@@ -0,0 +1,167 @@
+.. meta::
+   :description: Learn how to validate LLM inference performance on MI300X accelerators using AMD MAD and the
+                 ROCm PyTorch Docker image.
+   :keywords: model, MAD, automation, dashboarding, validate, pytorch
+
+*************************************
+PyTorch inference performance testing
+*************************************
+
+.. _pytorch-inference-benchmark-docker:
+
+.. datatemplate:yaml:: /data/how-to/rocm-for-ai/inference/pytorch-inference-benchmark-models.yaml
+
+   {% set unified_docker = data.pytorch_inference_benchmark.unified_docker.latest %}
+   {% set model_groups = data.pytorch_inference_benchmark.model_groups %}
+
+   The `ROCm PyTorch Docker <https://hub.docker.com/r/rocm/pytorch/tags>`_ image offers a prebuilt,
+   optimized environment for testing model inference performance on AMD Instinct™ MI300X series
+   accelerators. This guide demonstrates how to use the AMD Model Automation and Dashboarding (MAD)
+   tool with the ROCm PyTorch container to test inference performance on various models efficiently.
+
+   .. _pytorch-inference-benchmark-available-models:
+
+   Supported models
+   ================
+
+   .. raw:: html
+
+      <div id="vllm-benchmark-ud-params-picker" class="container-fluid">
+        <div class="row">
+          <div class="col-2 me-2 model-param-head">Model</div>
+          <div class="row col-10">
+   {% for model_group in model_groups %}
+            <div class="col-6 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
+   {% endfor %}
+          </div>
+        </div>
+
+        <div class="row mt-1" style="display: none;">
+          <div class="col-2 me-2 model-param-head">Model variant</div>
+          <div class="row col-10">
+   {% for model_group in model_groups %}
+      {% set models = model_group.models %}
+      {% for model in models %}
+            <div class="col-12 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
+      {% endfor %}
+   {% endfor %}
+          </div>
+        </div>
+      </div>
+
+   {% for model_group in model_groups %}
+      {% for model in model_group.models %}
+
+   .. container:: model-doc {{model.mad_tag}}
+
+      .. note::
+
+         See the `{{ model.model }} model card on Hugging Face <{{ model.url }}>`_ to learn more about your selected model.
+         Some models require access authorization before use via an external license agreement through a third party.
+
+      {% endfor %}
+   {% endfor %}
+
+   Getting started
+   ===============
+
+   Use the following procedures to reproduce the benchmark results on an
+   MI300X series accelerator with the prebuilt PyTorch Docker image.
+
+   .. _pytorch-benchmark-get-started:
+
+   1. Disable NUMA auto-balancing.
+
+      To optimize performance, disable automatic NUMA balancing. Otherwise, the GPU
+      might hang until the periodic balancing is finalized. For more information,
+      see :ref:`AMD Instinct MI300X system optimization <mi300x-disable-numa>`.
+
+      .. code-block:: shell
+
+         # disable automatic NUMA balancing
+         sh -c 'echo 0 > /proc/sys/kernel/numa_balancing'
+         # check if NUMA balancing is disabled (returns 0 if disabled)
+         cat /proc/sys/kernel/numa_balancing
+         0
+
+   .. container:: model-doc pyt_chai1_inference
+
+      2. Use the following command to pull the `ROCm PyTorch Docker image <https://hub.docker.com/layers/rocm/pytorch/latest/images/sha256-05b55983e5154f46e7441897d0908d79877370adca4d1fff4899d9539d6c4969>`_ from Docker Hub.
+
+         .. code-block:: shell
+
+            docker pull rocm/pytorch:rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0_triton_llvm_reg_issue
+
+         .. note::
+
+            The Chai-1 benchmark uses a specifically selected Docker image using ROCm 6.2.3 and PyTorch 2.3.0 to address an accuracy issue.
+
+   .. container:: model-doc pyt_clip_inference
+
+      2. Use the following command to pull the `ROCm PyTorch Docker image <https://hub.docker.com/layers/rocm/pytorch/rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0_triton_llvm_reg_issue/images/sha256-b736a4239ab38a9d0e448af6d4adca83b117debed00bfbe33846f99c4540f79b>`_ from Docker Hub.
+
+         .. code-block:: shell
+
+            docker pull rocm/pytorch:latest
+
+   Benchmarking
+   ============
+
+   .. _pytorch-inference-benchmark-mad:
+
+   {% for model_group in model_groups %}
+      {% for model in model_group.models %}
+
+   .. container:: model-doc {{model.mad_tag}}
+
+      To simplify performance testing, the ROCm Model Automation and Dashboarding
+      (`<https://github.com/ROCm/MAD>`__) project provides ready-to-use scripts and configuration.
+      To start, clone the  MAD repository to a local directory and install the required packages on the
+      host machine.
+
+      .. code-block:: shell
+
+         git clone https://github.com/ROCm/MAD
+         cd MAD
+         pip install -r requirements.txt
+
+      Use this command to run the performance benchmark test on the `{{model.model}} <{{ model.url }}>`_ model
+      using one GPU with the ``{{model.precision}}`` data type on the host machine.
+
+      .. code-block:: shell
+
+         export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
+         python3 tools/run_models.py --tags {{model.mad_tag}} --keep-model-dir --live-output --timeout 28800
+
+      MAD launches a Docker container with the name
+      ``container_ci-{{model.mad_tag}}``. The latency and throughput reports of the
+      model are collected in ``perf.csv``.
+
+      .. note::
+
+         For improved performance, consider enabling TunableOp. By default,
+         ``{{model.mad_tag}}`` runs with TunableOp disabled (see
+         `<https://github.com/ROCm/MAD/blob/develop/models.json>`__). To enable
+         it, edit the default run behavior in the ``tools/run_models.py``-- update the model's
+         run ``args`` by changing ``--tunableop off`` to ``--tunableop on``.
+
+         Enabling TunableOp triggers a two-pass run -- a warm-up followed by the performance-collection run.
+         Although this might increase the initial training time, it can result in a performance gain.
+
+      {% endfor %}
+   {% endfor %}
+
+Further reading
+===============
+
+- To learn more about system settings and management practices to configure your system for
+  MI300X accelerators, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_.
+
+- To learn how to run LLM models from Hugging Face or your model, see
+  :doc:`Running models from Hugging Face <hugging-face-models>`.
+
+- To learn how to optimize inference on LLMs, see
+  :doc:`Inference optimization <../inference-optimization/index>`.
+
+- To learn how to fine-tune LLMs, see
+  :doc:`Fine-tuning LLMs <../fine-tuning/index>`.
--- a/docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst
+++ b/docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst
@@ -3,9 +3,9 @@
                 ROCm vLLM Docker image.
   :keywords: model, MAD, automation, dashboarding, validate

-********************************************************
-LLM inference performance testing on AMD Instinct MI300X
-********************************************************
+**********************************
+vLLM inference performance testing
+**********************************

 .. _vllm-benchmark-unified-docker:

@@ -16,7 +16,7 @@ LLM inference performance testing on AMD Instinct MI300X

   The `ROCm vLLM Docker <{{ unified_docker.docker_hub_url }}>`_ image offers
   a prebuilt, optimized environment for validating large language model (LLM)
-   inference performance on AMD Instinct™ MI300X series accelerator. This ROCm vLLM
+   inference performance on AMD Instinct™ MI300X series accelerators. This ROCm vLLM
   Docker image integrates vLLM and PyTorch tailored specifically for MI300X series
   accelerators and includes the following components:

@@ -34,7 +34,7 @@ LLM inference performance testing on AMD Instinct MI300X

   .. _vllm-benchmark-available-models:

-   Available models
+   Supported models
   ================

   .. raw:: html
@@ -183,6 +183,25 @@ LLM inference performance testing on AMD Instinct MI300X
            to collect latency and throughput performance data, you can also change the benchmarking
            parameters. See the standalone benchmarking tab for more information.

+            {% if model.tunableop %}
+
+            .. note::
+
+               For improved performance, consider enabling :ref:`PyTorch TunableOp <mi300x-tunableop>`.
+               TunableOp automatically explores different implementations and configurations of certain PyTorch
+               operators to find the fastest one for your hardware.
+
+               By default, ``{{model.mad_tag}}`` runs with TunableOp disabled
+               (see
+               `<https://github.com/ROCm/MAD/blob/develop/models.json>`__). To
+               enable it, edit the default run behavior in the ``models.json``
+               configuration before running inference -- update the model's run
+               ``args`` by changing ``--tunableop off`` to ``--tunableop on``.
+
+               Enabling TunableOp triggers a two-pass run -- a warm-up followed by the performance-collection run.
+
+            {% endif %}
+
         .. tab-item:: Standalone benchmarking

            Run the vLLM benchmark tool independently by starting the
@@ -331,11 +350,18 @@ for benchmarking, see the version-specific documentation.
     - PyTorch version
     - Resources

+   * - 6.3.1
+     - 0.7.3
+     - 2.7.0
+     - 
+       * `Documentation <https://rocm.docs.amd.com/en/docs-6.3.3/how-to/rocm-for-ai/inference/vllm-benchmark.html>`_
+       * `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_instinct_vllm0.7.3_20250325/images/sha256-25245924f61750b19be6dcd8e787e46088a496c1fe17ee9b9e397f3d84d35640>`_
+
   * - 6.3.1
     - 0.6.6
     - 2.7.0
     - 
-       * `Documentation <https://rocm.docs.amd.com/en/docs-6.3.2/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.html>`_
+       * `Documentation <https://rocm.docs.amd.com/en/docs-6.3.2/how-to/rocm-for-ai/inference/vllm-benchmark.html>`_
       * `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6/images/sha256-9a12ef62bbbeb5a4c30a01f702c8e025061f575aa129f291a49fbd02d6b4d6c9>`_

   * - 6.2.1
--- a/docs/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.rst
+++ b/docs/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.rst
@@ -9,7 +9,8 @@ Training a model with PyTorch for ROCm
 PyTorch is an open-source machine learning framework that is widely used for
 model training with GPU-optimized components for transformer-based models.

-The PyTorch for ROCm training Docker (``rocm/pytorch-training:v25.4``) image
+The `PyTorch for ROCm training Docker <https://hub.docker.com/layers/rocm/pytorch-training/v25.5/images/sha256-d47850a9b25b4a7151f796a8d24d55ea17bba545573f0d50d54d3852f96ecde5>`_
+(``rocm/pytorch-training:v25.5``) image
 provides a prebuilt optimized environment for fine-tuning and pretraining a
 model on AMD Instinct MI325X and MI300X accelerators. It includes the following
 software components to accelerate training workloads:
@@ -17,19 +18,19 @@ software components to accelerate training workloads:
 +--------------------------+--------------------------------+
 | Software component       | Version                        |
 +==========================+================================+
-| ROCm                     | 6.3.0                          |
+| ROCm                     | 6.3.4                          |
 +--------------------------+--------------------------------+
 | PyTorch                  | 2.7.0a0+git637433              |
 +--------------------------+--------------------------------+
 | Python                   | 3.10                           |
 +--------------------------+--------------------------------+
-| Transformer Engine       | 1.11                           |
+| Transformer Engine       | 1.12.0.dev0+25a33da            |
 +--------------------------+--------------------------------+
 | Flash Attention          | 3.0.0                          |
 +--------------------------+--------------------------------+
-| hipBLASLt                | git258a2162                    |
+| hipBLASLt                | git53b53bf                     |
 +--------------------------+--------------------------------+
-| Triton                   | 3.1                            |
+| Triton                   | 3.2.0                          |
 +--------------------------+--------------------------------+

 .. _amd-pytorch-training-model-support:
@@ -39,6 +40,8 @@ Supported models

 The following models are pre-optimized for performance on the AMD Instinct MI325X and MI300X accelerators.

+* Llama 3.3 70B
+
 * Llama 3.1 8B

 * Llama 3.1 70B
@@ -79,309 +82,346 @@ auto-balancing, skip this step. Otherwise, complete the :ref:`system validation
 and optimization steps <train-a-model-system-validation>` to set up your system
 before starting training.

-Environment setup
-=================
-
 This Docker image is optimized for specific model configurations outlined
 below. Performance can vary for other training workloads, as AMD 
 doesn’t validate configurations and run conditions outside those described.

-Download the Docker image
-------------------------
+Benchmarking
+============

-1. Use the following command to pull the Docker image from Docker Hub.
+Once the setup is complete, choose between two options to start benchmarking:

-   .. code-block:: shell
+.. tab-set::

-      docker pull rocm/pytorch-training:v25.4
+   .. tab-item:: MAD-integrated benchmarking

-2. Run the Docker container.
+      Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
+      directory and install the required packages on the host machine.

-   .. code-block:: shell
+      .. code-block:: shell

-      docker run -it --device /dev/dri --device /dev/kfd --network host --ipc host --group-add video --cap-add SYS_PTRACE --security-opt seccomp=unconfined --privileged -v $HOME:$HOME -v  $HOME/.ssh:/root/.ssh --shm-size 64G --name training_env rocm/pytorch-training:v25.4
+         git clone https://github.com/ROCm/MAD
+         cd MAD
+         pip install -r requirements.txt

-3. Use these commands if you exit the ``training_env`` container and need to return to it.
+      For example, use this command to run the performance benchmark test on the Llama 3.1 8B model
+      using one GPU with the float16 data type on the host machine.

-   .. code-block:: shell
+      .. code-block:: shell

-      docker start training_env
-      docker exec -it training_env bash
+         export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
+         python3 tools/run_models.py --tags pyt_train_llama-3.1-8b --keep-model-dir --live-output --timeout 28800

-4. In the Docker container, clone the `<https://github.com/ROCm/MAD>`__
-   repository and navigate to the benchmark scripts directory
-   ``/workspace/MAD/scripts/pytorch_train``.
+      The available models for MAD-integrated benchmarking are:

-   .. code-block:: shell
+      * ``pyt_train_llama-3.3-70b``

-      git clone https://github.com/ROCm/MAD
-      cd MAD/scripts/pytorch_train
+      * ``pyt_train_llama-3.1-8b``

-Prepare training datasets and dependencies
------------------------------------------
+      * ``pyt_train_llama-3.1-70b``

-The following benchmarking examples require downloading models and datasets
-from Hugging Face. To ensure successful access to gated repos, set your
-``HF_TOKEN``.
+      * ``pyt_train_flux``

-.. code-block:: shell
+      MAD launches a Docker container with the name
+      ``container_ci-pyt_train_llama-3.1-8b``, for example. The latency and throughput reports of the
+      model are collected in the following path: ``~/MAD/perf.csv``.

-   export HF_TOKEN=$your_personal_hugging_face_access_token
+   .. tab-item:: Standalone benchmarking

-Run the setup script to install libraries and datasets needed for benchmarking.
+      .. rubric:: Download the Docker image and required packages

-.. code-block:: shell
+      Use the following command to pull the Docker image from Docker Hub.

-   ./pytorch_benchmark_setup.sh
+      .. code-block:: shell

-``pytorch_benchmark_setup.sh`` installs the following libraries:
+         docker pull rocm/pytorch-training:v25.5

-.. list-table::
-   :header-rows: 1
+      Run the Docker container.

-   * - Library
-     - Benchmark model
-     - Reference
+      .. code-block:: shell

-   * - ``accelerate``
-     - Llama 3.1 8B, FLUX
-     - `Hugging Face Accelerate <https://huggingface.co/docs/accelerate/en/index>`_
+         docker run -it --device /dev/dri --device /dev/kfd --network host --ipc host --group-add video --cap-add SYS_PTRACE --security-opt seccomp=unconfined --privileged -v $HOME:$HOME -v  $HOME/.ssh:/root/.ssh --shm-size 64G --name training_env rocm/pytorch-training:v25.5

-   * - ``datasets``
-     - Llama 3.1 8B, 70B, FLUX
-     - `Hugging Face Datasets <https://huggingface.co/docs/datasets/v3.2.0/en/index>`_ 3.2.0
+      Use these commands if you exit the ``training_env`` container and need to return to it.

-   * - ``torchdata``
-     - Llama 3.1 70B
-     - `TorchData <https://pytorch.org/data/beta/index.html>`_
+      .. code-block:: shell

-   * - ``tomli``
-     - Llama 3.1 70B
-     - `Tomli <https://pypi.org/project/tomli/>`_
+         docker start training_env
+         docker exec -it training_env bash

-   * - ``tiktoken``
-     - Llama 3.1 70B
-     - `tiktoken <https://github.com/openai/tiktoken>`_
+      In the Docker container, clone the `<https://github.com/ROCm/MAD>`__
+      repository and navigate to the benchmark scripts directory
+      ``/workspace/MAD/scripts/pytorch_train``.

-   * - ``blobfile``
-     - Llama 3.1 70B
-     - `blobfile <https://pypi.org/project/blobfile/>`_
+      .. code-block:: shell

-   * - ``tabulate``
-     - Llama 3.1 70B
-     - `tabulate <https://pypi.org/project/tabulate/>`_
+         git clone https://github.com/ROCm/MAD
+         cd MAD/scripts/pytorch_train

-   * - ``wandb``
-     - Llama 3.1 70B
-     - `Weights & Biases <https://github.com/wandb/wandb>`_
+      .. rubric:: Prepare training datasets and dependencies

-   * - ``sentencepiece``
-     - Llama 3.1 70B, FLUX
-     - `SentencePiece <https://github.com/google/sentencepiece>`_ 0.2.0
+      The following benchmarking examples require downloading models and datasets
+      from Hugging Face. To ensure successful access to gated repos, set your
+      ``HF_TOKEN``.

-   * - ``tensorboard``
-     - Llama 3.1 70 B, FLUX
-     - `TensorBoard <https://www.tensorflow.org/tensorboard>`_ 2.18.0
+      .. code-block:: shell

-   * - ``csvkit``
-     - FLUX
-     - `csvkit <https://csvkit.readthedocs.io/en/latest/>`_ 2.0.1
+         export HF_TOKEN=$your_personal_hugging_face_access_token

-   * - ``deepspeed``
-     - FLUX
-     - `DeepSpeed <https://github.com/deepspeedai/DeepSpeed>`_ 0.16.2
+      Run the setup script to install libraries and datasets needed for benchmarking.

-   * - ``diffusers``
-     - FLUX
-     - `Hugging Face Diffusers <https://huggingface.co/docs/diffusers/en/index>`_ 0.31.0
+      .. code-block:: shell

-   * - ``GitPython``
-     - FLUX
-     - `GitPython <https://github.com/gitpython-developers/GitPython>`_ 3.1.44
+         ./pytorch_benchmark_setup.sh

-   * - ``opencv-python-headless``
-     - FLUX
-     - `opencv-python-headless <https://pypi.org/project/opencv-python-headless/>`_ 4.10.0.84
+      ``pytorch_benchmark_setup.sh`` installs the following libraries:

-   * - ``peft``
-     - FLUX
-     - `PEFT <https://huggingface.co/docs/peft/en/index>`_ 0.14.0
+      .. list-table::
+         :header-rows: 1

-   * - ``protobuf``
-     - FLUX
-     - `Protocol Buffers <https://github.com/protocolbuffers/protobuf>`_ 5.29.2
+         * - Library
+           - Benchmark model
+           - Reference

-   * - ``pytest``
-     - FLUX
-     - `PyTest <https://docs.pytest.org/en/stable/>`_ 8.3.4
+         * - ``accelerate``
+           - Llama 3.1 8B, FLUX
+           - `Hugging Face Accelerate <https://huggingface.co/docs/accelerate/en/index>`_

-   * - ``python-dotenv``
-     - FLUX
-     - `python-dotenv <https://pypi.org/project/python-dotenv/>`_ 1.0.1
+         * - ``datasets``
+           - Llama 3.1 8B, 70B, FLUX
+           - `Hugging Face Datasets <https://huggingface.co/docs/datasets/v3.2.0/en/index>`_ 3.2.0

-   * - ``seaborn``
-     - FLUX
-     - `Seaborn <https://seaborn.pydata.org/>`_ 0.13.2
+         * - ``torchdata``
+           - Llama 3.1 70B
+           - `TorchData <https://pytorch.org/data/beta/index.html>`_

-   * - ``transformers``
-     - FLUX
-     - `Transformers <https://huggingface.co/docs/transformers/en/index>`_ 4.47.0
+         * - ``tomli``
+           - Llama 3.1 70B
+           - `Tomli <https://pypi.org/project/tomli/>`_

-``pytorch_benchmark_setup.sh`` downloads the following models from Hugging Face:
+         * - ``tiktoken``
+           - Llama 3.1 70B
+           - `tiktoken <https://github.com/openai/tiktoken>`_

-* `meta-llama/Llama-3.1-70B-Instruct <https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct>`_
+         * - ``blobfile``
+           - Llama 3.1 70B
+           - `blobfile <https://pypi.org/project/blobfile/>`_

-* `black-forest-labs/FLUX.1-dev <https://huggingface.co/black-forest-labs/FLUX.1-dev>`_
+         * - ``tabulate``
+           - Llama 3.1 70B
+           - `tabulate <https://pypi.org/project/tabulate/>`_

-Along with the following datasets:
+         * - ``wandb``
+           - Llama 3.1 70B
+           - `Weights & Biases <https://github.com/wandb/wandb>`_

-* `WikiText <https://huggingface.co/datasets/Salesforce/wikitext>`_
+         * - ``sentencepiece``
+           - Llama 3.1 70B, FLUX
+           - `SentencePiece <https://github.com/google/sentencepiece>`_ 0.2.0

-* `UltraChat 200k <https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k>`_
+         * - ``tensorboard``
+           - Llama 3.1 70 B, FLUX
+           - `TensorBoard <https://www.tensorflow.org/tensorboard>`_ 2.18.0

-* `bghira/pseudo-camera-10k <https://huggingface.co/datasets/bghira/pseudo-camera-10k>`_
+         * - ``csvkit``
+           - FLUX
+           - `csvkit <https://csvkit.readthedocs.io/en/latest/>`_ 2.0.1

-Getting started
-===============
+         * - ``deepspeed``
+           - FLUX
+           - `DeepSpeed <https://github.com/deepspeedai/DeepSpeed>`_ 0.16.2

-The prebuilt PyTorch with ROCm training environment allows users to quickly validate
-system performance, conduct training benchmarks, and achieve superior
-performance for models like Llama 3.1 and Llama 2. This container should not be
-expected to provide generalized performance across all training workloads. You
-can expect the container to perform in the model configurations described in
-the following section, but other configurations are not validated by AMD.
+         * - ``diffusers``
+           - FLUX
+           - `Hugging Face Diffusers <https://huggingface.co/docs/diffusers/en/index>`_ 0.31.0

-Use the following instructions to set up the environment, configure the script
-to train models, and reproduce the benchmark results on MI325X and MI300X
-accelerators with the AMD PyTorch training Docker image.
+         * - ``GitPython``
+           - FLUX
+           - `GitPython <https://github.com/gitpython-developers/GitPython>`_ 3.1.44

-Once your environment is set up, use the following commands and examples to start benchmarking.
+         * - ``opencv-python-headless``
+           - FLUX
+           - `opencv-python-headless <https://pypi.org/project/opencv-python-headless/>`_ 4.10.0.84

-Pretraining
-----------
+         * - ``peft``
+           - FLUX
+           - `PEFT <https://huggingface.co/docs/peft/en/index>`_ 0.14.0

-To start the pretraining benchmark, use the following command with the
-appropriate options. See the following list of options and their descriptions.
+         * - ``protobuf``
+           - FLUX
+           - `Protocol Buffers <https://github.com/protocolbuffers/protobuf>`_ 5.29.2

-.. code-block:: shell
+         * - ``pytest``
+           - FLUX
+           - `PyTest <https://docs.pytest.org/en/stable/>`_ 8.3.4

-   ./pytorch_benchmark_report.sh -t $training_mode -m $model_repo -p $datatype -s $sequence_length
+         * - ``python-dotenv``
+           - FLUX
+           - `python-dotenv <https://pypi.org/project/python-dotenv/>`_ 1.0.1

-Options and available models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+         * - ``seaborn``
+           - FLUX
+           - `Seaborn <https://seaborn.pydata.org/>`_ 0.13.2

-.. list-table::
-   :header-rows: 1
+         * - ``transformers``
+           - FLUX
+           - `Transformers <https://huggingface.co/docs/transformers/en/index>`_ 4.47.0

-   * - Name
-     - Options
-     - Description
+      ``pytorch_benchmark_setup.sh`` downloads the following models from Hugging Face:

-   * - ``$training_mode``
-     - ``pretrain``
-     - Benchmark pretraining
+      * `meta-llama/Llama-3.1-70B-Instruct <https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct>`_

-   * -
-     - ``finetune_fw``
-     - Benchmark full weight fine-tuning (Llama 3.1 70B with BF16)
+      * `black-forest-labs/FLUX.1-dev <https://huggingface.co/black-forest-labs/FLUX.1-dev>`_

-   * -
-     - ``finetune_lora``
-     - Benchmark LoRA fine-tuning (Llama 3.1 70B with BF16)
+      Along with the following datasets:

-   * -
-     - ``HF_finetune_lora``
-     - Benchmark LoRA fine-tuning with Hugging Face PEFT (Llama 2 70B with BF16)
+      * `WikiText <https://huggingface.co/datasets/Salesforce/wikitext>`_

-   * - ``$datatype``
-     - ``FP8`` or ``BF16``
-     - Only Llama 3.1 8B supports FP8 precision.
+      * `UltraChat 200k <https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k>`_

-   * - ``$model_repo``
-     - ``Llama-3.1-8B``
-     - `Llama 3.1 8B <https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct>`_
+      * `bghira/pseudo-camera-10k <https://huggingface.co/datasets/bghira/pseudo-camera-10k>`_

-   * - 
-     - ``Llama-3.1-70B``
-     - `Llama 3.1 70B <https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct>`_
+      .. rubric:: Pretraining

-   * - 
-     - ``Llama-2-70B``
-     - `Llama 2 70B <https://huggingface.co/meta-llama/Llama-2-70B>`_
+      To start the pretraining benchmark, use the following command with the
+      appropriate options. See the following list of options and their descriptions.

-   * - 
-     - ``Flux``
-     - `FLUX.1 [dev] <https://huggingface.co/black-forest-labs/FLUX.1-dev>`_
+      .. code-block:: shell

-   * - ``$sequence_length``
-     - Sequence length for the language model.
-     - Between 2048 and 8192. 8192 by default.
+         ./pytorch_benchmark_report.sh -t $training_mode -m $model_repo -p $datatype -s $sequence_length

-.. note::
+      .. list-table::
+         :header-rows: 1

-   Occasionally, downloading the Flux dataset might fail. In the event of this
-   error, manually download it from Hugging Face at
-   `black-forest-labs/FLUX.1-dev <https://huggingface.co/black-forest-labs/FLUX.1-dev>`_
-   and save it to `/workspace/FluxBenchmark`. This ensures that the test script can access
-   the required dataset.
+         * - Name
+           - Options
+           - Description

-Fine-tuning
-----------
+         * - ``$training_mode``
+           - ``pretrain``
+           - Benchmark pretraining

-To start the fine-tuning benchmark, use the following command. It will run the benchmarking example of Llama 3.1 70B
-with the WikiText dataset using the AMD fork of `torchtune <https://github.com/AMD-AIG-AIMA/torchtune>`_.
+         * -
+           - ``finetune_fw``
+           - Benchmark full weight fine-tuning (Llama 3.1 70B with BF16)

-.. code-block:: shell
+         * -
+           - ``finetune_lora``
+           - Benchmark LoRA fine-tuning (Llama 3.1 70B with BF16)

-   ./pytorch_benchmark_report.sh -t {finetune_fw, finetune_lora} -p BF16 -m Llama-3.1-70B
+         * -
+           - ``HF_finetune_lora``
+           - Benchmark LoRA fine-tuning with Hugging Face PEFT (Llama 2 70B with BF16)

-Use the following command to run the benchmarking example of Llama 2 70B with the UltraChat 200k dataset using
-`Hugging Face PEFT <https://huggingface.co/docs/peft/en/index>`_.
+         * - ``$datatype``
+           - ``FP8`` or ``BF16``
+           - Only Llama 3.1 8B supports FP8 precision.

-.. code-block:: shell
+         * - ``$model_repo``
+           - ``Llama-3.3-70B``
+           - `Llama 3.3 70B <https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct>`_

-   ./pytorch_benchmark_report.sh -t HF_finetune_lora -p BF16 -m Llama-2-70B
+         * - 
+           - ``Llama-3.1-8B``
+           - `Llama 3.1 8B <https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct>`_

-Benchmarking examples
---------------------
+         * - 
+           - ``Llama-3.1-70B``
+           - `Llama 3.1 70B <https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct>`_

-Here are some examples of how to use the command.
+         * - 
+           - ``Llama-2-70B``
+           - `Llama 2 70B <https://huggingface.co/meta-llama/Llama-2-70B>`_

-* Example 1: Llama 3.1 70B with BF16 precision with `torchtitan <https://github.com/ROCm/torchtitan>`_.
+         * - 
+           - ``Flux``
+           - `FLUX.1 [dev] <https://huggingface.co/black-forest-labs/FLUX.1-dev>`_

-  .. code-block:: shell
+         * - ``$sequence_length``
+           - Sequence length for the language model.
+           - Between 2048 and 8192. 8192 by default.

-     ./pytorch_benchmark_report.sh -t pretrain -p BF16 -m Llama-3.1-70B -s 8192
+      .. note::

-* Example 2: Llama 3.1 8B with FP8 precision using Transformer Engine (TE) and Hugging Face Accelerator.
+         Occasionally, downloading the Flux dataset might fail. In the event of this
+         error, manually download it from Hugging Face at
+         `black-forest-labs/FLUX.1-dev <https://huggingface.co/black-forest-labs/FLUX.1-dev>`_
+         and save it to `/workspace/FluxBenchmark`. This ensures that the test script can access
+         the required dataset.

-  .. code-block:: shell
+      .. rubric:: Fine-tuning

-     ./pytorch_benchmark_report.sh -t pretrain -p FP8 -m Llama-3.1-70B -s 8192
+      To start the fine-tuning benchmark, use the following command. It will run the benchmarking example of Llama 3.1 70B
+      with the WikiText dataset using the AMD fork of `torchtune <https://github.com/AMD-AIG-AIMA/torchtune>`_.

-* Example 3: FLUX.1-dev with BF16 precision with FluxBenchmark.
+      .. code-block:: shell

-  .. code-block:: shell
+         ./pytorch_benchmark_report.sh -t {finetune_fw, finetune_lora} -p BF16 -m Llama-3.1-70B

-     ./pytorch_benchmark_report.sh -t pretrain -p BF16 -m Flux
+      Use the following command to run the benchmarking example of Llama 2 70B with the UltraChat 200k dataset using
+      `Hugging Face PEFT <https://huggingface.co/docs/peft/en/index>`_.

-* Example 4: Torchtune full weight fine-tuning with Llama 3.1 70B
+      .. code-block:: shell

-  .. code-block:: shell
+         ./pytorch_benchmark_report.sh -t HF_finetune_lora -p BF16 -m Llama-2-70B

-     ./pytorch_benchmark_report.sh -t finetune_fw -p BF16 -m Llama-3.1-70B
+      .. rubric:: Benchmarking examples

-* Example 5: Torchtune LoRA fine-tuning with Llama 3.1 70B
+      Here are some example commands to get started pretraining and fine-tuning with various model configurations.

-  .. code-block:: shell
+      * Example 1: Llama 3.1 70B with BF16 precision with `torchtitan <https://github.com/ROCm/torchtitan>`_.

-     ./pytorch_benchmark_report.sh -t finetune_lora -p BF16 -m Llama-3.1-70B
+        .. code-block:: shell

-* Example 6: Hugging Face PEFT LoRA fine-tuning with Llama 2 70B
+           ./pytorch_benchmark_report.sh -t pretrain -p BF16 -m Llama-3.1-70B -s 8192

-  .. code-block:: shell
+      * Example 2: Llama 3.1 8B with FP8 precision using Transformer Engine (TE) and Hugging Face Accelerator.

-     ./pytorch_benchmark_report.sh -t HF_finetune_lora -p BF16 -m Llama-2-70B
+        .. code-block:: shell
+
+           ./pytorch_benchmark_report.sh -t pretrain -p FP8 -m Llama-3.1-70B -s 8192
+
+      * Example 3: FLUX.1-dev with BF16 precision with FluxBenchmark.
+
+        .. code-block:: shell
+
+           ./pytorch_benchmark_report.sh -t pretrain -p BF16 -m Flux
+
+      * Example 4: Torchtune full weight fine-tuning with Llama 3.1 70B
+
+        .. code-block:: shell
+
+           ./pytorch_benchmark_report.sh -t finetune_fw -p BF16 -m Llama-3.1-70B
+
+      * Example 5: Torchtune LoRA fine-tuning with Llama 3.1 70B
+
+        .. code-block:: shell
+
+           ./pytorch_benchmark_report.sh -t finetune_lora -p BF16 -m Llama-3.1-70B
+
+      * Example 6: Torchtune full weight fine-tuning with Llama-3.3-70B
+
+        .. code-block:: shell
+
+           ./pytorch_benchmark_report.sh -t finetune_fw -p BF16 -m Llama-3.3-70B
+
+      * Example 7: Torchtune LoRA fine-tuning with Llama-3.3-70B
+
+        .. code-block:: shell
+
+           ./pytorch_benchmark_report.sh -t finetune_lora -p BF16 -m Llama-3.3-70B
+
+      * Example 8: Torchtune QLoRA fine-tuning with Llama-3.3-70B
+
+        .. code-block:: shell
+
+           ./pytorch_benchmark_report.sh -t finetune_qlora -p BF16 -m Llama-3.3-70B
+
+      * Example 9: Hugging Face PEFT LoRA fine-tuning with Llama 2 70B
+
+        .. code-block:: shell
+
+           ./pytorch_benchmark_report.sh -t HF_finetune_lora -p BF16 -m Llama-2-70B

 Previous versions
 =================
@@ -399,6 +439,13 @@ benchmarking, see the version-specific documentation.
     - PyTorch version
     - Resources

+   * - v25.4
+     - 6.3.0
+     - 2.7.0a0+git637433
+     - 
+       * `Documentation <https://rocm.docs.amd.com/en/docs-6.3.4/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.html>`_
+       * `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-training/v25.4/images/sha256-fa98a9aa69968e654466c06f05aaa12730db79b48b113c1ab4f7a5fe6920a20b>`_
+
   * - v25.3
     - 6.3.0
     - 2.7.0a0+git637433
--- a/docs/sphinx/_toc.yml.in
+++ b/docs/sphinx/_toc.yml.in
@@ -75,7 +75,9 @@ subtrees:
          - file: how-to/rocm-for-ai/inference/llm-inference-frameworks.rst
            title: LLM inference frameworks
          - file: how-to/rocm-for-ai/inference/vllm-benchmark.rst
-            title: Performance testing
+            title: vLLM inference performance testing
+          - file: how-to/rocm-for-ai/inference/pytorch-inference-benchmark.rst
+            title: PyTorch inference performance testing
          - file: how-to/rocm-for-ai/inference/deploy-your-model.rst
            title: Deploy your model

--- a/docs/sphinx/requirements.txt
+++ b/docs/sphinx/requirements.txt
@@ -2,54 +2,55 @@
 # This file is autogenerated by pip-compile with Python 3.10
 # by the following command:
 #
-#    pip-compile docs/sphinx/requirements.in
+#    pip-compile requirements.in
 #
 accessible-pygments==0.0.5
    # via pydata-sphinx-theme
 alabaster==1.0.0
    # via sphinx
+appnope==0.1.4
+    # via ipykernel
 asttokens==3.0.0
    # via stack-data
-attrs==25.1.0
+attrs==25.3.0
    # via
    #   jsonschema
    #   jupyter-cache
    #   referencing
-babel==2.16.0
+babel==2.17.0
    # via
    #   pydata-sphinx-theme
    #   sphinx
-beautifulsoup4==4.12.3
+beautifulsoup4==4.13.3
    # via pydata-sphinx-theme
-breathe==4.35.0
+breathe==4.36.0
    # via rocm-docs-core
-certifi==2024.8.30
+certifi==2025.1.31
    # via requests
 cffi==1.17.1
    # via
    #   cryptography
    #   pynacl
-charset-normalizer==3.4.0
+charset-normalizer==3.4.1
    # via requests
-click==8.1.7
+click==8.1.8
    # via
    #   jupyter-cache
    #   sphinx-external-toc
 comm==0.2.2
    # via ipykernel
-cryptography==44.0.1
+cryptography==44.0.2
    # via pyjwt
-debugpy==1.8.12
+debugpy==1.8.14
    # via ipykernel
-decorator==5.1.1
+decorator==5.2.1
    # via ipython
 defusedxml==0.7.1
    # via sphinxcontrib-datatemplates
-deprecated==1.2.15
+deprecated==1.2.18
    # via pygithub
 docutils==0.21.2
    # via
-    #   breathe
    #   myst-parser
    #   pydata-sphinx-theme
    #   sphinx
@@ -57,16 +58,14 @@ exceptiongroup==1.2.2
    # via ipython
 executing==2.2.0
    # via stack-data
-fastjsonschema==2.20.0
+fastjsonschema==2.21.1
    # via
    #   nbformat
    #   rocm-docs-core
-gitdb==4.0.11
+gitdb==4.0.12
    # via gitpython
-gitpython==3.1.43
+gitpython==3.1.44
    # via rocm-docs-core
-greenlet==3.1.1
-    # via sqlalchemy
 idna==3.10
    # via requests
 imagesize==1.4.1
@@ -77,7 +76,7 @@ importlib-metadata==8.6.1
    #   myst-nb
 ipykernel==6.29.5
    # via myst-nb
-ipython==8.31.0
+ipython==8.35.0
    # via
    #   ipykernel
    #   myst-nb
@@ -117,9 +116,9 @@ mdit-py-plugins==0.4.2
    # via myst-parser
 mdurl==0.1.2
    # via markdown-it-py
-myst-nb==1.1.2
+myst-nb==1.2.0
    # via rocm-docs-core
-myst-parser==4.0.0
+myst-parser==4.0.1
    # via myst-nb
 nbclient==0.10.2
    # via
@@ -135,16 +134,17 @@ nest-asyncio==1.6.0
 packaging==24.2
    # via
    #   ipykernel
+    #   pydata-sphinx-theme
    #   sphinx
 parso==0.8.4
    # via jedi
 pexpect==4.9.0
    # via ipython
-platformdirs==4.3.6
+platformdirs==4.3.7
    # via jupyter-core
 prompt-toolkit==3.0.50
    # via ipython
-psutil==6.1.1
+psutil==7.0.0
    # via ipykernel
 ptyprocess==0.7.0
    # via pexpect
@@ -152,19 +152,19 @@ pure-eval==0.2.3
    # via stack-data
 pycparser==2.22
    # via cffi
-pydata-sphinx-theme==0.16.0
+pydata-sphinx-theme==0.15.4
    # via
    #   rocm-docs-core
    #   sphinx-book-theme
-pygithub==2.5.0
+pygithub==2.6.1
    # via rocm-docs-core
-pygments==2.18.0
+pygments==2.19.1
    # via
    #   accessible-pygments
    #   ipython
    #   pydata-sphinx-theme
    #   sphinx
-pyjwt[crypto]==2.10.0
+pyjwt[crypto]==2.10.1
    # via pygithub
 pynacl==1.5.0
    # via pygithub
@@ -178,7 +178,7 @@ pyyaml==6.0.2
    #   rocm-docs-core
    #   sphinx-external-toc
    #   sphinxcontrib-datatemplates
-pyzmq==26.2.0
+pyzmq==26.4.0
    # via
    #   ipykernel
    #   jupyter-client
@@ -192,13 +192,13 @@ requests==2.32.3
    #   sphinx
 rocm-docs-core==1.18.2
    # via -r requirements.in
-rpds-py==0.22.3
+rpds-py==0.24.0
    # via
    #   jsonschema
    #   referencing
 six==1.17.0
    # via python-dateutil
-smmap==5.0.1
+smmap==5.0.2
    # via gitdb
 snowballstemmer==2.2.0
    # via sphinx
@@ -220,7 +220,7 @@ sphinx==8.1.3
    #   sphinx-sitemap
    #   sphinxcontrib-datatemplates
    #   sphinxcontrib-runcmd
-sphinx-book-theme==1.1.3
+sphinx-book-theme==1.1.4
    # via rocm-docs-core
 sphinx-copybutton==0.5.2
    # via rocm-docs-core
@@ -228,7 +228,7 @@ sphinx-design==0.6.1
    # via rocm-docs-core
 sphinx-external-toc==1.0.1
    # via rocm-docs-core
-sphinx-notfound-page==1.0.4
+sphinx-notfound-page==1.1.0
    # via rocm-docs-core
 sphinx-reredirects==0.1.6
    # via -r requirements.in
@@ -250,13 +250,13 @@ sphinxcontrib-runcmd==0.2.0
    # via sphinxcontrib-datatemplates
 sphinxcontrib-serializinghtml==2.0.0
    # via sphinx
-sqlalchemy==2.0.37
+sqlalchemy==2.0.40
    # via jupyter-cache
 stack-data==0.6.3
    # via ipython
 tabulate==0.9.0
    # via jupyter-cache
-tomli==2.1.0
+tomli==2.2.1
    # via sphinx
 tornado==6.4.2
    # via
@@ -272,21 +272,22 @@ traitlets==5.14.3
    #   matplotlib-inline
    #   nbclient
    #   nbformat
-typing-extensions==4.12.2
+typing-extensions==4.13.2
    # via
+    #   beautifulsoup4
    #   ipython
    #   myst-nb
    #   pydata-sphinx-theme
    #   pygithub
    #   referencing
    #   sqlalchemy
-urllib3==2.2.3
+urllib3==2.4.0
    # via
    #   pygithub
    #   requests
 wcwidth==0.2.13
    # via prompt-toolkit
-wrapt==1.17.0
+wrapt==1.17.2
    # via deprecated
 zipp==3.21.0
    # via importlib-metadata
Author	SHA1	Message	Date
randyh62	6ee29dea72	Update RELEASE.md Update deprecation notice for `roc-obj` tools in HIP	2025-04-25 17:16:57 -07:00
Peter Park	547bb41f6d	Merge pull request #4686 from peterjunpark/docs/6.4.0 Update pytorch-inference-benchmark.rst and vllm-benchmark.rst (#4685) (#4684) (#4689) (#4653)	2025-04-24 18:05:55 -04:00
Peter Park	4f86b2801a	Update vLLM inference benchmark Docker guide (#4653 ) * Remove JAIS 13B and 30B * update Docker details - vLLM 0.8.3 * add previous version * Update docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst * fix link to previous version (cherry picked from commit `40e4ba3ecc`)	2025-04-24 17:57:05 -04:00
Peter Park	9c07ed1726	fix link to previous version in vllm-benchmark.rst (#4689 ) (cherry picked from commit `a66bc1d85e`)	2025-04-24 17:54:30 -04:00
Peter Park	34ca259220	Add QwQ 32B to vllm-benchmark.rst (#4685 ) * Add Qwen2 MoE 2.7B to vllm-benchmark-models.yaml * Add QwQ-32B-Preview to vllm-benchmark-models.yaml * add links to performance results words * change "performance validation" to "performance testing" * remove "-Preview" from QwQ-32B * move qwen2 MoE after qwen2 * add TunableOp section * fix formatting * add link to TunableOp doc * add tunableop note * fix vllm-benchmark template * remove cmdline option for --tunableop on * update docker details * remove "training" * remove qwen2 (cherry picked from commit `36b6ffaf7c`)	2025-04-24 16:46:48 -04:00
Peter Park	d04443ac13	Add note for chai-1 benchmark Docker in pytorch-inference-benchmark.rst (#4684 ) (cherry picked from commit `1f41ce26be`)	2025-04-24 16:45:33 -04:00
Peter Park	d0c2a23d3a	Merge pull request #4675 from peterjunpark/docs/6.4.0 [docs/6.4.0] Add PyTorch inference benchmark Docker guide (+ CLIP and Chai-1) (#4654)	2025-04-23 17:46:53 -04:00
Peter Park	311b4cd62b	Add PyTorch inference benchmark Docker guide (+ CLIP and Chai-1) (#4654 ) * update vLLM links in deploy-your-model.rst * add pytorch inference benchmark doc * update toc and vLLM title * remove previous versions * update * wording * fix link and "applies to" * add pytorch to wordlist * add tunableop note to clip * make tunableop note appear to all models * Update docs/how-to/rocm-for-ai/inference/pytorch-inference-benchmark.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update docs/how-to/rocm-for-ai/inference/pytorch-inference-benchmark.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update docs/how-to/rocm-for-ai/inference/pytorch-inference-benchmark.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update docs/how-to/rocm-for-ai/inference/pytorch-inference-benchmark.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * fix incorrect links * wording * fix wrong docker pull tag --------- Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> (cherry picked from commit `c3faa9670b`)	2025-04-23 17:36:25 -04:00
Pratik Basyal	97b3cdda9c	Broken link fixed (#4673 ) (#4674 )	2025-04-23 14:35:26 -04:00
Pratik Basyal	61eb483a5e	Post GA known issue update 640 (#4672 ) * Link update (#4591) * Known issue for installation failure in 6.4.0 added (#4666) * Known issue for installation failure added * Github issue No. added * Typo fixed * Feedback from Anush updated * Minor change * Feedback from Fai added * Public Issue No. updated * Minor change	2025-04-23 12:39:30 -04:00
Peter Park	f766b823c3	[docs/6.4.0] Update ML framework Docker compatibility docs and fix broken link (#4668 ) * fix link to Dockerfile.rocm (#4573) (cherry picked from commit `310864e653`) * Update ML framework Docker compatibility docs for 6.4.0 (#4667) * update pytorch-compatibility.rst * update tensorflow compat fix * update jax and jax-community docker versions (cherry picked from commit `b29b3592bd`)	2025-04-22 16:26:03 -04:00
Peter Park	d2ccd706a5	Update ML framework Docker compatibility docs for 6.4.0 (#4667 ) * update pytorch-compatibility.rst * update tensorflow compat fix * update jax and jax-community docker versions (cherry picked from commit `b29b3592bd`)	2025-04-22 16:17:24 -04:00
Peter Park	699f668a2b	fix link to Dockerfile.rocm (#4573 ) (cherry picked from commit `310864e653`)	2025-04-22 14:09:35 -04:00
Pratik Basyal	3bc09b6faa	615 column added to historical compatibility matrix in ROCm 640 (#4655 ) * 6.1.5 column added and broken link fixed	2025-04-17 11:50:32 -04:00
Peter Park	3e3b8989f8	Merge pull request #4639 from peterjunpark/docs/6.4.0 [docs/6.4.0] Update PyTorch training Docker doc for 25.5 (#4638)	2025-04-15 18:27:16 -04:00
Peter Park	824d760646	Update PyTorch training Docker doc for 25.5 (#4638 ) * update pytorch-training to 25.5 * remove llama 2 * Revert "remove llama 2" This reverts commit dab672fa7bcbd8bff730382c14177df4301a537d. * add previous version * fix run cmd * add link to docker hub * fix linting issue * add Llama 3.3 70B * update (cherry picked from commit `9ff3c2c885`)	2025-04-15 18:17:06 -04:00
Peter Park	d0862bdfc5	Merge pull request #4630 from peterjunpark/docs/6.4.0 [docs/6.4.0] Fix vllm Dockerfile.rocm path (#4628)	2025-04-15 11:33:44 -04:00
Peter Park	cb412a7a7f	Fix vllm Dockerfile.rocm path (#4628 ) (cherry picked from commit `d057d49af1`)	2025-04-15 11:28:09 -04:00
Pratik Basyal	78f5c18837	GitHub link to component in highlights changed to documentation reference in docs/6.4.0 (#4627 ) * Link update (#4591) * GitHub link to component in highlights changed to documentation reference in develop (#4626) * GitHub link to component in highlights changed to documentation * Removed entry from ROCm Compute Profiler * Jeff's feedback added Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * List updated --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * Links corrected * Additional note corrected --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>	2025-04-15 10:32:08 -04:00
randyh62	0bc0dfd8da	Update RELEASE.md (#4621 ) Change AMDGCN_WAVEFRONT_SIZE URL to point to 6.4.0	2025-04-14 09:36:47 -07:00
Pratik Basyal	63682eaf86	640 GitHub issue update (#4618 ) * Link update (#4591) * 640 known issue GitHub link update in develop (#4617) * Date updated * GitHub issue links added	2025-04-11 21:51:39 -04:00
Peter Park	75f84536d9	Merge pull request #4601 from peterjunpark/docs/6.4.0 Fix word (#4600)	2025-04-11 18:13:12 -04:00
Peter Park	50d41f633c	Fix word (#4600 ) (cherry picked from commit `eb090b8788`)	2025-04-11 18:09:16 -04:00
Peter Park	62d20c8581	Blog link update to 6.4.0 release notes #4596 (#4599 ) Blog link update to 6.4.0 release notes (cherry picked from commit `af18a170bc`) Co-authored-by: Pratik Basyal <prbasyal@amd.com>	2025-04-11 17:54:07 -04:00
Peter Park	0e54b2d006	Merge pull request #4595 from peterjunpark/docs/6.4.0 [6.4.0] Update KMD versions in compat matrix (#4594)	2025-04-11 16:52:40 -04:00
Peter Park	d1b426f2d0	Update KMD versions in compat matrix (#4594 ) * update KMD versions in compat matrix * update historical compat matrix (cherry picked from commit `656db2bc84`)	2025-04-11 16:49:12 -04:00
Pratik Basyal	639e2dc232	Release notes Link update 640 branch (#4593 ) * Link update (#4591) * Date updated	2025-04-11 16:26:26 -04:00
Peter Park	5bf3f6c059	fix links to amdsmi and rocmsmi changelogs (#4590 )	2025-04-11 15:48:29 -04:00
Parag Bhandari	abe86d3f14	Merge branch 'develop' into docs/6.4.0	2025-04-11 15:27:48 -04:00
Parag Bhandari	5104389ab3	Merge branch 'develop' into docs/6.4.0	2025-04-11 15:15:54 -04:00
Parag Bhandari	6b71afe8a2	Merge branch 'develop' into docs/6.4.0	2025-04-11 14:36:57 -04:00
pbhandar-amd	d2c914d477	Update documentation requirements	2025-04-11 10:28:37 -04:00
pbhandar-amd	15298c51cb	Sync develop into docs/6.4.0	2025-04-11 10:17:24 -04:00
pbhandar-amd	a9fe4dd2bb	Sync develop into docs/6.4.0	2025-04-09 18:42:35 -04:00