[External CI] Add SIMDe dev package to HIP runtime pipeline

Bump pynacl from 1.6.1 to 1.6.2 in /docs/sphinx (#5836 )
Bumps [pynacl](https://github.com/pyca/pynacl) from 1.6.1 to 1.6.2. - [Changelog](https://github.com/pyca/pynacl/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pyca/pynacl/compare/1.6.1...1.6.2) --- updated-dependencies: - dependency-name: pynacl dependency-version: 1.6.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-11 15:47:59 -05:00 · 2026-01-07 10:25:18 -05:00 · 2026-01-06 14:10:42 -05:00 · 2026-01-02 16:06:27 -05:00 · 2025-12-29 10:26:25 -05:00 · 2025-12-29 08:44:45 -05:00
24 changed files with 527 additions and 439 deletions
--- a/.azuredevops/components/HIP.yml
+++ b/.azuredevops/components/HIP.yml
@@ -34,6 +34,7 @@ parameters:
  default:
    - cmake
    - libnuma-dev
+    - libsimde-dev
    - mesa-common-dev
    - ninja-build
    - ocl-icd-libopencl1
--- a/.azuredevops/components/origami.yml
+++ b/.azuredevops/components/origami.yml
@@ -39,6 +39,7 @@ parameters:
    - python3
    - python3-dev
    - python3-pip
+    - python3-venv
    - libgtest-dev
    - libboost-filesystem-dev
    - libboost-program-options-dev
@@ -46,6 +47,8 @@ parameters:
  type: object
  default:
    - nanobind>=2.0.0
+    - pytest
+    - pytest-cov
 - name: rocmDependencies
  type: object
  default:
@@ -72,8 +75,10 @@ parameters:
      - { os: ubuntu2204, packageManager: apt }
      - { os: almalinux8, packageManager: dnf }
    testJobs:
-      - { os: ubuntu2204, packageManager: apt, target: gfx942 }
      - { os: ubuntu2204, packageManager: apt, target: gfx90a }
+      # - { os: ubuntu2204, packageManager: apt, target: gfx1100 }
+      # - { os: ubuntu2204, packageManager: apt, target: gfx1151 }
+      # - { os: ubuntu2204, packageManager: apt, target: gfx1201 }
 - name: downstreamComponentMatrix
  type: object
  default:
@@ -116,6 +121,11 @@ jobs:
      parameters:
        dependencyList:
          - gtest
+    - ${{ if ne(job.os, 'almalinux8') }}:
+      - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-vendor.yml
+        parameters:
+          dependencyList:
+            - catch2
    - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
      parameters:
        checkoutRepo: ${{ parameters.checkoutRepo }}
@@ -137,6 +147,7 @@ jobs:
          -DORIGAMI_BUILD_SHARED_LIBS=ON
          -DORIGAMI_ENABLE_PYTHON=ON
          -DORIGAMI_BUILD_TESTING=ON
+          -DORIGAMI_ENABLE_FETCH=ON
          -GNinja
    - ${{ if ne(job.os, 'almalinux8') }}:
      - task: PublishPipelineArtifact@1
@@ -169,7 +180,6 @@ jobs:
      dependsOn: origami_build_${{ job.os }}
      condition:
        and(succeeded(),
-          eq(variables['ENABLE_${{ upper(job.target) }}_TESTS'], 'true'),
          not(containsValue(split(variables['DISABLED_${{ upper(job.target) }}_TESTS'], ','), '${{ parameters.componentName }}')),
          eq(${{ parameters.aggregatePipeline }}, False)
        )
@@ -180,30 +190,30 @@ jobs:
      workspace:
        clean: all
      steps:
-      - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
-        parameters:
-          checkoutRepo: ${{ parameters.checkoutRepo }}
-          sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
      - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-other.yml
        parameters:
          aptPackages: ${{ parameters.aptPackages }}
          pipModules: ${{ parameters.pipModules }}
          packageManager: ${{ job.packageManager }}
+      - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-cmake-custom.yml
      - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/preamble.yml
+      - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/checkout.yml
+        parameters:
+          checkoutRepo: ${{ parameters.checkoutRepo }}
+          sparseCheckoutDir: ${{ parameters.sparseCheckoutDir }}
+      - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-vendor.yml
+        parameters:
+          dependencyList:
+            - gtest
+      - ${{ if ne(job.os, 'almalinux8') }}:
+        - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-vendor.yml
+          parameters:
+            dependencyList:
+              - catch2
      - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/local-artifact-download.yml
        parameters:
          preTargetFilter: ${{ parameters.componentName }}
          os: ${{ job.os }}
-      - task: DownloadPipelineArtifact@2
-        displayName: 'Download Build Directory Artifact'
-        inputs:
-          artifact: '${{ parameters.componentName }}_${{ job.os }}_build_dir'
-          path: '$(Agent.BuildDirectory)/s/build'
-      - task: DownloadPipelineArtifact@2
-        displayName: 'Download Python Source Artifact'
-        inputs:
-          artifact: '${{ parameters.componentName }}_${{ job.os }}_python_src'
-          path: '$(Agent.BuildDirectory)/s/python'
      - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/dependencies-rocm.yml
        parameters:
          checkoutRef: ${{ parameters.checkoutRef }}
@@ -212,25 +222,72 @@ jobs:
          gpuTarget: ${{ job.target }}
          ${{ if parameters.triggerDownstreamJobs }}:
            downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
+      - task: CMake@1
+        displayName: 'Origami Test CMake Configuration'
+        inputs:
+          cmakeArgs: >-
+            -DCMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm;$(Agent.BuildDirectory)/vendor
+            -DCMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
+            -DORIGAMI_BUILD_SHARED_LIBS=ON
+            -DORIGAMI_ENABLE_PYTHON=ON
+            -DORIGAMI_BUILD_TESTING=ON
+            -GNinja
+            $(Agent.BuildDirectory)/s
+      - task: Bash@3
+        displayName: 'Build Origami Tests and Python Bindings'
+        inputs:
+          targetType: inline
+          workingDirectory: build
+          script: |
+            cmake --build . --target origami-tests origami_python -- -j$(nproc)
      - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
+      # Run tests using CTest (discovers and runs both C++ and Python tests)
      - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/test.yml
        parameters:
          componentName: ${{ parameters.componentName }}
          os: ${{ job.os }}
-          testDir: '$(Agent.BuildDirectory)/rocm/bin'
-          testExecutable: './origami-tests'
-          testParameters: '--yaml origami-tests.yaml --gtest_output=xml:./test_output.xml --gtest_color=yes'
-      - script: |
-          set -e
-          export PYTHONPATH=$(Agent.BuildDirectory)/s/build/python:$PYTHONPATH
-
-          echo "--- Running origami_test.py ---"
-          python3 $(Agent.BuildDirectory)/s/python/origami_test.py
-          
-          echo "--- Running origami_grid_test.py ---"
-          python3 $(Agent.BuildDirectory)/s/python/origami_grid_test.py
-        displayName: 'Run Python Binding Tests'
-        condition: succeeded()
+          testDir: 'build'
+          testParameters: '--output-on-failure --force-new-ctest-process --output-junit test_output.xml'
+      # Test pip install workflow
+      # - task: Bash@3
+      #   displayName: 'Test Pip Install'
+      #   inputs:
+      #     targetType: inline
+      #     script: |
+      #       set -e
+            
+      #       echo "==================================================================="
+      #       echo "Testing pip install workflow (pip install -e .)"
+      #       echo "==================================================================="
+            
+      #       # Set environment variables for pip install CMake build
+      #       export ROCM_PATH=$(Agent.BuildDirectory)/rocm
+      #       export CMAKE_PREFIX_PATH=$(Agent.BuildDirectory)/rocm:$(Agent.BuildDirectory)/vendor
+      #       export CMAKE_CXX_COMPILER=$(Agent.BuildDirectory)/rocm/llvm/bin/amdclang++
+            
+      #       echo "ROCM_PATH: $ROCM_PATH"
+      #       echo "CMAKE_PREFIX_PATH: $CMAKE_PREFIX_PATH"
+      #       echo "CMAKE_CXX_COMPILER: $CMAKE_CXX_COMPILER"
+      #       echo ""
+            
+      #       # Install from source directory
+      #       cd "$(Agent.BuildDirectory)/s/python"
+      #       pip install -e .
+            
+      #       # Verify import works
+      #       echo ""
+      #       echo "Verifying origami can be imported..."
+      #       python3 -c "import origami; print('✓ Successfully imported origami')"
+            
+      #       # Run pytest on installed package
+      #       echo ""
+      #       echo "Running pytest tests..."
+      #       python3 -m pytest tests/ -v -m "not slow" --tb=short
+            
+      #       echo ""
+      #       echo "==================================================================="
+      #       echo "Pip install test completed successfully"
+      #       echo "==================================================================="
      - template: ${{ variables.CI_TEMPLATE_PATH }}/steps/docker-container.yml
        parameters:
          aptPackages: ${{ parameters.aptPackages }}
--- a/.azuredevops/components/rocm-examples.yml
+++ b/.azuredevops/components/rocm-examples.yml
@@ -30,6 +30,7 @@ parameters:
    - python3-pip
    - protobuf-compiler
    - libprotoc-dev
+    - libopencv-dev
 - name: pipModules
  type: object
  default:
@@ -64,6 +65,7 @@ parameters:
    - MIVisionX
    - rocm_smi_lib
    - rccl
+    - rocAL
    - rocALUTION
    - rocBLAS
    - rocDecode
@@ -103,6 +105,7 @@ parameters:
    - MIVisionX
    - rocm_smi_lib
    - rccl
+    - rocAL
    - rocALUTION
    - rocBLAS
    - rocDecode
--- a/.wordlist.txt
+++ b/.wordlist.txt
@@ -36,7 +36,6 @@ Andrej
 Arb
 Autocast
 autograd
-Backported
 BARs
 BatchNorm
 BLAS
@@ -204,11 +203,9 @@ GenAI
 GenZ
 GitHub
 Gitpod
-hardcoded
 HBM
 HCA
 HGX
-HLO
 HIPCC
 hipDataType
 HIPExtension
@@ -336,7 +333,6 @@ MoEs
 Mooncake
 Mpops
 Multicore
-multihost
 Multithreaded
 mx
 MXFP
@@ -1031,7 +1027,6 @@ uncacheable
 uncorrectable
 underoptimized
 unhandled
-unfused
 uninstallation
 unmapped
 unsqueeze
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -270,26 +270,26 @@ The [ROCm examples repository](https://github.com/ROCm/rocm-examples) has been e
 :margin: auto 0 auto auto
 :::{grid}
 :margin: auto 0 auto auto
-* [hipBLASLt](https://github.com/ROCm/rocm-examples/tree/amd-staging/Libraries/hipBLASLt)
-* [hipSPARSE](https://github.com/ROCm/rocm-examples/tree/amd-staging/Libraries/hipSPARSE)
-* [hipSPARSELt](https://github.com/ROCm/rocm-examples/tree/amd-staging/Libraries/hipSPARSELt)
-* [hipTensor](https://github.com/ROCm/rocm-examples/tree/amd-staging/Libraries/hipTensor)
+* [hipBLASLt](https://rocm.docs.amd.com/projects/hipBLASLt/en/latest/)
+* [hipSPARSE](https://rocm.docs.amd.com/projects/hipSPARSE/en/latest/)
+* [hipSPARSELt](https://rocm.docs.amd.com/projects/hipSPARSELt/en/latest/)
+* [hipTensor](https://rocm.docs.amd.com/projects/hipTensor/en/latest/)
 :::
 :::{grid}
 :margin: auto 0 auto auto
-* [rocALUTION](https://github.com/ROCm/rocm-examples/tree/amd-staging/Libraries/rocALUTION)
-* [ROCprofiler-SDK](https://github.com/ROCm/rocm-examples/tree/amd-staging/Libraries/rocProfiler-SDK)
-* [rocWMMA](https://github.com/ROCm/rocm-examples/tree/amd-staging/Libraries/rocWMMA)
+* [rocALUTION](https://rocm.docs.amd.com/projects/rocALUTION/en/latest/)
+* [ROCprofiler-SDK](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/)
+* [rocWMMA](https://rocm.docs.amd.com/projects/rocWMMA/en/latest/)
 :::
 ::::

 Usage examples are now available for the following performance analysis tools:

-* [ROCm Compute Profiler](https://github.com/ROCm/rocm-examples/tree/amd-staging/Tools/rocprof-compute)
-* [ROCm Systems Profiler](https://github.com/ROCm/rocm-examples/tree/amd-staging/Tools/rocprof-systems)
-* [rocprofv3](https://github.com/ROCm/rocm-examples/tree/amd-staging/Tools/rocprofv3)
+* [ROCm Compute Profiler](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/index.html)
+* [ROCm Systems Profiler](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/index.html)
+* [rocprofv3](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/how-to/using-rocprofv3.html)

-The complete source code for the [HIP Graph Tutorial](https://github.com/ROCm/rocm-examples/tree/amd-staging/HIP-Doc/Tutorials/graph_api) is also available as part of the ROCm examples.
+The complete source code for the [HIP Graph Tutorial](https://rocm.docs.amd.com/projects/HIP/en/latest/tutorial/graph_api.html) is also available as part of the ROCm examples.

 ### ROCm documentation updates

--- a/docs/compatibility/compatibility-matrix-historical-6.0.csv
+++ b/docs/compatibility/compatibility-matrix-historical-6.0.csv
@@ -37,7 +37,7 @@ ROCm Version,7.1.1,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6
      :doc:`Stanford Megatron-LM <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>` [#stanford-megatron-lm_compat-past-60]_,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,85f95ae,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
      :doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>` [#dgl_compat-past-60]_,N/A,N/A,N/A,2.4.0,2.4.0,N/A,N/A,2.4.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
      :doc:`Megablocks <../compatibility/ml-compatibility/megablocks-compatibility>` [#megablocks_compat-past-60]_,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,0.7.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
-      :doc:`Ray <../compatibility/ml-compatibility/ray-compatibility>` [#ray_compat-past-60]_,N/A,N/A,N/A,2.51.1,N/A,N/A,2.48.0.post0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
+      :doc:`Ray <../compatibility/ml-compatibility/ray-compatibility>` [#ray_compat-past-60]_,N/A,N/A,N/A,N/A,N/A,N/A,2.48.0.post0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
      :doc:`llama.cpp <../compatibility/ml-compatibility/llama-cpp-compatibility>` [#llama-cpp_compat-past-60]_,N/A,N/A,N/A,b6652,b6356,b6356,b6356,b5997,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
      :doc:`FlashInfer <../compatibility/ml-compatibility/flashinfer-compatibility>` [#flashinfer_compat-past-60]_,N/A,N/A,N/A,N/A,N/A,N/A,v0.2.5,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
      `ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.23.1,1.22.0,1.22.0,1.22.0,1.20.0,1.20.0,1.20.0,1.20.0,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.14.1,1.14.1
--- a/docs/compatibility/compatibility-matrix.rst
+++ b/docs/compatibility/compatibility-matrix.rst
@@ -157,8 +157,8 @@ compatibility and system requirements.

 .. [#os-compatibility] Some operating systems are supported on limited GPUs. For detailed information, see the latest :ref:`supported_distributions`. For version specific information, see `ROCm 7.1.1 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.1/reference/system-requirements.html#supported-operating-systems>`__, `ROCm 7.1.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.0/reference/system-requirements.html#supported-operating-systems>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-operating-systems>`__.
 .. [#gpu-compatibility] Some GPUs have limited operating system support. For detailed information, see the latest :ref:`supported_GPUs`. For version specific information, see `ROCm 7.1.1 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.1/reference/system-requirements.html#supported-gpus>`__, `ROCm 7.1.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.0/reference/system-requirements.html#supported-gpus>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-gpus>`__.
-.. [#dgl_compat] DGL is only supported on ROCm 7.0.0, ROCm 6.4.3 and ROCm 6.4.0.
-.. [#llama-cpp_compat] llama.cpp is only supported on ROCm 7.0.0 and ROCm 6.4.x.
+.. [#dgl_compat] DGL is supported only on ROCm 7.0.0, ROCm 6.4.3 and ROCm 6.4.0.
+.. [#llama-cpp_compat] llama.cpp is supported only on ROCm 7.0.0 and ROCm 6.4.x.
 .. [#mi325x_KVM] For AMD Instinct MI325X KVM SR-IOV users, do not use AMD GPU Driver (amdgpu) 30.20.0.
 .. [#driver_patch] AMD GPU Driver (amdgpu) 30.10.1 is a quality release that resolves an issue identified in the 30.10 release. There are no other significant changes or feature additions in ROCm 7.0.1 from ROCm 7.0.0. AMD GPU Driver (amdgpu) 30.10.1 is compatible with ROCm 7.0.1 and ROCm 7.0.0.
 .. [#kfd_support] As of ROCm 6.4.0, forward and backward compatibility between the AMD GPU Driver (amdgpu) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The supported user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and AMD GPU Driver support matrix <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html>`_.
@@ -204,13 +204,13 @@ Expand for full historical view of:
   .. [#os-compatibility-past-60] Some operating systems are supported on limited GPUs. For detailed information, see the latest :ref:`supported_distributions`. For version specific information, see `ROCm 7.1.1 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.1/reference/system-requirements.html#supported-operating-systems>`__, `ROCm 7.1.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.0/reference/system-requirements.html#supported-operating-systems>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-operating-systems>`__.
   .. [#gpu-compatibility-past-60] Some GPUs have limited operating system support. For detailed information, see the latest :ref:`supported_GPUs`. For version specific information, see `ROCm 7.1.1 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.1/reference/system-requirements.html#supported-gpus>`__, `ROCm 7.1.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.1.0/reference/system-requirements.html#supported-gpus>`__, and `ROCm 6.4.0 <https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.0/reference/system-requirements.html#supported-gpus>`__.
   .. [#tf-mi350-past-60] TensorFlow 2.17.1 is not supported on AMD Instinct MI350 Series GPUs. Use TensorFlow 2.19.1 or 2.18.1 with MI350 Series GPUs instead.
-   .. [#verl_compat-past-60] verl is only supported on ROCm 7.0.0 and 6.2.0.
-   .. [#stanford-megatron-lm_compat-past-60] Stanford Megatron-LM is only supported on ROCm 6.3.0.
-   .. [#dgl_compat-past-60] DGL is only supported on ROCm 7.0.0, ROCm 6.4.3 and ROCm 6.4.0.
-   .. [#megablocks_compat-past-60] Megablocks is only supported on ROCm 6.3.0.
-   .. [#ray_compat-past-60] Ray is only supported on ROCm 7.0.0 and 6.4.1.
-   .. [#llama-cpp_compat-past-60] llama.cpp is only supported on ROCm 7.0.0 and 6.4.x.
-   .. [#flashinfer_compat-past-60] FlashInfer is only supported on ROCm 6.4.1.
+   .. [#verl_compat-past-60] verl is supported only on ROCm 7.0.0 and 6.2.0.
+   .. [#stanford-megatron-lm_compat-past-60] Stanford Megatron-LM is supported only on ROCm 6.3.0.
+   .. [#dgl_compat-past-60] DGL is supported only on ROCm 7.0.0, ROCm 6.4.3 and ROCm 6.4.0.
+   .. [#megablocks_compat-past-60] Megablocks is supported only on ROCm 6.3.0.
+   .. [#ray_compat-past-60] Ray is supported only on ROCm 6.4.1.
+   .. [#llama-cpp_compat-past-60] llama.cpp is supported only on ROCm 7.0.0 and 6.4.x.
+   .. [#flashinfer_compat-past-60] FlashInfer is supported only on ROCm 6.4.1.
   .. [#mi325x_KVM-past-60] For AMD Instinct MI325X KVM SR-IOV users, do not use AMD GPU Driver (amdgpu) 30.20.0.
   .. [#driver_patch-past-60] AMD GPU Driver (amdgpu) 30.10.1 is a quality release that resolves an issue identified in the 30.10 release. There are no other significant changes or feature additions in ROCm 7.0.1 from ROCm 7.0.0. AMD GPU Driver (amdgpu) 30.10.1 is compatible with ROCm 7.0.1 and ROCm 7.0.0.
   .. [#kfd_support-past-60] As of ROCm 6.4.0, forward and backward compatibility between the AMD GPU Driver (amdgpu) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The supported user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and AMD GPU Driver support matrix <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html>`_.
--- a/docs/compatibility/ml-compatibility/dgl-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/dgl-compatibility.rst
@@ -36,9 +36,63 @@ Support overview
  - You can also consult the upstream `Installation guide <https://www.dgl.ai/pages/start.html>`__ 
    for additional context.

+Version support
+--------------------------------------------------------------------------------
+
+DGL is supported on `ROCm 7.0.0 <https://repo.radeon.com/rocm/apt/7.0/>`__, 
+`ROCm 6.4.3 <https://repo.radeon.com/rocm/apt/6.4.3/>`__, and `ROCm 6.4.0 <https://repo.radeon.com/rocm/apt/6.4/>`__.
+
+Supported devices
+--------------------------------------------------------------------------------
+
+**Officially Supported**: AMD Instinct™ MI300X, MI250X
+
+.. _dgl-recommendations:
+
+Use cases and recommendations
+================================================================================
+
+DGL can be used for Graph Learning, and building popular graph models like  
+GAT, GCN, and GraphSage. Using these models, a variety of use cases are supported:
+
+- Recommender systems
+- Network Optimization and Analysis
+- 1D (Temporal) and 2D (Image) Classification
+- Drug Discovery
+
+For use cases and recommendations, refer to the `AMD ROCm blog <https://rocm.blogs.amd.com/>`__, 
+where you can search for DGL examples and best practices to optimize your workloads on AMD GPUs.
+
+* Although multiple use cases of DGL have been tested and verified, a few have been  
+  outlined in the `DGL in the Real World: Running GNNs on Real Use Cases 
+  <https://rocm.blogs.amd.com/artificial-intelligence/dgl_blog2/README.html>`__ blog 
+  post, which walks through four real-world graph neural network (GNN) workloads 
+  implemented with the Deep Graph Library on ROCm. It covers tasks ranging from 
+  heterogeneous e-commerce graphs and multiplex networks (GATNE) to molecular graph 
+  regression (GNN-FiLM) and EEG-based neurological diagnosis (EEG-GCNN). For each use 
+  case, the authors detail: the dataset and task, how DGL is used, and their experience 
+  porting to ROCm. It is shown that DGL codebases often run without modification, with 
+  seamless integration of graph operations, message passing, sampling, and convolution. 
+
+* The `Graph Neural Networks (GNNs) at Scale: DGL with ROCm on AMD Hardware 
+  <https://rocm.blogs.amd.com/artificial-intelligence/why-graph-neural/README.html>`__ 
+  blog post introduces the Deep Graph Library (DGL) and its enablement on the AMD ROCm platform, 
+  bringing high-performance graph neural network (GNN) training to AMD GPUs. DGL bridges 
+  the gap between dense tensor frameworks and the irregular nature of graph data through a 
+  graph-first, message-passing abstraction. Its design ensures scalability, flexibility, and 
+  interoperability across frameworks like PyTorch and TensorFlow. AMD’s ROCm integration 
+  enables DGL to run efficiently on HIP-based GPUs, supported by prebuilt Docker containers 
+  and open-source repositories. This marks a major step in AMD's mission to advance open, 
+  scalable AI ecosystems beyond traditional architectures.
+
+You can pre-process datasets and begin training on AMD GPUs through:
+
+* Single-GPU training/inference
+* Multi-GPU training
+
 .. _dgl-docker-compat:

-Compatibility matrix
+Docker image compatibility
 ================================================================================

 .. |docker-icon| raw:: html
@@ -60,7 +114,6 @@ Click the |docker-icon| to view the image on Docker Hub.
      - PyTorch
      - Ubuntu
      - Python
-      - GPU

    * - .. raw:: html

@@ -71,7 +124,6 @@ Click the |docker-icon| to view the image on Docker Hub.
      - `2.8.0 <https://github.com/pytorch/pytorch/releases/tag/v2.8.0>`__
      - 24.04
      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`__
-      - MI300X, MI250X

    * - .. raw:: html

@@ -82,7 +134,6 @@ Click the |docker-icon| to view the image on Docker Hub.
      - `2.6.0 <https://github.com/pytorch/pytorch/releases/tag/v2.6.0>`__
      - 24.04
      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`__
-      - MI300X, MI250X

    * - .. raw:: html

@@ -93,7 +144,6 @@ Click the |docker-icon| to view the image on Docker Hub.
      - `2.7.1 <https://github.com/pytorch/pytorch/releases/tag/v2.7.1>`__
      - 22.04
      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`__
-      - MI300X, MI250X

    * - .. raw:: html

@@ -104,7 +154,6 @@ Click the |docker-icon| to view the image on Docker Hub.
      - `2.6.0 <https://github.com/pytorch/pytorch/releases/tag/v2.6.0>`__
      - 24.04
      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`__
-      - MI300X, MI250X

    * - .. raw:: html

@@ -115,7 +164,6 @@ Click the |docker-icon| to view the image on Docker Hub.
      - `2.6.0 <https://github.com/pytorch/pytorch/releases/tag/v2.6.0>`__
      - 24.04
      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`__
-      - MI300X, MI250X

    * - .. raw:: html

@@ -126,7 +174,7 @@ Click the |docker-icon| to view the image on Docker Hub.
      - `2.4.1 <https://github.com/pytorch/pytorch/releases/tag/v2.4.1>`__
      - 24.04
      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`__
-      - MI300X, MI250X
+

    * - .. raw:: html

@@ -137,7 +185,7 @@ Click the |docker-icon| to view the image on Docker Hub.
      - `2.4.1 <https://github.com/pytorch/pytorch/releases/tag/v2.4.1>`__
      - 22.04
      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`__
-      - MI300X, MI250X
+

    * - .. raw:: html

@@ -148,10 +196,7 @@ Click the |docker-icon| to view the image on Docker Hub.
      - `2.3.0 <https://github.com/pytorch/pytorch/releases/tag/v2.3.0>`__
      - 22.04
      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`__
-      - MI300X, MI250X
-
-
-.. _dgl-key-rocm-libraries:
+      

 Key ROCm libraries for DGL
 ================================================================================
@@ -265,9 +310,8 @@ If you prefer to build it yourself, ensure the following dependencies are instal
        multiplication (GEMM) and accumulation operations with mixed precision
        support.

-.. _dgl-supported-features-latest:

-Supported features with ROCm 7.0.0
+Supported features
 ================================================================================

 Many functions and methods available upstream are also supported in DGL on ROCm.
@@ -291,17 +335,14 @@ Instead of listing them all, support is grouped into the following categories to
 * DGL Sparse
 * GraphBolt

-.. _dgl-unsupported-features-latest:
-
-Unsupported features with ROCm 7.0.0
+Unsupported features
 ================================================================================

 * TF32 Support (only supported for PyTorch 2.7 and above)
 * Kineto/ROCTracer integration

-.. _dgl-unsupported-functions:

-Unsupported functions with ROCm 7.0.0
+Unsupported functions
 ================================================================================

 * ``bfs``
@@ -314,50 +355,6 @@ Unsupported functions with ROCm 7.0.0
 * ``sample_labors_noprob``
 * ``sparse_admin``

-.. _dgl-recommendations:
-
-Use cases and recommendations
-================================================================================
-
-DGL can be used for Graph Learning, and building popular graph models like  
-GAT, GCN, and GraphSage. Using these models, a variety of use cases are supported:
-
- Recommender systems
- Network Optimization and Analysis
- 1D (Temporal) and 2D (Image) Classification
- Drug Discovery
-
-For use cases and recommendations, refer to the `AMD ROCm blog <https://rocm.blogs.amd.com/>`__, 
-where you can search for DGL examples and best practices to optimize your workloads on AMD GPUs.
-
-* Although multiple use cases of DGL have been tested and verified, a few have been  
-  outlined in the `DGL in the Real World: Running GNNs on Real Use Cases 
-  <https://rocm.blogs.amd.com/artificial-intelligence/dgl_blog2/README.html>`__ blog 
-  post, which walks through four real-world graph neural network (GNN) workloads 
-  implemented with the Deep Graph Library on ROCm. It covers tasks ranging from 
-  heterogeneous e-commerce graphs and multiplex networks (GATNE) to molecular graph 
-  regression (GNN-FiLM) and EEG-based neurological diagnosis (EEG-GCNN). For each use 
-  case, the authors detail: the dataset and task, how DGL is used, and their experience 
-  porting to ROCm. It is shown that DGL codebases often run without modification, with 
-  seamless integration of graph operations, message passing, sampling, and convolution. 
-
-* The `Graph Neural Networks (GNNs) at Scale: DGL with ROCm on AMD Hardware 
-  <https://rocm.blogs.amd.com/artificial-intelligence/why-graph-neural/README.html>`__ 
-  blog post introduces the Deep Graph Library (DGL) and its enablement on the AMD ROCm platform, 
-  bringing high-performance graph neural network (GNN) training to AMD GPUs. DGL bridges 
-  the gap between dense tensor frameworks and the irregular nature of graph data through a 
-  graph-first, message-passing abstraction. Its design ensures scalability, flexibility, and 
-  interoperability across frameworks like PyTorch and TensorFlow. AMD’s ROCm integration 
-  enables DGL to run efficiently on HIP-based GPUs, supported by prebuilt Docker containers 
-  and open-source repositories. This marks a major step in AMD's mission to advance open, 
-  scalable AI ecosystems beyond traditional architectures.
-
-You can pre-process datasets and begin training on AMD GPUs through:
-
-* Single-GPU training/inference
-* Multi-GPU training
-
-
 Previous versions
 ===============================================================================
 See :doc:`rocm-install-on-linux:install/3rd-party/previous-versions/dgl-history` to find documentation for previous releases
--- a/docs/compatibility/ml-compatibility/flashinfer-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/flashinfer-compatibility.rst
@@ -42,9 +42,38 @@ Support overview
  - You can also consult the upstream `Installation guide <https://docs.flashinfer.ai/installation.html>`__ 
    for additional context.

+Version support
+--------------------------------------------------------------------------------
+
+FlashInfer is supported on `ROCm 6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`__.
+
+Supported devices
+--------------------------------------------------------------------------------
+
+**Officially Supported**: AMD Instinct™ MI300X
+
+
+.. _flashinfer-recommendations:
+
+Use cases and recommendations
+================================================================================
+
+This release of FlashInfer on ROCm provides the decode functionality for LLM inferencing.
+In the decode phase, tokens are generated sequentially, with the model predicting each new 
+token based on the previously generated tokens and the input context.
+
+FlashInfer on ROCm brings over upstream features such as load balancing, sparse and dense 
+attention optimizations, and batching support, enabling efficient execution on AMD Instinct™ MI300X GPUs.
+
+Because large LLMs often require substantial KV caches or long context windows, FlashInfer on ROCm 
+also implements cascade attention from upstream to reduce memory usage. 
+
+For currently supported use cases and recommendations, refer to the `AMD ROCm blog <https://rocm.blogs.amd.com/>`__, 
+where you can search for examples and best practices to optimize your workloads on AMD GPUs.
+
 .. _flashinfer-docker-compat:

-Compatibility matrix
+Docker image compatibility
 ================================================================================

 .. |docker-icon| raw:: html
@@ -66,7 +95,6 @@ Click |docker-icon| to view the image on Docker Hub.
      - PyTorch
      - Ubuntu
      - Python
-      - GPU

    * - .. raw:: html

@@ -76,23 +104,5 @@ Click |docker-icon| to view the image on Docker Hub.
      - `2.7.1 <https://github.com/ROCm/pytorch/releases/tag/v2.7.1>`__
      - 24.04
      - `3.12 <https://www.python.org/downloads/release/python-3129/>`__
-      - MI300X

-.. _flashinfer-recommendations:
-
-Use cases and recommendations
-================================================================================
-
-The release of FlashInfer on ROCm provides the decode functionality for LLM inferencing.
-In the decode phase, tokens are generated sequentially, with the model predicting each new 
-token based on the previously generated tokens and the input context.
-
-FlashInfer on ROCm brings over upstream features such as load balancing, sparse and dense 
-attention optimizations, and batching support, enabling efficient execution on AMD Instinct™ MI300X GPUs.
-
-Because large LLMs often require substantial KV caches or long context windows, FlashInfer on ROCm 
-also implements cascade attention from upstream to reduce memory usage. 
-
-For currently supported use cases and recommendations, refer to the `AMD ROCm blog <https://rocm.blogs.amd.com/>`__, 
-where you can search for examples and best practices to optimize your workloads on AMD GPUs.

--- a/docs/compatibility/ml-compatibility/jax-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/jax-compatibility.rst
@@ -269,33 +269,6 @@ For a complete and up-to-date list of JAX public modules (for example, ``jax.num
  JAX API modules are maintained by the JAX project and is subject to change.
  Refer to the official Jax documentation for the most up-to-date information.

-Key features and enhancements for ROCm 7.1
-===============================================================================
-
- Enabled compilation of multihost HLO runner Python bindings.
-
-  - Backported multihost HLO runner bindings and some related changes to
-    :code:`FunctionalHloRunner`.
-
-  - Added :code:`requirements_lock_3_12` to enable building for Python 3.12.
-
- Removed hardcoded NHWC convolution layout for ``fp16`` precision to address the performance drops for ``fp16`` precision on gfx12xx GPUs.
-
-
- ROCprofiler-SDK integration:
-
-  - Integrated ROCprofiler-SDK (v3) to XLA to improve profiling of GPU events,
-    support both time-based and step-based profiling.
-
-  - Added unit tests for :code:`rocm_collector` and :code:`rocm_tracer`.
-
- Added Triton unsupported conversion from ``f8E4M3FNUZ`` to ``fp16`` with
-  rounding mode.
-
- Introduced :code:`CudnnFusedConvDecomposer` to revert fused convolutions
-  when :code:`ConvAlgorithmPicker` fails to find a fused algorithm, and removed
-  unfused fallback paths from :code:`RocmFusedConvRunner`.
-
 Key features and enhancements for ROCm 7.0
 ===============================================================================

--- a/docs/compatibility/ml-compatibility/llama-cpp-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/llama-cpp-compatibility.rst
@@ -36,9 +36,47 @@ Support overview
  - You can also consult the upstream `Installation guide <https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md>`__ 
    for additional context.

+Version support
+--------------------------------------------------------------------------------
+
+llama.cpp is supported on `ROCm 7.0.0 <https://repo.radeon.com/rocm/apt/7.0/>`__ and 
+`ROCm 6.4.x <https://repo.radeon.com/rocm/apt/6.4/>`__.
+
+Supported devices
+--------------------------------------------------------------------------------
+
+**Officially Supported**: AMD Instinct™ MI325X, MI300X, MI210
+
+Use cases and recommendations
+================================================================================
+
+llama.cpp can be applied in a variety of scenarios, particularly when you need to meet one or more of the following requirements:
+
+- Plain C/C++ implementation with no external dependencies
+- Support for 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory usage
+- Custom HIP (Heterogeneous-compute Interface for Portability) kernels for running large language models (LLMs) on AMD GPUs (graphics processing units)
+- CPU (central processing unit) + GPU (graphics processing unit) hybrid inference for partially accelerating models larger than the total available VRAM (video random-access memory)
+
+llama.cpp is also used in a range of real-world applications, including:
+
+- Games such as `Lucy's Labyrinth <https://github.com/MorganRO8/Lucys_Labyrinth>`__:
+  A simple maze game where AI-controlled agents attempt to trick the player.
+- Tools such as `Styled Lines <https://marketplace.unity.com/packages/tools/ai-ml-integration/style-text-webgl-ios-stand-alone-llm-llama-cpp-wrapper-292902>`__:
+  A proprietary, asynchronous inference wrapper for Unity3D game development, including pre-built mobile and web platform wrappers and a model example.
+- Various other AI applications use llama.cpp as their inference engine;  
+  for a detailed list, see the `user interfaces (UIs) section <https://github.com/ggml-org/llama.cpp?tab=readme-ov-file#description>`__.
+
+For more use cases and recommendations, refer to the `AMD ROCm blog <https://rocm.blogs.amd.com/>`__, 
+where you can search for llama.cpp examples and best practices to optimize your workloads on AMD GPUs.
+
+- The `Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration <https://rocm.blogs.amd.com/ecosystems-and-partners/llama-cpp/README.html>`__ 
+  blog post outlines how the open-source llama.cpp framework enables efficient LLM inference—including interactive inference with ``llama-cli``, 
+  server deployment with ``llama-server``, GGUF model preparation and quantization, performance benchmarking, and optimizations tailored for 
+  AMD Instinct GPUs within the ROCm ecosystem. 
+
 .. _llama-cpp-docker-compat:

-Compatibility matrix
+Docker image compatibility
 ================================================================================

 .. |docker-icon| raw:: html
@@ -68,7 +106,6 @@ Click |docker-icon| to view the image on Docker Hub.
      - llama.cpp
      - ROCm
      - Ubuntu
-      - GPU

    * - .. raw:: html

@@ -82,7 +119,6 @@ Click |docker-icon| to view the image on Docker Hub.
      - `b6652 <https://github.com/ROCm/llama.cpp/tree/release/b6652>`__
      - `7.0.0 <https://repo.radeon.com/rocm/apt/7.0/>`__
      - 24.04
-      - MI325X, MI300X, MI210

    * - .. raw:: html

@@ -96,7 +132,6 @@ Click |docker-icon| to view the image on Docker Hub.
      - `b6652 <https://github.com/ROCm/llama.cpp/tree/release/b6652>`__
      - `7.0.0 <https://repo.radeon.com/rocm/apt/7.0/>`__
      - 22.04
-      - MI325X, MI300X, MI210

    * - .. raw:: html

@@ -110,7 +145,6 @@ Click |docker-icon| to view the image on Docker Hub.
      - `b6356 <https://github.com/ROCm/llama.cpp/tree/release/b6356>`__
      - `6.4.3 <https://repo.radeon.com/rocm/apt/6.4.3/>`__
      - 24.04
-      - MI325X, MI300X, MI210

    * - .. raw:: html

@@ -124,7 +158,7 @@ Click |docker-icon| to view the image on Docker Hub.
      - `b6356 <https://github.com/ROCm/llama.cpp/tree/release/b6356>`__
      - `6.4.3 <https://repo.radeon.com/rocm/apt/6.4.3/>`__
      - 22.04
-      - MI325X, MI300X, MI210
+

    * - .. raw:: html

@@ -138,7 +172,6 @@ Click |docker-icon| to view the image on Docker Hub.
      - `b6356 <https://github.com/ROCm/llama.cpp/tree/release/b6356>`__
      - `6.4.2 <https://repo.radeon.com/rocm/apt/6.4.2/>`__
      - 24.04
-      - MI325X, MI300X, MI210

    * - .. raw:: html

@@ -152,7 +185,7 @@ Click |docker-icon| to view the image on Docker Hub.
      - `b6356 <https://github.com/ROCm/llama.cpp/tree/release/b6356>`__
      - `6.4.2 <https://repo.radeon.com/rocm/apt/6.4.2/>`__
      - 22.04
-      - MI325X, MI300X, MI210
+

    * - .. raw:: html

@@ -166,7 +199,6 @@ Click |docker-icon| to view the image on Docker Hub.
      - `b6356 <https://github.com/ROCm/llama.cpp/tree/release/b6356>`__
      - `6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`__
      - 24.04
-      - MI325X, MI300X, MI210

    * - .. raw:: html

@@ -180,7 +212,6 @@ Click |docker-icon| to view the image on Docker Hub.
      - `b6356 <https://github.com/ROCm/llama.cpp/tree/release/b6356>`__
      - `6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`__
      - 22.04
-      - MI325X, MI300X, MI210

    * - .. raw:: html

@@ -194,9 +225,7 @@ Click |docker-icon| to view the image on Docker Hub.
      - `b5997 <https://github.com/ROCm/llama.cpp/tree/release/b5997>`__
      - `6.4.0 <https://repo.radeon.com/rocm/apt/6.4/>`__
      - 24.04
-      - MI300X, MI210

-.. _llama-cpp-key-rocm-libraries:

 Key ROCm libraries for llama.cpp
 ================================================================================
@@ -239,36 +268,6 @@ your corresponding ROCm version.
      - Can be used to enhance the flash attention performance on AMD compute, by enabling
        the flag during compile time.

-.. _llama-cpp-uses-recommendations:
-
-Use cases and recommendations
-================================================================================
-
-llama.cpp can be applied in a variety of scenarios, particularly when you need to meet one or more of the following requirements:
-
- Plain C/C++ implementation with no external dependencies
- Support for 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory usage
- Custom HIP (Heterogeneous-compute Interface for Portability) kernels for running large language models (LLMs) on AMD GPUs (graphics processing units)
- CPU (central processing unit) + GPU (graphics processing unit) hybrid inference for partially accelerating models larger than the total available VRAM (video random-access memory)
-
-llama.cpp is also used in a range of real-world applications, including:
-
- Games such as `Lucy's Labyrinth <https://github.com/MorganRO8/Lucys_Labyrinth>`__:
-  A simple maze game where AI-controlled agents attempt to trick the player.
- Tools such as `Styled Lines <https://marketplace.unity.com/packages/tools/ai-ml-integration/style-text-webgl-ios-stand-alone-llm-llama-cpp-wrapper-292902>`__:
-  A proprietary, asynchronous inference wrapper for Unity3D game development, including pre-built mobile and web platform wrappers and a model example.
- Various other AI applications use llama.cpp as their inference engine;  
-  for a detailed list, see the `user interfaces (UIs) section <https://github.com/ggml-org/llama.cpp?tab=readme-ov-file#description>`__.
-
-For more use cases and recommendations, refer to the `AMD ROCm blog <https://rocm.blogs.amd.com/>`__, 
-where you can search for llama.cpp examples and best practices to optimize your workloads on AMD GPUs.
-
- The `Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration <https://rocm.blogs.amd.com/ecosystems-and-partners/llama-cpp/README.html>`__ 
-  blog post outlines how the open-source llama.cpp framework enables efficient LLM inference—including interactive inference with ``llama-cli``, 
-  server deployment with ``llama-server``, GGUF model preparation and quantization, performance benchmarking, and optimizations tailored for 
-  AMD Instinct GPUs within the ROCm ecosystem. 
-
-
 Previous versions
 ===============================================================================
 See :doc:`rocm-install-on-linux:install/3rd-party/previous-versions/llama-cpp-history` to find documentation for previous releases
--- a/docs/compatibility/ml-compatibility/megablocks-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/megablocks-compatibility.rst
@@ -33,44 +33,19 @@ Support overview
  - You can also consult the upstream `Installation guide <https://github.com/databricks/megablocks>`__ 
    for additional context.

-.. _megablocks-docker-compat:
+Version support
+--------------------------------------------------------------------------------

-Compatibility matrix
-================================================================================
+Megablocks is supported on `ROCm 6.3.0 <https://repo.radeon.com/rocm/apt/6.3/>`__.

-.. |docker-icon| raw:: html
+Supported devices
+--------------------------------------------------------------------------------

-   <i class="fab fa-docker"></i>
+- **Officially Supported**: AMD Instinct™ MI300X
+- **Partially Supported** (functionality or performance limitations): AMD Instinct™ MI250X, MI210

-AMD validates and publishes `Megablocks images <https://hub.docker.com/r/rocm/megablocks/tags>`__
-with ROCm backends on Docker Hub. The following Docker image tag and associated
-inventories represent the latest available Megablocks version from the official Docker Hub. 
-Click |docker-icon| to view the image on Docker Hub.
-
-.. list-table:: 
-    :header-rows: 1
-    :class: docker-image-compatibility
-
-    * - Docker image
-      - ROCm
-      - Megablocks
-      - PyTorch
-      - Ubuntu
-      - Python
-      - GPU
-
-    * - .. raw:: html
-
-           <a href="https://hub.docker.com/layers/rocm/megablocks/megablocks-0.7.0_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0/images/sha256-372ff89b96599019b8f5f9db469c84add2529b713456781fa62eb9a148659ab4"><i class="fab fa-docker fa-lg"></i> rocm/megablocks</a>
-      - `6.3.0 <https://repo.radeon.com/rocm/apt/6.3/>`_
-      - `0.7.0 <https://github.com/databricks/megablocks/releases/tag/v0.7.0>`_
-      - `2.4.0 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
-      - 24.04
-      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
-      - MI300X
-
-Supported models and features with ROCm 6.3.0
-================================================================================
+Supported models and features
+--------------------------------------------------------------------------------

 This section summarizes the Megablocks features supported by ROCm.

@@ -102,3 +77,38 @@ It features how to pre-process datasets and how to begin pre-training on AMD GPU
 * Single-GPU pre-training
 * Multi-GPU pre-training

+.. _megablocks-docker-compat:
+
+Docker image compatibility
+================================================================================
+
+.. |docker-icon| raw:: html
+
+   <i class="fab fa-docker"></i>
+
+AMD validates and publishes `Megablocks images <https://hub.docker.com/r/rocm/megablocks/tags>`__
+with ROCm backends on Docker Hub. The following Docker image tag and associated
+inventories represent the latest available Megablocks version from the official Docker Hub. 
+Click |docker-icon| to view the image on Docker Hub.
+
+.. list-table:: 
+    :header-rows: 1
+    :class: docker-image-compatibility
+
+    * - Docker image
+      - ROCm
+      - Megablocks
+      - PyTorch
+      - Ubuntu
+      - Python
+
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/megablocks/megablocks-0.7.0_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0/images/sha256-372ff89b96599019b8f5f9db469c84add2529b713456781fa62eb9a148659ab4"><i class="fab fa-docker fa-lg"></i> rocm/megablocks</a>
+      - `6.3.0 <https://repo.radeon.com/rocm/apt/6.3/>`_
+      - `0.7.0 <https://github.com/databricks/megablocks/releases/tag/v0.7.0>`_
+      - `2.4.0 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
+      - 24.04
+      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
+
+
--- a/docs/compatibility/ml-compatibility/ray-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/ray-compatibility.rst
@@ -12,8 +12,8 @@ Ray compatibility

 Ray is a unified framework for scaling AI and Python applications from your laptop 
 to a full cluster, without changing your code. Ray consists of `a core distributed 
-runtime  <https://docs.ray.io/en/latest/ray-core/walkthrough.html>`__ and a set of 
-`AI libraries <https://docs.ray.io/en/latest/ray-air/getting-started.html>`__ for 
+runtime  <https://docs.ray.io/en/latest/ray-core/walkthrough.html>`_ and a set of 
+`AI libraries <https://docs.ray.io/en/latest/ray-air/getting-started.html>`_ for 
 simplifying machine learning computations.

 Ray is a general-purpose framework that runs many types of workloads efficiently. 
@@ -29,57 +29,25 @@ Support overview
 - To get started and install Ray on ROCm, use the prebuilt :ref:`Docker image <ray-docker-compat>`, 
  which includes ROCm, Ray, and all required dependencies.

-  - See the :doc:`ROCm Ray installation guide <rocm-install-on-linux:install/3rd-party/ray-install>`
+  - The Docker image provided is based on the upstream Ray `Daily Release (Nightly) wheels 
+    <https://docs.ray.io/en/latest/ray-overview/installation.html#daily-releases-nightlies>`__ 
+    corresponding to commit `005c372 <https://github.com/ray-project/ray/commit/005c372262e050d5745f475e22e64305fa07f8b8>`__.
+
+  - See the :doc:`ROCm Ray installation guide <rocm-install-on-linux:install/3rd-party/ray-install>` 
    for installation and setup instructions.

  - You can also consult the upstream `Installation guide <https://docs.ray.io/en/latest/ray-overview/installation.html>`__ 
    for additional context.

-.. _ray-docker-compat:
+Version support
+--------------------------------------------------------------------------------

-Compatibility matrix
-================================================================================
+Ray is supported on `ROCm 6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`__.

-.. |docker-icon| raw:: html
+Supported devices
+--------------------------------------------------------------------------------

-   <i class="fab fa-docker"></i>
-
-AMD validates and publishes `ROCm Ray Docker images <https://hub.docker.com/r/rocm/ray/tags>`__
-with ROCm backends on Docker Hub. The following Docker image tags and
-associated inventories represent the latest Ray version from the official Docker Hub.
-Click |docker-icon| to view the image on Docker Hub.
-
-.. list-table::
-    :header-rows: 1
-    :class: docker-image-compatibility
-
-    * - Docker image
-      - ROCm
-      - Ray
-      - Pytorch
-      - Ubuntu
-      - Python
-      - GPU
-
-    * - .. raw:: html
-
-           <a href="https://hub.docker.com/layers/rocm/ray/ray-2.51.1_rocm7.0.0_ubuntu22.04_py3.12_pytorch2.9.0/images/sha256-a02f6766b4ba406f88fd7e85707ec86c04b569834d869a08043ec9bcbd672168"><i class="fab fa-docker fa-lg"></i> rocm/ray</a>
-      - `7.0.0 <https://repo.radeon.com/rocm/apt/7.0/>`__
-      - `2.51.1 <https://github.com/ROCm/ray/tree/release/2.51.1>`__
-      - 2.9.0a0+git1c57644
-      - 22.04
-      - `3.12.12 <https://www.python.org/downloads/release/python-31212/>`__
-      - MI300X
-
-    * - .. raw:: html
-
-           <a href="https://hub.docker.com/layers/rocm/ray/ray-2.48.0.post0_rocm6.4.1_ubuntu24.04_py3.12_pytorch2.6.0/images/sha256-0d166fe6bdced38338c78eedfb96eff92655fb797da3478a62dd636365133cc0"><i class="fab fa-docker fa-lg"></i> rocm/ray</a>
-      - `6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`__
-      - `2.48.0.post0 <https://github.com/ROCm/ray/tree/release/2.48.0.post0>`__
-      - 2.6.0+git684f6f2
-      - 24.04
-      - `3.12.10 <https://www.python.org/downloads/release/python-31210/>`__
-      - MI300X, MI210
+**Officially Supported**: AMD Instinct™ MI300X, MI210

 Use cases and recommendations
 ================================================================================
@@ -108,7 +76,36 @@ topic <https://docs.ray.io/en/latest/ray-core/scheduling/accelerators.html#accel
 of the Ray core documentation and refer to the `AMD ROCm blog <https://rocm.blogs.amd.com/>`__, 
 where you can search for Ray examples and best practices to optimize your workloads on AMD GPUs.

-Previous versions
-===============================================================================
-See :doc:`rocm-install-on-linux:install/3rd-party/previous-versions/ray-history` to find documentation for previous releases
-of the ``ROCm/ray`` Docker image.
+.. _ray-docker-compat:
+
+Docker image compatibility
+================================================================================
+
+.. |docker-icon| raw:: html
+
+   <i class="fab fa-docker"></i>
+
+AMD validates and publishes ready-made `ROCm Ray Docker images <https://hub.docker.com/r/rocm/ray/tags>`__
+with ROCm backends on Docker Hub. The following Docker image tags and
+associated inventories represent the latest Ray version from the official Docker Hub.
+Click the |docker-icon| icon to view the image on Docker Hub.
+
+.. list-table::
+    :header-rows: 1
+    :class: docker-image-compatibility
+
+    * - Docker image
+      - ROCm
+      - Ray
+      - Pytorch
+      - Ubuntu
+      - Python
+
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/ray/ray-2.48.0.post0_rocm6.4.1_ubuntu24.04_py3.12_pytorch2.6.0/images/sha256-0d166fe6bdced38338c78eedfb96eff92655fb797da3478a62dd636365133cc0"><i class="fab fa-docker fa-lg"></i> rocm/ray</a>
+      - `6.4.1 <https://repo.radeon.com/rocm/apt/6.4.1/>`__.
+      - `2.48.0.post0 <https://github.com/ROCm/ray/tree/release/2.48.0.post0>`_
+      - 2.6.0+git684f6f2
+      - 24.04
+      - `3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
--- a/docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst
@@ -35,45 +35,19 @@ Support overview
  - You can also consult the upstream `Installation guide <https://github.com/NVIDIA/Megatron-LM>`__ 
    for additional context.

-.. _megatron-lm-docker-compat:
+Version support
+--------------------------------------------------------------------------------

-Compatibility matrix
-================================================================================
+Stanford Megatron-LM is supported on `ROCm 6.3.0 <https://repo.radeon.com/rocm/apt/6.3/>`__.

-.. |docker-icon| raw:: html
+Supported devices
+--------------------------------------------------------------------------------

-   <i class="fab fa-docker"></i>
+- **Officially Supported**: AMD Instinct™ MI300X
+- **Partially Supported** (functionality or performance limitations): AMD Instinct™ MI250X, MI210

-AMD validates and publishes `Stanford Megatron-LM images <https://hub.docker.com/r/rocm/stanford-megatron-lm/tags>`_
-with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated
-inventories represent the latest Stanford Megatron-LM version from the official Docker Hub.
-Click |docker-icon| to view the image on Docker Hub.
-
-.. list-table:: 
-    :header-rows: 1
-    :class: docker-image-compatibility
-
-    * - Docker image
-      - ROCm
-      - Stanford Megatron-LM
-      - PyTorch
-      - Ubuntu
-      - Python
-      - GPU
-
-    * - .. raw:: html
-
-           <a href="https://hub.docker.com/layers/rocm/stanford-megatron-lm/stanford-megatron-lm85f95ae_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0/images/sha256-070556f078be10888a1421a2cb4f48c29f28b02bfeddae02588d1f7fc02a96a6"><i class="fab fa-docker fa-lg"></i> rocm/stanford-megatron-lm</a>
-
-      - `6.3.0 <https://repo.radeon.com/rocm/apt/6.3/>`_
-      - `85f95ae <https://github.com/stanford-futuredata/Megatron-LM/commit/85f95aef3b648075fe6f291c86714fdcbd9cd1f5>`_
-      - `2.4.0 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
-      - 24.04
-      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
-      - MI300X
-
-Supported models and features with ROCm 6.3.0
-================================================================================
+Supported models and features
+--------------------------------------------------------------------------------

 This section details models & features that are supported by the ROCm version on Stanford Megatron-LM.

@@ -114,3 +88,41 @@ It features how to pre-process datasets and how to begin pre-training on AMD GPU

 * Single-GPU pre-training
 * Multi-GPU pre-training
+
+.. _megatron-lm-docker-compat:
+
+Docker image compatibility
+================================================================================
+
+.. |docker-icon| raw:: html
+
+   <i class="fab fa-docker"></i>
+
+AMD validates and publishes `Stanford Megatron-LM images <https://hub.docker.com/r/rocm/stanford-megatron-lm/tags>`_
+with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated
+inventories represent the latest Stanford Megatron-LM version from the official Docker Hub.
+Click |docker-icon| to view the image on Docker Hub.
+
+.. list-table:: 
+    :header-rows: 1
+    :class: docker-image-compatibility
+
+    * - Docker image
+      - ROCm
+      - Stanford Megatron-LM
+      - PyTorch
+      - Ubuntu
+      - Python
+
+    * - .. raw:: html
+
+           <a href="https://hub.docker.com/layers/rocm/stanford-megatron-lm/stanford-megatron-lm85f95ae_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0/images/sha256-070556f078be10888a1421a2cb4f48c29f28b02bfeddae02588d1f7fc02a96a6"><i class="fab fa-docker fa-lg"></i></a>
+
+      - `6.3.0 <https://repo.radeon.com/rocm/apt/6.3/>`_
+      - `85f95ae <https://github.com/stanford-futuredata/Megatron-LM/commit/85f95aef3b648075fe6f291c86714fdcbd9cd1f5>`_
+      - `2.4.0 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
+      - 24.04
+      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
+
+      
+
--- a/docs/compatibility/ml-compatibility/verl-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/verl-compatibility.rst
@@ -37,9 +37,67 @@ Support overview
  - You can also consult the upstream `verl documentation <https://verl.readthedocs.io/en/latest/>`__ 
    for additional context.

+Version support
+--------------------------------------------------------------------------------
+
+verl is supported on `ROCm 7.0.0 <https://repo.radeon.com/rocm/apt/7.0/>`__ and
+`ROCm 6.2.0 <https://repo.radeon.com/rocm/apt/6.2/>`__.
+
+Supported devices
+--------------------------------------------------------------------------------
+
+**Officially Supported**: AMD Instinct™ MI300X
+
+.. _verl-recommendations:
+
+Use cases and recommendations
+================================================================================
+
+* The benefits of verl in large-scale reinforcement learning from human feedback 
+  (RLHF) are discussed in the `Reinforcement Learning from Human Feedback on AMD 
+  GPUs with verl and ROCm Integration <https://rocm.blogs.amd.com/artificial-intelligence/verl-large-scale/README.html>`__ 
+  blog. The blog post outlines how the Volcano Engine Reinforcement Learning 
+  (verl) framework integrates with the AMD ROCm platform to optimize training on 
+  AMD Instinct™ GPUs. The guide details the process of building a Docker image, 
+  setting up single-node and multi-node training environments, and highlights 
+  performance benchmarks demonstrating improved throughput and convergence accuracy. 
+  This resource serves as a comprehensive starting point for deploying verl on AMD GPUs, 
+  facilitating efficient RLHF training workflows.
+
+.. _verl-supported_features:
+
+Supported features
+===============================================================================
+
+The following table shows verl on ROCm support for GPU-accelerated modules.
+
+.. list-table::
+    :header-rows: 1
+
+    * - Module
+      - Description
+      - verl version
+      - ROCm version
+    * - ``FSDP``
+      - Training engine
+      - 
+       * 0.6.0
+       * 0.3.0.post0
+      - 
+       * 7.0.0
+       * 6.2.0
+    * - ``vllm``
+      - Inference engine
+      - 
+       * 0.6.0
+       * 0.3.0.post0
+      - 
+       * 7.0.0
+       * 6.2.0
+
 .. _verl-docker-compat:

-Compatibility matrix
+Docker image compatibility
 ================================================================================

 .. |docker-icon| raw:: html
@@ -62,7 +120,6 @@ Click |docker-icon| to view the image on Docker Hub.
     - PyTorch
     - Python
     - vllm
-     - GPU

   * - .. raw:: html

@@ -73,7 +130,6 @@ Click |docker-icon| to view the image on Docker Hub.
     - `2.9.0 <https://github.com/ROCm/pytorch/tree/release/2.9-rocm7.x-gfx115x>`__
     - `3.12.11 <https://www.python.org/downloads/release/python-31211/>`__
     - `0.11.0 <https://github.com/vllm-project/vllm/releases/tag/v0.11.0>`__
-     - MI300X

   * - .. raw:: html

@@ -84,33 +140,7 @@ Click |docker-icon| to view the image on Docker Hub.
     - `2.5.0 <https://github.com/ROCm/pytorch/tree/release/2.5>`__
     - `3.9.19 <https://www.python.org/downloads/release/python-3919/>`__
     - `0.6.3 <https://github.com/vllm-project/vllm/releases/tag/v0.6.3>`__
-     - MI300X

-.. _verl-supported_features:
-
-Supported modules with verl on ROCm
-===============================================================================
-
-The following GPU-accelerated modules are supported with verl on ROCm:
-
- ``FSDP``: Training engine
- ``vllm``: Inference engine
-
-.. _verl-recommendations:
-
-Use cases and recommendations
-================================================================================
-
-* The benefits of verl in large-scale reinforcement learning from human feedback 
-  (RLHF) are discussed in the `Reinforcement Learning from Human Feedback on AMD 
-  GPUs with verl and ROCm Integration <https://rocm.blogs.amd.com/artificial-intelligence/verl-large-scale/README.html>`__ 
-  blog. The blog post outlines how the Volcano Engine Reinforcement Learning 
-  (verl) framework integrates with the AMD ROCm platform to optimize training on 
-  AMD Instinct™ GPUs. The guide details the process of building a Docker image, 
-  setting up single-node and multi-node training environments, and highlights 
-  performance benchmarks demonstrating improved throughput and convergence accuracy. 
-  This resource serves as a comprehensive starting point for deploying verl on AMD GPUs, 
-  facilitating efficient RLHF training workflows.

 Previous versions
 ===============================================================================
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -268,3 +268,6 @@ html_context = {
    "granularity_type" : [('Coarse-grained', 'coarse-grained'), ('Fine-grained', 'fine-grained')],
    "scope_type" : [('Device', 'device'), ('System', 'system')]
 }
+
+# Disable figure and table numbering
+numfig = False
--- a/docs/how-to/rocm-for-ai/fine-tuning/multi-gpu-fine-tuning-and-inference.rst
+++ b/docs/how-to/rocm-for-ai/fine-tuning/multi-gpu-fine-tuning-and-inference.rst
@@ -44,7 +44,7 @@ Setting up the base implementation environment

   .. code-block:: shell

-      amd-smi static --board
+      rocm-smi --showproductname

 #. Check that your GPUs are available to PyTorch.

@@ -65,8 +65,8 @@ Setting up the base implementation environment

 .. tip::

-   During training and inference, you can check the memory usage by running the ``amd-smi`` command in your terminal.
-   This tool helps you see which GPUs are involved.
+   During training and inference, you can check the memory usage by running the ``rocm-smi`` command in your terminal.
+   This tool helps you see shows which GPUs are involved.


 .. _fine-tuning-llms-multi-gpu-hugging-face-accelerate:
@@ -91,10 +91,10 @@ Now, it's important to adjust how you load the model. Add the ``device_map`` par

   ...
   base_model_name = "meta-llama/Llama-2-7b-chat-hf"
-
+   
   # Load base model to GPU memory
   base_model = AutoModelForCausalLM.from_pretrained(
-           base_model_name,
+           base_model_name, 
           device_map = "auto",
           trust_remote_code = True)
   ...
@@ -130,7 +130,7 @@ After loading the model in this way, the model is fully ready to use the resourc
 torchtune for fine-tuning and inference
 =============================================

-`torchtune <https://pytorch.org/torchtune/main/>`_ is a PyTorch-native library for easy single and multi-GPU
+`torchtune <https://meta-pytorch.org/torchtune/main/>`_ is a PyTorch-native library for easy single and multi-GPU 
 model fine-tuning and inference with LLMs.

 #. Install torchtune using pip.
@@ -139,7 +139,7 @@ model fine-tuning and inference with LLMs.

      # Install torchtune with PyTorch release 2.2.2+
      pip install torchtune
-
+      
      # To confirm that the package is installed correctly
      tune --help

@@ -148,12 +148,12 @@ model fine-tuning and inference with LLMs.
   .. code-block:: shell

      usage: tune [-h] {download,ls,cp,run,validate} ...
-
+      
      Welcome to the TorchTune CLI!
-
+      
      options:
        -h, --help            show this help message and exit
-
+      
      subcommands:
        {download,ls,cp,run,validate}

@@ -194,11 +194,11 @@ model fine-tuning and inference with LLMs.
        apply_lora_to_output: False
        lora_rank: 8
        lora_alpha: 16
-
+      
      tokenizer:
        _component_: torchtune.models.llama2.llama2_tokenizer
        path: /tmp/Llama-2-7b-hf/tokenizer.model
-
+      
      # Dataset and sampler
      dataset:
        _component_: torchtune.datasets.alpaca_cleaned_dataset
--- a/docs/how-to/rocm-for-ai/fine-tuning/single-gpu-fine-tuning-and-inference.rst
+++ b/docs/how-to/rocm-for-ai/fine-tuning/single-gpu-fine-tuning-and-inference.rst
@@ -44,19 +44,20 @@ Setting up the base implementation environment

   .. code-block:: shell

-      amd-smi static --board
+      rocm-smi --showproductname

   Your output should look like this:

   .. code-block:: shell

-      GPU: 0
-         BOARD:
-            MODEL_NUMBER: 102-G39203-0B
-            PRODUCT_SERIAL: PCB079220-1150
-            FRU_ID: 113-AMDG392030B04-100-300000097H
-            PRODUCT_NAME: AMD Instinct MI325 OAM
-            MANUFACTURER_NAME: AMD
+      ============================ ROCm System Management Interface ============================
+      ====================================== Product Info ======================================
+      GPU[0]          : Card Series:          AMD Instinct MI300X OAM
+      GPU[0]          : Card model:           0x74a1
+      GPU[0]          : Card vendor:          Advanced Micro Devices, Inc. [AMD/ATI]
+      GPU[0]          : Card SKU:             MI3SRIOV
+      ==========================================================================================
+      ================================== End of ROCm SMI Log ===================================

 #. Check that your GPUs are available to PyTorch.

@@ -93,13 +94,13 @@ Setting up the base implementation environment
      pip install -r requirements-dev.txt
      cmake -DBNB_ROCM_ARCH="gfx942" -DCOMPUTE_BACKEND=hip -S .
      python setup.py install
-
+      
      # To leverage the SFTTrainer in TRL for model fine-tuning.
      pip install trl
-
+      
      # To leverage PEFT for efficiently adapting pre-trained language models .
      pip install peft
-
+      
      # Install the other dependencies.
      pip install transformers datasets huggingface-hub scipy

@@ -131,7 +132,7 @@ Download the base model and fine-tuning dataset

   .. note::

-      You can also use the `NousResearch Llama-2-7b-chat-hf <https://huggingface.co/NousResearch/Llama-2-7b-chat-hf>`_
+      You can also use the `NousResearch Llama-2-7b-chat-hf <https://huggingface.co/NousResearch/Llama-2-7b-chat-hf>`_ 
      as a substitute. It has the same model weights as the original.

 #. Run the following code to load the base model and tokenizer.
@@ -140,14 +141,14 @@ Download the base model and fine-tuning dataset

      # Base model and tokenizer names.
      base_model_name = "meta-llama/Llama-2-7b-chat-hf"
-
+      
      # Load base model to GPU memory.
      device = "cuda:0"
      base_model = AutoModelForCausalLM.from_pretrained(base_model_name, trust_remote_code = True).to(device)
-
+      
      # Load tokenizer.
      tokenizer = AutoTokenizer.from_pretrained(
-              base_model_name,
+              base_model_name, 
              trust_remote_code = True)
      tokenizer.pad_token = tokenizer.eos_token
      tokenizer.padding_side = "right"
@@ -161,10 +162,10 @@ Download the base model and fine-tuning dataset
      # Dataset for fine-tuning.
      training_dataset_name = "mlabonne/guanaco-llama2-1k"
      training_dataset = load_dataset(training_dataset_name, split = "train")
-
+      
      # Check the data.
      print(training_dataset)
-
+      
      # Dataset 11 is a QA sample in English.
      print(training_dataset[11])

@@ -251,8 +252,8 @@ Compare the number of trainable parameters and training time under the two diffe
                    dataset_text_field = "text",
                    tokenizer = tokenizer,
                    args = training_arguments
-            )
-
+            ) 
+            
            # Run the trainer.
            sft_trainer.train()

@@ -285,7 +286,7 @@ Compare the number of trainable parameters and training time under the two diffe
                    if param.requires_grad:
                        trainable_params += param.numel()
                print(f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param:.2f}")
-
+            
            sft_trainer.peft_config = None
            print_trainable_parameters(sft_trainer.model)

@@ -308,8 +309,8 @@ Compare the number of trainable parameters and training time under the two diffe
                    dataset_text_field = "text",
                    tokenizer = tokenizer,
                    args = training_arguments
-            )
-
+            ) 
+            
            # Training.
            trainer_full.train()

@@ -348,7 +349,7 @@ store, and load.

         # PEFT adapter name.
         adapter_name = "llama-2-7b-enhanced-adapter"
-
+         
         # Save PEFT adapter.
         sft_trainer.model.save_pretrained(adapter_name)

@@ -358,21 +359,21 @@ store, and load.

         # Access adapter directory.
         cd llama-2-7b-enhanced-adapter
-
+         
         # List all adapter files.
         README.md  adapter_config.json  adapter_model.safetensors

   .. tab-item:: Saving a fully fine-tuned model
      :sync: without

-      If you're not using LoRA and PEFT so there is no PEFT LoRA configuration used for training, use the following code
+      If you're not using LoRA and PEFT so there is no PEFT LoRA configuration used for training, use the following code 
      to save your fine-tuned model to your system.

      .. code-block:: python

         # Fully fine-tuned model name.
         new_model_name = "llama-2-7b-enhanced"
-
+         
         # Save the fully fine-tuned model.
         full_trainer.model.save_pretrained(new_model_name)

@@ -382,7 +383,7 @@ store, and load.

         # Access new model directory.
         cd llama-2-7b-enhanced
-
+         
         # List all model files.
         config.json                       model-00002-of-00006.safetensors  model-00005-of-00006.safetensors
         generation_config.json            model-00003-of-00006.safetensors  model-00006-of-00006.safetensors
@@ -411,26 +412,26 @@ Let's look at achieving model inference using these types of models.

   .. tab-item:: Inference using PEFT adapters

-      To use PEFT adapters like a normal transformer model, you can run the generation by loading a base model along with PEFT
+      To use PEFT adapters like a normal transformer model, you can run the generation by loading a base model along with PEFT 
      adapters as follows.

      .. code-block:: python

         from peft import PeftModel
         from transformers import AutoModelForCausalLM
-
+         
         # Set the path of the model or the name on Hugging face hub
         base_model_name = "meta-llama/Llama-2-7b-chat-hf"
-
+         
         # Set the path of the adapter
         adapter_name = "Llama-2-7b-enhanced-adpater"
-
-         # Load base model
+         
+         # Load base model 
         base_model = AutoModelForCausalLM.from_pretrained(base_model_name)
-
-         # Adapt the base model with the adapter
+         
+         # Adapt the base model with the adapter 
         new_model = PeftModel.from_pretrained(base_model, adapter_name)
-
+         
         # Then, run generation as the same with a normal model outlined in 2.1

      The PEFT library provides a ``merge_and_unload`` method, which merges the adapter layers into the base model. This is
@@ -438,13 +439,13 @@ Let's look at achieving model inference using these types of models.

      .. code-block:: python

-         # Load base model
+         # Load base model 
         base_model = AutoModelForCausalLM.from_pretrained(base_model_name)
-
-         # Adapt the base model with the adapter
+         
+         # Adapt the base model with the adapter 
         new_model = PeftModel.from_pretrained(base_model, adapter_name)
-
-         # Merge adapter
+         
+         # Merge adapter 
         model = model.merge_and_unload()

         # Save the merged model into local
@@ -460,25 +461,25 @@ Let's look at achieving model inference using these types of models.

         # Import relevant class for loading model and tokenizer
         from transformers import AutoTokenizer, AutoModelForCausalLM
-
+         
         # Set the pre-trained model name on Hugging face hub
         model_name = "meta-llama/Llama-2-7b-chat-hf"
-
-         # Set device type
+         
+         # Set device type 
         device = "cuda:0"
-
-         # Load model and tokenizer
+         
+         # Load model and tokenizer 
         model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
         tokenizer = AutoTokenizer.from_pretrained(model_name)
-
-         # Input prompt encoding
+         
+         # Input prompt encoding 
         query = "What is a large language model?"
         inputs = tokenizer.encode(query, return_tensors="pt").to(device)
-
-         # Token generation
-         outputs = model.generate(inputs)
-
-         # Outputs decoding
+         
+         # Token generation  
+         outputs = model.generate(inputs) 
+         
+         # Outputs decoding 
         print(tokenizer.decode(outputs[0]))

      In addition, pipelines from Transformers offer simple APIs to use pre-trained models for different tasks, including
@@ -489,14 +490,14 @@ Let's look at achieving model inference using these types of models.

         # Import relevant class for loading model and tokenizer
         from transformers import pipeline
-
+         
         # Set the path of your model or the name on Hugging face hub
         model_name_or_path = "meta-llama/Llama-2-7b-chat-hf"
-
-         # Set pipeline
+         
+         # Set pipeline 
         # A positive device value will run the model on associated CUDA device id
         pipe = pipeline("text-generation", model=model_name_or_path, device=0)
-
+         
         # Token generation
         print(pipe("What is a large language model?")[0]["generated_text"])

--- a/docs/how-to/rocm-for-ai/system-setup/prerequisite-system-validation.rst
+++ b/docs/how-to/rocm-for-ai/system-setup/prerequisite-system-validation.rst
@@ -31,16 +31,16 @@ in the Instinct documentation for more information.
 Hardware verification with ROCm
 -------------------------------

-Use the command ``amd-smi set --perf-determinism 1900`` to set the max clock speed up to 1900 MHz
+Use the command ``rocm-smi --setperfdeterminism 1900`` to set the max clock speed up to 1900 MHz
 instead of the default 2100 MHz. This can reduce the chance of a PCC event lowering the attainable
 GPU clocks. This setting will not be required for new IFWI releases with the production PRC feature.
-You can restore this setting to its default value with the ``amd-smi reset --clocks`` command.
+You can restore this setting to its default value with the ``rocm-smi -r`` command.

 Run the command:

 .. code-block:: shell

-   amd-smi set --perf-determinism 1900
+   rocm-smi --setperfdeterminism 1900

 See `Hardware verfication for ROCm <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html#hardware-verification-with-rocm>`_
 in the Instinct documentation for more information.
--- a/docs/how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-v24.12-dev.rst
+++ b/docs/how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-v24.12-dev.rst
@@ -108,16 +108,16 @@ for more information.
 Hardware verification with ROCm
 -------------------------------

-Use the command ``amd-smi set --perf-determinism 1900`` to set the max clock speed up to 1900 MHz
+Use the command ``rocm-smi --setperfdeterminism 1900`` to set the max clock speed up to 1900 MHz
 instead of the default 2100 MHz. This can reduce the chance of a PCC event lowering the attainable
 GPU clocks. This setting will not be required for new IFWI releases with the production PRC feature.
-You can restore this setting to its default value with the ``amd-smi reset --clocks`` command.
+You can restore this setting to its default value with the ``rocm-smi -r`` command.

 Run the command:

 .. code-block:: shell

-   amd-smi set --perf-determinism 1900
+   rocm-smi --setperfdeterminism 1900

 See `Hardware verification with ROCm <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html#hardware-verification-with-rocm>`_ for more information.

@@ -248,7 +248,7 @@ Download the Docker image and required packages
      Checking out this specific commit is recommended for a stable and reproducible environment.

      .. code-block:: shell
-
+         
         git checkout bb93ccbfeae6363c67b361a97a27c74ab86e7e92

 Prepare training datasets
--- a/docs/how-to/rocm-for-ai/training/benchmark-docker/primus-pytorch.rst
+++ b/docs/how-to/rocm-for-ai/training/benchmark-docker/primus-pytorch.rst
@@ -285,7 +285,7 @@ tweak some configurations (such as batch sizes).

                     .. code-block:: shell

-                        EXP=examples/torchtitan/configs/MI355X/llama3.1_8B-FP8-pretrain.yaml \
+                        EXP=examples/torchtitan/configs/MI355X/llama3.1_8B-BF16-pretrain.yaml \
                        bash examples/run_pretrain.sh

                  .. tab-item:: MI325X
--- a/docs/sphinx/requirements.in
+++ b/docs/sphinx/requirements.in
@@ -1,4 +1,4 @@
-rocm-docs-core==1.30.0
+rocm-docs-core==1.31.1
 sphinx-reredirects
 sphinx-sitemap
 sphinxcontrib.datatemplates==0.11.0
--- a/docs/sphinx/requirements.txt
+++ b/docs/sphinx/requirements.txt
@@ -132,6 +132,7 @@ nest-asyncio==1.6.0
 packaging==25.0
    # via
    #   ipykernel
+    #   pydata-sphinx-theme
    #   sphinx
 parso==0.8.5
    # via jedi
@@ -149,7 +150,7 @@ pure-eval==0.2.3
    # via stack-data
 pycparser==2.23
    # via cffi
-pydata-sphinx-theme==0.16.1
+pydata-sphinx-theme==0.15.4
    # via
    #   rocm-docs-core
    #   sphinx-book-theme
@@ -163,7 +164,7 @@ pygments==2.19.2
    #   sphinx
 pyjwt[crypto]==2.10.1
    # via pygithub
-pynacl==1.6.1
+pynacl==1.6.2
    # via pygithub
 python-dateutil==2.9.0.post0
    # via jupyter-client
@@ -187,7 +188,7 @@ requests==2.32.5
    # via
    #   pygithub
    #   sphinx
-rocm-docs-core==1.30.0
+rocm-docs-core==1.31.1
    # via -r requirements.in
 rpds-py==0.29.0
    # via
@@ -217,7 +218,7 @@ sphinx==8.1.3
    #   sphinx-reredirects
    #   sphinxcontrib-datatemplates
    #   sphinxcontrib-runcmd
-sphinx-book-theme==1.1.3
+sphinx-book-theme==1.1.4
    # via rocm-docs-core
 sphinx-copybutton==0.5.2
    # via rocm-docs-core
--- a/docs/what-is-rocm.rst
+++ b/docs/what-is-rocm.rst
@@ -123,8 +123,7 @@ Performance

 .. note::

-  `ROCprof Compute Viewer <https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/amd-mainline/>`_ is a tool for visualizing and analyzing GPU thread trace data collected using :doc:`rocprofv3 <rocprofiler-sdk:index>`.
-  Note that `ROCprof Compute Viewer <https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/amd-mainline/>`_ is in an early access state. Running production workloads is not recommended.
+  `ROCprof Compute Viewer <https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/amd-mainline/>`_ is a tool for visualizing and analyzing GPU thread trace data collected using :doc:`rocprofv3 <rocprofiler-sdk:index>`. Note that `ROCprof Compute Viewer <https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/amd-mainline/>`_ is in an early access state. Running production workloads is not recommended.

 Development
 ^^^^^^^^^^^
Author	SHA1	Message	Date
Joseph Macaranas	febbf385c4	[External CI] Add SIMDe dev package to HIP runtime pipeline	2026-01-07 10:25:18 -05:00
dependabot[bot]	ba95e0e689	Bump pynacl from 1.6.1 to 1.6.2 in /docs/sphinx (#5836 ) Bumps [pynacl](https://github.com/pyca/pynacl) from 1.6.1 to 1.6.2. - [Changelog](https://github.com/pyca/pynacl/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pyca/pynacl/compare/1.6.1...1.6.2) --- updated-dependencies: - dependency-name: pynacl dependency-version: 1.6.2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-01-06 14:10:42 -05:00
Pratik Basyal	1691d369e9	ROCM-core version fixed (#5827 )	2026-01-02 16:06:27 -05:00
peterjunpark	172b0f7c08	Fix inconsistency in xDiT doc Fix inconsistency in xDiT doc	2025-12-29 10:26:25 -05:00
peterjunpark	c67fac78bd	Update docs for xDiT diffusion inference 25.13 Docker release (#5820 ) * archive previous version * add xdit 25.13 * update history index * add perf results section	2025-12-29 08:44:45 -05:00
peterjunpark	e0b8ec4dfb	Update training docs for Primus/25.11 (#5819 ) * update conf and toc.yml.in * archive previous versions archive data files update anchors * primus pytorch: remove training batch size args * update primus megatron run cmds multi-node * update primus pytorch update * update update * update docker tag	2025-12-29 08:05:47 -05:00
Pratik Basyal	38f2d043dc	OS table removed from compatibility table [develop] (#5810 ) * OS table removed from compatibility table * Feedback added * Azure Linux 3.0 and compatibility version update * Version fix * Review feedback added * Minor change	2025-12-23 16:28:19 -05:00
peterjunpark	3a43bacdda	Update xdit diffusion inference history (#5808 ) * Update xdit diffusion inference history * fix	2025-12-22 11:05:32 -05:00
peterjunpark	48d8fe139b	fix link to ROCm PyT docker image (#5803 )	2025-12-19 15:47:55 -05:00
peterjunpark	7455fe57b8	clean up formatting in FA2 page (#5795 )	2025-12-19 09:21:41 -05:00
peterjunpark	52c0a47e84	Update Flash Attention guidance in "Model acceleration libraries" (#5793 ) * flash attention update Signed-off-by: seungrok.jung <seungrok.jung@amd.com> flash attention update Signed-off-by: seungrok.jung <seungrok.jung@amd.com> flash attention update Signed-off-by: seungrok.jung <seungrok.jung@amd.com> sentence-case heading * Update docs/how-to/rocm-for-ai/inference-optimization/model-acceleration-libraries.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> --------- Co-authored-by: seungrok.jung <seungrok.jung@amd.com> Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>	2025-12-19 08:48:52 -05:00
peterjunpark	cbab9a465d	Update documentation for JAX training MaxText 25.11 release (#5789 )	2025-12-18 11:23:58 -05:00
peterjunpark	459283da3c	xDiT diffusion inference v25.12 documentation update (#5786 ) * Add xdit-diffusion ROCm docs page. * Update template formatting and fix sphinx warnings * Add System Validation section. * Add sw component versions/commits. * Update to use latest v25.10 image instead of v25.9 * Update commands and add FLUX instructions. * Update Flux instructions. Change image tag. Describe as diffusion inference instead of specifically video. * git rm xdit-video-diffusion.rst * Docs for v25.12 * Add hyperlinks to components * Command fixes * -Diffusers suffix * Simplify yaml file and cleanup main rst page. * Spelling, added 'js' * fix merge conflict fix --------- Co-authored-by: Kristoffer <kristoffer.torp@amd.com>	2025-12-17 10:20:10 -05:00
peterjunpark	1b4f25733d	vLLM inference benchmark 1210 (#5776 ) * Archive previous ver fix anchors * Update vllm.rst and data yaml for 20251210	2025-12-17 09:21:57 -05:00
Ibrahim Wani	b287372be5	[origami] Test update (#5768 ) * Fix the skipping of origami tests * Update dependencies for origami refactor * test * Unsupress test output. * Ctest implementation * Test ctest * Test ctest 2 * Add pip install test * Fix python version * Add python dep * test * test 2 * Debug for readme * Fix pip install * Fix pip install 2 * Clean up * Run tests on 950 * Replace 950 with 1201 * 1101 * Add more archs * Add more archs 2 * Comment out archs * Move pip install script to ./azuredevops/scripts * Fix path * Fix path 2 * Fix path 3 * Fix path 4 * Remove pip install testing: * Use inline script * Add old deps	2025-12-16 15:37:41 -07:00
Pratik Basyal	78e8baf147	Taichi removed from ROCm docs [Develop] (#5779 ) * Taichi removed from ROCm docs * Warnings fixed	2025-12-16 13:12:40 -05:00
Matt Williams	3e0c8b47e3	Merge pull request #5771 from ROCm/mattwill-amd-patch-4 Reverting Optiq note	2025-12-12 17:53:41 -05:00
Matt Williams	c3f0b99cc0	Reverting Optiq note	2025-12-12 17:47:33 -05:00
dependabot[bot]	c9d1679486	Bump rocm-docs-core from 1.31.0 to 1.31.1 in /docs/sphinx Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.31.0 to 1.31.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.31.0...v1.31.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-version: 1.31.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2025-12-12 16:15:26 -05:00
Pratik Basyal	fdbef17d7b	Onnx and rocshmem version updated (#5760 )	2025-12-11 17:05:25 -05:00
Matt Williams	6592a41a7f	Adding ROCm-Optiq note to What is ROCm page (#5709 ) * Adding ROCm-Optiq note to What is ROCm page Adding a note for a link to the Optiq docs * Apply suggestion from @mattwill-amd * Apply suggestion from @mattwill-amd * Apply suggestion from @mattwill-amd * Update what-is-rocm.rst * Update what-is-rocm.rst * Apply suggestion from @mattwill-amd * Apply suggestion from @mattwill-amd * Apply suggestion from @mattwill-amd * Apply suggestion from @mattwill-amd	2025-12-10 12:56:33 -08:00
Matt Williams	65a936023b	Fixing link redirects (#5758 ) * Update multi-gpu-fine-tuning-and-inference.rst * Update pytorch-training-v25.6.rst * Update pytorch-compatibility.rst	2025-12-10 11:17:59 -05:00
anisha-amd	2a64949081	Docs: update verl compatibility - fix (#5756 )	2025-12-09 19:51:37 -05:00
anisha-amd	0a17434517	Docs: update verl compatibility - fix (#5754 )	2025-12-09 18:36:16 -05:00
anisha-amd	2be7e5ac1e	Docs: verl framework - compatibility - 25.11 release (#5752 )	2025-12-09 11:41:43 -05:00
dependabot[bot]	ae80c4a31c	Bump rocm-docs-core from 1.30.1 to 1.31.0 in /docs/sphinx (#5751 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.30.1 to 1.31.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.31.0/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.30.1...v1.31.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-version: 1.31.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-12-09 08:25:16 -05:00
Adel Johar	dd89a692e1	[Ex CI] Add rocAL dependencies	2025-12-09 10:56:23 +01:00
peterjunpark	bf74351e5a	Fix Primus PyTorch doc: training.batch_size -> training.local_batch_size (#5748 )	2025-12-08 13:35:22 -05:00
yugang-amd	f2067767e0	xdit-diffusion v25.11 docs (#5744 )	2025-12-05 17:09:48 -05:00
Pratik Basyal	effd4174fb	PyTorch 2.7 support added (#5740 )	2025-12-04 15:49:23 -05:00
peterjunpark	453751a86f	fix docker hub links for primus:v25.10 (#5738 )	2025-12-04 09:17:33 -05:00
peterjunpark	fb644412d5	Update training Docker docs for Primus 25.10 (#5737 )	2025-12-04 09:08:00 -05:00
Pratik Basyal	e8fdc34b71	711 hipBLASLT performance decline known issue added (#5730 ) * hipBLASLT performance decline known issue added * Update RELEASE.md Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * GitHub Issue added * Ram's feedback incorporated * GitHub Issue added * Update RELEASE.md Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>	2025-12-03 08:50:25 -05:00
Pratik Basyal	b4031ef23c	7.1.1 known issues post GA (#5721 ) * rocblas known issues added * Minor change * Update RELEASE.md Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * Resolved * Update RELEASE.md Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>	2025-11-28 16:34:47 -05:00
dependabot[bot]	d0bd4e6f03	Bump rocm-docs-core from 1.29.0 to 1.30.1 in /docs/sphinx (#5712 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.29.0 to 1.30.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.29.0...v1.30.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-version: 1.30.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-11-28 08:18:23 -05:00
Jan Stephan	0056b9453e	Remove continuous numbering of tables and figures Signed-off-by: Jan Stephan <jan.stephan@amd.com>	2025-11-28 10:29:01 +01:00
Pratik Basyal	3d1ad79766	Merged cell removed for coloring issue (#5713 )	2025-11-27 19:52:36 -05:00
Pratik Basyal	8683bed11b	Known issue from 7.1.0 removed (#5702 )	2025-11-26 12:27:22 -05:00
Pratik Basyal	847cd7c423	Link and PyTorch version updated (#5700 )	2025-11-26 11:52:47 -05:00