Merge Verl, DGL, Megatron changes. (#5047)

* Verl compatibility * verl compatibility * add Supported features Signed-off-by: Vicky Tsang <vtsang@amd.com> * updated and edited verl compat doc * added links to verl * add future release for sglang and megatron inference eng. Signed-off-by: Vicky Tsang <vtsang@amd.com> * fix lint Signed-off-by: Vicky Tsang <vtsang@amd.com> * fixed a typo and a table * Spolifroni amd/add to compat matrix (#430) * added verl to compatibility matrix * small change * fixed an error in csv * edited the verl compat based on leo's recommendations * updated compat matrix (#435) * Added a hardcoded link to the verl install This is a link to an RTD build and MUST be removed before publishing. * Update verl-compatibility.rst * Added a hardcoded link to the verl install This link is to an RTD build and it WILL break at publishing. It MUST be changed before publishing. * Added version support note (#448) * small fixes * Update verl-compatibility.rst * Update verl-compatibility.rst --------- Signed-off-by: Vicky Tsang <vtsang@amd.com> Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com> Co-authored-by: anisha-amd <anisha.sankar@amd.com> (cherry picked from commit f9bd22626b) * Stanford Megatron-LM Compatibility * Create stanford-megatron-lm-compatibility.rst * toc and wordlist * Update deep-learning-rocm.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * fixes and adding to main compat matrix * formatting fix * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst --------- Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> (cherry picked from commit f4f096b44e) * Framework: DGL Compatability * Introducing new file for DGL Compatability * Update dgl-compatibility.rst * Update .wordlist.txt * Update .wordlist.txt * Update deep-learning-rocm.rst * compatibility fixes * Update docs/compatibility/ml-compatibility/dgl-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update docs/compatibility/ml-compatibility/dgl-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update docs/compatibility/ml-compatibility/dgl-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update docs/compatibility/ml-compatibility/dgl-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update dgl-compatibility.rst * Update dgl-compatibility.rst * Update dgl-compatibility.rst * Update dgl-compatibility.rst * additions to use-cases and system support * wording and fixes * Update dgl-compatibility.rst * Update dgl-compatibility.rst * remove table heading * Update compatibility-matrix-historical-6.0.csv --------- Co-authored-by: anisha-amd <anisha.sankar@amd.com> Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> (cherry picked from commit 2a7554c0b9) * Manually resolve merge conflict * Further merge conflict adjustments --------- Signed-off-by: Vicky Tsang <vtsang@amd.com> Co-authored-by: vickytsang <vtsang@amd.com> Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com> Co-authored-by: anisha-amd <anisha.sankar@amd.com> Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> Co-authored-by: Mukhil M S <167260682+mukh1l@users.noreply.github.com>
2026-01-09 22:58:17 -05:00 · 2025-07-15 18:57:31 -04:00
parent 505698ed3f
commit b431415ade
6 changed files with 469 additions and 1 deletions
--- a/.wordlist.txt
+++ b/.wordlist.txt
@@ -6,6 +6,7 @@ ACS
 AccVGPR
 AccVGPRs
 ALU
 AllReduce
 AMD
 AMDGPU
 AMDGPUs
@@ -13,6 +14,7 @@ AMDMIGraphX
 AMI
 AOCC
 AOMP
 AOT
 AOTriton
 APBDIS
 APIC
@@ -79,10 +81,13 @@ ConnectX
 CuPy
 da
 Dashboarding
 Dataloading
 DBRX
 DDR
 DF
 DGEMM
 DGL
 DGLGraph
 dGPU
 dGPUs
 DIMM
@@ -100,6 +105,7 @@ DataFrame
 DataLoader
 DataParallel
 Debian
 decompositions
 DeepSeek
 DeepSpeed
 Dependabot
@@ -129,6 +135,7 @@ FluxBenchmark
 Fortran
 Fuyu
 GALB
 GAT
 GCC
 GCD
 GCDs
@@ -156,6 +163,8 @@ GPT
 GPU
 GPU's
 GPUs
 Graphbolt
 GraphSage
 GRBM
 GenAI
 GenZ
@@ -168,6 +177,7 @@ HIPCC
 HIPExtension
 HIPIFY
 HIPification
 hipification
 HIPify
 HPC
 HPCG
@@ -182,6 +192,7 @@ Higgs
 Hyperparameters
 Huggingface
 ICD
 ICT
 ICV
 IDE
 IDEs
@@ -216,6 +227,7 @@ KV
 KVM
 Karpathy's
 KiB
 Kineto
 Keras
 Khronos
 LAPACK
@@ -264,6 +276,7 @@ Miniconda
 MirroredStrategy
 Mixtral
 MosaicML
 Mpops
 Multicore
 Multithreaded
 MyEnvironment
@@ -277,6 +290,7 @@ NIC
 NICs
 NLI
 NLP
 NN
 NPKit
 NPS
 NSP
@@ -313,6 +327,7 @@ OpenMPI
 OpenSSL
 OpenVX
 OpenXLA
 Optim
 Oversubscription
 PagedAttention
 Pallas
@@ -351,6 +366,7 @@ RDC's
 RDMA
 RDNA
 README
 Recomputation
 RHEL
 RMW
 RNN
@@ -781,6 +797,7 @@ reStructuredText
 redirections
 refactorization
 reformats
 reinforcememt
 repo
 repos
 representativeness
@@ -788,6 +805,7 @@ req
 resampling
 rescaling
 reusability
 RLHF
 roadmap
 roc
 rocAL
@@ -899,6 +917,7 @@ vectorize
 vectorized
 vectorizer
 vectorizes
 verl
 virtualize
 virtualized
 vjxb
--- a/docs/compatibility/compatibility-matrix.rst
+++ b/docs/compatibility/compatibility-matrix.rst
@@ -54,7 +54,9 @@ compatibility and system requirements.
      FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix:,,
      :doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.4, 2.3, 2.2, 2.1, 2.0, 1.13"
      :doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>`,"2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.17.0, 2.16.2, 2.15.1"
-      :doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.4.35,0.4.35,0.4.31
+      :doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.4.35,0.4.35,0.4.31  
      :doc:`Stanford Megatron-LM <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>`,N/A,N/A,`85f95ae <https://github.com/stanford-futuredata/Megatron-LM/commit/85f95aef3b648075fe6f291c86714fdcbd9cd1f5>`_
      :doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>`,2.4.0,2.4.0,N/A
      `ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.2,1.2,1.17.3
      ,,,
      THIRD PARTY COMMS,.. _thirdpartycomms-support-compatibility-matrix:,,
@@ -235,6 +237,7 @@ Expand for full historical view of:
   .. [#mi300_610-past-60] **For ROCm 6.1.0** - MI300A (gfx942) is supported on Ubuntu 22.04.4, RHEL 9.4, RHEL 9.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.4.
   .. [#mi300_602-past-60] **For ROCm 6.0.2** - MI300A (gfx942) is supported on Ubuntu 22.04.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.3.
   .. [#mi300_600-past-60] **For ROCm 6.0.0** - MI300A (gfx942) is supported on Ubuntu 22.04.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.3.
   .. [#verl_compat] verl is only supported on ROCm 6.2.0.
   .. [#kfd_support-past-60] As of ROCm 6.4.0, forward and backward compatibility between the AMD Kernel-mode GPU Driver (KMD) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The tested user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and kernel-space support matrix <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html>`_.
   .. [#ROCT-rocr-past-60] Starting from ROCm 6.3.0, the ROCT Thunk Interface is included as part of the ROCr runtime package.
   .. [#RDNA-OS-past-60] Radeon AI PRO R9700, Radeon RX 9070 XT (gfx1201), Radeon RX 9060 XT (gfx1200), Radeon PRO W7700 (gfx1101), and Radeon RX 7800 XT (gfx1101) are supported only on Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.6, RHEL 9.5, and RHEL 9.4.
--- a/docs/compatibility/ml-compatibility/dgl-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/dgl-compatibility.rst
@@ -0,0 +1,255 @@
 :orphan:
 .. meta::
    :description: Deep Graph Library (DGL) compatibility
    :keywords: GPU, DGL compatibility
 .. version-set:: rocm_version latest
 ********************************************************************************
 DGL compatibility
 ********************************************************************************
 Deep Graph Library `(DGL) <https://www.dgl.ai/>`_ is an easy-to-use, high-performance and scalable 
 Python package for deep learning on graphs. DGL is framework agnostic, meaning 
 if a deep graph model is a component in an end-to-end application, the rest of 
 the logic is implemented using PyTorch.  
 * ROCm support for DGL is hosted in the `https://github.com/ROCm/dgl <https://github.com/ROCm/dgl>`_ repository. 
 * Due to independent compatibility considerations, this location differs from the `https://github.com/dmlc/dgl <https://github.com/dmlc/dgl>`_ upstream repository. 
 * Use the prebuilt :ref:`Docker images <dgl-docker-compat>` with DGL, PyTorch, and ROCm preinstalled.
 * See the :doc:`ROCm DGL installation guide <rocm-install-on-linux:install/3rd-party/dgl-install>` 
  to install and get started.
 Supported devices
 ================================================================================
 - **Officially Supported**: TF32 with AMD Instinct MI300X (through hipblaslt)
 - **Partially Supported**: TF32 with AMD Instinct MI250X
 .. _dgl-recommendations:
 Use cases and recommendations
 ================================================================================
 DGL can be used for Graph Learning, and building popular graph models like  
 GAT, GCN and GraphSage. Using these we can support a variety of use-cases such as:
 - Recommender systems
 - Network Optimization and Analysis
 - 1D (Temporal) and 2D (Image) Classification
 - Drug Discovery
 Refer to :doc:`ROCm DGL blog posts <https://rocm.blogs.amd.com/blog/tag/dgl.html>` 
 for examples and best practices to optimize your training workflows on AMD GPUs. 
 Coverage includes:
 - Single-GPU training/inference
 - Multi-GPU training
 Benchmarking details are included in the :doc:`Benchmarks` section.
 .. _dgl-docker-compat:
 Docker image compatibility
 ================================================================================
 .. |docker-icon| raw:: html
   <i class="fab fa-docker"></i>
 AMD validates and publishes `DGL images <https://hub.docker.com/r/rocm/dgl>`_
 with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated
 inventories were tested on `ROCm 6.4.0 <https://repo.radeon.com/rocm/apt/6.4/>`_.
 Click the |docker-icon| to view the image on Docker Hub.
 .. list-table:: DGL Docker image components
    :header-rows: 1
    :class: docker-image-compatibility
    * - Docker
      - DGL
      - PyTorch
      - Ubuntu
      - Python
    * - .. raw:: html
           <a href="https://hub.docker.com/layers/rocm/dgl/dgl-2.4_rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.6.0/images/sha256-8ce2c3bcfaa137ab94a75f9e2ea711894748980f57417739138402a542dd5564"><i class="fab fa-docker fa-lg"></i></a>
      - `2.4.0 <https://github.com/dmlc/dgl/releases/tag/v2.4.0>`_
      - `2.6.0 <https://github.com/ROCm/pytorch/tree/release/2.6>`_
      - 24.04
      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
    * - .. raw:: html
           <a href="https://hub.docker.com/layers/rocm/dgl/dgl-2.4_rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1/images/sha256-cf1683283b8eeda867b690229c8091c5bbf1edb9f52e8fb3da437c49a612ebe4"><i class="fab fa-docker fa-lg"></i></a>
      - `2.4.0 <https://github.com/dmlc/dgl/releases/tag/v2.4.0>`_
      - `2.4.1 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
      - 24.04
      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
    * - .. raw:: html
           <a href="https://hub.docker.com/layers/rocm/dgl/dgl-2.4_rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.4.1/images/sha256-4834f178c3614e2d09e89e32041db8984c456d45dfd20286e377ca8635686554"><i class="fab fa-docker fa-lg"></i></a>
      - `2.4.0 <https://github.com/dmlc/dgl/releases/tag/v2.4.0>`_
      - `2.4.1 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
      - 22.04
      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
    * - .. raw:: html
           <a href="https://hub.docker.com/layers/rocm/dgl/dgl-2.4_rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.3.0/images/sha256-88740a2c8ab4084b42b10c3c6ba984cab33dd3a044f479c6d7618e2b2cb05e69"><i class="fab fa-docker fa-lg"></i></a>
      - `2.4.0 <https://github.com/dmlc/dgl/releases/tag/v2.4.0>`_
      - `2.3.0 <https://github.com/ROCm/pytorch/tree/release/2.3>`_
      - 22.04
      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
 Key ROCm libraries for DGL
 ================================================================================
 DGL on ROCm depends on specific libraries that affect its features and performance.
 Using the DGL Docker container or building it with the provided docker file or a ROCm base image is recommended.
 If you prefer to build it yourself, ensure the following dependencies are installed:
 .. list-table:: 
    :header-rows: 1
    * - ROCm library
      - Version
      - Purpose
    * - `Composable Kernel <https://github.com/ROCm/composable_kernel>`_
      - :version-ref:`"Composable Kernel" rocm_version`
      - Enables faster execution of core operations like matrix multiplication
        (GEMM), convolutions and transformations.
    * - `hipBLAS <https://github.com/ROCm/hipBLAS>`_
      - :version-ref:`hipBLAS rocm_version`
      - Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for
        matrix and vector operations.
    * - `hipBLASLt <https://github.com/ROCm/hipBLASLt>`_
      - :version-ref:`hipBLASLt rocm_version`
      - hipBLASLt is an extension of the hipBLAS library, providing additional
        features like epilogues fused into the matrix multiplication kernel or
        use of integer tensor cores.
    * - `hipCUB <https://github.com/ROCm/hipCUB>`_
      - :version-ref:`hipCUB rocm_version`
      - Provides a C++ template library for parallel algorithms for reduction,
        scan, sort and select.
    * - `hipFFT <https://github.com/ROCm/hipFFT>`_
      - :version-ref:`hipFFT rocm_version`
      - Provides GPU-accelerated Fast Fourier Transform (FFT) operations.
    * - `hipRAND <https://github.com/ROCm/hipRAND>`_
      - :version-ref:`hipRAND rocm_version`
      - Provides fast random number generation for GPUs.
    * - `hipSOLVER <https://github.com/ROCm/hipSOLVER>`_
      - :version-ref:`hipSOLVER rocm_version`
      - Provides GPU-accelerated solvers for linear systems, eigenvalues, and
        singular value decompositions (SVD).
    * - `hipSPARSE <https://github.com/ROCm/hipSPARSE>`_
      - :version-ref:`hipSPARSE rocm_version`
      - Accelerates operations on sparse matrices, such as sparse matrix-vector
        or matrix-matrix products.
    * - `hipSPARSELt <https://github.com/ROCm/hipSPARSELt>`_
      - :version-ref:`hipSPARSELt rocm_version`
      - Accelerates operations on sparse matrices, such as sparse matrix-vector
        or matrix-matrix products.
    * - `hipTensor <https://github.com/ROCm/hipTensor>`_
      - :version-ref:`hipTensor rocm_version`
      - Optimizes for high-performance tensor operations, such as contractions.
    * - `MIOpen <https://github.com/ROCm/MIOpen>`_
      - :version-ref:`MIOpen rocm_version`
      - Optimizes deep learning primitives such as convolutions, pooling,
        normalization, and activation functions.
    * - `MIGraphX <https://github.com/ROCm/AMDMIGraphX>`_
      - :version-ref:`MIGraphX rocm_version`
      - Adds graph-level optimizations, ONNX models and mixed precision support
        and enable Ahead-of-Time (AOT) Compilation.
    * - `MIVisionX <https://github.com/ROCm/MIVisionX>`_
      - :version-ref:`MIVisionX rocm_version`
      - Optimizes acceleration for computer vision and AI workloads like
        preprocessing, augmentation, and inferencing.
    * - `rocAL <https://github.com/ROCm/rocAL>`_
      - :version-ref:`rocAL rocm_version`
      - Accelerates the data pipeline by offloading intensive preprocessing and
        augmentation tasks. rocAL is part of MIVisionX.
    * - `RCCL <https://github.com/ROCm/rccl>`_
      - :version-ref:`RCCL rocm_version`
      - Optimizes for multi-GPU communication for operations like AllReduce and
        Broadcast.
    * - `rocDecode <https://github.com/ROCm/rocDecode>`_
      - :version-ref:`rocDecode rocm_version`
      - Provides hardware-accelerated data decoding capabilities, particularly
        for image, video, and other dataset formats.
    * - `rocJPEG <https://github.com/ROCm/rocJPEG>`_
      - :version-ref:`rocJPEG rocm_version`
      - Provides hardware-accelerated JPEG image decoding and encoding.
    * - `RPP <https://github.com/ROCm/RPP>`_
      - :version-ref:`RPP rocm_version`
      - Speeds up data augmentation, transformation, and other preprocessing steps.
    * - `rocThrust <https://github.com/ROCm/rocThrust>`_
      - :version-ref:`rocThrust rocm_version`
      - Provides a C++ template library for parallel algorithms like sorting,
        reduction, and scanning.
    * - `rocWMMA <https://github.com/ROCm/rocWMMA>`_
      - :version-ref:`rocWMMA rocm_version`
      - Accelerates warp-level matrix-multiply and matrix-accumulate to speed up matrix
        multiplication (GEMM) and accumulation operations with mixed precision
        support.
 Supported features
 ================================================================================
 Many functions and methods available in DGL Upstream are also supported in DGL ROCm.
 Instead of listing them all, support is grouped into the following categories to provide a general overview. 
 * DGL Base
 * DGL Backend 
 * DGL Data
 * DGL Dataloading
 * DGL DGLGraph
 * DGL Function
 * DGL Ops
 * DGL Sampling
 * DGL Transforms
 * DGL Utils
 * DGL Distributed
 * DGL Geometry
 * DGL Mpops
 * DGL NN
 * DGL Optim
 * DGL Sparse
 Unsupported features
 ================================================================================
 * Graphbolt
 * Partial TF32 Support (MI250x only)
 * Kineto/ ROCTracer integration
 Unsupported functions
 ================================================================================
 * ``more_nnz``
 * ``format``
 * ``multiprocess_sparse_adam_state_dict``
 * ``record_stream_ndarray``
 * ``half_spmm``
 * ``segment_mm`` 
 * ``gather_mm_idx_b``
 * ``pgexplainer``
 * ``sample_labors_prob``
 * ``sample_labors_noprob``
--- a/docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst
@@ -0,0 +1,100 @@
 :orphan:
 .. meta::
    :description: Stanford Megatron-LM compatibility
    :keywords: Stanford, Megatron-LM, compatibility
 .. version-set:: rocm_version latest
 ********************************************************************************
 Stanford Megatron-LM compatibility
 ********************************************************************************
 Stanford Megatron-LM is a large-scale language model training framework developed by NVIDIA `https://github.com/NVIDIA/Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_. It is
 designed to train massive transformer-based language models efficiently by model and data parallelism. 
 * ROCm support for Stanford Megatron-LM is hosted in the official `https://github.com/ROCm/Stanford-Megatron-LM <https://github.com/ROCm/Stanford-Megatron-LM>`_ repository. 
 * Due to independent compatibility considerations, this location differs from the `https://github.com/stanford-futuredata/Megatron-LM <https://github.com/stanford-futuredata/Megatron-LM>`_ upstream repository. 
 * Use the prebuilt :ref:`Docker image <megatron-lm-docker-compat>` with ROCm, PyTorch, and Megatron-LM preinstalled. 
 * See the :doc:`ROCm Stanford Megatron-LM installation guide <rocm-install-on-linux:install/3rd-party/stanford-megatron-lm-install>` to install and get started.
 .. note::
 	Stanford Megatron-LM is supported on ROCm 6.3.0.
 Supported Devices
 ================================================================================
 - **Officially Supported**: AMD Instinct MI300X
 - **Partially Supported** (functionality or performance limitations): AMD Instinct MI250X, MI210X
 Supported models and features
 ================================================================================
 This section details models & features that are supported by the ROCm version on Stanford Megatron-LM.
 Models:
 * Bert
 * GPT
 * T5
 * ICT
 Features:
 * Distributed Pre-training
 * Activation Checkpointing and Recomputation
 * Distributed Optimizer
 * Mixture-of-Experts
 .. _megatron-lm-recommendations:
 Use cases and recommendations
 ================================================================================
 See the `Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs blog <https://rocm.blogs.amd.com/artificial-intelligence/megablocks/README.html>`_ post  
 to leverage the ROCm platform for pre-training by using the Stanford Megatron-LM framework of pre-processing datasets on AMD GPUs. 
 Coverage includes:
  * Single-GPU pre-training
  * Multi-GPU pre-training
 .. _megatron-lm-docker-compat:
 Docker image compatibility
 ================================================================================
 .. |docker-icon| raw:: html
   <i class="fab fa-docker"></i>
 AMD validates and publishes `Stanford Megatron-LM images <https://hub.docker.com/r/rocm/megatron-lm>`_
 with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated
 inventories represent the latest Megatron-LM version from the official Docker Hub.
 The Docker images have been validated for `ROCm 6.3.0 <https://repo.radeon.com/rocm/apt/6.3/>`_.
 Click |docker-icon| to view the image on Docker Hub.
 .. list-table:: 
    :header-rows: 1
    :class: docker-image-compatibility
    * - Docker image
      - Stanford Megatron-LM
      - PyTorch
      - Ubuntu
      - Python
    * - .. raw:: html
           <a href="https://hub.docker.com/layers/rocm/stanford-megatron-lm/stanford-megatron-lm85f95ae_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0/images/sha256-070556f078be10888a1421a2cb4f48c29f28b02bfeddae02588d1f7fc02a96a6"><i class="fab fa-docker fa-lg"></i></a>
      - `85f95ae <https://github.com/stanford-futuredata/Megatron-LM/commit/85f95aef3b648075fe6f291c86714fdcbd9cd1f5>`_
      - `2.4.0 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
      - 24.04
      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
--- a/docs/compatibility/ml-compatibility/verl-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/verl-compatibility.rst
@@ -0,0 +1,85 @@
 :orphan:
 .. meta::
   :description: verl compatibility
   :keywords: GPU, verl compatibility
 .. version-set:: rocm_version latest
 *******************************************************************************
 verl compatibility
 *******************************************************************************
 Volcano Engine Reinforcement Learning for LLMs (verl) is a reinforcement learning framework designed for large language models (LLMs). 
 verl offers a scalable, open-source fine-tuning solution optimized for AMD Instinct GPUs with full ROCm support.
 * See the `verl documentation <https://verl.readthedocs.io/en/latest/>`_ for more information about verl. 
 * The official verl GitHub repository is `https://github.com/volcengine/verl <https://github.com/volcengine/verl>`_.
 * Use the AMD-validated :ref:`Docker images <verl-docker-compat>` with ROCm and verl preinstalled. 
 * See the :doc:`ROCm verl installation guide <rocm-install-on-linux:install/3rd-party/dgl-install>` to get started.
 .. note::
 	verl is supported on ROCm 6.2.0.
 .. _verl-recommendations:
 Use cases and recommendations
 ================================================================================
 The benefits of verl in large-scale reinforcement leaning from human feedback (RLHF) are discussed in the `Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration <https://rocm.blogs.amd.com/artificial-intelligence/verl-large-scale/README.html>`_ blog.
 .. _verl-docker-compat:
 Docker image compatibility
 ================================================================================
 .. |docker-icon| raw:: html
   <i class="fab fa-docker"></i>
 AMD validates and publishes ready-made `ROCm verl Docker images <https://hub.docker.com/r/rocm/verl>`_
 with ROCm backends on Docker Hub. The following Docker image tags and associated inventories represent the latest verl version from the official Docker Hub. The Docker images have been validated for `ROCm 6.2.0 <https://repo.radeon.com/rocm/apt/6.2/>`_. 
 .. list-table:: 
    :header-rows: 1
    *   - Docker image
        - verl
        - Linux
        - Pytorch
        - Python
        - vllm
    *   - .. raw:: html
            <a href="https://hub.docker.com/layers/rocm/verl/verl-0.3.0.post0_rocm6.2_vllm0.6.3/images/sha256-cbe423803fd7850448b22444176bee06f4dcf22cd3c94c27732752d3a39b04b2"><i class="fab fa-docker fa-lg"></i> rocm/verl</a>
        - `0.3.0post0 <https://github.com/volcengine/verl/releases/tag/v0.3.0.post0>`_
        - Ubuntu 20.04
        - `2.5.0 <https://download.pytorch.org/whl/cu118/torch-2.5.0%2Bcu118-cp39-cp39-linux_x86_64.whl#sha256=1ee24b267418c37b297529ede875b961e382c1c365482f4142af2398b92ed127>`_
        - `3.9.19 <https://www.python.org/downloads/release/python-3919/>`_
        - `0.6.4 <https://github.com/vllm-project/vllm/releases/tag/v0.6.4>`_
 Supported features
 ===============================================================================
 The following table shows verl and ROCm support for GPU-accelerated modules.
 .. list-table::
    :header-rows: 1
    * - Module
      - Description
      - verl version
      - ROCm version
    * - ``FSDP``
      - Training engine
      - 0.3.0.post0
      - 6.2
    * - ``vllm``
      - Inference engine
      - 0.3.0.post0
      - 6.2
--- a/docs/how-to/deep-learning-rocm.rst
+++ b/docs/how-to/deep-learning-rocm.rst
@@ -17,6 +17,9 @@ features for these ROCm-enabled deep learning frameworks.
 * :doc:`PyTorch compatibility <../compatibility/ml-compatibility/pytorch-compatibility>`
 * :doc:`TensorFlow compatibility <../compatibility/ml-compatibility/tensorflow-compatibility>`
 * :doc:`JAX compatibility <../compatibility/ml-compatibility/jax-compatibility>`
 * :doc:`verl compatibility <../compatibility/ml-compatibility/verl-compatibility>`
 * :doc:`Stanford Megatron-LM compatibility <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>`
 * :doc:`DGL compatibility <../compatibility/ml-compatibility/dgl-compatibility>`
 This chart steps through typical installation workflows for installing deep learning frameworks for ROCm.
@@ -29,6 +32,9 @@ See the installation instructions to get started.
 * :doc:`PyTorch for ROCm <rocm-install-on-linux:install/3rd-party/pytorch-install>`
 * :doc:`TensorFlow for ROCm <rocm-install-on-linux:install/3rd-party/tensorflow-install>`
 * :doc:`JAX for ROCm <rocm-install-on-linux:install/3rd-party/jax-install>`
 * :doc:`verl for ROCm <rocm-install-on-linux:install/3rd-party/verl-install>`
 * :doc:`Stanford Megatron-LM for ROCm <rocm-install-on-linux:install/3rd-party/stanford-megatron-lm-install>`
 * :doc:`DGL for ROCm <rocm-install-on-linux:install/3rd-party/dgl-install>`
 .. note::