Stanford Megatron-LM Compatibility

* Create stanford-megatron-lm-compatibility.rst * toc and wordlist * Update deep-learning-rocm.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * fixes and adding to main compat matrix * formatting fix * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst --------- Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
2026-01-09 22:58:17 -05:00 · 2025-07-15 16:23:50 -04:00
parent 2a7554c0b9
commit f4f096b44e
4 changed files with 105 additions and 0 deletions
--- a/.wordlist.txt
+++ b/.wordlist.txt
@@ -194,6 +194,7 @@ Higgs
 Hyperparameters
 Huggingface
 ICD
 ICT
 ICV
 IDE
 IDEs
@@ -368,6 +369,7 @@ RDC's
 RDMA
 RDNA
 README
 Recomputation
 RHEL
 RMW
 RNN
--- a/docs/compatibility/compatibility-matrix.rst
+++ b/docs/compatibility/compatibility-matrix.rst
@@ -55,6 +55,7 @@ compatibility and system requirements.
      :doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.4, 2.3, 2.2, 2.1, 2.0, 1.13"
      :doc:`TensorFlow <../compatibility/ml-compatibility/tensorflow-compatibility>`,"2.18.1, 2.17.1, 2.16.2","2.18.1, 2.17.1, 2.16.2","2.17.0, 2.16.2, 2.15.1"
      :doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.4.35,0.4.35,0.4.31
      :doc:`Stanford Megatron-LM <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>`,N/A,N/A,`85f95ae <https://github.com/stanford-futuredata/Megatron-LM/commit/85f95aef3b648075fe6f291c86714fdcbd9cd1f5>`_
      :doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>`,2.4.0,2.4.0,N/A
      `ONNX Runtime <https://onnxruntime.ai/docs/build/eps.html#amd-migraphx>`_,1.2,1.2,1.17.3
      ,,,
--- a/docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst
@@ -0,0 +1,100 @@
 :orphan:
 .. meta::
    :description: Stanford Megatron-LM compatibility
    :keywords: Stanford, Megatron-LM, compatibility
 .. version-set:: rocm_version latest
 ********************************************************************************
 Stanford Megatron-LM compatibility
 ********************************************************************************
 Stanford Megatron-LM is a large-scale language model training framework developed by NVIDIA `https://github.com/NVIDIA/Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_. It is
 designed to train massive transformer-based language models efficiently by model and data parallelism. 
 * ROCm support for Stanford Megatron-LM is hosted in the official `https://github.com/ROCm/Stanford-Megatron-LM <https://github.com/ROCm/Stanford-Megatron-LM>`_ repository. 
 * Due to independent compatibility considerations, this location differs from the `https://github.com/stanford-futuredata/Megatron-LM <https://github.com/stanford-futuredata/Megatron-LM>`_ upstream repository. 
 * Use the prebuilt :ref:`Docker image <megatron-lm-docker-compat>` with ROCm, PyTorch, and Megatron-LM preinstalled. 
 * See the :doc:`ROCm Stanford Megatron-LM installation guide <rocm-install-on-linux:install/3rd-party/stanford-megatron-lm-install>` to install and get started.
 .. note::
 	Stanford Megatron-LM is supported on ROCm 6.3.0.
 Supported Devices
 ================================================================================
 - **Officially Supported**: AMD Instinct MI300X
 - **Partially Supported** (functionality or performance limitations): AMD Instinct MI250X, MI210X
 Supported models and features
 ================================================================================
 This section details models & features that are supported by the ROCm version on Stanford Megatron-LM.
 Models:
 * Bert
 * GPT
 * T5
 * ICT
 Features:
 * Distributed Pre-training
 * Activation Checkpointing and Recomputation
 * Distributed Optimizer
 * Mixture-of-Experts
 .. _megatron-lm-recommendations:
 Use cases and recommendations
 ================================================================================
 See the `Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs blog <https://rocm.blogs.amd.com/artificial-intelligence/megablocks/README.html>`_ post  
 to leverage the ROCm platform for pre-training by using the Stanford Megatron-LM framework of pre-processing datasets on AMD GPUs. 
 Coverage includes:
  * Single-GPU pre-training
  * Multi-GPU pre-training
 .. _megatron-lm-docker-compat:
 Docker image compatibility
 ================================================================================
 .. |docker-icon| raw:: html
   <i class="fab fa-docker"></i>
 AMD validates and publishes `Stanford Megatron-LM images <https://hub.docker.com/r/rocm/megatron-lm>`_
 with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated
 inventories represent the latest Megatron-LM version from the official Docker Hub.
 The Docker images have been validated for `ROCm 6.3.0 <https://repo.radeon.com/rocm/apt/6.3/>`_.
 Click |docker-icon| to view the image on Docker Hub.
 .. list-table:: 
    :header-rows: 1
    :class: docker-image-compatibility
    * - Docker image
      - Stanford Megatron-LM
      - PyTorch
      - Ubuntu
      - Python
    * - .. raw:: html
           <a href="https://hub.docker.com/layers/rocm/stanford-megatron-lm/stanford-megatron-lm85f95ae_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0/images/sha256-070556f078be10888a1421a2cb4f48c29f28b02bfeddae02588d1f7fc02a96a6"><i class="fab fa-docker fa-lg"></i></a>
      - `85f95ae <https://github.com/stanford-futuredata/Megatron-LM/commit/85f95aef3b648075fe6f291c86714fdcbd9cd1f5>`_
      - `2.4.0 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
      - 24.04
      - `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
--- a/docs/how-to/deep-learning-rocm.rst
+++ b/docs/how-to/deep-learning-rocm.rst
@@ -17,6 +17,7 @@ features for these ROCm-enabled deep learning frameworks.
 * :doc:`PyTorch compatibility <../compatibility/ml-compatibility/pytorch-compatibility>`
 * :doc:`TensorFlow compatibility <../compatibility/ml-compatibility/tensorflow-compatibility>`
 * :doc:`JAX compatibility <../compatibility/ml-compatibility/jax-compatibility>`
 * :doc:`Stanford Megatron-LM compatibility <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>`
 * :doc:`DGL compatibility <../compatibility/ml-compatibility/dgl-compatibility>`
 This chart steps through typical installation workflows for installing deep learning frameworks for ROCm.
@@ -30,6 +31,7 @@ See the installation instructions to get started.
 * :doc:`PyTorch for ROCm <rocm-install-on-linux:install/3rd-party/pytorch-install>`
 * :doc:`TensorFlow for ROCm <rocm-install-on-linux:install/3rd-party/tensorflow-install>`
 * :doc:`JAX for ROCm <rocm-install-on-linux:install/3rd-party/jax-install>`
 * :doc:`Stanford Megatron-LM for ROCm <rocm-install-on-linux:install/3rd-party/stanford-megatron-lm-install>`
 * :doc:`DGL for ROCm <rocm-install-on-linux:install/3rd-party/dgl-install>`
 .. note::