mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-09 22:58:17 -05:00
* Verl compatibility * verl compatibility * add Supported features Signed-off-by: Vicky Tsang <vtsang@amd.com> * updated and edited verl compat doc * added links to verl * add future release for sglang and megatron inference eng. Signed-off-by: Vicky Tsang <vtsang@amd.com> * fix lint Signed-off-by: Vicky Tsang <vtsang@amd.com> * fixed a typo and a table * Spolifroni amd/add to compat matrix (#430) * added verl to compatibility matrix * small change * fixed an error in csv * edited the verl compat based on leo's recommendations * updated compat matrix (#435) * Added a hardcoded link to the verl install This is a link to an RTD build and MUST be removed before publishing. * Update verl-compatibility.rst * Added a hardcoded link to the verl install This link is to an RTD build and it WILL break at publishing. It MUST be changed before publishing. * Added version support note (#448) * small fixes * Update verl-compatibility.rst * Update verl-compatibility.rst --------- Signed-off-by: Vicky Tsang <vtsang@amd.com> Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com> Co-authored-by: anisha-amd <anisha.sankar@amd.com> (cherry picked from commitf9bd22626b) * Stanford Megatron-LM Compatibility * Create stanford-megatron-lm-compatibility.rst * toc and wordlist * Update deep-learning-rocm.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * fixes and adding to main compat matrix * formatting fix * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst * Update stanford-megatron-lm-compatibility.rst --------- Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> (cherry picked from commitf4f096b44e) * Framework: DGL Compatability * Introducing new file for DGL Compatability * Update dgl-compatibility.rst * Update .wordlist.txt * Update .wordlist.txt * Update deep-learning-rocm.rst * compatibility fixes * Update docs/compatibility/ml-compatibility/dgl-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update docs/compatibility/ml-compatibility/dgl-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update docs/compatibility/ml-compatibility/dgl-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update docs/compatibility/ml-compatibility/dgl-compatibility.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> * Update dgl-compatibility.rst * Update dgl-compatibility.rst * Update dgl-compatibility.rst * Update dgl-compatibility.rst * additions to use-cases and system support * wording and fixes * Update dgl-compatibility.rst * Update dgl-compatibility.rst * remove table heading * Update compatibility-matrix-historical-6.0.csv --------- Co-authored-by: anisha-amd <anisha.sankar@amd.com> Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> (cherry picked from commit2a7554c0b9) * Manually resolve merge conflict * Further merge conflict adjustments --------- Signed-off-by: Vicky Tsang <vtsang@amd.com> Co-authored-by: vickytsang <vtsang@amd.com> Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com> Co-authored-by: anisha-amd <anisha.sankar@amd.com> Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> Co-authored-by: Mukhil M S <167260682+mukh1l@users.noreply.github.com>
101 lines
3.8 KiB
ReStructuredText
101 lines
3.8 KiB
ReStructuredText
:orphan:
|
|
|
|
.. meta::
|
|
:description: Stanford Megatron-LM compatibility
|
|
:keywords: Stanford, Megatron-LM, compatibility
|
|
|
|
.. version-set:: rocm_version latest
|
|
|
|
********************************************************************************
|
|
Stanford Megatron-LM compatibility
|
|
********************************************************************************
|
|
|
|
Stanford Megatron-LM is a large-scale language model training framework developed by NVIDIA `https://github.com/NVIDIA/Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_. It is
|
|
designed to train massive transformer-based language models efficiently by model and data parallelism.
|
|
|
|
* ROCm support for Stanford Megatron-LM is hosted in the official `https://github.com/ROCm/Stanford-Megatron-LM <https://github.com/ROCm/Stanford-Megatron-LM>`_ repository.
|
|
* Due to independent compatibility considerations, this location differs from the `https://github.com/stanford-futuredata/Megatron-LM <https://github.com/stanford-futuredata/Megatron-LM>`_ upstream repository.
|
|
* Use the prebuilt :ref:`Docker image <megatron-lm-docker-compat>` with ROCm, PyTorch, and Megatron-LM preinstalled.
|
|
* See the :doc:`ROCm Stanford Megatron-LM installation guide <rocm-install-on-linux:install/3rd-party/stanford-megatron-lm-install>` to install and get started.
|
|
|
|
.. note::
|
|
|
|
Stanford Megatron-LM is supported on ROCm 6.3.0.
|
|
|
|
|
|
Supported Devices
|
|
================================================================================
|
|
|
|
- **Officially Supported**: AMD Instinct MI300X
|
|
- **Partially Supported** (functionality or performance limitations): AMD Instinct MI250X, MI210X
|
|
|
|
|
|
Supported models and features
|
|
================================================================================
|
|
|
|
This section details models & features that are supported by the ROCm version on Stanford Megatron-LM.
|
|
|
|
Models:
|
|
|
|
* Bert
|
|
* GPT
|
|
* T5
|
|
* ICT
|
|
|
|
Features:
|
|
|
|
* Distributed Pre-training
|
|
* Activation Checkpointing and Recomputation
|
|
* Distributed Optimizer
|
|
* Mixture-of-Experts
|
|
|
|
.. _megatron-lm-recommendations:
|
|
|
|
Use cases and recommendations
|
|
================================================================================
|
|
|
|
See the `Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs blog <https://rocm.blogs.amd.com/artificial-intelligence/megablocks/README.html>`_ post
|
|
to leverage the ROCm platform for pre-training by using the Stanford Megatron-LM framework of pre-processing datasets on AMD GPUs.
|
|
Coverage includes:
|
|
|
|
* Single-GPU pre-training
|
|
* Multi-GPU pre-training
|
|
|
|
|
|
.. _megatron-lm-docker-compat:
|
|
|
|
Docker image compatibility
|
|
================================================================================
|
|
|
|
.. |docker-icon| raw:: html
|
|
|
|
<i class="fab fa-docker"></i>
|
|
|
|
AMD validates and publishes `Stanford Megatron-LM images <https://hub.docker.com/r/rocm/megatron-lm>`_
|
|
with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated
|
|
inventories represent the latest Megatron-LM version from the official Docker Hub.
|
|
The Docker images have been validated for `ROCm 6.3.0 <https://repo.radeon.com/rocm/apt/6.3/>`_.
|
|
Click |docker-icon| to view the image on Docker Hub.
|
|
|
|
.. list-table::
|
|
:header-rows: 1
|
|
:class: docker-image-compatibility
|
|
|
|
* - Docker image
|
|
- Stanford Megatron-LM
|
|
- PyTorch
|
|
- Ubuntu
|
|
- Python
|
|
|
|
* - .. raw:: html
|
|
|
|
<a href="https://hub.docker.com/layers/rocm/stanford-megatron-lm/stanford-megatron-lm85f95ae_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0/images/sha256-070556f078be10888a1421a2cb4f48c29f28b02bfeddae02588d1f7fc02a96a6"><i class="fab fa-docker fa-lg"></i></a>
|
|
|
|
- `85f95ae <https://github.com/stanford-futuredata/Megatron-LM/commit/85f95aef3b648075fe6f291c86714fdcbd9cd1f5>`_
|
|
- `2.4.0 <https://github.com/ROCm/pytorch/tree/release/2.4>`_
|
|
- 24.04
|
|
- `3.12.9 <https://www.python.org/downloads/release/python-3129/>`_
|
|
|
|
|
|
|