mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-11 07:38:17 -05:00
Compare commits
6 Commits
target-ext
...
users/ibra
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
b259d55bae | ||
|
|
d92e5b6c12 | ||
|
|
91fce2e134 | ||
|
|
27d53cf082 | ||
|
|
bc084246be | ||
|
|
9827ba7ff2 |
@@ -207,6 +207,7 @@ jobs:
|
||||
downstreamAggregateNames: ${{ parameters.downstreamAggregateNames }}
|
||||
- template: ${{ variables.CI_TEMPLATE_PATH }}/steps/gpu-diagnostics.yml
|
||||
- script: |
|
||||
set -e
|
||||
export PYTHONPATH=$(Agent.BuildDirectory)/s/build/python:$PYTHONPATH
|
||||
|
||||
echo "--- Running origami_test.py ---"
|
||||
|
||||
@@ -249,7 +249,7 @@ AMD ROCm has officially added support for the following Deep learning and AI fra
|
||||
|
||||
#### AMD GPU Driver/ROCm packaging separation
|
||||
|
||||
The AMD GPU Driver (amdgpu) is now distributed separately from the ROCm software stack and is stored under in its own location ``/amdgpu/`` in the package repository at [repo.radeon.com](https://repo.radeon.com/amdgpu/). The first release is designated as AMD GPU Driver (amdgpu) version 30.10. See the [User and kernel-space support matrix](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html) for more information.
|
||||
The AMD GPU Driver (amdgpu) is now distributed separately from the ROCm software stack and is stored under in its own location ``/amdgpu/`` in the package repository at [repo.radeon.com](https://repo.radeon.com/amdgpu/). The first release is designated as [AMD GPU Driver (amdgpu) version 30.10](https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/documentation/change-logs/30.10.1.html#amd-gpu-driver-amdgpu-30-10-release-notes). See the [User and kernel-space support matrix](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html) for more information.
|
||||
|
||||
[AMD SMI](https://github.com/ROCm/amdsmi) continues to stay with the ROCm software stack under the ROCm organization repository.
|
||||
|
||||
@@ -417,9 +417,7 @@ which facilitates profiling wavefronts at the instruction timing level.
|
||||
|
||||
###### rocpd
|
||||
|
||||
The ROCm Profiling Data (``rocpd``) is now the default output format for ``rocprofv3``.
|
||||
A subproject of the ROCprofiler-SDK, ``rocpd`` enables saving profiling results to a SQLite3 database, providing a structured and
|
||||
efficient foundation for analysis and post-processing.
|
||||
As a subcomponent of the ROCprofiler-SDK, ``rocpd`` enables storing the profiling results in a SQLite3 database, providing a structured and efficient foundation for analysis and post-processing. For details, see [Using rocpd Output Format](https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/docs-7.0.1/how-to/using-rocpd-output-format.html#using-rocpd-output-format).
|
||||
|
||||
###### rocprofv3 CLI tool enhancements
|
||||
|
||||
|
||||
@@ -90,75 +90,15 @@ For more use cases and recommendations, see `ROCm JAX blog posts <https://rocm.b
|
||||
Docker image compatibility
|
||||
================================================================================
|
||||
|
||||
.. |docker-icon| raw:: html
|
||||
AMD provides preconfigured Docker images with JAX and the ROCm backend.
|
||||
These images are published on `Docker Hub <https://hub.docker.com/r/rocm/jax>`__ and are the
|
||||
recommended way to get started with deep learning with JAX on ROCm.
|
||||
For ``jax-community`` images, see `rocm/jax-community
|
||||
<https://hub.docker.com/r/rocm/jax-community/tags>`__ on Docker Hub.
|
||||
|
||||
<i class="fab fa-docker"></i>
|
||||
|
||||
AMD validates and publishes ready-made `ROCm JAX Docker images <https://hub.docker.com/r/rocm/jax>`_
|
||||
with ROCm backends on Docker Hub. The following Docker image tags and
|
||||
associated inventories represent the latest JAX version from the official Docker Hub and are validated for
|
||||
`ROCm 6.4.2 <https://repo.radeon.com/rocm/apt/6.4.2/>`_. Click the |docker-icon|
|
||||
icon to view the image on Docker Hub.
|
||||
|
||||
.. list-table:: JAX Docker image components
|
||||
:header-rows: 1
|
||||
|
||||
* - Docker image
|
||||
- JAX
|
||||
- Linux
|
||||
- Python
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/jax/rocm6.4.2-jax0.4.35-py3.12/images/sha256-8918fa806a172c1a10eb2f57131eb31b5d7c8fa1656b8729fe7d3d736112de83"><i class="fab fa-docker fa-lg"></i> rocm/jax</a>
|
||||
|
||||
- `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
|
||||
- Ubuntu 24.04
|
||||
- `3.12.10 <https://www.python.org/downloads/release/python-31210/>`_
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/jax/rocm6.4.2-jax0.4.35-py3.10/images/sha256-a394be13c67b7fc602216abee51233afd4b6cb7adaa57ca97e688fba82f9ad79"><i class="fab fa-docker fa-lg"></i> rocm/jax</a>
|
||||
|
||||
- `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
|
||||
- Ubuntu 22.04
|
||||
- `3.10.17 <https://www.python.org/downloads/release/python-31017/>`_
|
||||
|
||||
AMD publishes `Community ROCm JAX Docker images <https://hub.docker.com/r/rocm/jax-community>`_
|
||||
with ROCm backends on Docker Hub. The following Docker image tags and
|
||||
associated inventories are tested for `ROCm 6.3.2 <https://repo.radeon.com/rocm/apt/6.3.2/>`_.
|
||||
|
||||
.. list-table:: JAX community Docker image components
|
||||
:header-rows: 1
|
||||
|
||||
* - Docker image
|
||||
- JAX
|
||||
- Linux
|
||||
- Python
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.3.2-jax0.5.0-py3.12.8/images/sha256-25dfaa0183e274bd0a3554a309af3249c6f16a1793226cb5373f418e39d3146a"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>
|
||||
|
||||
- `0.5.0 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.5.0>`_
|
||||
- Ubuntu 22.04
|
||||
- `3.12.8 <https://www.python.org/downloads/release/python-3128/>`_
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.3.2-jax0.5.0-py3.11.11/images/sha256-ff9baeca9067d13e6c279c911e5a9e5beed0817d24fafd424367cc3d5bd381d7"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>
|
||||
|
||||
- `0.5.0 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.5.0>`_
|
||||
- Ubuntu 22.04
|
||||
- `3.11.11 <https://www.python.org/downloads/release/python-31111/>`_
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.3.2-jax0.5.0-py3.10.16/images/sha256-8bab484be1713655f74da51a191ed824bb9d03db1104fd63530a1ac3c37cf7b1"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>
|
||||
|
||||
- `0.5.0 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.5.0>`_
|
||||
- Ubuntu 22.04
|
||||
- `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_
|
||||
To find the right image tag, see the :ref:`JAX on ROCm installation
|
||||
documentation <rocm-install-on-linux:jax-docker-support>` for a list of
|
||||
available ``rocm/jax`` images.
|
||||
|
||||
.. _key_rocm_libraries:
|
||||
|
||||
|
||||
@@ -89,141 +89,13 @@ For more use cases and recommendations, see `ROCm PyTorch blog posts <https://ro
|
||||
Docker image compatibility
|
||||
================================================================================
|
||||
|
||||
.. |docker-icon| raw:: html
|
||||
AMD provides preconfigured Docker images with PyTorch and the ROCm backend.
|
||||
These images are published on `Docker Hub <https://hub.docker.com/r/rocm/pytorch>`__ and are the
|
||||
recommended way to get started with deep learning with PyTorch on ROCm.
|
||||
|
||||
<i class="fab fa-docker"></i>
|
||||
|
||||
AMD validates and publishes `PyTorch images <https://hub.docker.com/r/rocm/pytorch>`__
|
||||
with ROCm backends on Docker Hub. The following Docker image tags and associated
|
||||
inventories were tested on `ROCm 6.4.2 <https://repo.radeon.com/rocm/apt/6.4.2/>`__.
|
||||
Click |docker-icon| to view the image on Docker Hub.
|
||||
|
||||
.. list-table:: PyTorch Docker image components
|
||||
:header-rows: 1
|
||||
:class: docker-image-compatibility
|
||||
|
||||
* - Docker
|
||||
- PyTorch
|
||||
- Ubuntu
|
||||
- Python
|
||||
- Apex
|
||||
- torchvision
|
||||
- TensorBoard
|
||||
- MAGMA
|
||||
- UCX
|
||||
- OMPI
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.2_ubuntu24.04_py3.12_pytorch_release_2.6.0/images/sha256-6a287591500b4048a9556c1ecc92bc411fd3d552f6c8233bc399f18eb803e8d6"><i class="fab fa-docker fa-lg"></i></a>
|
||||
|
||||
- `2.6.0 <https://github.com/ROCm/pytorch/tree/release/2.6>`__
|
||||
- 24.04
|
||||
- `3.12 <https://www.python.org/downloads/release/python-31210/>`__
|
||||
- `1.6.0 <https://github.com/ROCm/apex/tree/release/1.6.0>`__
|
||||
- `0.21.0 <https://github.com/pytorch/vision/tree/v0.21.0>`__
|
||||
- `2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
|
||||
- `master <https://bitbucket.org/icl/magma/src/master/>`__
|
||||
- `1.16.0+ds-5ubuntu1 <https://github.com/openucx/ucx/tree/v1.16.0>`__
|
||||
- `4.1.6-7ubuntu2 <https://github.com/open-mpi/ompi/tree/v4.1.6>`__
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.2_ubuntu22.04_py3.10_pytorch_release_2.6.0/images/sha256-06b967629ba6657709f04169832cd769a11e6b491e8b1394c361d42d7a0c8b43"><i class="fab fa-docker fa-lg"></i></a>
|
||||
|
||||
- `2.6.0 <https://github.com/ROCm/pytorch/tree/release/2.6>`__
|
||||
- 22.04
|
||||
- `3.10 <https://www.python.org/downloads/release/python-31017/>`__
|
||||
- `1.6.0 <https://github.com/ROCm/apex/tree/release/1.6.0>`__
|
||||
- `0.21.0 <https://github.com/pytorch/vision/tree/v0.21.0>`__
|
||||
- `2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
|
||||
- `master <https://bitbucket.org/icl/magma/src/master/>`__
|
||||
- `1.12.1~rc2-1 <https://github.com/openucx/ucx/tree/v1.12.1>`__
|
||||
- `4.1.2-2ubuntu1 <https://github.com/open-mpi/ompi/tree/v4.1.2>`__
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.2_ubuntu24.04_py3.12_pytorch_release_2.5.1/images/sha256-62022414217ef6de33ac5b1341e57db8a48e8573fa2ace12d48aa5edd4b99ef0"><i class="fab fa-docker fa-lg"></i></a>
|
||||
|
||||
- `2.5.1 <https://github.com/ROCm/pytorch/tree/release/2.5>`__
|
||||
- 24.04
|
||||
- `3.12 <https://www.python.org/downloads/release/python-31210/>`__
|
||||
- `1.5.0 <https://github.com/ROCm/apex/tree/release/1.5.0>`__
|
||||
- `0.20.1 <https://github.com/pytorch/vision/tree/v0.20.1>`__
|
||||
- `2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
|
||||
- `master <https://bitbucket.org/icl/magma/src/master/>`__
|
||||
- `1.16.0+ds-5ubuntu1 <https://github.com/openucx/ucx/tree/v1.10.0>`__
|
||||
- `4.1.6-7ubuntu2 <https://github.com/open-mpi/ompi/tree/v4.1.6>`__
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.2_ubuntu22.04_py3.11_pytorch_release_2.5.1/images/sha256-469a7f74fc149aff31797e011ee41978f6a190adc69fa423b3c6a718a77bd985"><i class="fab fa-docker fa-lg"></i></a>
|
||||
|
||||
- `2.5.1 <https://github.com/ROCm/pytorch/tree/release/2.5>`__
|
||||
- 22.04
|
||||
- `3.11 <https://www.python.org/downloads/release/python-31113/>`__
|
||||
- `1.5.0 <https://github.com/ROCm/apex/tree/release/1.5.0>`__
|
||||
- `0.20.1 <https://github.com/pytorch/vision/tree/v0.20.1>`__
|
||||
- `2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
|
||||
- `master <https://bitbucket.org/icl/magma/src/master/>`__
|
||||
- `1.12.1~rc2-1 <https://github.com/openucx/ucx/tree/v1.12.1>`__
|
||||
- `4.1.2-2ubuntu1 <https://github.com/open-mpi/ompi/tree/v4.1.2>`__
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.2_ubuntu22.04_py3.10_pytorch_release_2.5.1/images/sha256-37f41a1cd94019688669a1b20d33ea74156e0c129ef6b8270076ef214a6a1a2c"><i class="fab fa-docker fa-lg"></i></a>
|
||||
|
||||
- `2.5.1 <https://github.com/ROCm/pytorch/tree/release/2.5>`__
|
||||
- 22.04
|
||||
- `3.10 <https://www.python.org/downloads/release/python-31017/>`__
|
||||
- `1.5.0 <https://github.com/ROCm/apex/tree/release/1.5.0>`__
|
||||
- `0.20.1 <https://github.com/pytorch/vision/tree/v0.20.1>`__
|
||||
- `2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
|
||||
- `master <https://bitbucket.org/icl/magma/src/master/>`__
|
||||
- `1.12.1~rc2-1 <https://github.com/openucx/ucx/tree/v1.12.1>`__
|
||||
- `4.1.2-2ubuntu1 <https://github.com/open-mpi/ompi/tree/v4.1.2>`__
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.2_ubuntu24.04_py3.12_pytorch_release_2.4.1/images/sha256-60824ba83dc1b9d94164925af1f81c0235c105dd555091ec04c57e05177ead1b"><i class="fab fa-docker fa-lg"></i></a>
|
||||
|
||||
- `2.4.1 <https://github.com/ROCm/pytorch/tree/release/2.4>`__
|
||||
- 24.04
|
||||
- `3.12 <https://www.python.org/downloads/release/python-31210/>`__
|
||||
- `1.4.0 <https://github.com/ROCm/apex/tree/release/1.4.0>`__
|
||||
- `0.19.0 <https://github.com/pytorch/vision/tree/v0.19.0>`__
|
||||
- `2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
|
||||
- `master <https://bitbucket.org/icl/magma/src/master/>`__
|
||||
- `1.16.0+ds-5ubuntu1 <https://github.com/openucx/ucx/tree/v1.16.0>`__
|
||||
- `4.1.6-7ubuntu2 <https://github.com/open-mpi/ompi/tree/v4.1.6>`__
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.2_ubuntu22.04_py3.10_pytorch_release_2.4.1/images/sha256-fe944fe083312f901be6891ab4d3ffebf2eaf2cf4f5f0f435ef0b76ec714fabd"><i class="fab fa-docker fa-lg"></i></a>
|
||||
|
||||
- `2.4.1 <https://github.com/ROCm/pytorch/tree/release/2.4>`__
|
||||
- 22.04
|
||||
- `3.10 <https://www.python.org/downloads/release/python-31017/>`__
|
||||
- `1.4.0 <https://github.com/ROCm/apex/tree/release/1.4.0>`__
|
||||
- `0.19.0 <https://github.com/pytorch/vision/tree/v0.19.0>`__
|
||||
- `2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
|
||||
- `master <https://bitbucket.org/icl/magma/src/master/>`__
|
||||
- `1.12.1~rc2-1 <https://github.com/openucx/ucx/tree/v1.12.1>`__
|
||||
- `4.1.2-2ubuntu1 <https://github.com/open-mpi/ompi/tree/v4.1.2>`__
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/pytorch/rocm6.4.2_ubuntu24.04_py3.12_pytorch_release_2.3.0/images/sha256-1d59251c47170c5b8960d1172a4dbe52f5793d8966edd778f168eaf32d56661a"><i class="fab fa-docker fa-lg"></i></a>
|
||||
|
||||
- `2.3.0 <https://github.com/ROCm/pytorch/tree/release/2.3>`__
|
||||
- 24.04
|
||||
- `3.12 <https://www.python.org/downloads/release/python-31210/>`__
|
||||
- `1.3.0 <https://github.com/ROCm/apex/tree/release/1.3.0>`__
|
||||
- `0.18.0 <https://github.com/pytorch/vision/tree/v0.18.0>`__
|
||||
- `2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13>`__
|
||||
- `master <https://bitbucket.org/icl/magma/src/master/>`__
|
||||
- `1.16.0+ds-5ubuntu1 <https://github.com/openucx/ucx/tree/v1.16.0>`__
|
||||
- `4.1.6-7ubuntu2 <https://github.com/open-mpi/ompi/tree/v4.1.6>`__
|
||||
To find the right image tag, see the :ref:`PyTorch on ROCm installation
|
||||
documentation <rocm-install-on-linux:pytorch-docker-support>` for a list of
|
||||
available ``rocm/pytorch`` images.
|
||||
|
||||
Key ROCm libraries for PyTorch
|
||||
================================================================================
|
||||
|
||||
@@ -47,80 +47,15 @@ fixes, updates, and support for the latest ROCM versions.
|
||||
.. _tensorflow-docker-compat:
|
||||
|
||||
Docker image compatibility
|
||||
===============================================================================
|
||||
================================================================================
|
||||
|
||||
.. |docker-icon| raw:: html
|
||||
AMD provides preconfigured Docker images with TensorFlow and the ROCm backend.
|
||||
These images are published on `Docker Hub <https://hub.docker.com/r/rocm/tensorflow>`__ and are the
|
||||
recommended way to get started with deep learning with TensorFlow on ROCm.
|
||||
|
||||
<i class="fab fa-docker"></i>
|
||||
|
||||
AMD validates and publishes ready-made `TensorFlow images
|
||||
<https://hub.docker.com/r/rocm/tensorflow>`__ with ROCm backends on
|
||||
Docker Hub. The following Docker image tags and associated inventories are
|
||||
validated for `ROCm 6.4.2 <https://repo.radeon.com/rocm/apt/6.4.2/>`__. Click
|
||||
the |docker-icon| icon to view the image on Docker Hub.
|
||||
|
||||
.. list-table:: TensorFlow Docker image components
|
||||
:header-rows: 1
|
||||
|
||||
* - Docker image
|
||||
- TensorFlow
|
||||
- Ubuntu
|
||||
- Python
|
||||
- TensorBoard
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.12-tf2.18-dev/images/sha256-96754ce2d30f729e19b497279915b5212ba33d5e408e7e5dd3f2304d87e3441e"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
|
||||
|
||||
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
|
||||
- 24.04
|
||||
- `Python 3.12 <https://www.python.org/downloads/release/python-31210/>`__
|
||||
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.10-tf2.18-dev/images/sha256-fa741508d383858e86985a9efac85174529127408102558ae2e3a4ac894eea1e"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
|
||||
|
||||
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
|
||||
- 22.04
|
||||
- `Python 3.10 <https://www.python.org/downloads/release/python-31017/>`__
|
||||
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.12-tf2.17-dev/images/sha256-3a0aef09f2a8833c2b64b85874dd9449ffc2ad257351857338ff5b706c03a418"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
|
||||
|
||||
- `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
|
||||
- 24.04
|
||||
- `Python 3.12 <https://www.python.org/downloads/release/python-31210/>`__
|
||||
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`__
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.10-tf2.17-dev/images/sha256-bc7341a41ebe7ab261aa100732874507c452421ef733e408ac4f05ed453b0bc5"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
|
||||
|
||||
- `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
|
||||
- 22.04
|
||||
- `Python 3.10 <https://www.python.org/downloads/release/python-31017/>`__
|
||||
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`__
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.12-tf2.16-dev/images/sha256-4841a8df7c340dab79bf9362dad687797649a00d594e0832eb83ea6880a40d3b"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
|
||||
|
||||
- `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
|
||||
- 24.04
|
||||
- `Python 3.12 <https://www.python.org/downloads/release/python-31210/>`__
|
||||
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`__
|
||||
|
||||
* - .. raw:: html
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.10-tf2.16-dev/images/sha256-883fa95aba960c58a3e46fceaa18f03ede2c7df89b8e9fd603ab2d47e0852897"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
|
||||
|
||||
- `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
|
||||
- 22.04
|
||||
- `Python 3.10 <https://www.python.org/downloads/release/python-31017/>`__
|
||||
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`__
|
||||
To find the right image tag, see the :ref:`TensorFlow on ROCm installation
|
||||
documentation <rocm-install-on-linux:tensorflow-docker-support>` for a list of
|
||||
available ``rocm/tensorflow`` images.
|
||||
|
||||
|
||||
Critical ROCm libraries for TensorFlow
|
||||
|
||||
@@ -127,7 +127,9 @@ article_pages = [
|
||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-v25.4", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-v25.5", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-v25.6", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-v25.7", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/megatron-lm-primus-migration-guide", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/primus-megatron-v25.7", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/primus-megatron", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/pytorch-training", "os": ["linux"]},
|
||||
{"file": "how-to/rocm-for-ai/training/benchmark-docker/previous-versions/pytorch-training-history", "os": ["linux"]},
|
||||
|
||||
@@ -1,12 +1,4 @@
|
||||
dockers:
|
||||
- pull_tag: rocm/jax-training:maxtext-v25.7
|
||||
docker_hub_url: https://hub.docker.com/layers/rocm/jax-training/maxtext-v25.7/images/sha256-45f4c727d4019a63fc47313d3a5f5a5105569539294ddfd2d742218212ae9025
|
||||
components:
|
||||
ROCm: 6.4.1
|
||||
JAX: 0.5.0
|
||||
Python: 3.10.12
|
||||
Transformer Engine: 2.1.0+90d703dd
|
||||
hipBLASLt: 1.x.x
|
||||
- pull_tag: rocm/jax-training:maxtext-v25.7-jax060
|
||||
docker_hub_url: https://hub.docker.com/layers/rocm/jax-training/maxtext-v25.7/images/sha256-45f4c727d4019a63fc47313d3a5f5a5105569539294ddfd2d742218212ae9025
|
||||
components:
|
||||
@@ -15,6 +7,14 @@ dockers:
|
||||
Python: 3.10.12
|
||||
Transformer Engine: 2.1.0+90d703dd
|
||||
hipBLASLt: 1.1.0-499ece1c21
|
||||
- pull_tag: rocm/jax-training:maxtext-v25.7
|
||||
docker_hub_url: https://hub.docker.com/layers/rocm/jax-training/maxtext-v25.7/images/sha256-45f4c727d4019a63fc47313d3a5f5a5105569539294ddfd2d742218212ae9025
|
||||
components:
|
||||
ROCm: 6.4.1
|
||||
JAX: 0.5.0
|
||||
Python: 3.10.12
|
||||
Transformer Engine: 2.1.0+90d703dd
|
||||
hipBLASLt: 1.x.x
|
||||
model_groups:
|
||||
- group: Meta Llama
|
||||
tag: llama
|
||||
|
||||
@@ -1,13 +1,12 @@
|
||||
dockers:
|
||||
- pull_tag: rocm/megatron-lm:v25.7_py310
|
||||
docker_hub_url: https://hub.docker.com/layers/rocm/megatron-lm/v25.7_py310/images/sha256-6189df849feeeee3ae31bb1e97aef5006d69d2b90c134e97708c19632e20ab5a
|
||||
- pull_tag: rocm/megatron-lm:v25.8_py310
|
||||
docker_hub_url: https://hub.docker.com/layers/rocm/megatron-lm/v25.8_py310/images/sha256-50fc824361054e445e86d5d88d5f58817f61f8ec83ad4a7e43ea38bbc4a142c0
|
||||
components:
|
||||
ROCm: 6.4.2
|
||||
Primus: v0.1.0-rc1
|
||||
ROCm: 6.4.3
|
||||
PyTorch: 2.8.0a0+gitd06a406
|
||||
Python: "3.10"
|
||||
Transformer Engine: 2.1.0.dev0+ba586519
|
||||
hipBLASLt: 37ba1d36
|
||||
Transformer Engine: 2.2.0.dev0+54dd2bdc
|
||||
hipBLASLt: d1b517fc7a
|
||||
Triton: 3.3.0
|
||||
RCCL: 2.22.3
|
||||
model_groups:
|
||||
|
||||
@@ -0,0 +1,49 @@
|
||||
dockers:
|
||||
- pull_tag: rocm/megatron-lm:v25.7_py310
|
||||
docker_hub_url: https://hub.docker.com/layers/rocm/megatron-lm/v25.7_py310/images/sha256-6189df849feeeee3ae31bb1e97aef5006d69d2b90c134e97708c19632e20ab5a
|
||||
components:
|
||||
ROCm: 6.4.2
|
||||
Primus: v0.1.0-rc1
|
||||
PyTorch: 2.8.0a0+gitd06a406
|
||||
Python: "3.10"
|
||||
Transformer Engine: 2.1.0.dev0+ba586519
|
||||
hipBLASLt: 37ba1d36
|
||||
Triton: 3.3.0
|
||||
RCCL: 2.22.3
|
||||
model_groups:
|
||||
- group: Meta Llama
|
||||
tag: llama
|
||||
models:
|
||||
- model: Llama 3.3 70B
|
||||
mad_tag: pyt_megatron_lm_train_llama-3.3-70b
|
||||
- model: Llama 3.1 8B
|
||||
mad_tag: pyt_megatron_lm_train_llama-3.1-8b
|
||||
- model: Llama 3.1 70B
|
||||
mad_tag: pyt_megatron_lm_train_llama-3.1-70b
|
||||
- model: Llama 3.1 70B (proxy)
|
||||
mad_tag: pyt_megatron_lm_train_llama-3.1-70b-proxy
|
||||
- model: Llama 2 7B
|
||||
mad_tag: pyt_megatron_lm_train_llama-2-7b
|
||||
- model: Llama 2 70B
|
||||
mad_tag: pyt_megatron_lm_train_llama-2-70b
|
||||
- group: DeepSeek
|
||||
tag: deepseek
|
||||
models:
|
||||
- model: DeepSeek-V3 (proxy)
|
||||
mad_tag: pyt_megatron_lm_train_deepseek-v3-proxy
|
||||
- model: DeepSeek-V2-Lite
|
||||
mad_tag: pyt_megatron_lm_train_deepseek-v2-lite-16b
|
||||
- group: Mistral AI
|
||||
tag: mistral
|
||||
models:
|
||||
- model: Mixtral 8x7B
|
||||
mad_tag: pyt_megatron_lm_train_mixtral-8x7b
|
||||
- model: Mixtral 8x22B (proxy)
|
||||
mad_tag: pyt_megatron_lm_train_mixtral-8x22b-proxy
|
||||
- group: Qwen
|
||||
tag: qwen
|
||||
models:
|
||||
- model: Qwen 2.5 7B
|
||||
mad_tag: pyt_megatron_lm_train_qwen2.5-7b
|
||||
- model: Qwen 2.5 72B
|
||||
mad_tag: pyt_megatron_lm_train_qwen2.5-72b
|
||||
@@ -0,0 +1,58 @@
|
||||
dockers:
|
||||
- pull_tag: rocm/megatron-lm:v25.7_py310
|
||||
docker_hub_url: https://hub.docker.com/layers/rocm/megatron-lm/v25.7_py310/images/sha256-6189df849feeeee3ae31bb1e97aef5006d69d2b90c134e97708c19632e20ab5a
|
||||
components:
|
||||
ROCm: 6.4.2
|
||||
Primus: v0.1.0-rc1
|
||||
PyTorch: 2.8.0a0+gitd06a406
|
||||
Python: "3.10"
|
||||
Transformer Engine: 2.1.0.dev0+ba586519
|
||||
hipBLASLt: 37ba1d36
|
||||
Triton: 3.3.0
|
||||
RCCL: 2.22.3
|
||||
model_groups:
|
||||
- group: Meta Llama
|
||||
tag: llama
|
||||
models:
|
||||
- model: Llama 3.3 70B
|
||||
mad_tag: primus_pyt_megatron_lm_train_llama-3.3-70b
|
||||
config_name: llama3.3_70B-pretrain.yaml
|
||||
- model: Llama 3.1 70B
|
||||
mad_tag: primus_pyt_megatron_lm_train_llama-3.1-70b
|
||||
config_name: llama3.1_70B-pretrain.yaml
|
||||
- model: Llama 3.1 8B
|
||||
mad_tag: primus_pyt_megatron_lm_train_llama-3.1-8b
|
||||
config_name: llama3.1_8B-pretrain.yaml
|
||||
- model: Llama 2 7B
|
||||
mad_tag: primus_pyt_megatron_lm_train_llama-2-7b
|
||||
config_name: llama2_7B-pretrain.yaml
|
||||
- model: Llama 2 70B
|
||||
mad_tag: primus_pyt_megatron_lm_train_llama-2-70b
|
||||
config_name: llama2_70B-pretrain.yaml
|
||||
- group: DeepSeek
|
||||
tag: deepseek
|
||||
models:
|
||||
- model: DeepSeek-V3 (proxy)
|
||||
mad_tag: primus_pyt_megatron_lm_train_deepseek-v3-proxy
|
||||
config_name: deepseek_v3-pretrain.yaml
|
||||
- model: DeepSeek-V2-Lite
|
||||
mad_tag: primus_pyt_megatron_lm_train_deepseek-v2-lite-16b
|
||||
config_name: deepseek_v2_lite-pretrain.yaml
|
||||
- group: Mistral AI
|
||||
tag: mistral
|
||||
models:
|
||||
- model: Mixtral 8x7B
|
||||
mad_tag: primus_pyt_megatron_lm_train_mixtral-8x7b
|
||||
config_name: mixtral_8x7B_v0.1-pretrain.yaml
|
||||
- model: Mixtral 8x22B (proxy)
|
||||
mad_tag: primus_pyt_megatron_lm_train_mixtral-8x22b-proxy
|
||||
config_name: mixtral_8x22B_v0.1-pretrain.yaml
|
||||
- group: Qwen
|
||||
tag: qwen
|
||||
models:
|
||||
- model: Qwen 2.5 7B
|
||||
mad_tag: primus_pyt_megatron_lm_train_qwen2.5-7b
|
||||
config_name: primus_qwen2.5_7B-pretrain.yaml
|
||||
- model: Qwen 2.5 72B
|
||||
mad_tag: primus_pyt_megatron_lm_train_qwen2.5-72b
|
||||
config_name: qwen2.5_72B-pretrain.yaml
|
||||
@@ -1,13 +1,13 @@
|
||||
dockers:
|
||||
- pull_tag: rocm/megatron-lm:v25.7_py310
|
||||
docker_hub_url: https://hub.docker.com/layers/rocm/megatron-lm/v25.7_py310/images/sha256-6189df849feeeee3ae31bb1e97aef5006d69d2b90c134e97708c19632e20ab5a
|
||||
- pull_tag: rocm/megatron-lm:v25.8_py310
|
||||
docker_hub_url: https://hub.docker.com/layers/rocm/megatron-lm/v25.8_py310/images/sha256-50fc824361054e445e86d5d88d5f58817f61f8ec83ad4a7e43ea38bbc4a142c0
|
||||
components:
|
||||
ROCm: 6.4.2
|
||||
Primus: v0.1.0-rc1
|
||||
ROCm: 6.4.3
|
||||
Primus: 927a717
|
||||
PyTorch: 2.8.0a0+gitd06a406
|
||||
Python: "3.10"
|
||||
Transformer Engine: 2.1.0.dev0+ba586519
|
||||
hipBLASLt: 37ba1d36
|
||||
Transformer Engine: 2.2.0.dev0+54dd2bdc
|
||||
hipBLASLt: d1b517fc7a
|
||||
Triton: 3.3.0
|
||||
RCCL: 2.22.3
|
||||
model_groups:
|
||||
|
||||
@@ -47,10 +47,6 @@ It includes the following software components:
|
||||
``shardy=False`` during the training run. You can also follow the `migration
|
||||
guide <https://docs.jax.dev/en/latest/shardy_jax_migration.html>`__ to enable
|
||||
it.
|
||||
|
||||
The provided multi-node training scripts in this documentation are
|
||||
not currently supported with JAX 0.6.0. For multi-node training, use the JAX 0.5.0
|
||||
Docker image.
|
||||
{% endif %}
|
||||
|
||||
{% endfor %}
|
||||
@@ -361,12 +357,6 @@ benchmark results:
|
||||
|
||||
./jax-maxtext_benchmark_report.sh -m {{ model.model_repo }} -q nanoo_fp8
|
||||
|
||||
.. important::
|
||||
|
||||
Quantized training is not supported with the JAX 0.6.0 Docker image; support
|
||||
will be added in a future release. For quantized training, use the JAX 0.5.0
|
||||
Docker image: ``rocm/jax-training:maxtext-v25.7``.
|
||||
|
||||
{% endif %}
|
||||
{% if model.multinode_training_script and "multi-node" in model.doc_options %}
|
||||
.. rubric:: Multi-node training
|
||||
@@ -383,7 +373,7 @@ benchmark results:
|
||||
for more details on downloading the Llama models before running the
|
||||
benchmark.
|
||||
|
||||
2. To run multi-node training for {{ model.model }},
|
||||
2. To run multi-node training for {{ model.model }},
|
||||
use the
|
||||
`multi-node training script <https://github.com/ROCm/MAD/blob/develop/scripts/jax-maxtext/gpu-rocm/{{ model.multinode_training_script }}>`__
|
||||
under the ``scripts/jax-maxtext/gpu-rocm/`` directory.
|
||||
|
||||
@@ -16,12 +16,22 @@ previous releases of the ``ROCm/megatron-lm`` Docker image on `Docker Hub <https
|
||||
- Components
|
||||
- Resources
|
||||
|
||||
* - v25.7 (latest)
|
||||
* - v25.8 (latest)
|
||||
-
|
||||
* ROCm
|
||||
* PyTorch
|
||||
* ROCm 6.4.3
|
||||
* PyTorch 2.8.0a0+gitd06a406
|
||||
-
|
||||
* :doc:`Documentation <../megatron-lm>`
|
||||
* :doc:`Primus Megatron documentation <../primus-megatron>`
|
||||
* :doc:`Megatron-LM (legacy) documentation <../megatron-lm>`
|
||||
* `Docker Hub (py310) <https://hub.docker.com/r/rocm/megatron-lm/tags>`__
|
||||
|
||||
* - v25.7
|
||||
-
|
||||
* ROCm 6.4.2
|
||||
* PyTorch 2.8.0a0+gitd06a406
|
||||
-
|
||||
* :doc:`Primus Megatron documentation <primus-megatron-v25.7>`
|
||||
* :doc:`Megatron-LM (legacy) documentation <megatron-lm-v25.7>`
|
||||
* `Docker Hub (py310) <https://hub.docker.com/layers/rocm/megatron-lm/v25.7_py310/images/sha256-6189df849feeeee3ae31bb1e97aef5006d69d2b90c134e97708c19632e20ab5a>`__
|
||||
|
||||
* - v25.6
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
:orphan:
|
||||
|
||||
**********************************************************************
|
||||
Migrating workloads to Primus (Megatron-Core backend) from Megatron-LM
|
||||
**********************************************************************
|
||||
*****************************************************************
|
||||
Migrating workloads to Primus (Megatron backend) from Megatron-LM
|
||||
*****************************************************************
|
||||
|
||||
Primus supports Megatron-Core as backend optimization library,
|
||||
replacing ROCm Megatron-LM. This document outlines the steps to migrate
|
||||
workload from ROCm Megatron-LM to Primus with the Megatron-Core backend.
|
||||
workload from ROCm Megatron-LM to Primus with the Megatron backend.
|
||||
|
||||
Model architecture
|
||||
==================
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,604 @@
|
||||
:orphan:
|
||||
|
||||
.. meta::
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
:keywords: ROCm, AI, LLM, train, Megatron-LM, megatron, Llama, tutorial, docker, torch
|
||||
|
||||
********************************************
|
||||
Training a model with Primus and Megatron-LM
|
||||
********************************************
|
||||
|
||||
.. caution::
|
||||
|
||||
This documentation does not reflect the latest version of ROCm Megatron-LM
|
||||
training performance documentation. See :doc:`../primus-megatron` for the latest version.
|
||||
|
||||
`Primus <https://github.com/AMD-AGI/Primus>`__ is a unified and flexible
|
||||
LLM training framework designed to streamline training. It streamlines LLM
|
||||
training on AMD Instinct accelerators using a modular, reproducible configuration paradigm.
|
||||
Primus is backend-agnostic and supports multiple training engines -- including Megatron.
|
||||
|
||||
.. note::
|
||||
|
||||
Primus with the Megatron backend is intended to replace ROCm
|
||||
Megatron-LM in this Dockerized training environment. To learn how to migrate
|
||||
workloads from Megatron-LM to Primus with Megatron, see
|
||||
:doc:`megatron-lm-primus-migration-guide`.
|
||||
|
||||
For ease of use, AMD provides a ready-to-use Docker image for MI300 series accelerators
|
||||
containing essential components for Primus and Megatron-LM.
|
||||
|
||||
.. note::
|
||||
|
||||
This Docker environment is based on Python 3.10 and Ubuntu 22.04. For an alternative environment with
|
||||
Python 3.12 and Ubuntu 24.04, see the :doc:`previous ROCm Megatron-LM v25.6 Docker release <megatron-lm-v25.6>`.
|
||||
|
||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-megatron-v25.7-benchmark-models.yaml
|
||||
|
||||
{% set dockers = data.dockers %}
|
||||
{% set docker = dockers[0] %}
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - Software component
|
||||
- Version
|
||||
|
||||
{% for component_name, component_version in docker.components.items() %}
|
||||
* - {{ component_name }}
|
||||
- {{ component_version }}
|
||||
{% endfor %}
|
||||
|
||||
.. _amd-primus-megatron-lm-model-support-v257:
|
||||
|
||||
Supported models
|
||||
================
|
||||
|
||||
The following models are pre-optimized for performance on AMD Instinct MI300X series accelerators.
|
||||
Some instructions, commands, and training examples in this documentation might
|
||||
vary by model -- select one to get started.
|
||||
|
||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-megatron-v25.7-benchmark-models.yaml
|
||||
|
||||
{% set model_groups = data.model_groups %}
|
||||
.. raw:: html
|
||||
|
||||
<div id="vllm-benchmark-ud-params-picker" class="container-fluid">
|
||||
<div class="row gx-0">
|
||||
<div class="col-2 me-1 px-2 model-param-head">Model</div>
|
||||
<div class="row col-10 pe-0">
|
||||
{% for model_group in model_groups %}
|
||||
<div class="col-3 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="row gx-0 pt-1">
|
||||
<div class="col-2 me-1 px-2 model-param-head">Variant</div>
|
||||
<div class="row col-10 pe-0">
|
||||
{% for model_group in model_groups %}
|
||||
{% set models = model_group.models %}
|
||||
{% for model in models %}
|
||||
{% if models|length % 3 == 0 %}
|
||||
<div class="col-4 px-2 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
|
||||
{% else %}
|
||||
<div class="col-6 px-2 model-param" data-param-k="model" data-param-v="{{ model.mad_tag }}" data-param-group="{{ model_group.tag }}" tabindex="0">{{ model.model }}</div>
|
||||
{% endif %}
|
||||
{% endfor %}
|
||||
{% endfor %}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
.. note::
|
||||
|
||||
Some models, such as Llama, require an external license agreement through
|
||||
a third party (for example, Meta).
|
||||
|
||||
System validation
|
||||
=================
|
||||
|
||||
Before running AI workloads, it's important to validate that your AMD hardware is configured
|
||||
correctly and performing optimally.
|
||||
|
||||
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
|
||||
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
|
||||
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
|
||||
before starting training.
|
||||
|
||||
To test for optimal performance, consult the recommended :ref:`System health benchmarks
|
||||
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
|
||||
system's configuration.
|
||||
|
||||
.. _mi300x-amd-primus-megatron-lm-training-v257:
|
||||
|
||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-megatron-v25.7-benchmark-models.yaml
|
||||
|
||||
{% set dockers = data.dockers %}
|
||||
{% set docker = dockers[0] %}
|
||||
|
||||
Environment setup
|
||||
=================
|
||||
|
||||
Use the following instructions to set up the environment, configure the script to train models, and
|
||||
reproduce the benchmark results on MI300X series accelerators with the ``{{ docker.pull_tag }}`` image.
|
||||
|
||||
.. _amd-primus-megatron-lm-requirements-v257:
|
||||
|
||||
Download the Docker image
|
||||
-------------------------
|
||||
|
||||
1. Use the following command to pull the Docker image from Docker Hub.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker pull {{ docker.pull_tag }}
|
||||
|
||||
2. Launch the Docker container.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker run -it \
|
||||
--device /dev/dri \
|
||||
--device /dev/kfd \
|
||||
--device /dev/infiniband \
|
||||
--network host --ipc host \
|
||||
--group-add video \
|
||||
--cap-add SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
--privileged \
|
||||
-v $HOME:$HOME \
|
||||
--shm-size 128G \
|
||||
--name primus_training_env \
|
||||
{{ docker.pull_tag }}
|
||||
|
||||
3. Use these commands if you exit the ``primus_training_env`` container and need to return to it.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
docker start primus_training_env
|
||||
docker exec -it primus_training_env bash
|
||||
|
||||
The Docker container hosts verified release tag ``v0.1.0-rc1`` of the `Primus
|
||||
<https://github.com/AMD-AIG-AIMA/Primus/tree/v0.1.0-rc1>`__ repository.
|
||||
|
||||
.. _amd-primus-megatron-lm-environment-setup-v257:
|
||||
|
||||
Configuration
|
||||
=============
|
||||
|
||||
Primus defines a training configuration in YAML for each model in
|
||||
`examples/megatron/configs <https://github.com/AMD-AIG-AIMA/Primus/tree/v0.1.0-rc1/examples/megatron/configs>`__.
|
||||
|
||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-megatron-v25.7-benchmark-models.yaml
|
||||
|
||||
{% set model_groups = data.model_groups %}
|
||||
{% for model_group in model_groups %}
|
||||
{% for model in model_group.models %}
|
||||
.. container:: model-doc {{ model.mad_tag }}
|
||||
|
||||
To update training parameters for {{ model.model }}, you can update ``examples/megatron/configs/{{ model.config_name }}``.
|
||||
Note that training configuration YAML files for other models follow this naming convention.
|
||||
|
||||
{% endfor %}
|
||||
{% endfor %}
|
||||
|
||||
.. note::
|
||||
|
||||
See :ref:`Key options <amd-primus-megatron-lm-benchmark-test-vars>` for more information on configuration options.
|
||||
|
||||
Dataset options
|
||||
---------------
|
||||
|
||||
You can use either mock data or real data for training.
|
||||
|
||||
* Mock data can be useful for testing and validation. Use the ``mock_data`` field to toggle between mock and real data. The default
|
||||
value is ``true`` for enabled.
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
mock_data: true
|
||||
|
||||
* If you're using a real dataset, update the ``train_data_path`` field to point to the location of your dataset.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
mock_data: false
|
||||
train_data_path: /path/to/your/dataset
|
||||
|
||||
Ensure that the files are accessible inside the Docker container.
|
||||
|
||||
.. _amd-primus-megatron-lm-tokenizer-v257:
|
||||
|
||||
Tokenizer
|
||||
---------
|
||||
|
||||
In Primus, each model uses a tokenizer from Hugging Face. For example, Llama
|
||||
3.1 8B model uses ``tokenizer_model: meta-llama/Llama-3.1-8B`` and
|
||||
``tokenizer_type: Llama3Tokenizer`` defined in the `llama3.1-8B model
|
||||
<https://github.com/AMD-AIG-AIMA/Primus/tree/v0.1.0-rc1/primus/configs/models/megatron/llama3.1_8B.yaml>`__
|
||||
definition. As such, you need to set the ``HF_TOKEN`` environment variable with
|
||||
right permissions to access the tokenizer for each model.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Export your HF_TOKEN in the workspace
|
||||
export HF_TOKEN=<your_hftoken>
|
||||
|
||||
.. _amd-primus-megatron-lm-run-training-v257:
|
||||
|
||||
Run training
|
||||
============
|
||||
|
||||
Use the following example commands to set up the environment, configure
|
||||
:ref:`key options <amd-primus-megatron-lm-benchmark-test-vars>`, and run training on
|
||||
MI300X series accelerators with the AMD Megatron-LM environment.
|
||||
|
||||
Single node training
|
||||
--------------------
|
||||
|
||||
To run training on a single node, navigate to ``/workspace/Primus`` and use the following setup command:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
pip install -r requirements.txt
|
||||
export HSA_NO_SCRATCH_RECLAIM=1
|
||||
export NVTE_CK_USES_BWD_V3=1
|
||||
|
||||
Once setup is complete, run the appropriate training command.
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.3-70b
|
||||
|
||||
To run pre-training for Llama 3.3 70B BF16, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
EXP=examples/megatron/configs/llama3.3_70B-pretrain.yaml \
|
||||
bash ./examples/run_pretrain.sh \
|
||||
--micro_batch_size 2 \
|
||||
--global_batch_size 16 \
|
||||
--train_iters 50
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.1-8b
|
||||
|
||||
To run pre-training for Llama 3.1 8B FP8, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
EXP=examples/megatron/configs/llama3.1_8B-pretrain.yaml \
|
||||
bash ./examples/run_pretrain.sh \
|
||||
--train_iters 50 \
|
||||
--fp8 hybrid
|
||||
|
||||
For Llama 3.1 8B BF16, use the following command:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
EXP=examples/megatron/configs/llama3.1_8B-pretrain.yaml \
|
||||
bash ./examples/run_pretrain.sh --train_iters 50
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.1-70b
|
||||
|
||||
To run pre-training for Llama 3.1 70B BF16, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
EXP=examples/megatron/configs/llama3.1_70B-pretrain.yaml \
|
||||
bash ./examples/run_pretrain.sh \
|
||||
--train_iters 50
|
||||
|
||||
To run the training on a single node for Llama 3.1 70B FP8 with proxy, use the following command:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
EXP=examples/megatron/configs/llama3.1_70B-pretrain.yaml \
|
||||
bash ./examples/run_pretrain.sh \
|
||||
--train_iters 50 \
|
||||
--num_layers 40 \
|
||||
--fp8 hybrid \
|
||||
--no_fp8_weight_transpose_cache true
|
||||
|
||||
.. note::
|
||||
|
||||
Use two or more nodes to run the *full* Llama 70B model with FP8 precision.
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-2-7b
|
||||
|
||||
To run pre-training for Llama 2 7B FP8, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
EXP=examples/megatron/configs/llama2_7B-pretrain.yaml \
|
||||
bash ./examples/run_pretrain.sh \
|
||||
--train_iters 50 \
|
||||
--fp8 hybrid
|
||||
|
||||
To run pre-training for Llama 2 7B BF16, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
EXP=examples/megatron/configs/llama2_7B-pretrain.yaml \
|
||||
bash ./examples/run_pretrain.sh --train_iters 50
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-2-70b
|
||||
|
||||
To run pre-training for Llama 2 70B BF16, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
EXP=examples/megatron/configs/llama2_70B-pretrain.yaml \
|
||||
bash ./examples/run_pretrain.sh --train_iters 50
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_deepseek-v3-proxy
|
||||
|
||||
To run training on a single node for DeepSeek-V3 (MoE with expert parallel) with 3-layer proxy,
|
||||
use the following command:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
EXP=examples/megatron/configs/deepseek_v3-pretrain.yaml \
|
||||
bash examples/run_pretrain.sh \
|
||||
--num_layers 3 \
|
||||
--moe_layer_freq 1 \
|
||||
--train_iters 50
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_deepseek-v2-lite-16b
|
||||
|
||||
To run training on a single node for DeepSeek-V2-Lite (MoE with expert parallel),
|
||||
use the following command:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
EXP=examples/megatron/configs/deepseek_v2_lite-pretrain.yaml \
|
||||
bash examples/run_pretrain.sh \
|
||||
--global_batch_size 256 \
|
||||
--train_iters 50
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_mixtral-8x7b
|
||||
|
||||
To run training on a single node for Mixtral 8x7B (MoE with expert parallel),
|
||||
use the following command:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
EXP=examples/megatron/configs/mixtral_8x7B_v0.1-pretrain.yaml \
|
||||
bash examples/run_pretrain.sh --train_iters 50
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_mixtral-8x22b-proxy
|
||||
|
||||
To run training on a single node for Mixtral 8x7B (MoE with expert parallel) with 4-layer proxy,
|
||||
use the following command:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
EXP=examples/megatron/configs/mixtral_8x22B_v0.1-pretrain.yaml \
|
||||
bash examples/run_pretrain.sh \
|
||||
--num_layers 4 \
|
||||
--pipeline_model_parallel_size 1 \
|
||||
--micro_batch_size 1 \
|
||||
--global_batch_size 16 \
|
||||
--train_iters 50
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_qwen2.5-7b
|
||||
|
||||
To run training on a single node for Qwen 2.5 7B BF16, use the following
|
||||
command:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
EXP=examples/megatron/configs/qwen2.5_7B-pretrain.yaml \
|
||||
bash examples/run_pretrain.sh --train_iters 50
|
||||
|
||||
For FP8, use the following command.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
EXP=examples/megatron/configs/qwen2.5_7B-pretrain.yaml \
|
||||
bash examples/run_pretrain.sh \
|
||||
--train_iters 50 \
|
||||
--fp8 hybrid
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_qwen2.5-72b
|
||||
|
||||
To run the training on a single node for Qwen 2.5 72B BF16, use the following command.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
EXP=examples/megatron/configs/qwen2.5_72B-pretrain.yaml \
|
||||
bash examples/run_pretrain.sh --train_iters 50
|
||||
|
||||
Multi-node training examples
|
||||
----------------------------
|
||||
|
||||
To run training on multiple nodes, you can use the
|
||||
`run_slurm_pretrain.sh <https://github.com/AMD-AIG-AIMA/Primus/tree/v0.1.0-rc1/examples/run_slurm_pretrain.sh>`__
|
||||
to launch the multi-node workload. Use the following steps to setup your environment:
|
||||
|
||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/previous-versions/primus-megatron-v25.7-benchmark-models.yaml
|
||||
|
||||
{% set dockers = data.dockers %}
|
||||
{% set docker = dockers[0] %}
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
cd /workspace/Primus/
|
||||
export DOCKER_IMAGE={{ docker.pull_tag }}
|
||||
export HF_TOKEN=<your_HF_token>
|
||||
export HSA_NO_SCRATCH_RECLAIM=1
|
||||
export NVTE_CK_USES_BWD_V3=1
|
||||
export NCCL_IB_HCA=<your_NCCL_IB_HCA> # specify which RDMA interfaces to use for communication
|
||||
export NCCL_SOCKET_IFNAME=<your_NCCL_SOCKET_IFNAME> # your Network Interface
|
||||
export GLOO_SOCKET_IFNAME=<your_GLOO_SOCKET_IFNAME> # your Network Interface
|
||||
export NCCL_IB_GID_INDEX=3 # Set InfiniBand GID index for NCCL communication. Default is 3 for ROCE
|
||||
|
||||
.. note::
|
||||
|
||||
* Make sure correct network drivers are installed on the nodes. If inside a Docker, either install the drivers inside the Docker container or pass the network drivers from the host while creating Docker container.
|
||||
* If ``NCCL_IB_HCA`` and ``NCCL_SOCKET_IFNAME`` are not set, Primus will try to auto-detect. However, since NICs can vary accross different cluster, it is encouraged to explicitly export your NCCL parameters for the cluster.
|
||||
* To find your network interface, you can use ``ip a``.
|
||||
* To find RDMA interfaces, you can use ``ibv_devices`` to get the list of all the RDMA/IB devices.
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.3-70b
|
||||
|
||||
To train Llama 3.3 70B FP8 on 8 nodes, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
NNODES=8 EXP=examples/megatron/configs/llama3.3_70B-pretrain.yaml \
|
||||
bash examples/run_slurm_pretrain.sh \
|
||||
--micro_batch_size 4 \
|
||||
--global_batch_size 256 \
|
||||
--recompute_num_layers 80 \
|
||||
--no_fp8_weight_transpose_cache true \
|
||||
--fp8 hybrid
|
||||
|
||||
To train Llama 3.3 70B BF16 on 8 nodes, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
NNODES=8 EXP=examples/megatron/configs/llama3.3_70B-pretrain.yaml \
|
||||
bash examples/run_slurm_pretrain.sh \
|
||||
--micro_batch_size 1 \
|
||||
--global_batch_size 256 \
|
||||
--recompute_num_layers 12
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.1-8b
|
||||
|
||||
To train Llama 3.1 8B FP8 on 8 nodes, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# Adjust the training parameters. For e.g., `global_batch_size: 8 * #single_node_bs` for 8 nodes in this case
|
||||
NNODES=8 EXP=examples/megatron/configs/llama3.1_8B-pretrain.yaml \
|
||||
bash ./examples/run_slurm_pretrain.sh \
|
||||
--global_batch_size 1024 \
|
||||
--fp8 hybrid
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.1-70b
|
||||
|
||||
To train Llama 3.1 70B FP8 on 8 nodes, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
NNODES=8 EXP=examples/megatron/configs/llama3.1_70B-pretrain.yaml \
|
||||
bash examples/run_slurm_pretrain.sh \
|
||||
--micro_batch_size 4 \
|
||||
--global_batch_size 256 \
|
||||
--recompute_num_layers 80 \
|
||||
--no_fp8_weight_transpose_cache true \
|
||||
--fp8 hybrid
|
||||
|
||||
To train Llama 3.1 70B BF16 on 8 nodes, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
NNODES=8 EXP=examples/megatron/configs/llama3.1_70B-pretrain.yaml \
|
||||
bash examples/run_slurm_pretrain.sh \
|
||||
--micro_batch_size 1 \
|
||||
--global_batch_size 256 \
|
||||
--recompute_num_layers 12
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-2-7b
|
||||
|
||||
To train Llama 2 8B FP8 on 8 nodes, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# Adjust the training parameters. For e.g., `global_batch_size: 8 * #single_node_bs` for 8 nodes in this case
|
||||
NNODES=8 EXP=examples/megatron/configs/llama2_7B-pretrain.yaml bash ./examples/run_slurm_pretrain.sh --global_batch_size 2048 --fp8 hybrid
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-2-70b
|
||||
|
||||
To train Llama 2 70B FP8 on 8 nodes, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
NNODES=8 EXP=examples/megatron/configs/llama2_70B-pretrain.yaml \
|
||||
bash examples/run_slurm_pretrain.sh \
|
||||
--micro_batch_size 10 \
|
||||
--global_batch_size 640 \
|
||||
--recompute_num_layers 80 \
|
||||
--no_fp8_weight_transpose_cache true \
|
||||
--fp8 hybrid
|
||||
|
||||
To train Llama 2 70B BF16 on 8 nodes, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
NNODES=8 EXP=examples/megatron/configs/llama2_70B-pretrain.yaml \
|
||||
bash ./examples/run_slurm_pretrain.sh \
|
||||
--micro_batch_size 2 \
|
||||
--global_batch_size 1536 \
|
||||
--recompute_num_layers 12
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_mixtral-8x7b
|
||||
|
||||
To train Mixtral 8x7B BF16 on 8 nodes, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
NNODES=8 EXP=examples/megatron/configs/mixtral_8x7B_v0.1-pretrain.yaml \
|
||||
bash examples/run_slurm_pretrain.sh \
|
||||
--micro_batch_size 2 \
|
||||
--global_batch_size 256
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_qwen2.5-72b
|
||||
|
||||
To train Qwen2.5 72B FP8 on 8 nodes, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
NNODES=8 EXP=examples/megatron/configs/qwen2.5_72B-pretrain.yaml \
|
||||
bash examples/run_slurm_pretrain.sh \
|
||||
--micro_batch_size 8 \
|
||||
--global_batch_size 512 \
|
||||
--recompute_num_layers 80 \
|
||||
--no_fp8_weight_transpose_cache true \
|
||||
--fp8 hybrid
|
||||
|
||||
.. _amd-primus-megatron-lm-benchmark-test-vars-v257:
|
||||
|
||||
Key options
|
||||
-----------
|
||||
|
||||
The following are key options to take note of
|
||||
|
||||
fp8
|
||||
``hybrid`` enables FP8 GEMMs.
|
||||
|
||||
use_torch_fsdp2
|
||||
``use_torch_fsdp2: 1`` enables torch fsdp-v2. If FSDP is enabled,
|
||||
set ``use_distributed_optimizer`` and ``overlap_param_gather`` to ``false``.
|
||||
|
||||
profile
|
||||
To enable PyTorch profiling, set these parameters:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
profile: true
|
||||
use_pytorch_profiler: true
|
||||
profile_step_end: 7
|
||||
profile_step_start: 6
|
||||
|
||||
train_iters
|
||||
The total number of iterations (default: 50).
|
||||
|
||||
mock_data
|
||||
True by default.
|
||||
|
||||
micro_batch_size
|
||||
Micro batch size.
|
||||
|
||||
global_batch_size
|
||||
Global batch size.
|
||||
|
||||
recompute_granularity
|
||||
For activation checkpointing.
|
||||
|
||||
num_layers
|
||||
For using a reduced number of layers as with proxy models.
|
||||
|
||||
Previous versions
|
||||
=================
|
||||
|
||||
See :doc:`megatron-lm-history` to find documentation for previous releases
|
||||
of the ``ROCm/megatron-lm`` Docker image.
|
||||
@@ -2,24 +2,25 @@
|
||||
:description: How to train a model using Megatron-LM for ROCm.
|
||||
:keywords: ROCm, AI, LLM, train, Megatron-LM, megatron, Llama, tutorial, docker, torch
|
||||
|
||||
**********************************************
|
||||
Training a model with Primus and Megatron-Core
|
||||
**********************************************
|
||||
********************************************
|
||||
Training a model with Primus and Megatron-LM
|
||||
********************************************
|
||||
|
||||
`Primus <https://github.com/AMD-AIG-AIMA/Primus>`__ is a unified and flexible
|
||||
`Primus <https://github.com/AMD-AGI/Primus>`__ is a unified and flexible
|
||||
LLM training framework designed to streamline training. It streamlines LLM
|
||||
training on AMD Instinct accelerators using a modular, reproducible configuration paradigm.
|
||||
Primus is backend-agnostic and supports multiple training engines -- including Megatron-Core.
|
||||
Primus is backend-agnostic and supports multiple training engines -- including Megatron.
|
||||
|
||||
.. note::
|
||||
|
||||
Primus with the Megatron-Core backend is intended to replace ROCm
|
||||
Megatron-LM in this Dockerized training environment. To learn how to migrate
|
||||
workloads from Megatron-LM to Primus with Megatron-Core, see
|
||||
:doc:`previous-versions/megatron-lm-primus-migration-guide`.
|
||||
Primus with Megatron supersedes the :doc:`ROCm Megatron-LM training <megatron-lm>` workflow.
|
||||
To learn how to migrate workloads from Megatron-LM to Primus with Megatron,
|
||||
see :doc:`previous-versions/megatron-lm-primus-migration-guide`.
|
||||
|
||||
For ease of use, AMD provides a ready-to-use Docker image for MI300 series accelerators
|
||||
containing essential components for Primus and Megatron-Core.
|
||||
containing essential components for Primus and Megatron-LM. This Docker is powered by Primus
|
||||
Turbo optimizations for performance; this release adds support for Primus Turbo
|
||||
with optimized attention and grouped GEMM kernels.
|
||||
|
||||
.. note::
|
||||
|
||||
@@ -151,8 +152,8 @@ system's configuration.
|
||||
docker start primus_training_env
|
||||
docker exec -it primus_training_env bash
|
||||
|
||||
The Docker container hosts verified release tag ``v0.1.0-rc1`` of the `Primus
|
||||
<https://github.com/AMD-AIG-AIMA/Primus/tree/v0.1.0-rc1>`__ repository.
|
||||
The Docker container hosts verified commit ``927a717`` of the `Primus
|
||||
<https://github.com/AMD-AGI/Primus/tree/927a71702784347a311ca48fd45f0f308c6ef6dd>`__ repository.
|
||||
|
||||
.. _amd-primus-megatron-lm-environment-setup:
|
||||
|
||||
@@ -160,7 +161,7 @@ Configuration
|
||||
=============
|
||||
|
||||
Primus defines a training configuration in YAML for each model in
|
||||
`examples/megatron/configs <https://github.com/AMD-AIG-AIMA/Primus/tree/v0.1.0-rc1/examples/megatron/configs>`__.
|
||||
`examples/megatron/configs <https://github.com/AMD-AGI/Primus/tree/927a71702784347a311ca48fd45f0f308c6ef6dd/examples/megatron/configs>`__.
|
||||
|
||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/primus-megatron-benchmark-models.yaml
|
||||
|
||||
@@ -205,11 +206,7 @@ You can use either mock data or real data for training.
|
||||
Tokenizer
|
||||
---------
|
||||
|
||||
In Primus, each model uses a tokenizer from Hugging Face. For example, Llama
|
||||
3.1 8B model uses ``tokenizer_model: meta-llama/Llama-3.1-8B`` and
|
||||
``tokenizer_type: Llama3Tokenizer`` defined in the `llama3.1-8B model
|
||||
<https://github.com/AMD-AIG-AIMA/Primus/tree/v0.1.0-rc1/primus/configs/models/megatron/llama3.1_8B.yaml>`__
|
||||
definition. As such, you need to set the ``HF_TOKEN`` environment variable with
|
||||
Set the ``HF_TOKEN`` environment variable with
|
||||
right permissions to access the tokenizer for each model.
|
||||
|
||||
.. code-block:: bash
|
||||
@@ -217,6 +214,14 @@ right permissions to access the tokenizer for each model.
|
||||
# Export your HF_TOKEN in the workspace
|
||||
export HF_TOKEN=<your_hftoken>
|
||||
|
||||
.. note::
|
||||
|
||||
In Primus, each model uses a tokenizer from Hugging Face. For example, Llama
|
||||
3.1 8B model uses ``tokenizer_model: meta-llama/Llama-3.1-8B`` and
|
||||
``tokenizer_type: Llama3Tokenizer`` defined in the `llama3.1-8B model
|
||||
<https://github.com/AMD-AGI/Primus/blob/927a71702784347a311ca48fd45f0f308c6ef6dd/examples/megatron/configs/llama3.1_8B-pretrain.yaml>`__
|
||||
definition.
|
||||
|
||||
.. _amd-primus-megatron-lm-run-training:
|
||||
|
||||
Run training
|
||||
@@ -237,10 +242,12 @@ To run training on a single node, navigate to ``/workspace/Primus`` and use the
|
||||
export HSA_NO_SCRATCH_RECLAIM=1
|
||||
export NVTE_CK_USES_BWD_V3=1
|
||||
|
||||
Once setup is complete, run the appropriate training command.
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.3-70b
|
||||
|
||||
Once setup is complete, run the appropriate training command.
|
||||
The following run commands are tailored to Llama 3.3 70B.
|
||||
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
|
||||
|
||||
To run pre-training for Llama 3.3 70B BF16, run:
|
||||
|
||||
.. code-block:: shell
|
||||
@@ -253,6 +260,10 @@ Once setup is complete, run the appropriate training command.
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.1-8b
|
||||
|
||||
Once setup is complete, run the appropriate training command.
|
||||
The following run commands are tailored to Llama 3.1 8B.
|
||||
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
|
||||
|
||||
To run pre-training for Llama 3.1 8B FP8, run:
|
||||
|
||||
.. code-block:: shell
|
||||
@@ -271,6 +282,10 @@ Once setup is complete, run the appropriate training command.
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-3.1-70b
|
||||
|
||||
Once setup is complete, run the appropriate training command.
|
||||
The following run commands are tailored to Llama 3.1 70B.
|
||||
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
|
||||
|
||||
To run pre-training for Llama 3.1 70B BF16, run:
|
||||
|
||||
.. code-block:: shell
|
||||
@@ -287,8 +302,7 @@ Once setup is complete, run the appropriate training command.
|
||||
bash ./examples/run_pretrain.sh \
|
||||
--train_iters 50 \
|
||||
--num_layers 40 \
|
||||
--fp8 hybrid \
|
||||
--no_fp8_weight_transpose_cache true
|
||||
--fp8 hybrid
|
||||
|
||||
.. note::
|
||||
|
||||
@@ -296,6 +310,10 @@ Once setup is complete, run the appropriate training command.
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-2-7b
|
||||
|
||||
Once setup is complete, run the appropriate training command.
|
||||
The following run commands are tailored to Llama 2 7B.
|
||||
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
|
||||
|
||||
To run pre-training for Llama 2 7B FP8, run:
|
||||
|
||||
.. code-block:: shell
|
||||
@@ -314,6 +332,10 @@ Once setup is complete, run the appropriate training command.
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_llama-2-70b
|
||||
|
||||
Once setup is complete, run the appropriate training command.
|
||||
The following run commands are tailored to Llama 2 70B.
|
||||
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
|
||||
|
||||
To run pre-training for Llama 2 70B BF16, run:
|
||||
|
||||
.. code-block:: shell
|
||||
@@ -323,6 +345,10 @@ Once setup is complete, run the appropriate training command.
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_deepseek-v3-proxy
|
||||
|
||||
Once setup is complete, run the appropriate training command.
|
||||
The following run commands are tailored to DeepSeek-V3.
|
||||
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
|
||||
|
||||
To run training on a single node for DeepSeek-V3 (MoE with expert parallel) with 3-layer proxy,
|
||||
use the following command:
|
||||
|
||||
@@ -336,6 +362,10 @@ Once setup is complete, run the appropriate training command.
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_deepseek-v2-lite-16b
|
||||
|
||||
Once setup is complete, run the appropriate training command.
|
||||
The following run commands are tailored to DeepSeek-V2-Lite.
|
||||
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
|
||||
|
||||
To run training on a single node for DeepSeek-V2-Lite (MoE with expert parallel),
|
||||
use the following command:
|
||||
|
||||
@@ -348,6 +378,10 @@ Once setup is complete, run the appropriate training command.
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_mixtral-8x7b
|
||||
|
||||
Once setup is complete, run the appropriate training command.
|
||||
The following run commands are tailored to Mixtral 8x7B.
|
||||
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
|
||||
|
||||
To run training on a single node for Mixtral 8x7B (MoE with expert parallel),
|
||||
use the following command:
|
||||
|
||||
@@ -358,7 +392,11 @@ Once setup is complete, run the appropriate training command.
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_mixtral-8x22b-proxy
|
||||
|
||||
To run training on a single node for Mixtral 8x7B (MoE with expert parallel) with 4-layer proxy,
|
||||
Once setup is complete, run the appropriate training command.
|
||||
The following run commands are tailored to Mixtral 8x22B.
|
||||
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
|
||||
|
||||
To run training on a single node for Mixtral 8x22B (MoE with expert parallel) with 4-layer proxy,
|
||||
use the following command:
|
||||
|
||||
.. code-block:: shell
|
||||
@@ -373,6 +411,10 @@ Once setup is complete, run the appropriate training command.
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_qwen2.5-7b
|
||||
|
||||
Once setup is complete, run the appropriate training command.
|
||||
The following run commands are tailored to Qwen 2.5 7B.
|
||||
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
|
||||
|
||||
To run training on a single node for Qwen 2.5 7B BF16, use the following
|
||||
command:
|
||||
|
||||
@@ -392,6 +434,10 @@ Once setup is complete, run the appropriate training command.
|
||||
|
||||
.. container:: model-doc primus_pyt_megatron_lm_train_qwen2.5-72b
|
||||
|
||||
Once setup is complete, run the appropriate training command.
|
||||
The following run commands are tailored to Qwen 2.5 72B.
|
||||
See :ref:`amd-primus-megatron-lm-model-support` to switch to another available model.
|
||||
|
||||
To run the training on a single node for Qwen 2.5 72B BF16, use the following command.
|
||||
|
||||
.. code-block:: shell
|
||||
@@ -403,7 +449,7 @@ Multi-node training examples
|
||||
----------------------------
|
||||
|
||||
To run training on multiple nodes, you can use the
|
||||
`run_slurm_pretrain.sh <https://github.com/AMD-AIG-AIMA/Primus/tree/v0.1.0-rc1/examples/run_slurm_pretrain.sh>`__
|
||||
`run_slurm_pretrain.sh <https://github.com/AMD-AGI/Primus/blob/927a71702784347a311ca48fd45f0f308c6ef6dd/examples/run_slurm_pretrain.sh>`__
|
||||
to launch the multi-node workload. Use the following steps to setup your environment:
|
||||
|
||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/primus-megatron-benchmark-models.yaml
|
||||
@@ -438,10 +484,9 @@ to launch the multi-node workload. Use the following steps to setup your environ
|
||||
|
||||
NNODES=8 EXP=examples/megatron/configs/llama3.3_70B-pretrain.yaml \
|
||||
bash examples/run_slurm_pretrain.sh \
|
||||
--micro_batch_size 4 \
|
||||
--micro_batch_size 1 \
|
||||
--global_batch_size 256 \
|
||||
--recompute_num_layers 80 \
|
||||
--no_fp8_weight_transpose_cache true \
|
||||
--fp8 hybrid
|
||||
|
||||
To train Llama 3.3 70B BF16 on 8 nodes, run:
|
||||
@@ -474,10 +519,9 @@ to launch the multi-node workload. Use the following steps to setup your environ
|
||||
|
||||
NNODES=8 EXP=examples/megatron/configs/llama3.1_70B-pretrain.yaml \
|
||||
bash examples/run_slurm_pretrain.sh \
|
||||
--micro_batch_size 4 \
|
||||
--micro_batch_size 1 \
|
||||
--global_batch_size 256 \
|
||||
--recompute_num_layers 80 \
|
||||
--no_fp8_weight_transpose_cache true \
|
||||
--fp8 hybrid
|
||||
|
||||
To train Llama 3.1 70B BF16 on 8 nodes, run:
|
||||
@@ -507,10 +551,9 @@ to launch the multi-node workload. Use the following steps to setup your environ
|
||||
|
||||
NNODES=8 EXP=examples/megatron/configs/llama2_70B-pretrain.yaml \
|
||||
bash examples/run_slurm_pretrain.sh \
|
||||
--micro_batch_size 10 \
|
||||
--global_batch_size 640 \
|
||||
--micro_batch_size 2 \
|
||||
--global_batch_size 256 \
|
||||
--recompute_num_layers 80 \
|
||||
--no_fp8_weight_transpose_cache true \
|
||||
--fp8 hybrid
|
||||
|
||||
To train Llama 2 70B BF16 on 8 nodes, run:
|
||||
@@ -542,10 +585,9 @@ to launch the multi-node workload. Use the following steps to setup your environ
|
||||
|
||||
NNODES=8 EXP=examples/megatron/configs/qwen2.5_72B-pretrain.yaml \
|
||||
bash examples/run_slurm_pretrain.sh \
|
||||
--micro_batch_size 8 \
|
||||
--global_batch_size 512 \
|
||||
--micro_batch_size 4 \
|
||||
--global_batch_size 256 \
|
||||
--recompute_num_layers 80 \
|
||||
--no_fp8_weight_transpose_cache true \
|
||||
--fp8 hybrid
|
||||
|
||||
.. _amd-primus-megatron-lm-benchmark-test-vars:
|
||||
@@ -590,6 +632,18 @@ recompute_granularity
|
||||
num_layers
|
||||
For using a reduced number of layers as with proxy models.
|
||||
|
||||
Further reading
|
||||
===============
|
||||
|
||||
- For an introduction to Primus, see `Primus: A Lightweight, Unified Training
|
||||
Framework for Large Models on AMD GPUs <https://rocm.blogs.amd.com/software-tools-optimization/primus/README.html>`__.
|
||||
|
||||
- To learn more about system settings and management practices to configure your system for
|
||||
AMD Instinct MI300X series accelerators, see `AMD Instinct MI300X system optimization <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html>`_.
|
||||
|
||||
- For a list of other ready-made Docker images for AI with ROCm, see
|
||||
`AMD Infinity Hub <https://www.amd.com/en/developer/resources/infinity-hub.html#f-amd_hub_category=AI%20%26%20ML%20Models>`_.
|
||||
|
||||
Previous versions
|
||||
=================
|
||||
|
||||
@@ -598,5 +652,4 @@ of the ``ROCm/megatron-lm`` Docker image.
|
||||
|
||||
This training environment now uses Primus with Megatron as the primary
|
||||
configuration. Limited support for the legacy ROCm Megatron-LM is still
|
||||
available. For instructions on using ROCm Megatron-LM, see the
|
||||
:doc:`megatron-lm` document.
|
||||
available; see the :doc:`megatron-lm` documentation.
|
||||
|
||||
Reference in New Issue
Block a user