From 8eb5fef37c7db3c29dc90d85832468c142e24982 Mon Sep 17 00:00:00 2001 From: anisha-amd Date: Tue, 21 Oct 2025 16:12:18 -0400 Subject: [PATCH] Docs: frameworks compatibility standardization (#5488) --- .wordlist.txt | 6 +- .../ml-compatibility/dgl-compatibility.rst | 133 ++++++++++-------- .../flashinfer-compatibility.rst | 39 ++--- .../ml-compatibility/jax-compatibility.rst | 64 ++++----- .../llama-cpp-compatibility.rst | 55 ++++---- .../megablocks-compatibility.rst | 63 ++++++--- .../pytorch-compatibility.rst | 53 +++---- .../ml-compatibility/ray-compatibility.rst | 64 ++++----- .../stanford-megatron-lm-compatibility.rst | 78 ++++++---- .../ml-compatibility/taichi-compatibility.rst | 57 +++++--- .../tensorflow-compatibility.rst | 52 ++++--- .../ml-compatibility/verl-compatibility.rst | 60 ++++++-- 12 files changed, 426 insertions(+), 298 deletions(-) diff --git a/.wordlist.txt b/.wordlist.txt index 8e7c9ba62..294016553 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -34,6 +34,7 @@ AlexNet Andrej Arb Autocast +autograd BARs BatchNorm BLAS @@ -86,9 +87,11 @@ Conda ConnectX CountOnes CuPy +customizable da Dashboarding Dataloading +dataflows DBRX DDR DF @@ -182,7 +185,7 @@ GPT GPU GPU's GPUs -Graphbolt +GraphBolt GraphSage GRBM GRE @@ -212,6 +215,7 @@ Haswell Higgs href Hyperparameters +HybridEngine Huggingface IB ICD diff --git a/docs/compatibility/ml-compatibility/dgl-compatibility.rst b/docs/compatibility/ml-compatibility/dgl-compatibility.rst index 7c61515ec..3c18ce100 100644 --- a/docs/compatibility/ml-compatibility/dgl-compatibility.rst +++ b/docs/compatibility/ml-compatibility/dgl-compatibility.rst @@ -2,7 +2,7 @@ .. meta:: :description: Deep Graph Library (DGL) compatibility - :keywords: GPU, DGL compatibility + :keywords: GPU, CPU, deep graph library, DGL, deep learning, framework compatibility .. version-set:: rocm_version latest @@ -10,24 +10,42 @@ DGL compatibility ******************************************************************************** -Deep Graph Library `(DGL) `_ is an easy-to-use, high-performance and scalable +Deep Graph Library (`DGL `__) is an easy-to-use, high-performance, and scalable Python package for deep learning on graphs. DGL is framework agnostic, meaning -if a deep graph model is a component in an end-to-end application, the rest of +that if a deep graph model is a component in an end-to-end application, the rest of the logic is implemented using PyTorch. -* ROCm support for DGL is hosted in the `https://github.com/ROCm/dgl `_ repository. -* Due to independent compatibility considerations, this location differs from the `https://github.com/dmlc/dgl `_ upstream repository. -* Use the prebuilt :ref:`Docker images ` with DGL, PyTorch, and ROCm preinstalled. -* See the :doc:`ROCm DGL installation guide ` - to install and get started. +DGL provides a high-performance graph object that can reside on either CPUs or GPUs. +It bundles structural data features for better control and provides a variety of functions +for computing with graph objects, including efficient and customizable message passing +primitives for Graph Neural Networks. - -Supported devices +Support overview ================================================================================ -- **Officially Supported**: TF32 with AMD Instinct MI300X (through hipblaslt) -- **Partially Supported**: TF32 with AMD Instinct MI250X +- The ROCm-supported version of DGL is maintained in the official `https://github.com/ROCm/dgl + `__ repository, which differs from the + `https://github.com/dmlc/dgl `__ upstream repository. +- To get started and install DGL on ROCm, use the prebuilt :ref:`Docker images `, + which include ROCm, DGL, and all required dependencies. + + - See the :doc:`ROCm DGL installation guide ` + for installation and setup instructions. + + - You can also consult the upstream `Installation guide `__ + for additional context. + +Version support +-------------------------------------------------------------------------------- + +DGL is supported on `ROCm 6.4.0 `__. + +Supported devices +-------------------------------------------------------------------------------- + +- **Officially Supported**: AMD Instinct™ MI300X (through `hipBLASlt `__) +- **Partially Supported**: AMD Instinct™ MI250X .. _dgl-recommendations: @@ -35,7 +53,7 @@ Use cases and recommendations ================================================================================ DGL can be used for Graph Learning, and building popular graph models like -GAT, GCN and GraphSage. Using these we can support a variety of use-cases such as: +GAT, GCN, and GraphSage. Using these models, a variety of use cases are supported: - Recommender systems - Network Optimization and Analysis @@ -62,16 +80,17 @@ Docker image compatibility -AMD validates and publishes `DGL images `_ -with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated -inventories were tested on `ROCm 6.4.0 `_. +AMD validates and publishes `DGL images `__ +with ROCm backends on Docker Hub. The following Docker image tags and associated +inventories represent the latest available DGL version from the official Docker Hub. Click the |docker-icon| to view the image on Docker Hub. .. list-table:: DGL Docker image components :header-rows: 1 :class: docker-image-compatibility - * - Docker + * - Docker image + - ROCm - DGL - PyTorch - Ubuntu @@ -81,102 +100,106 @@ Click the |docker-icon| to view the image on Docker Hub. - - `2.4.0 `_ - - `2.6.0 `_ + - `6.4.0 `__. + - `2.4.0 `__ + - `2.6.0 `__ - 24.04 - - `3.12.9 `_ + - `3.12.9 `__ * - .. raw:: html - - `2.4.0 `_ - - `2.4.1 `_ + - `6.4.0 `__. + - `2.4.0 `__ + - `2.4.1 `__ - 24.04 - - `3.12.9 `_ + - `3.12.9 `__ * - .. raw:: html - - `2.4.0 `_ - - `2.4.1 `_ + - `6.4.0 `__. + - `2.4.0 `__ + - `2.4.1 `__ - 22.04 - - `3.10.16 `_ + - `3.10.16 `__ * - .. raw:: html - - `2.4.0 `_ - - `2.3.0 `_ + - `6.4.0 `__. + - `2.4.0 `__ + - `2.3.0 `__ - 22.04 - - `3.10.16 `_ + - `3.10.16 `__ Key ROCm libraries for DGL ================================================================================ DGL on ROCm depends on specific libraries that affect its features and performance. -Using the DGL Docker container or building it with the provided docker file or a ROCm base image is recommended. +Using the DGL Docker container or building it with the provided Docker file or a ROCm base image is recommended. If you prefer to build it yourself, ensure the following dependencies are installed: .. list-table:: :header-rows: 1 * - ROCm library - - Version + - ROCm 6.4.0 Version - Purpose * - `Composable Kernel `_ - - :version-ref:`"Composable Kernel" rocm_version` + - 1.1.0 - Enables faster execution of core operations like matrix multiplication (GEMM), convolutions and transformations. * - `hipBLAS `_ - - :version-ref:`hipBLAS rocm_version` + - 2.4.0 - Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for matrix and vector operations. * - `hipBLASLt `_ - - :version-ref:`hipBLASLt rocm_version` + - 0.12.0 - hipBLASLt is an extension of the hipBLAS library, providing additional features like epilogues fused into the matrix multiplication kernel or use of integer tensor cores. * - `hipCUB `_ - - :version-ref:`hipCUB rocm_version` + - 3.4.0 - Provides a C++ template library for parallel algorithms for reduction, scan, sort and select. * - `hipFFT `_ - - :version-ref:`hipFFT rocm_version` + - 1.0.18 - Provides GPU-accelerated Fast Fourier Transform (FFT) operations. * - `hipRAND `_ - - :version-ref:`hipRAND rocm_version` + - 2.12.0 - Provides fast random number generation for GPUs. * - `hipSOLVER `_ - - :version-ref:`hipSOLVER rocm_version` + - 2.4.0 - Provides GPU-accelerated solvers for linear systems, eigenvalues, and singular value decompositions (SVD). * - `hipSPARSE `_ - - :version-ref:`hipSPARSE rocm_version` + - 3.2.0 - Accelerates operations on sparse matrices, such as sparse matrix-vector or matrix-matrix products. * - `hipSPARSELt `_ - - :version-ref:`hipSPARSELt rocm_version` + - 0.2.3 - Accelerates operations on sparse matrices, such as sparse matrix-vector or matrix-matrix products. * - `hipTensor `_ - - :version-ref:`hipTensor rocm_version` + - 1.5.0 - Optimizes for high-performance tensor operations, such as contractions. * - `MIOpen `_ - - :version-ref:`MIOpen rocm_version` + - 3.4.0 - Optimizes deep learning primitives such as convolutions, pooling, normalization, and activation functions. * - `MIGraphX `_ - - :version-ref:`MIGraphX rocm_version` + - 2.12.0 - Adds graph-level optimizations, ONNX models and mixed precision support and enable Ahead-of-Time (AOT) Compilation. * - `MIVisionX `_ - - :version-ref:`MIVisionX rocm_version` + - 3.2.0 - Optimizes acceleration for computer vision and AI workloads like preprocessing, augmentation, and inferencing. * - `rocAL `_ @@ -184,25 +207,25 @@ If you prefer to build it yourself, ensure the following dependencies are instal - Accelerates the data pipeline by offloading intensive preprocessing and augmentation tasks. rocAL is part of MIVisionX. * - `RCCL `_ - - :version-ref:`RCCL rocm_version` + - 2.2.0 - Optimizes for multi-GPU communication for operations like AllReduce and Broadcast. * - `rocDecode `_ - - :version-ref:`rocDecode rocm_version` + - 0.10.0 - Provides hardware-accelerated data decoding capabilities, particularly for image, video, and other dataset formats. * - `rocJPEG `_ - - :version-ref:`rocJPEG rocm_version` + - 0.8.0 - Provides hardware-accelerated JPEG image decoding and encoding. * - `RPP `_ - - :version-ref:`RPP rocm_version` + - 1.9.10 - Speeds up data augmentation, transformation, and other preprocessing steps. * - `rocThrust `_ - - :version-ref:`rocThrust rocm_version` + - 3.3.0 - Provides a C++ template library for parallel algorithms like sorting, reduction, and scanning. * - `rocWMMA `_ - - :version-ref:`rocWMMA rocm_version` + - 1.7.0 - Accelerates warp-level matrix-multiply and matrix-accumulate to speed up matrix multiplication (GEMM) and accumulation operations with mixed precision support. @@ -211,14 +234,14 @@ If you prefer to build it yourself, ensure the following dependencies are instal Supported features ================================================================================ -Many functions and methods available in DGL Upstream are also supported in DGL ROCm. +Many functions and methods available upstream are also supported in DGL on ROCm. Instead of listing them all, support is grouped into the following categories to provide a general overview. * DGL Base * DGL Backend * DGL Data * DGL Dataloading -* DGL DGLGraph +* DGL Graph * DGL Function * DGL Ops * DGL Sampling @@ -235,9 +258,9 @@ Instead of listing them all, support is grouped into the following categories to Unsupported features ================================================================================ -* Graphbolt -* Partial TF32 Support (MI250x only) -* Kineto/ ROCTracer integration +* GraphBolt +* Partial TF32 Support (MI250X only) +* Kineto/ROCTracer integration Unsupported functions diff --git a/docs/compatibility/ml-compatibility/flashinfer-compatibility.rst b/docs/compatibility/ml-compatibility/flashinfer-compatibility.rst index 45ecc6a75..700186c73 100644 --- a/docs/compatibility/ml-compatibility/flashinfer-compatibility.rst +++ b/docs/compatibility/ml-compatibility/flashinfer-compatibility.rst @@ -1,8 +1,8 @@ :orphan: .. meta:: - :description: FlashInfer deep learning framework compatibility - :keywords: GPU, LLM, FlashInfer, compatibility + :description: FlashInfer compatibility + :keywords: GPU, LLM, FlashInfer, deep learning, framework compatibility .. version-set:: rocm_version latest @@ -11,7 +11,7 @@ FlashInfer compatibility ******************************************************************************** `FlashInfer `__ is a library and kernel generator -for Large Language Models (LLMs) that provides high-performance implementation of graphics +for Large Language Models (LLMs) that provides a high-performance implementation of graphics processing units (GPUs) kernels. FlashInfer focuses on LLM serving and inference, as well as advanced performance across diverse scenarios. @@ -25,28 +25,30 @@ offers high-performance LLM-specific operators, with easy integration through Py For the latest feature compatibility matrix, refer to the ``README`` of the `https://github.com/ROCm/flashinfer `__ repository. -Support for the ROCm port of FlashInfer is available as follows: +Support overview +================================================================================ -- ROCm support for FlashInfer is hosted in the `https://github.com/ROCm/flashinfer - `__ repository. This location differs from the - `https://github.com/flashinfer-ai/flashinfer `_ +- The ROCm-supported version of FlashInfer is maintained in the official `https://github.com/ROCm/flashinfer + `__ repository, which differs from the + `https://github.com/flashinfer-ai/flashinfer `__ upstream repository. -- To install FlashInfer, use the prebuilt :ref:`Docker image `, - which includes ROCm, FlashInfer, and all required dependencies. +- To get started and install FlashInfer on ROCm, use the prebuilt :ref:`Docker images `, + which include ROCm, FlashInfer, and all required dependencies. - See the :doc:`ROCm FlashInfer installation guide ` - to install and get started. + for installation and setup instructions. - - See the `Installation guide `__ - in the upstream FlashInfer documentation. + - You can also consult the upstream `Installation guide `__ + for additional context. -.. note:: +Version support +-------------------------------------------------------------------------------- - Flashinfer is supported on ROCm 6.4.1. +FlashInfer is supported on `ROCm 6.4.1 `__. Supported devices -================================================================================ +-------------------------------------------------------------------------------- **Officially Supported**: AMD Instinct™ MI300X @@ -78,10 +80,9 @@ Docker image compatibility -AMD validates and publishes `ROCm FlashInfer images `__ -with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated -inventories represent the FlashInfer version from the official Docker Hub. -The Docker images have been validated for `ROCm 6.4.1 `__. +AMD validates and publishes `FlashInfer images `__ +with ROCm backends on Docker Hub. The following Docker image tag and associated +inventories represent the latest available FlashInfer version from the official Docker Hub. Click |docker-icon| to view the image on Docker Hub. .. list-table:: diff --git a/docs/compatibility/ml-compatibility/jax-compatibility.rst b/docs/compatibility/ml-compatibility/jax-compatibility.rst index 121e4d126..8308a8efc 100644 --- a/docs/compatibility/ml-compatibility/jax-compatibility.rst +++ b/docs/compatibility/ml-compatibility/jax-compatibility.rst @@ -2,7 +2,7 @@ .. meta:: :description: JAX compatibility - :keywords: GPU, JAX compatibility + :keywords: GPU, JAX, deep learning, framework compatibility .. version-set:: rocm_version latest @@ -10,42 +10,38 @@ JAX compatibility ******************************************************************************* -JAX provides a NumPy-like API, which combines automatic differentiation and the -Accelerated Linear Algebra (XLA) compiler to achieve high-performance machine -learning at scale. +`JAX `__ is a library +for array-oriented numerical computation (similar to NumPy), with automatic differentiation +and just-in-time (JIT) compilation to enable high-performance machine learning research. -JAX uses composable transformations of Python and NumPy through just-in-time -(JIT) compilation, automatic vectorization, and parallelization. To learn about -JAX, including profiling and optimizations, see the official `JAX documentation -`_. +JAX provides an API that combines automatic differentiation and the +Accelerated Linear Algebra (XLA) compiler to achieve high-performance machine +learning at scale. JAX uses composable transformations of Python and NumPy through +JIT compilation, automatic vectorization, and parallelization. -ROCm support for JAX is upstreamed, and users can build the official source code -with ROCm support: +Support overview +================================================================================ -- ROCm JAX release: +- The ROCm-supported version of JAX is maintained in the official `https://github.com/ROCm/rocm-jax + `__ repository, which differs from the + `https://github.com/jax-ml/jax `__ upstream repository. - - Offers AMD-validated and community :ref:`Docker images ` - with ROCm and JAX preinstalled. +- To get started and install JAX on ROCm, use the prebuilt :ref:`Docker images `, + which include ROCm, JAX, and all required dependencies. - - ROCm JAX repository: `ROCm/rocm-jax `_ + - See the :doc:`ROCm JAX installation guide ` + for installation and setup instructions. - - See the :doc:`ROCm JAX installation guide ` - to get started. + - You can also consult the upstream `Installation guide `__ + for additional context. -- Official JAX release: +Version support +-------------------------------------------------------------------------------- - - Official JAX repository: `jax-ml/jax `_ - - - See the `AMD GPU (Linux) installation section - `_ in - the JAX documentation. - -.. note:: - - AMD releases official `ROCm JAX Docker images `_ - quarterly alongside new ROCm releases. These images undergo full AMD testing. - `Community ROCm JAX Docker images `_ - follow upstream JAX releases and use the latest available ROCm version. +AMD releases official `ROCm JAX Docker images `_ +quarterly alongside new ROCm releases. These images undergo full AMD testing. +`Community ROCm JAX Docker images `_ +follow upstream JAX releases and use the latest available ROCm version. Use cases and recommendations ================================================================================ @@ -71,7 +67,7 @@ Use cases and recommendations * The `Distributed fine-tuning with JAX on AMD GPUs `_ outlines the process of fine-tuning a Bidirectional Encoder Representations from Transformers (BERT)-based large language model (LLM) using JAX for a text - classification task. The blog post discuss techniques for parallelizing the + classification task. The blog post discusses techniques for parallelizing the fine-tuning across multiple AMD GPUs and assess the model's performance on a holdout dataset. During the fine-tuning, a BERT-base-cased transformer model and the General Language Understanding Evaluation (GLUE) benchmark dataset was @@ -90,9 +86,9 @@ For more use cases and recommendations, see `ROCm JAX blog posts `__ and are the -recommended way to get started with deep learning with JAX on ROCm. +AMD validates and publishes `JAX images `__ +with ROCm backends on Docker Hub. + For ``jax-community`` images, see `rocm/jax-community `__ on Docker Hub. @@ -234,7 +230,7 @@ The ROCm supported data types in JAX are collected in the following table. .. note:: - JAX data type support is effected by the :ref:`key_rocm_libraries` and it's + JAX data type support is affected by the :ref:`key_rocm_libraries` and it's collected on :doc:`ROCm data types and precision support ` page. diff --git a/docs/compatibility/ml-compatibility/llama-cpp-compatibility.rst b/docs/compatibility/ml-compatibility/llama-cpp-compatibility.rst index 902c61a2a..b79baf253 100644 --- a/docs/compatibility/ml-compatibility/llama-cpp-compatibility.rst +++ b/docs/compatibility/ml-compatibility/llama-cpp-compatibility.rst @@ -1,8 +1,8 @@ :orphan: .. meta:: - :description: llama.cpp deep learning framework compatibility - :keywords: GPU, GGML, llama.cpp compatibility + :description: llama.cpp compatibility + :keywords: GPU, GGML, llama.cpp, deep learning, framework compatibility .. version-set:: rocm_version latest @@ -20,33 +20,32 @@ to accelerate inference and reduce memory usage. Originally built as a CPU-first llama.cpp is easy to integrate with other programming environments and is widely adopted across diverse platforms, including consumer devices. -ROCm support for llama.cpp is upstreamed, and you can build the official source code -with ROCm support: - -- ROCm support for llama.cpp is hosted in the official `https://github.com/ROCm/llama.cpp - `_ repository. - -- Due to independent compatibility considerations, this location differs from the - `https://github.com/ggml-org/llama.cpp `_ upstream repository. - -- To install llama.cpp, use the prebuilt :ref:`Docker image `, - which includes ROCm, llama.cpp, and all required dependencies. - - - See the :doc:`ROCm llama.cpp installation guide ` - to install and get started. - - - See the `Installation guide `__ - in the upstream llama.cpp documentation. - -.. note:: - - llama.cpp is supported on ROCm 7.0.0 and ROCm 6.4.x. - -Supported devices +Support overview ================================================================================ -**Officially Supported**: AMD Instinct™ MI300X, MI325X, MI210 +- The ROCm-supported version of llama.cpp is maintained in the official `https://github.com/ROCm/llama.cpp + `__ repository, which differs from the + `https://github.com/ggml-org/llama.cpp `__ upstream repository. +- To get started and install llama.cpp on ROCm, use the prebuilt :ref:`Docker images `, + which include ROCm, llama.cpp, and all required dependencies. + + - See the :doc:`ROCm llama.cpp installation guide ` + for installation and setup instructions. + + - You can also consult the upstream `Installation guide `__ + for additional context. + +Version support +-------------------------------------------------------------------------------- + +llama.cpp is supported on `ROCm 7.0.0 `__ and +`ROCm 6.4.x `__. + +Supported devices +-------------------------------------------------------------------------------- + +**Officially Supported**: AMD Instinct™ MI300X, MI325X, MI210 Use cases and recommendations ================================================================================ @@ -84,9 +83,9 @@ Docker image compatibility -AMD validates and publishes `ROCm llama.cpp Docker images `__ +AMD validates and publishes `llama.cpp images `__ with ROCm backends on Docker Hub. The following Docker image tags and associated -inventories represent the available llama.cpp versions from the official Docker Hub. +inventories represent the latest available llama.cpp versions from the official Docker Hub. Click |docker-icon| to view the image on Docker Hub. .. important:: diff --git a/docs/compatibility/ml-compatibility/megablocks-compatibility.rst b/docs/compatibility/ml-compatibility/megablocks-compatibility.rst index 50c2c3821..5716ececb 100644 --- a/docs/compatibility/ml-compatibility/megablocks-compatibility.rst +++ b/docs/compatibility/ml-compatibility/megablocks-compatibility.rst @@ -2,7 +2,7 @@ .. meta:: :description: Megablocks compatibility - :keywords: GPU, megablocks, compatibility + :keywords: GPU, megablocks, deep learning, framework compatibility .. version-set:: rocm_version latest @@ -10,28 +10,42 @@ Megablocks compatibility ******************************************************************************** -Megablocks is a light-weight library for mixture-of-experts (MoE) training. +`Megablocks `__ is a lightweight library +for mixture-of-experts `(MoE) `__ training. The core of the system is efficient "dropless-MoE" and standard MoE layers. -Megablocks is integrated with `https://github.com/stanford-futuredata/Megatron-LM `_, +Megablocks is integrated with `https://github.com/stanford-futuredata/Megatron-LM +`__, where data and pipeline parallel training of MoEs is supported. -* ROCm support for Megablocks is hosted in the official `https://github.com/ROCm/megablocks `_ repository. -* Due to independent compatibility considerations, this location differs from the `https://github.com/stanford-futuredata/Megatron-LM `_ upstream repository. -* Use the prebuilt :ref:`Docker image ` with ROCm, PyTorch, and Megablocks preinstalled. -* See the :doc:`ROCm Megablocks installation guide ` to install and get started. +Support overview +================================================================================ -.. note:: +- The ROCm-supported version of Megablocks is maintained in the official `https://github.com/ROCm/megablocks + `__ repository, which differs from the + `https://github.com/stanford-futuredata/Megatron-LM `__ upstream repository. - Megablocks is supported on ROCm 6.3.0. +- To get started and install Megablocks on ROCm, use the prebuilt :ref:`Docker image `, + which includes ROCm, Megablocks, and all required dependencies. + + - See the :doc:`ROCm Megablocks installation guide ` + for installation and setup instructions. + + - You can also consult the upstream `Installation guide `__ + for additional context. + +Version support +-------------------------------------------------------------------------------- + +Megablocks is supported on `ROCm 6.3.0 `__. Supported devices -================================================================================ +-------------------------------------------------------------------------------- -- **Officially Supported**: AMD Instinct MI300X -- **Partially Supported** (functionality or performance limitations): AMD Instinct MI250X, MI210 +- **Officially Supported**: AMD Instinct™ MI300X +- **Partially Supported** (functionality or performance limitations): AMD Instinct™ MI250X, MI210 Supported models and features -================================================================================ +-------------------------------------------------------------------------------- This section summarizes the Megablocks features supported by ROCm. @@ -41,20 +55,28 @@ This section summarizes the Megablocks features supported by ROCm. * Mixture-of-Experts * dropless-Mixture-of-Experts - .. _megablocks-recommendations: Use cases and recommendations ================================================================================ -The `ROCm Megablocks blog posts `_ -guide how to leverage the ROCm platform for pre-training using the Megablocks framework. +* The `Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs + `__ + blog post guides how to leverage the ROCm platform for pre-training using the + Megablocks framework. It introduces a streamlined approach for training Mixture-of-Experts + (MoE) models using the Megablocks library on AMD hardware. Focusing on GPT-2, it + demonstrates how block-sparse computations can enhance scalability and efficiency in MoE + training. The guide provides step-by-step instructions for setting up the environment, + including cloning the repository, building the Docker image, and running the training container. + Additionally, it offers insights into utilizing the ``oscar-1GB.json`` dataset for pre-training + language models. By leveraging Megablocks and the ROCm platform, you can optimize your MoE + training workflows for large-scale transformer models. + It features how to pre-process datasets and how to begin pre-training on AMD GPUs through: * Single-GPU pre-training * Multi-GPU pre-training - .. _megablocks-docker-compat: Docker image compatibility @@ -64,10 +86,9 @@ Docker image compatibility -AMD validates and publishes `ROCm Megablocks images `_ -with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated -inventories represent the latest Megatron-LM version from the official Docker Hub. -The Docker images have been validated for `ROCm 6.3.0 `_. +AMD validates and publishes `Megablocks images `__ +with ROCm backends on Docker Hub. The following Docker image tag and associated +inventories represent the latest available Megablocks version from the official Docker Hub. Click |docker-icon| to view the image on Docker Hub. .. list-table:: diff --git a/docs/compatibility/ml-compatibility/pytorch-compatibility.rst b/docs/compatibility/ml-compatibility/pytorch-compatibility.rst index 19901b7cd..54365a72f 100644 --- a/docs/compatibility/ml-compatibility/pytorch-compatibility.rst +++ b/docs/compatibility/ml-compatibility/pytorch-compatibility.rst @@ -2,7 +2,7 @@ .. meta:: :description: PyTorch compatibility - :keywords: GPU, PyTorch compatibility + :keywords: GPU, PyTorch, deep learning, framework compatibility .. version-set:: rocm_version latest @@ -15,40 +15,42 @@ deep learning. PyTorch on ROCm provides mixed-precision and large-scale training using `MIOpen `__ and `RCCL `__ libraries. -ROCm support for PyTorch is upstreamed into the official PyTorch repository. Due -to independent compatibility considerations, this results in two distinct -release cycles for PyTorch on ROCm: +PyTorch provides two high-level features: -- ROCm PyTorch release: +- Tensor computation (like NumPy) with strong GPU acceleration - - Provides the latest version of ROCm but might not necessarily support the - latest stable PyTorch version. +- Deep neural networks built on a tape-based autograd system (rapid computation + of multiple partial derivatives or gradients) - - Offers :ref:`Docker images ` with ROCm and PyTorch - preinstalled. +Support overview +================================================================================ - - ROCm PyTorch repository: ``__ +ROCm support for PyTorch is upstreamed into the official PyTorch repository. +ROCm development is aligned with the stable release of PyTorch, while upstream +PyTorch testing uses the stable release of ROCm to maintain consistency: - - See the :doc:`ROCm PyTorch installation guide ` - to get started. +- The ROCm-supported version of PyTorch is maintained in the official `https://github.com/ROCm/pytorch + `__ repository, which differs from the + `https://github.com/pytorch/pytorch `__ upstream repository. -- Official PyTorch release: +- To get started and install PyTorch on ROCm, use the prebuilt :ref:`Docker images `, + which include ROCm, PyTorch, and all required dependencies. - - Provides the latest stable version of PyTorch but might not necessarily - support the latest ROCm version. + - See the :doc:`ROCm PyTorch installation guide ` + for installation and setup instructions. - - Official PyTorch repository: ``__ - - - See the `Nightly and latest stable version installation guide `__ - or `Previous versions `__ - to get started. + - You can also consult the upstream `Installation guide `__ or + `Previous versions `__ for additional context. PyTorch includes tooling that generates HIP source code from the CUDA backend. This approach allows PyTorch to support ROCm without requiring manual code modifications. For more information, see :doc:`HIPIFY `. -ROCm development is aligned with the stable release of PyTorch, while upstream -PyTorch testing uses the stable release of ROCm to maintain consistency. +Version support +-------------------------------------------------------------------------------- + +AMD releases official `ROCm PyTorch Docker images `_ +quarterly alongside new ROCm releases. These images undergo full AMD testing. .. _pytorch-recommendations: @@ -78,7 +80,7 @@ Use cases and recommendations GPU. * The :doc:`Inception with PyTorch documentation ` - describes how PyTorch integrates with ROCm for AI workloads It outlines the + describes how PyTorch integrates with ROCm for AI workloads. It outlines the use of PyTorch on the ROCm platform and focuses on efficiently leveraging AMD GPU hardware for training and inference tasks in AI applications. @@ -89,9 +91,8 @@ For more use cases and recommendations, see `ROCm PyTorch blog posts `__ and are the -recommended way to get started with deep learning with PyTorch on ROCm. +AMD validates and publishes `PyTorch images `__ +with ROCm backends on Docker Hub. To find the right image tag, see the :ref:`PyTorch on ROCm installation documentation ` for a list of diff --git a/docs/compatibility/ml-compatibility/ray-compatibility.rst b/docs/compatibility/ml-compatibility/ray-compatibility.rst index 2f5c83589..428d750b3 100644 --- a/docs/compatibility/ml-compatibility/ray-compatibility.rst +++ b/docs/compatibility/ml-compatibility/ray-compatibility.rst @@ -1,8 +1,8 @@ :orphan: .. meta:: - :description: Ray deep learning framework compatibility - :keywords: GPU, Ray compatibility + :description: Ray compatibility + :keywords: GPU, Ray, deep learning, framework compatibility .. version-set:: rocm_version latest @@ -19,36 +19,35 @@ simplifying machine learning computations. Ray is a general-purpose framework that runs many types of workloads efficiently. Any Python application can be scaled with Ray, without extra infrastructure. -ROCm support for Ray is upstreamed, and you can build the official source code -with ROCm support: - -- ROCm support for Ray is hosted in the official `https://github.com/ROCm/ray - `_ repository. - -- Due to independent compatibility considerations, this location differs from the - `https://github.com/ray-project/ray `_ upstream repository. - -- To install Ray, use the prebuilt :ref:`Docker image ` - which includes ROCm, Ray, and all required dependencies. - - - See the :doc:`ROCm Ray installation guide ` - for instructions to get started. - - - See the `Installation section `_ - in the upstream Ray documentation. - - - The Docker image provided is based on the upstream Ray `Daily Release (Nightly) wheels `__ - corresponding to commit `005c372 `__. - -.. note:: - - Ray is supported on ROCm 6.4.1. - -Supported devices +Support overview ================================================================================ -**Officially Supported**: AMD Instinct™ MI300X, MI210 +- The ROCm-supported version of Ray is maintained in the official `https://github.com/ROCm/ray + `__ repository, which differs from the + `https://github.com/ray-project/ray `__ upstream repository. +- To get started and install Ray on ROCm, use the prebuilt :ref:`Docker image `, + which includes ROCm, Ray, and all required dependencies. + + - The Docker image provided is based on the upstream Ray `Daily Release (Nightly) wheels + `__ + corresponding to commit `005c372 `__. + + - See the :doc:`ROCm Ray installation guide ` + for installation and setup instructions. + + - You can also consult the upstream `Installation guide `__ + for additional context. + +Version support +-------------------------------------------------------------------------------- + +Ray is supported on `ROCm 6.4.1 `__. + +Supported devices +-------------------------------------------------------------------------------- + +**Officially Supported**: AMD Instinct™ MI300X, MI210 Use cases and recommendations ================================================================================ @@ -88,15 +87,15 @@ Docker image compatibility AMD validates and publishes ready-made `ROCm Ray Docker images `__ with ROCm backends on Docker Hub. The following Docker image tags and -associated inventories represent the latest Ray version from the official Docker Hub and are validated for -`ROCm 6.4.1 `_. Click the |docker-icon| -icon to view the image on Docker Hub. +associated inventories represent the latest Ray version from the official Docker Hub. +Click the |docker-icon| icon to view the image on Docker Hub. .. list-table:: :header-rows: 1 :class: docker-image-compatibility * - Docker image + - ROCm - Ray - Pytorch - Ubuntu @@ -105,6 +104,7 @@ icon to view the image on Docker Hub. * - .. raw:: html rocm/ray + - `6.4.1 `__. - `2.48.0.post0 `_ - 2.6.0+git684f6f2 - 24.04 diff --git a/docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst b/docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst index 1550a82d1..f3e2badb7 100644 --- a/docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst +++ b/docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst @@ -2,7 +2,7 @@ .. meta:: :description: Stanford Megatron-LM compatibility - :keywords: Stanford, Megatron-LM, compatibility + :keywords: Stanford, Megatron-LM, deep learning, framework compatibility .. version-set:: rocm_version latest @@ -10,34 +10,50 @@ Stanford Megatron-LM compatibility ******************************************************************************** -Stanford Megatron-LM is a large-scale language model training framework developed by NVIDIA `https://github.com/NVIDIA/Megatron-LM `_. It is -designed to train massive transformer-based language models efficiently by model and data parallelism. +Stanford Megatron-LM is a large-scale language model training framework developed +by NVIDIA at `https://github.com/NVIDIA/Megatron-LM `_. +It is designed to train massive transformer-based language models efficiently by model +and data parallelism. -* ROCm support for Stanford Megatron-LM is hosted in the official `https://github.com/ROCm/Stanford-Megatron-LM `_ repository. -* Due to independent compatibility considerations, this location differs from the `https://github.com/stanford-futuredata/Megatron-LM `_ upstream repository. -* Use the prebuilt :ref:`Docker image ` with ROCm, PyTorch, and Megatron-LM preinstalled. -* See the :doc:`ROCm Stanford Megatron-LM installation guide ` to install and get started. +It provides efficient tensor, pipeline, and sequence-based model parallelism for +pre-training transformer-based language models such as GPT (Decoder Only), BERT +(Encoder Only), and T5 (Encoder-Decoder). -.. note:: - - Stanford Megatron-LM is supported on ROCm 6.3.0. - - -Supported Devices +Support overview ================================================================================ -- **Officially Supported**: AMD Instinct MI300X -- **Partially Supported** (functionality or performance limitations): AMD Instinct MI250X, MI210 +- The ROCm-supported version of Stanford Megatron-LM is maintained in the official `https://github.com/ROCm/Stanford-Megatron-LM + `__ repository, which differs from the + `https://github.com/stanford-futuredata/Megatron-LM `__ upstream repository. +- To get started and install Stanford Megatron-LM on ROCm, use the prebuilt :ref:`Docker image `, + which includes ROCm, Stanford Megatron-LM, and all required dependencies. + + - See the :doc:`ROCm Stanford Megatron-LM installation guide ` + for installation and setup instructions. + + - You can also consult the upstream `Installation guide `__ + for additional context. + +Version support +-------------------------------------------------------------------------------- + +Stanford Megatron-LM is supported on `ROCm 6.3.0 `__. + +Supported devices +-------------------------------------------------------------------------------- + +- **Officially Supported**: AMD Instinct™ MI300X +- **Partially Supported** (functionality or performance limitations): AMD Instinct™ MI250X, MI210 Supported models and features -================================================================================ +-------------------------------------------------------------------------------- This section details models & features that are supported by the ROCm version on Stanford Megatron-LM. Models: -* Bert +* BERT * GPT * T5 * ICT @@ -54,13 +70,24 @@ Features: Use cases and recommendations ================================================================================ -See the `Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs blog `_ post -to leverage the ROCm platform for pre-training by using the Stanford Megatron-LM framework of pre-processing datasets on AMD GPUs. -Coverage includes: +The following blog post mentions Megablocks, but you can run Stanford Megatron-LM with the same steps to pre-process datasets on AMD GPUs: - * Single-GPU pre-training - * Multi-GPU pre-training +* The `Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs + `__ + blog post guides how to leverage the ROCm platform for pre-training using the + Megablocks framework. It introduces a streamlined approach for training Mixture-of-Experts + (MoE) models using the Megablocks library on AMD hardware. Focusing on GPT-2, it + demonstrates how block-sparse computations can enhance scalability and efficiency in MoE + training. The guide provides step-by-step instructions for setting up the environment, + including cloning the repository, building the Docker image, and running the training container. + Additionally, it offers insights into utilizing the ``oscar-1GB.json`` dataset for pre-training + language models. By leveraging Megablocks and the ROCm platform, you can optimize your MoE + training workflows for large-scale transformer models. +It features how to pre-process datasets and how to begin pre-training on AMD GPUs through: + +* Single-GPU pre-training +* Multi-GPU pre-training .. _megatron-lm-docker-compat: @@ -71,10 +98,9 @@ Docker image compatibility -AMD validates and publishes `Stanford Megatron-LM images `_ +AMD validates and publishes `Stanford Megatron-LM images `_ with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated -inventories represent the latest Megatron-LM version from the official Docker Hub. -The Docker images have been validated for `ROCm 6.3.0 `_. +inventories represent the latest Stanford Megatron-LM version from the official Docker Hub. Click |docker-icon| to view the image on Docker Hub. .. list-table:: @@ -82,6 +108,7 @@ Click |docker-icon| to view the image on Docker Hub. :class: docker-image-compatibility * - Docker image + - ROCm - Stanford Megatron-LM - PyTorch - Ubuntu @@ -91,6 +118,7 @@ Click |docker-icon| to view the image on Docker Hub. + - `6.3.0 `_ - `85f95ae `_ - `2.4.0 `_ - 24.04 diff --git a/docs/compatibility/ml-compatibility/taichi-compatibility.rst b/docs/compatibility/ml-compatibility/taichi-compatibility.rst index 5fb2b9708..3da4a3776 100644 --- a/docs/compatibility/ml-compatibility/taichi-compatibility.rst +++ b/docs/compatibility/ml-compatibility/taichi-compatibility.rst @@ -2,7 +2,7 @@ .. meta:: :description: Taichi compatibility - :keywords: GPU, Taichi compatibility + :keywords: GPU, Taichi, deep learning, framework compatibility .. version-set:: rocm_version latest @@ -19,28 +19,52 @@ Taichi is widely used across various domains, including real-time physical simul numerical computing, augmented reality, artificial intelligence, computer vision, robotics, visual effects in film and gaming, and general-purpose computing. -* ROCm support for Taichi is hosted in the official `https://github.com/ROCm/taichi `_ repository. -* Due to independent compatibility considerations, this location differs from the `https://github.com/taichi-dev `_ upstream repository. -* Use the prebuilt :ref:`Docker image ` with ROCm, PyTorch, and Taichi preinstalled. -* See the :doc:`ROCm Taichi installation guide ` to install and get started. +Support overview +================================================================================ -.. note:: +- The ROCm-supported version of Taichi is maintained in the official `https://github.com/ROCm/taichi + `__ repository, which differs from the + `https://github.com/taichi-dev/taichi `__ upstream repository. - Taichi is supported on ROCm 6.3.2. +- To get started and install Taichi on ROCm, use the prebuilt :ref:`Docker image `, + which includes ROCm, Taichi, and all required dependencies. -Supported devices and features -=============================================================================== -There is support through the ROCm software stack for all Taichi GPU features on AMD Instinct MI250X and MI210X Series GPUs with the exception of Taichi’s GPU rendering system, CGUI. -AMD Instinct MI300X Series GPUs will be supported by November. + - See the :doc:`ROCm Taichi installation guide ` + for installation and setup instructions. + + - You can also consult the upstream `Installation guide `__ + for additional context. + +Version support +-------------------------------------------------------------------------------- + +Taichi is supported on `ROCm 6.3.2 `__. + +Supported devices +-------------------------------------------------------------------------------- + +- **Officially Supported**: AMD Instinct™ MI250X, MI210X (with the exception of Taichi’s GPU rendering system, CGUI) +- **Upcoming Support**: AMD Instinct™ MI300X .. _taichi-recommendations: Use cases and recommendations ================================================================================ -To fully leverage Taichi's performance capabilities in compute-intensive tasks, it is best to adhere to specific coding patterns and utilize Taichi decorators. -A collection of example use cases is available in the `https://github.com/ROCm/taichi_examples `_ repository, -providing practical insights and foundational knowledge for working with the Taichi programming language. -You can also refer to the `AMD ROCm blog `_ to search for Taichi examples and best practices to optimize your workflows on AMD GPUs. + +* The `Accelerating Parallel Programming in Python with Taichi Lang on AMD GPUs + `__ + blog highlights Taichi as an open-source programming language designed for high-performance + numerical computation, particularly in domains like real-time physical simulation, + artificial intelligence, computer vision, robotics, and visual effects. Taichi + is embedded in Python and uses just-in-time (JIT) compilation frameworks like + LLVM to optimize execution on GPUs and CPUs. The blog emphasizes the versatility + of Taichi in enabling complex simulations and numerical algorithms, making + it ideal for developers working on compute-intensive tasks. Developers are + encouraged to follow recommended coding patterns and utilize Taichi decorators + for performance optimization, with examples available in the `https://github.com/ROCm/taichi_examples + `_ repository. Prebuilt Docker images + integrating ROCm, PyTorch, and Taichi are provided for simplified installation + and deployment, making it easier to leverage Taichi for advanced computational workloads. .. _taichi-docker-compat: @@ -52,9 +76,8 @@ Docker image compatibility AMD validates and publishes ready-made `ROCm Taichi Docker images `_ -with ROCm backends on Docker Hub. The following Docker image tags and associated inventories +with ROCm backends on Docker Hub. The following Docker image tag and associated inventories represent the latest Taichi version from the official Docker Hub. -The Docker images have been validated for `ROCm 6.3.2 `_. Click |docker-icon| to view the image on Docker Hub. .. list-table:: diff --git a/docs/compatibility/ml-compatibility/tensorflow-compatibility.rst b/docs/compatibility/ml-compatibility/tensorflow-compatibility.rst index 4f48d9f4c..485980d13 100644 --- a/docs/compatibility/ml-compatibility/tensorflow-compatibility.rst +++ b/docs/compatibility/ml-compatibility/tensorflow-compatibility.rst @@ -2,7 +2,7 @@ .. meta:: :description: TensorFlow compatibility - :keywords: GPU, TensorFlow compatibility + :keywords: GPU, TensorFlow, deep learning, framework compatibility .. version-set:: rocm_version latest @@ -12,37 +12,33 @@ TensorFlow compatibility `TensorFlow `__ is an open-source library for solving machine learning, deep learning, and AI problems. It can solve many -problems across different sectors and industries but primarily focuses on -neural network training and inference. It is one of the most popular and -in-demand frameworks and is very active in open-source contribution and -development. +problems across different sectors and industries, but primarily focuses on +neural network training and inference. It is one of the most popular deep +learning frameworks and is very active in open-source development. + +Support overview +================================================================================ + +- The ROCm-supported version of TensorFlow is maintained in the official `https://github.com/ROCm/tensorflow-upstream + `__ repository, which differs from the + `https://github.com/tensorflow/tensorflow `__ upstream repository. + +- To get started and install TensorFlow on ROCm, use the prebuilt :ref:`Docker images `, + which include ROCm, TensorFlow, and all required dependencies. + + - See the :doc:`ROCm TensorFlow installation guide ` + for installation and setup instructions. + + - You can also consult the `TensorFlow API versions `__ list + for additional context. + +Version support +-------------------------------------------------------------------------------- The `official TensorFlow repository `__ includes full ROCm support. AMD maintains a TensorFlow `ROCm repository `__ in order to quickly add bug -fixes, updates, and support for the latest ROCM versions. - -- ROCm TensorFlow release: - - - Offers :ref:`Docker images ` with - ROCm and TensorFlow pre-installed. - - - ROCm TensorFlow repository: ``__ - - - See the :doc:`ROCm TensorFlow installation guide ` - to get started. - -- Official TensorFlow release: - - - Official TensorFlow repository: ``__ - - - See the `TensorFlow API versions `__ list. - - .. note:: - - The official TensorFlow documentation does not cover ROCm support. Use the - ROCm documentation for installation instructions for Tensorflow on ROCm. - See :doc:`rocm-install-on-linux:install/3rd-party/tensorflow-install`. +fixes, updates, and support for the latest ROCm versions. .. _tensorflow-docker-compat: diff --git a/docs/compatibility/ml-compatibility/verl-compatibility.rst b/docs/compatibility/ml-compatibility/verl-compatibility.rst index 0351384e5..d4936a0ec 100644 --- a/docs/compatibility/ml-compatibility/verl-compatibility.rst +++ b/docs/compatibility/ml-compatibility/verl-compatibility.rst @@ -2,7 +2,7 @@ .. meta:: :description: verl compatibility - :keywords: GPU, verl compatibility + :keywords: GPU, verl, deep learning, framework compatibility .. version-set:: rocm_version latest @@ -10,24 +10,58 @@ verl compatibility ******************************************************************************* -Volcano Engine Reinforcement Learning for LLMs (verl) is a reinforcement learning framework designed for large language models (LLMs). -verl offers a scalable, open-source fine-tuning solution optimized for AMD Instinct GPUs with full ROCm support. +Volcano Engine Reinforcement Learning for LLMs (`verl `__) +is a reinforcement learning framework designed for large language models (LLMs). +verl offers a scalable, open-source fine-tuning solution by using a hybrid programming model +that makes it easy to define and run complex post-training dataflows efficiently. -* See the `verl documentation `_ for more information about verl. -* The official verl GitHub repository is `https://github.com/volcengine/verl `_. -* Use the AMD-validated :ref:`Docker images ` with ROCm and verl preinstalled. -* See the :doc:`ROCm verl installation guide ` to install and get started. +Its modular APIs separate computation from data, allowing smooth integration with other frameworks. +It also supports flexible model placement across GPUs for efficient scaling on different cluster sizes. +verl achieves high training and generation throughput by building on existing LLM frameworks. +Its 3D-HybridEngine reduces memory use and communication overhead when switching between training +and inference, improving overall performance. -.. note:: +Support overview +================================================================================ - verl is supported on ROCm 6.2.0. +- The ROCm-supported version of verl is maintained in the official `https://github.com/ROCm/verl + `__ repository, which differs from the + `https://github.com/volcengine/verl `__ upstream repository. + +- To get started and install verl on ROCm, use the prebuilt :ref:`Docker image `, + which includes ROCm, verl, and all required dependencies. + + - See the :doc:`ROCm verl installation guide ` + for installation and setup instructions. + + - You can also consult the upstream `verl documentation `__ + for additional context. + +Version support +-------------------------------------------------------------------------------- + +verl is supported on `ROCm 6.2.0 `__. + +Supported devices +-------------------------------------------------------------------------------- + +**Officially Supported**: AMD Instinct™ MI300X .. _verl-recommendations: Use cases and recommendations ================================================================================ -The benefits of verl in large-scale reinforcement learning from human feedback (RLHF) are discussed in the `Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm Integration `_ blog. +* The benefits of verl in large-scale reinforcement learning from human feedback + (RLHF) are discussed in the `Reinforcement Learning from Human Feedback on AMD + GPUs with verl and ROCm Integration `__ + blog. The blog post outlines how the Volcano Engine Reinforcement Learning + (verl) framework integrates with the AMD ROCm platform to optimize training on + Instinct™ MI300X GPUs. The guide details the process of building a Docker image, + setting up single-node and multi-node training environments, and highlights + performance benchmarks demonstrating improved throughput and convergence accuracy. + This resource serves as a comprehensive starting point for deploying verl on AMD GPUs, + facilitating efficient RLHF training workflows. .. _verl-supported_features: @@ -61,8 +95,10 @@ Docker image compatibility -AMD validates and publishes ready-made `ROCm verl Docker images `_ -with ROCm backends on Docker Hub. The following Docker image tags and associated inventories represent the available verl versions from the official Docker Hub. +AMD validates and publishes ready-made `verl Docker images `_ +with ROCm backends on Docker Hub. The following Docker image tag and associated inventories +represent the latest verl version from the official Docker Hub. +Click |docker-icon| to view the image on Docker Hub. .. list-table:: :header-rows: 1