From db43d18c3725ba53c00544971139ba9b743f1536 Mon Sep 17 00:00:00 2001 From: anisha-amd Date: Tue, 9 Sep 2025 11:02:30 -0400 Subject: [PATCH] Docs: frameworks compatibility- ray and llama.cpp (#5273) --- .wordlist.txt | 1 + .../compatibility-matrix-historical-6.0.csv | 2 + docs/compatibility/compatibility-matrix.rst | 2 + .../llama-cpp-compatibility.rst | 151 ++++++++++++++++++ .../ml-compatibility/ray-compatibility.rst | 105 ++++++++++++ docs/conf.py | 2 + docs/how-to/deep-learning-rocm.rst | 22 +++ docs/sphinx/_toc.yml.in | 12 +- 8 files changed, 293 insertions(+), 4 deletions(-) create mode 100644 docs/compatibility/ml-compatibility/llama-cpp-compatibility.rst create mode 100644 docs/compatibility/ml-compatibility/ray-compatibility.rst diff --git a/.wordlist.txt b/.wordlist.txt index 289fc276e..5370f4752 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -501,6 +501,7 @@ Unhandled VALU VBIOS VCN +verl's VGPR VGPRs VM diff --git a/docs/compatibility/compatibility-matrix-historical-6.0.csv b/docs/compatibility/compatibility-matrix-historical-6.0.csv index b8f7b6ba2..54f5ceb50 100644 --- a/docs/compatibility/compatibility-matrix-historical-6.0.csv +++ b/docs/compatibility/compatibility-matrix-historical-6.0.csv @@ -35,6 +35,8 @@ ROCm Version,6.4.3,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6 :doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>` [#dgl_compat]_,N/A,N/A,N/A,2.4.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A, :doc:`Megablocks <../compatibility/ml-compatibility/megablocks-compatibility>` [#megablocks_compat]_,N/A,N/A,N/A,N/A,N/A,N/A,N/A,0.7.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A :doc:`Taichi <../compatibility/ml-compatibility/taichi-compatibility>` [#taichi_compat]_,N/A,N/A,N/A,N/A,N/A,1.8.0b1,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A + :doc:`Ray <../compatibility/ml-compatibility/ray-compatibility>` [#ray_compat]_,N/A,N/A,2.48.0.post0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A + :doc:`llama.cpp <../compatibility/ml-compatibility/llama-cpp-compatibility>` [#llama-cpp_compat]_,N/A,N/A,N/A,b5997,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A `ONNX Runtime `_,1.2,1.2,1.2,1.2,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.14.1,1.14.1 ,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,, diff --git a/docs/compatibility/compatibility-matrix.rst b/docs/compatibility/compatibility-matrix.rst index 797e2894e..fb1ffad43 100644 --- a/docs/compatibility/compatibility-matrix.rst +++ b/docs/compatibility/compatibility-matrix.rst @@ -246,6 +246,8 @@ Expand for full historical view of: .. [#dgl_compat] DGL is only supported on ROCm 6.4.0. .. [#megablocks_compat] Megablocks is only supported on ROCm 6.3.0. .. [#taichi_compat] Taichi is only supported on ROCm 6.3.2. + .. [#ray_compat] Ray is only supported on ROCm 6.4.1. + .. [#llama-cpp_compat] llama.cpp is only supported on ROCm 6.4.0. .. [#kfd_support-past-60] As of ROCm 6.4.0, forward and backward compatibility between the AMD Kernel-mode GPU Driver (KMD) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The tested user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and kernel-space support matrix `_. .. [#ROCT-rocr-past-60] Starting from ROCm 6.3.0, the ROCT Thunk Interface is included as part of the ROCr runtime package. diff --git a/docs/compatibility/ml-compatibility/llama-cpp-compatibility.rst b/docs/compatibility/ml-compatibility/llama-cpp-compatibility.rst new file mode 100644 index 000000000..fd1356d32 --- /dev/null +++ b/docs/compatibility/ml-compatibility/llama-cpp-compatibility.rst @@ -0,0 +1,151 @@ +:orphan: + +.. meta:: + :description: llama.cpp deep learning framework compatibility + :keywords: GPU, GGML, llama.cpp compatibility + +.. version-set:: rocm_version latest + +******************************************************************************** +llama.cpp compatibility +******************************************************************************** + +`llama.cpp `__ is an open-source framework +for Large Language Model (LLM) inference that runs on both central processing units +(CPUs) and graphics processing units (GPUs). It is written in plain C/C++, providing +a simple, dependency-free setup. + +The framework supports multiple quantization options, from 1.5-bit to 8-bit integers, +to speed up inference and reduce memory usage. Originally built as a CPU-first library, +llama.cpp is easy to integrate with other programming environments and is widely +adopted across diverse platforms, including consumer devices. + +ROCm support for llama.cpp is upstreamed, and you can build the official source code +with ROCm support: + +- ROCm support for llama.cpp is hosted in the official `https://github.com/ROCm/llama.cpp + `_ repository. + +- Due to independent compatibility considerations, this location differs from the + `https://github.com/ggml-org/llama.cpp `_ upstream repository. + +- To install llama.cpp, use the prebuilt :ref:`Docker image `, + which includes ROCm, llama.cpp, and all required dependencies. + + - See the :doc:`ROCm llama.cpp installation guide ` + to install and get started. + + - See the `Installation guide `__ + in the upstream llama.cpp documentation. + +.. note:: + + llama.cpp is supported on ROCm 6.4.0. + +Supported devices +================================================================================ + +**Officially Supported**: AMD Instinct™ MI300X, MI210 + + +Use cases and recommendations +================================================================================ + +llama.cpp can be applied in a variety of scenarios, particularly when you need to meet one or more of the following requirements: + +- Plain C/C++ implementation with no external dependencies +- Support for 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory usage +- Custom HIP (Heterogeneous-compute Interface for Portability) kernels for running large language models (LLMs) on AMD GPUs (graphics processing units) +- CPU (central processing unit) + GPU (graphics processing unit) hybrid inference for partially accelerating models larger than the total available VRAM (video random-access memory) + +llama.cpp is also used in a range of real-world applications, including: + +- Games such as `Lucy's Labyrinth `__: + A simple maze game where AI-controlled agents attempt to trick the player. +- Tools such as `Styled Lines `__: + A proprietary, asynchronous inference wrapper for Unity3D game development, including pre-built mobile and web platform wrappers and a model example. +- Various other AI applications use llama.cpp as their inference engine; + for a detailed list, see the `user interfaces (UIs) section `__. + +Refer to the `AMD ROCm blog `_, +where you can search for llama.cpp examples and best practices to optimize your workloads on AMD GPUs. + +.. _llama-cpp-docker-compat: + +Docker image compatibility +================================================================================ + +.. |docker-icon| raw:: html + + + +AMD validates and publishes `ROCm llama.cpp Docker images `__ +with ROCm backends on Docker Hub. The following Docker image tags and associated +inventories were tested on `ROCm 6.4.0 `__. +Click |docker-icon| to view the image on Docker Hub. + +.. important:: + + Tag endings of ``_full``, ``_server``, and ``_light`` serve different purposes for entrypoints as follows: + + - Full: This image includes both the main executable file and the tools to convert ``LLaMA`` models into ``ggml`` and convert into 4-bit quantization. + - Server: This image only includes the server executable file. + - Light: This image only includes the main executable file. + +.. list-table:: + :header-rows: 1 + :class: docker-image-compatibility + + * - Full Docker + - Server Docker + - Light Docker + - llama.cpp + - Ubuntu + + * - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - `b5997 `__ + - 24.04 + +Key ROCm libraries for llama.cpp +================================================================================ + +llama.cpp functionality on ROCm is determined by its underlying library +dependencies. These ROCm components affect the capabilities, performance, and +feature set available to developers. + +.. list-table:: + :header-rows: 1 + + * - ROCm library + - Version + - Purpose + - Usage + * - `hipBLAS `__ + - :version-ref:`hipBLAS rocm_version` + - Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for + matrix and vector operations. + - Supports operations such as matrix multiplication, matrix-vector + products, and tensor contractions. Utilized in both dense and batched + linear algebra operations. + * - `hipBLASLt `__ + - :version-ref:`hipBLASLt rocm_version` + - hipBLASLt is an extension of the hipBLAS library, providing additional + features like epilogues fused into the matrix multiplication kernel or + use of integer tensor cores. + - By setting the flag ``ROCBLAS_USE_HIPBLASLT``, you can dispatch hipblasLt + kernels where possible. + * - `rocWMMA `__ + - :version-ref:`rocWMMA rocm_version` + - Accelerates warp-level matrix-multiply and matrix-accumulate to speed up matrix + multiplication (GEMM) and accumulation operations with mixed precision + support. + - Can be used to enhance the flash attention performance on AMD compute, by enabling + the flag during compile time. \ No newline at end of file diff --git a/docs/compatibility/ml-compatibility/ray-compatibility.rst b/docs/compatibility/ml-compatibility/ray-compatibility.rst new file mode 100644 index 000000000..c5a2ed39f --- /dev/null +++ b/docs/compatibility/ml-compatibility/ray-compatibility.rst @@ -0,0 +1,105 @@ +:orphan: + +.. meta:: + :description: Ray deep learning framework compatibility + :keywords: GPU, Ray compatibility + +.. version-set:: rocm_version latest + +******************************************************************************* +Ray compatibility +******************************************************************************* + +Ray is a unified framework for scaling AI and Python applications from your laptop +to a full cluster, without changing your code. Ray consists of `a core distributed +runtime `_ and a set of +`AI libraries `_ for +simplifying machine learning computations. + +Ray is a general-purpose framework that runs many types of workloads efficiently. +Any Python application can be scaled with Ray, without extra infrastructure. + +ROCm support for Ray is upstreamed, and you can build the official source code +with ROCm support: + +- ROCm support for Ray is hosted in the official `https://github.com/ROCm/ray + `_ repository. + +- Due to independent compatibility considerations, this location differs from the + `https://github.com/ray-project/ray `_ upstream repository. + +- To install Ray, use the prebuilt :ref:`Docker image ` + which includes ROCm, Ray, and all required dependencies. + + - See the :doc:`ROCm Ray installation guide ` + for instructions to get started. + + - See the `Installation section `_ + in the upstream Ray documentation. + + - The Docker image provided is based on the upstream Ray `Daily Release (Nightly) wheels `__ + corresponding to commit `005c372 `__. + +.. note:: + + Ray is supported on ROCm 6.4.1. + +Supported devices +================================================================================ + +**Officially Supported**: AMD Instinct™ MI300X, MI210 + + +Use cases and recommendations +================================================================================ + +* The `Reinforcement Learning from Human Feedback on AMD GPUs with verl and ROCm + Integration `__ + blog provides an overview of Volcano Engine Reinforcement Learning (verl) + for large language models (LLMs) and discusses its benefits in large-scale + reinforcement learning from human feedback (RLHF). It uses Ray as part of a + hybrid orchestration engine to schedule and coordinate training and inference + tasks in parallel, enabling optimized resource utilization and potential overlap + between these phases. This dynamic resource allocation strategy significantly + improves overall system efficiency. The blog presents verl’s performance results, + focusing on throughput and convergence accuracy achieved on AMD Instinct™ MI300X + GPUs. Follow this guide to get started with verl on AMD Instinct GPUs and + accelerate your RLHF training with ROCm-optimized performance. + +For more use cases and recommendations, see the AMD GPU tabs in the `Accelerator Support +topic `_ +of the Ray core documentation and refer to the `AMD ROCm blog `_, +where you can search for Ray examples and best practices to optimize your workloads on AMD GPUs. + +.. _ray-docker-compat: + +Docker image compatibility +================================================================================ + +.. |docker-icon| raw:: html + + + +AMD validates and publishes ready-made `ROCm Ray Docker images `__ +with ROCm backends on Docker Hub. The following Docker image tags and +associated inventories represent the latest Ray version from the official Docker Hub and are validated for +`ROCm 6.4.1 `_. Click the |docker-icon| +icon to view the image on Docker Hub. + +.. list-table:: + :header-rows: 1 + :class: docker-image-compatibility + + * - Docker image + - Ray + - Pytorch + - Ubuntu + - Python + + * - .. raw:: html + + rocm/ray + - `2.48.0.post0 `_ + - 2.6.0+git684f6f2 + - 24.04 + - `3.12.10 `_ diff --git a/docs/conf.py b/docs/conf.py index 6e7fa5e61..f852b6697 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -108,6 +108,8 @@ article_pages = [ {"file": "compatibility/ml-compatibility/dgl-compatibility", "os": ["linux"]}, {"file": "compatibility/ml-compatibility/megablocks-compatibility", "os": ["linux"]}, {"file": "compatibility/ml-compatibility/taichi-compatibility", "os": ["linux"]}, + {"file": "compatibility/ml-compatibility/ray-compatibility", "os": ["linux"]}, + {"file": "compatibility/ml-compatibility/llama-cpp-compatibility", "os": ["linux"]}, {"file": "how-to/deep-learning-rocm", "os": ["linux"]}, {"file": "how-to/rocm-for-ai/index", "os": ["linux"]}, diff --git a/docs/how-to/deep-learning-rocm.rst b/docs/how-to/deep-learning-rocm.rst index fb1d55a3c..accb2e546 100644 --- a/docs/how-to/deep-learning-rocm.rst +++ b/docs/how-to/deep-learning-rocm.rst @@ -110,6 +110,28 @@ The table below summarizes information about ROCm-enabled deep learning framewor + * - `Ray `__ + - .. raw:: html + + + - + - `Docker image `__ + - `Wheels package `__ + - `ROCm Base Docker image `__ + - .. raw:: html + + + + * - `llama.cpp `__ + - .. raw:: html + + + - + - `Docker image `__ + - .. raw:: html + + + Learn how to use your ROCm deep learning environment for training, fine-tuning, inference, and performance optimization through the following guides. diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index 1bb9177f0..732aab15e 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -32,19 +32,23 @@ subtrees: - file: compatibility/ml-compatibility/pytorch-compatibility.rst title: PyTorch compatibility - file: compatibility/ml-compatibility/tensorflow-compatibility.rst - title: TensorFlow compatibility + title: TensorFlow compatibility - file: compatibility/ml-compatibility/jax-compatibility.rst title: JAX compatibility - file: compatibility/ml-compatibility/verl-compatibility.rst - title: verl compatibility + title: verl compatibility - file: compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst title: Stanford Megatron-LM compatibility - file: compatibility/ml-compatibility/dgl-compatibility.rst - title: DGL compatibility + title: DGL compatibility - file: compatibility/ml-compatibility/megablocks-compatibility.rst title: Megablocks compatibility - file: compatibility/ml-compatibility/taichi-compatibility.rst - title: Taichi compatibility + title: Taichi compatibility + - file: compatibility/ml-compatibility/ray-compatibility.rst + title: Ray compatibility + - file: compatibility/ml-compatibility/llama-cpp-compatibility.rst + title: llama.cpp compatibility - file: how-to/build-rocm.rst title: Build ROCm from source