diff --git a/.wordlist.txt b/.wordlist.txt index 70cdba47a..6d0e2d49e 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -147,6 +147,8 @@ Filesystem FindDb Flang FlashAttention +FlashInfer’s +FlashInfer FluxBenchmark Fortran Fuyu @@ -481,6 +483,7 @@ TCI TCIU TCP TCR +TVM THREADGROUPS threadgroups TensorRT diff --git a/docs/compatibility/compatibility-matrix-historical-6.0.csv b/docs/compatibility/compatibility-matrix-historical-6.0.csv index 696ae3b6d..5c2462234 100644 --- a/docs/compatibility/compatibility-matrix-historical-6.0.csv +++ b/docs/compatibility/compatibility-matrix-historical-6.0.csv @@ -38,8 +38,9 @@ ROCm Version,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6 :doc:`DGL <../compatibility/ml-compatibility/dgl-compatibility>` [#dgl_compat-past-60]_,N/A,N/A,N/A,N/A,2.4.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A :doc:`Megablocks <../compatibility/ml-compatibility/megablocks-compatibility>` [#megablocks_compat-past-60]_,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,0.7.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A :doc:`Taichi <../compatibility/ml-compatibility/taichi-compatibility>` [#taichi_compat-past-60]_,N/A,N/A,N/A,N/A,N/A,N/A,1.8.0b1,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A -:doc:`Ray <../compatibility/ml-compatibility/ray-compatibility>` [#ray_compat-past-60]_,N/A,N/A,N/A,2.48.0.post0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A -:doc:`llama.cpp <../compatibility/ml-compatibility/llama-cpp-compatibility>` [#llama-cpp_compat-past-60]_,N/A,N/A,N/A,N/A,b5997,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A + :doc:`Ray <../compatibility/ml-compatibility/ray-compatibility>` [#ray_compat-past-60]_,N/A,N/A,N/A,2.48.0.post0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A + :doc:`llama.cpp <../compatibility/ml-compatibility/llama-cpp-compatibility>` [#llama-cpp_compat-past-60]_,b6356,b6356,b6356,b6356,b5997,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A + :doc:`FlashInfer <../compatibility/ml-compatibility/flashinfer-compatibility>` [#flashinfer_compat-past-60]_,N/A,N/A,N/A,v0.2.5,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A `ONNX Runtime `_,1.22.0,1.20.0,1.20.0,1.20.0,1.20.0,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.17.3,1.14.1,1.14.1 ,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,, diff --git a/docs/compatibility/compatibility-matrix.rst b/docs/compatibility/compatibility-matrix.rst index abcf6e05e..ff4c90a1d 100644 --- a/docs/compatibility/compatibility-matrix.rst +++ b/docs/compatibility/compatibility-matrix.rst @@ -60,6 +60,7 @@ compatibility and system requirements. :doc:`JAX <../compatibility/ml-compatibility/jax-compatibility>`,0.6.0,0.4.35,0.4.31 :doc:`Stanford Megatron-LM <../compatibility/ml-compatibility/stanford-megatron-lm-compatibility>` [#stanford-megatron-lm_compat]_,N/A,N/A,85f95ae :doc:`Megablocks <../compatibility/ml-compatibility/megablocks-compatibility>` [#megablocks_compat]_,N/A,N/A,0.7.0 + :doc:`llama.cpp <../compatibility/ml-compatibility/llama-cpp-compatibility>` [#llama-cpp_compat]_,b6356,b6356,N/A `ONNX Runtime `_,1.22.0,1.20.0,1.17.3 ,,, THIRD PARTY COMMS,.. _thirdpartycomms-support-compatibility-matrix:,, @@ -175,6 +176,7 @@ compatibility and system requirements. .. [#7700XT-OS] **Prior ROCm 7.0.0** - Radeon RX 7700 XT (gfx1101) is supported only on Ubuntu 24.04.2 and RHEL 9.6. .. [#stanford-megatron-lm_compat] Stanford Megatron-LM is only supported on ROCm 6.3.0. .. [#megablocks_compat] Megablocks is only supported on ROCm 6.3.0. +.. [#llama-cpp_compat] llama.cpp is only supported on ROCm 7.0.0 and 6.4.x. .. [#driver_patch] AMD GPU Driver (amdgpu) 30.10.1 is a quality release that resolves an issue identified in the 30.10 release. There are no other significant changes or feature additions in ROCm 7.0.1 from ROCm 7.0.0. AMD GPU Driver (amdgpu) 30.10.1 is compatible with ROCm 7.0.1 and ROCm 7.0.0. .. [#kfd_support] As of ROCm 6.4.0, forward and backward compatibility between the AMD GPU Driver (amdgpu) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The supported user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and AMD GPU Driver support matrix `_. .. [#ROCT-rocr] Starting from ROCm 6.3.0, the ROCT Thunk Interface is included as part of the ROCr runtime package. @@ -282,7 +284,8 @@ Expand for full historical view of: .. [#megablocks_compat-past-60] Megablocks is only supported on ROCm 6.3.0. .. [#taichi_compat-past-60] Taichi is only supported on ROCm 6.3.2. .. [#ray_compat-past-60] Ray is only supported on ROCm 6.4.1. - .. [#llama-cpp_compat-past-60] llama.cpp is only supported on ROCm 6.4.0. + .. [#llama-cpp_compat-past-60] llama.cpp is only supported on ROCm 7.0.0 and 6.4.x. + .. [#flashinfer_compat-past-60] FlashInfer is only supported on ROCm 6.4.1. .. [#driver_patch-past-60] AMD GPU Driver (amdgpu) 30.10.1 is a quality release that resolves an issue identified in the 30.10 release. There are no other significant changes or feature additions in ROCm 7.0.1 from ROCm 7.0.0. AMD GPU Driver (amdgpu) 30.10.1 is compatible with ROCm 7.0.1 and ROCm 7.0.0. .. [#kfd_support-past-60] As of ROCm 6.4.0, forward and backward compatibility between the AMD GPU Driver (amdgpu) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The supported user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and AMD GPU Driver support matrix `_. .. [#ROCT-rocr-past-60] Starting from ROCm 6.3.0, the ROCT Thunk Interface is included as part of the ROCr runtime package. diff --git a/docs/compatibility/ml-compatibility/flashinfer-compatibility.rst b/docs/compatibility/ml-compatibility/flashinfer-compatibility.rst new file mode 100644 index 000000000..45ecc6a75 --- /dev/null +++ b/docs/compatibility/ml-compatibility/flashinfer-compatibility.rst @@ -0,0 +1,107 @@ +:orphan: + +.. meta:: + :description: FlashInfer deep learning framework compatibility + :keywords: GPU, LLM, FlashInfer, compatibility + +.. version-set:: rocm_version latest + +******************************************************************************** +FlashInfer compatibility +******************************************************************************** + +`FlashInfer `__ is a library and kernel generator +for Large Language Models (LLMs) that provides high-performance implementation of graphics +processing units (GPUs) kernels. FlashInfer focuses on LLM serving and inference, as well +as advanced performance across diverse scenarios. + +FlashInfer features highly efficient attention kernels, load-balanced scheduling, and memory-optimized +techniques, while supporting customized attention variants. It’s compatible with ``torch.compile``, and +offers high-performance LLM-specific operators, with easy integration through PyTorch, and C++ APIs. + +.. note:: + + The ROCm port of FlashInfer is under active development, and some features are not yet available. + For the latest feature compatibility matrix, refer to the ``README`` of the + `https://github.com/ROCm/flashinfer `__ repository. + +Support for the ROCm port of FlashInfer is available as follows: + +- ROCm support for FlashInfer is hosted in the `https://github.com/ROCm/flashinfer + `__ repository. This location differs from the + `https://github.com/flashinfer-ai/flashinfer `_ + upstream repository. + +- To install FlashInfer, use the prebuilt :ref:`Docker image `, + which includes ROCm, FlashInfer, and all required dependencies. + + - See the :doc:`ROCm FlashInfer installation guide ` + to install and get started. + + - See the `Installation guide `__ + in the upstream FlashInfer documentation. + +.. note:: + + Flashinfer is supported on ROCm 6.4.1. + +Supported devices +================================================================================ + +**Officially Supported**: AMD Instinct™ MI300X + + +.. _flashinfer-recommendations: + +Use cases and recommendations +================================================================================ + +This release of FlashInfer on ROCm provides the decode functionality for LLM inferencing. +In the decode phase, tokens are generated sequentially, with the model predicting each new +token based on the previously generated tokens and the input context. + +FlashInfer on ROCm brings over upstream features such as load balancing, sparse and dense +attention optimizations, and batching support, enabling efficient execution on AMD Instinct™ MI300X GPUs. + +Because large LLMs often require substantial KV caches or long context windows, FlashInfer on ROCm +also implements cascade attention from upstream to reduce memory usage. + +For currently supported use cases and recommendations, refer to the `AMD ROCm blog `__, +where you can search for examples and best practices to optimize your workloads on AMD GPUs. + +.. _flashinfer-docker-compat: + +Docker image compatibility +================================================================================ + +.. |docker-icon| raw:: html + + + +AMD validates and publishes `ROCm FlashInfer images `__ +with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated +inventories represent the FlashInfer version from the official Docker Hub. +The Docker images have been validated for `ROCm 6.4.1 `__. +Click |docker-icon| to view the image on Docker Hub. + +.. list-table:: + :header-rows: 1 + :class: docker-image-compatibility + + * - Docker image + - ROCm + - FlashInfer + - PyTorch + - Ubuntu + - Python + + * - .. raw:: html + + rocm/flashinfer + - `6.4.1 `__ + - `v0.2.5 `__ + - `2.7.1 `__ + - 24.04 + - `3.12 `__ + + diff --git a/docs/compatibility/ml-compatibility/llama-cpp-compatibility.rst b/docs/compatibility/ml-compatibility/llama-cpp-compatibility.rst index 1ae246931..902c61a2a 100644 --- a/docs/compatibility/ml-compatibility/llama-cpp-compatibility.rst +++ b/docs/compatibility/ml-compatibility/llama-cpp-compatibility.rst @@ -16,7 +16,7 @@ for Large Language Model (LLM) inference that runs on both central processing un a simple, dependency-free setup. The framework supports multiple quantization options, from 1.5-bit to 8-bit integers, -to speed up inference and reduce memory usage. Originally built as a CPU-first library, +to accelerate inference and reduce memory usage. Originally built as a CPU-first library, llama.cpp is easy to integrate with other programming environments and is widely adopted across diverse platforms, including consumer devices. @@ -40,12 +40,12 @@ with ROCm support: .. note:: - llama.cpp is supported on ROCm 6.4.0. + llama.cpp is supported on ROCm 7.0.0 and ROCm 6.4.x. Supported devices ================================================================================ -**Officially Supported**: AMD Instinct™ MI300X, MI210 +**Officially Supported**: AMD Instinct™ MI300X, MI325X, MI210 Use cases and recommendations @@ -70,7 +70,7 @@ llama.cpp is also used in a range of real-world applications, including: For more use cases and recommendations, refer to the `AMD ROCm blog `__, where you can search for llama.cpp examples and best practices to optimize your workloads on AMD GPUs. -- The `Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration `__, +- The `Llama.cpp Meets Instinct: A New Era of Open-Source AI Acceleration `__ blog post outlines how the open-source llama.cpp framework enables efficient LLM inference—including interactive inference with ``llama-cli``, server deployment with ``llama-server``, GGUF model preparation and quantization, performance benchmarking, and optimizations tailored for AMD Instinct GPUs within the ROCm ecosystem. @@ -84,9 +84,9 @@ Docker image compatibility -AMD validates and publishes `ROCm llama.cpp Docker images `__ +AMD validates and publishes `ROCm llama.cpp Docker images `__ with ROCm backends on Docker Hub. The following Docker image tags and associated -inventories were tested on `ROCm 6.4.0 `__. +inventories represent the available llama.cpp versions from the official Docker Hub. Click |docker-icon| to view the image on Docker Hub. .. important:: @@ -105,8 +105,115 @@ Click |docker-icon| to view the image on Docker Hub. - Server Docker - Light Docker - llama.cpp + - ROCm - Ubuntu + * - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - `b6356 `__ + - `7.0.0 `__ + - 24.04 + + * - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - `b6356 `__ + - `7.0.0 `__ + - 22.04 + + * - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - `b6356 `__ + - `6.4.3 `__ + - 24.04 + + * - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - `b6356 `__ + - `6.4.3 `__ + - 22.04 + + + * - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - `b6356 `__ + - `6.4.2 `__ + - 24.04 + + * - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - `b6356 `__ + - `6.4.2 `__ + - 22.04 + + + * - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - `b6356 `__ + - `6.4.1 `__ + - 24.04 + + * - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - .. raw:: html + + rocm/llama.cpp + - `b6356 `__ + - `6.4.1 `__ + - 22.04 + * - .. raw:: html rocm/llama.cpp @@ -117,40 +224,52 @@ Click |docker-icon| to view the image on Docker Hub. rocm/llama.cpp - `b5997 `__ + - `6.4.0 `__ - 24.04 + Key ROCm libraries for llama.cpp ================================================================================ llama.cpp functionality on ROCm is determined by its underlying library dependencies. These ROCm components affect the capabilities, performance, and -feature set available to developers. +feature set available to developers. Ensure you have the required libraries for +your corresponding ROCm version. .. list-table:: :header-rows: 1 * - ROCm library - - Version + - ROCm 7.0.0 version + - ROCm 6.4.x version - Purpose - Usage * - `hipBLAS `__ - - :version-ref:`hipBLAS rocm_version` + - 3.0.0 + - 2.4.0 - Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for matrix and vector operations. - Supports operations such as matrix multiplication, matrix-vector products, and tensor contractions. Utilized in both dense and batched linear algebra operations. * - `hipBLASLt `__ - - :version-ref:`hipBLASLt rocm_version` + - 1.0.0 + - 0.12.0 - hipBLASLt is an extension of the hipBLAS library, providing additional features like epilogues fused into the matrix multiplication kernel or use of integer tensor cores. - By setting the flag ``ROCBLAS_USE_HIPBLASLT``, you can dispatch hipblasLt kernels where possible. * - `rocWMMA `__ - - :version-ref:`rocWMMA rocm_version` + - 2.0.0 + - 1.7.0 - Accelerates warp-level matrix-multiply and matrix-accumulate to speed up matrix multiplication (GEMM) and accumulation operations with mixed precision support. - Can be used to enhance the flash attention performance on AMD compute, by enabling - the flag during compile time. \ No newline at end of file + the flag during compile time. + +Previous versions +=============================================================================== +See :doc:`rocm-install-on-linux:install/3rd-party/previous-versions/llama-cpp-history` to find documentation for previous releases +of the ``ROCm/llama.cpp`` Docker image. \ No newline at end of file diff --git a/docs/compatibility/ml-compatibility/megablocks-compatibility.rst b/docs/compatibility/ml-compatibility/megablocks-compatibility.rst index 234dc82fc..50c2c3821 100644 --- a/docs/compatibility/ml-compatibility/megablocks-compatibility.rst +++ b/docs/compatibility/ml-compatibility/megablocks-compatibility.rst @@ -28,7 +28,7 @@ Supported devices ================================================================================ - **Officially Supported**: AMD Instinct MI300X -- **Partially Supported** (functionality or performance limitations): AMD Instinct MI250X, MI210X +- **Partially Supported** (functionality or performance limitations): AMD Instinct MI250X, MI210 Supported models and features ================================================================================ diff --git a/docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst b/docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst index e8f1b4195..1550a82d1 100644 --- a/docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst +++ b/docs/compatibility/ml-compatibility/stanford-megatron-lm-compatibility.rst @@ -27,7 +27,7 @@ Supported Devices ================================================================================ - **Officially Supported**: AMD Instinct MI300X -- **Partially Supported** (functionality or performance limitations): AMD Instinct MI250X, MI210X +- **Partially Supported** (functionality or performance limitations): AMD Instinct MI250X, MI210 Supported models and features diff --git a/docs/conf.py b/docs/conf.py index 78d50d502..760e3326c 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -110,6 +110,7 @@ article_pages = [ {"file": "compatibility/ml-compatibility/taichi-compatibility", "os": ["linux"]}, {"file": "compatibility/ml-compatibility/ray-compatibility", "os": ["linux"]}, {"file": "compatibility/ml-compatibility/llama-cpp-compatibility", "os": ["linux"]}, + {"file": "compatibility/ml-compatibility/flashinfer-compatibility", "os": ["linux"]}, {"file": "how-to/deep-learning-rocm", "os": ["linux"]}, {"file": "how-to/rocm-for-ai/index", "os": ["linux"]}, diff --git a/docs/how-to/deep-learning-rocm.rst b/docs/how-to/deep-learning-rocm.rst index accb2e546..fb21328f8 100644 --- a/docs/how-to/deep-learning-rocm.rst +++ b/docs/how-to/deep-learning-rocm.rst @@ -128,10 +128,22 @@ The table below summarizes information about ROCm-enabled deep learning framewor - - `Docker image `__ + - `ROCm Base Docker image `__ - .. raw:: html + * - `FlashInfer `__ + - .. raw:: html + + + - + - `Docker image `__ + - `ROCm Base Docker image `__ + - .. raw:: html + + + Learn how to use your ROCm deep learning environment for training, fine-tuning, inference, and performance optimization through the following guides. diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index 92f0534f9..bfaef7ffe 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -49,6 +49,8 @@ subtrees: title: Ray compatibility - file: compatibility/ml-compatibility/llama-cpp-compatibility.rst title: llama.cpp compatibility + - file: compatibility/ml-compatibility/flashinfer-compatibility.rst + title: FlashInfer compatibility - file: how-to/build-rocm.rst title: Build ROCm from source