Docs: Update JAX compatibility page

Update vLLM benchmarking guide (#4347 )
* update vllm-benchmark fix hlist overflow update standalone benchmarking options update list of models fix typo and model name unnecessary duplicate info update formatting update vllm benchmark guide - remove Llama 2 FP8 - add Jais 13B - update commands update docker pull tag update MAD available models remove extra mad models not relevant to vllm update PyTorch version add changelog add model names to .wordlist.txt * Update docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst Co-authored-by: Pratik Basyal <pratik.basyal@amd.com> * Update docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst Co-authored-by: Pratik Basyal <pratik.basyal@amd.com> * Update docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst Co-authored-by: Pratik Basyal <pratik.basyal@amd.com> * fix typo * update link * fix link text * change changelog to previous versions * fix typo * remove "for" --------- Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2026-01-09 22:58:17 -05:00 · 2025-02-13 16:58:05 +01:00 · 2025-02-05 17:18:35 -05:00 · 2025-02-05 16:40:31 -05:00 · 2025-02-05 14:46:16 -05:00
5 changed files with 224 additions and 284 deletions
--- a/.github/workflows/issue_retrieval.yml
+++ b/.github/workflows/issue_retrieval.yml
@@ -2,7 +2,7 @@ name: Issue retrieval

 on:
  issues:
-    types: [opened]
+    types: [opened, edited]

 jobs:
  auto-retrieve:
--- a/.wordlist.txt
+++ b/.wordlist.txt
@@ -74,6 +74,7 @@ Conda
 ConnectX
 CuPy
 Dashboarding
+DBRX
 DDR
 DF
 DGEMM
@@ -92,6 +93,7 @@ DataFrame
 DataLoader
 DataParallel
 Debian
+DeepSeek
 DeepSpeed
 Dependabot
 Deprecations
@@ -129,6 +131,7 @@ GDS
 GEMM
 GEMMs
 GFortran
+Gemma
 GiB
 GIM
 GL
@@ -674,6 +677,7 @@ namespace
 namespaces
 nanoGPT
 num
+numpy
 numref
 ocl
 opencl
--- a/docs/about/license.md
+++ b/docs/about/license.md
@@ -62,7 +62,7 @@ additional licenses. Please review individual repositories for more information.
 | [rocJPEG](https://github.com/ROCm/rocJPEG/) | [MIT](https://github.com/ROCm/rocJPEG/blob/develop/LICENSE) |
 | [ROCK-Kernel-Driver](https://github.com/ROCm/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/ROCm/ROCK-Kernel-Driver/blob/master/COPYING) |
 | [rocminfo](https://github.com/ROCm/rocminfo/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocminfo/blob/amd-staging/License.txt) |
-| [ROCm Bandwidth Test](https://github.com/ROCm/rocm_bandwidth_test/) | [The University of Illinois/NCSA](https://github.com/ROCm/rocm_bandwidth_test/blob/master/LICENSE.txt) |
+| [ROCm Bandwidth Test](https://github.com/ROCm/rocm_bandwidth_test/) | [MIT](https://github.com/ROCm/rocm_bandwidth_test/blob/master/LICENSE.txt) |
 | [ROCm CMake](https://github.com/ROCm/rocm-cmake/) | [MIT](https://github.com/ROCm/rocm-cmake/blob/develop/LICENSE) |
 | [ROCm Communication Collectives Library (RCCL)](https://github.com/ROCm/rccl/) | [Custom](https://github.com/ROCm/rccl/blob/develop/LICENSE.txt) |
 | [ROCm-Core](https://github.com/ROCm/rocm-core) | [MIT](https://github.com/ROCm/rocm-core/blob/master/copyright) |
--- a/docs/compatibility/ml-compatibility/jax-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/jax-compatibility.rst
@@ -55,7 +55,7 @@ Docker image compatibility

 AMD validates and publishes ready-made `ROCm JAX Docker images <https://hub.docker.com/r/rocm/jax>`_
 with ROCm backends on Docker Hub. The following Docker image tags and
-associated inventories are validated for
+associated inventories represent the latest JAX version from the official Docker Hub and are validated for
 `ROCm 6.3.1 <https://repo.radeon.com/rocm/apt/6.3.1/>`_. Click the |docker-icon|
 icon to view the image on Docker Hub.

@@ -83,7 +83,8 @@ icon to view the image on Docker Hub.

 AMD publishes `Community ROCm JAX Docker images <https://hub.docker.com/r/rocm/jax-community>`_
 with ROCm backends on Docker Hub. The following Docker image tags and
-associated inventories are tested for `ROCm 6.2.4 <https://repo.radeon.com/rocm/apt/6.2.4/>`_.
+associated inventories represent the latest JAX version from the official Docker Hub and are
+tested for `ROCm 6.3.1 <https://repo.radeon.com/rocm/apt/6.3.1/>`_.

 .. list-table:: JAX community Docker image components
    :header-rows: 1
@@ -94,25 +95,25 @@ associated inventories are tested for `ROCm 6.2.4 <https://repo.radeon.com/rocm/
      - Python
    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.2.4-jax0.4.35-py3.12.7/images/sha256-a6032d89c07573b84c44e42c637bf9752b1b7cd2a222d39344e603d8f4c63beb?context=explore"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>
+           <a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.3.1-jax0.5.0-py3.12.8/images/sha256-897d7471a954d9df7f79cd28b87ec515fbb94189fc3cf13e3a1588aa6b2a5fee?context_explore"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>

-      - `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
+      - `0.5.0 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.5.0>`_
      - Ubuntu 22.04
-      - `3.12.7 <https://www.python.org/downloads/release/python-3127/>`_
+      - `3.12.8 <https://www.python.org/downloads/release/python-3128/>`_
    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.2.4-jax0.4.35-py3.11.10/images/sha256-d462f7e445545fba2f3b92234a21beaa52fe6c5f550faabcfdcd1bf53486d991?context=explore"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>
+           <a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.3.1-jax0.5.0-py3.11.11/images/sha256-f59243f324ee8da8dd54cd81b3649a860b2b454eaac8e4ce41d5c5f40e42b0e8?context_explore"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>

-      - `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
+      - `0.5.0 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.5.0>`_
      - Ubuntu 22.04
      - `3.11.10 <https://www.python.org/downloads/release/python-31110/>`_
    * - .. raw:: html

-           <a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.2.4-jax0.4.35-py3.10.15/images/sha256-6f2d4d0f529378d9572f0e8cfdcbc101d1e1d335bd626bb3336fff87814e9d60?context=explore"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>
+           <a href="https://hub.docker.com/layers/rocm/jax-community/rocm6.3.1-jax0.5.0-py3.10.16/images/sha256-6f12e5f6a3b5d033d2b1a43938b6804978d999978e68e402228d02984a69fb9d?context_explore"><i class="fab fa-docker fa-lg"></i> rocm/jax-community</a>

-      - `0.4.35 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.4.35>`_
+      - `0.5.0 <https://github.com/ROCm/jax/releases/tag/rocm-jax-v0.5.0>`_
      - Ubuntu 22.04
-      - `3.10.15 <https://www.python.org/downloads/release/python-31015/>`_
+      - `3.10.16 <https://www.python.org/downloads/release/python-31016/>`_

 Critical ROCm libraries for JAX
 ================================================================================
@@ -210,7 +211,7 @@ performance, and feature set available to developers.
 Supported and unsupported features
 ===============================================================================

-The following table maps GPU-accelerated JAX modules to their supported
+The following table maps the public JAX API modules to their supported
 ROCm and JAX versions.

 .. list-table::
@@ -247,21 +248,11 @@ ROCm and JAX versions.
        devices.
      - 0.3.20
      - 5.1.0
-    * - ``jax.dlpack``
-      - For exchanging tensor data between JAX and other libraries that support the
-        DLPack standard.
-      - 0.1.57
-      - 5.0.0
    * - ``jax.distributed``
      - Enables the scaling of computations across multiple devices on a single
        machine or across multiple machines.
      - 0.1.74
      - 5.0.0
-    * - ``jax.dtypes``
-      - Provides utilities for working with and managing data types in JAX
-        arrays and computations.
-      - 0.1.66
-      - 5.0.0
    * - ``jax.image``
      - Contains image manipulation functions like resize, scale and translation.
      - 0.1.57
@@ -275,27 +266,10 @@ ROCm and JAX versions.
        array.
      - 0.1.57
      - 5.0.0
-    * - ``jax.profiler``
-      - Contains JAX’s tracing and time profiling features.
-      - 0.1.57
-      - 5.0.0
    * - ``jax.stages``
      - Contains interfaces to stages of the compiled execution process.
      - 0.3.4
      - 5.0.0
-    * - ``jax.tree``
-      - Provides utilities for working with tree-like container data structures.
-      - 0.4.26
-      - 5.6.0
-    * - ``jax.tree_util``
-      - Provides utilities for working with nested data structures, or
-        ``pytrees``.
-      - 0.1.65
-      - 5.0.0
-    * - ``jax.typing``
-      - Provides JAX-specific static type annotations.
-      - 0.3.18
-      - 5.1.0
    * - ``jax.extend``
      - Provides modules for access to JAX internal machinery module. The
        ``jax.extend`` module defines a library view of some of JAX’s internal
@@ -322,10 +296,10 @@ ROCm and JAX versions.
      - jax_triton 0.2.0
      - 6.2.4

-jax.scipy module
+jax.lax module
 -------------------------------------------------------------------------------

-A SciPy-like API for scientific computing.
+A module for primitives operations.

 .. list-table::
    :header-rows: 1
@@ -333,129 +307,14 @@ A SciPy-like API for scientific computing.
    * - Module
      - Since JAX
      - Since ROCm
-    * - ``jax.scipy.cluster``
-      - 0.3.11
-      - 5.1.0
-    * - ``jax.scipy.fft``
-      - 0.1.71
+    * - ``jax.lax.linalg``
+      - 0.3.2
      - 5.0.0
-    * - ``jax.scipy.integrate``
-      - 0.4.15
-      - 5.5.0
-    * - ``jax.scipy.interpolate``
-      - 0.1.76
-      - 5.0.0
-    * - ``jax.scipy.linalg``
-      - 0.1.56
-      - 5.0.0
-    * - ``jax.scipy.ndimage``
-      - 0.1.56
-      - 5.0.0
-    * - ``jax.scipy.optimize``
-      - 0.1.57
-      - 5.0.0
-    * - ``jax.scipy.signal``
-      - 0.1.56
-      - 5.0.0
-    * - ``jax.scipy.spatial.transform``
-      - 0.4.12
-      - 5.4.0
-    * - ``jax.scipy.sparse.linalg``
-      - 0.1.56
-      - 5.0.0
-    * - ``jax.scipy.special``
-      - 0.1.56
-      - 5.0.0
-    * - ``jax.scipy.stats``
-      - 0.1.56
-      - 5.0.0
-
-jax.scipy.stats module
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-.. list-table::
-   :header-rows: 1
-
-   * - Module
-     - Since JAX
-     - Since ROCm
-   * - ``jax.scipy.stats.bernouli``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.beta``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.betabinom``
-     - 0.1.61
-     - 5.0.0
-   * - ``jax.scipy.stats.binom``
-     - 0.4.14
-     - 5.4.0
-   * - ``jax.scipy.stats.cauchy``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.chi2``
-     - 0.1.61
-     - 5.0.0
-   * - ``jax.scipy.stats.dirichlet``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.expon``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.gamma``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.gennorm``
-     - 0.3.15
-     - 5.2.0
-   * - ``jax.scipy.stats.geom``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.laplace``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.logistic``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.multinomial``
-     - 0.3.18
-     - 5.1.0
-   * - ``jax.scipy.stats.multivariate_normal``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.nbinom``
-     - 0.1.72
-     - 5.0.0
-   * - ``jax.scipy.stats.norm``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.pareto``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.poisson``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.t``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.truncnorm``
-     - 0.4.0
-     - 5.3.0
-   * - ``jax.scipy.stats.uniform``
-     - 0.1.56
-     - 5.0.0
-   * - ``jax.scipy.stats.vonmises``
-     - 0.4.2
-     - 5.3.0
-   * - ``jax.scipy.stats.wrapcauchy``
-     - 0.4.20
-     - 5.6.0

 jax.extend module
 -------------------------------------------------------------------------------

-Modules for JAX extensions.
+A module for primitives operations.

 .. list-table::
    :header-rows: 1
@@ -463,18 +322,30 @@ Modules for JAX extensions.
    * - Module
      - Since JAX
      - Since ROCm
-    * - ``jax.extend.ffi``
-      - 0.4.30
-      - 6.0.0
-    * - ``jax.extend.linear_util``
-      - 0.4.17
-      - 5.6.0
-    * - ``jax.extend.mlir``
-      - 0.4.26
-      - 5.6.0
-    * - ``jax.extend.random``
+    * - ``jax.extend.core``
      - 0.4.15
      - 5.5.0
+    * - ``jax.extend.core.primitives``
+      - 0.4.32
+      - 5.5.0
+
+jax.numpy module
+-------------------------------------------------------------------------------
+
+A module for primitives operations.
+
+.. list-table::
+    :header-rows: 1
+
+    * - Module
+      - Since JAX
+      - Since ROCm
+    * - ``jax.numpy.fft``
+      - 0.3.20
+      - 5.1.0
+    * - ``jax.numpy.linalg``
+      - 0.3.20
+      - 5.1.0

 jax.experimental module
 -------------------------------------------------------------------------------
@@ -490,9 +361,6 @@ Experimental modules and APIs.
    * - ``jax.experimental.checkify``
      - 0.1.75
      - 5.0.0
-    * - ``jax.experimental.compilation_cache.compilation_cache``
-      - 0.1.68
-      - 5.0.0
    * - ``jax.experimental.custom_partitioning``
      - 0.4.0
      - 5.3.0
@@ -514,8 +382,11 @@ Experimental modules and APIs.
    * - ``jax.experimental.pjit``
      - 0.1.61
      - 5.0.0
-    * - ``jax.experimental.serialize_executable``
-      - 0.4.0
+    * - ``jax.experimental.roofline``
+      - 0.4.36
+      - 5.3.0
+    * - ``jax.experimental.rnn``
+      - 0.4.3
      - 5.3.0
    * - ``jax.experimental.shard_map``
      - 0.4.3
@@ -572,9 +443,6 @@ Experimental support for sparse matrix operations.
    * - ``jax.experimental.sparse.linalg``
      - 0.3.15
      - 5.2.0
-    * - ``jax.experimental.sparse.sparsify``
-      - 0.3.25
-      - ❌

 .. list-table::
    :header-rows: 1
@@ -622,9 +490,6 @@ ROCm.
    * - XLA int4 support
      - 4-bit integer (int4) precision in the XLA compiler.
      - 0.4.0
-    * - ``jax.experimental.sparsify``
-      - Converts a dense matrix to a sparse matrix representation.
-      - Experimental

 Use cases and recommendations
 ================================================================================
--- a/docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst
+++ b/docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst
@@ -10,49 +10,22 @@ LLM inference performance validation on AMD Instinct MI300X
 .. _vllm-benchmark-unified-docker:

 The `ROCm vLLM Docker <https://hub.docker.com/r/rocm/vllm/tags>`_ image offers
-a prebuilt, optimized environment designed for validating large language model
-(LLM) inference performance on the AMD Instinct™ MI300X accelerator. This
-ROCm vLLM Docker image integrates vLLM and PyTorch tailored specifically for the
-MI300X accelerator and includes the following components:
+a prebuilt, optimized environment for validating large language model (LLM)
+inference performance on the AMD Instinct™ MI300X accelerator. This ROCm vLLM
+Docker image integrates vLLM and PyTorch tailored specifically for the MI300X
+accelerator and includes the following components:

-* `ROCm 6.2.1 <https://github.com/ROCm/ROCm>`_
+* `ROCm 6.3.1 <https://github.com/ROCm/ROCm>`_

-* `vLLM 0.6.4 <https://docs.vllm.ai/en/latest>`_
+* `vLLM 0.6.6 <https://docs.vllm.ai/en/latest>`_

-* `PyTorch 2.5.0 <https://github.com/pytorch/pytorch>`_
-
-* Tuning files (in CSV format)
+* `PyTorch 2.7.0 (2.7.0a0+git3a58512) <https://github.com/pytorch/pytorch>`_

 With this Docker image, you can quickly validate the expected inference
-performance numbers on the MI300X accelerator. This topic also provides tips on
-optimizing performance with popular AI models.
-
-.. hlist::
-   :columns: 6
-
-   * Llama 3.1 8B
-
-   * Llama 3.1 70B
-
-   * Llama 3.1 405B
-
-   * Llama 2 7B
-
-   * Llama 2 70B
-
-   * Mixtral 8x7B
-
-   * Mixtral 8x22B
-
-   * Mixtral 7B
-
-   * Qwen2 7B
-
-   * Qwen2 72B
-
-   * JAIS 13B
-
-   * JAIS 30B
+performance numbers for the MI300X accelerator. This topic also provides tips on
+optimizing performance with popular AI models. For more information, see the lists of
+:ref:`available models for MAD-integrated benchmarking <vllm-benchmark-mad-models>`
+and :ref:`standalone benchmarking <vllm-benchmark-standalone-options>`.

 .. _vllm-benchmark-vllm:

@@ -91,9 +64,9 @@ MI300X accelerator with the prebuilt vLLM Docker image.

   .. code-block:: shell

-      docker pull rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
+      docker pull rocm/vllm:rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6

-Once setup is complete, you can choose between two options to reproduce the
+Once the setup is complete, choose between two options to reproduce the
 benchmark results:

 -  :ref:`MAD-integrated benchmarking <vllm-benchmark-mad>`
@@ -130,45 +103,89 @@ Although the following models are preconfigured to collect latency and
 throughput performance data, you can also change the benchmarking parameters.
 Refer to the :ref:`Standalone benchmarking <vllm-benchmark-standalone>` section.

+.. _vllm-benchmark-mad-models:
+
 Available models
 ----------------

-.. hlist::
-   :columns: 3
+.. list-table::
+   :header-rows: 1
+   :widths: 2, 3

-   * ``pyt_vllm_llama-3.1-8b``
+   * - Model name
+     - Tag

-   * ``pyt_vllm_llama-3.1-70b``
+   * - `Llama 3.1 8B <https://huggingface.co/meta-llama/Llama-3.1-8B>`_
+     - ``pyt_vllm_llama-3.1-8b``

-   * ``pyt_vllm_llama-3.1-405b``
+   * - `Llama 3.1 70B <https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct>`_
+     - ``pyt_vllm_llama-3.1-70b``

-   * ``pyt_vllm_llama-2-7b``
+   * - `Llama 3.1 405B <https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct>`_
+     - ``pyt_vllm_llama-3.1-405b``

-   * ``pyt_vllm_llama-2-70b``
+   * - `Llama 3.2 11B Vision <https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct>`_
+     - ``pyt_vllm_llama-3.2-11b-vision-instruct``

-   * ``pyt_vllm_mixtral-8x7b``
+   * - `Llama 2 7B <https://huggingface.co/meta-llama/Llama-2-7b-chat-hf>`_
+     - ``pyt_vllm_llama-2-7b``

-   * ``pyt_vllm_mixtral-8x22b``
+   * - `Llama 2 70B <https://huggingface.co/meta-llama/Llama-2-70b-chat-hf>`_
+     - ``pyt_vllm_llama-2-70b``

-   * ``pyt_vllm_mistral-7b``
+   * - `Mixtral MoE 8x7B <https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1>`_
+     - ``pyt_vllm_mixtral-8x7b``

-   * ``pyt_vllm_qwen2-7b``
+   * - `Mixtral MoE 8x22B <https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1>`_
+     - ``pyt_vllm_mixtral-8x22b``

-   * ``pyt_vllm_qwen2-72b``
+   * - `Mistral 7B <https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3>`_
+     - ``pyt_vllm_mistral-7b``

-   * ``pyt_vllm_jais-13b``
+   * - `Qwen2 7B <https://huggingface.co/Qwen/Qwen2-7B-Instruct>`_
+     - ``pyt_vllm_qwen2-7b``

-   * ``pyt_vllm_jais-30b``
+   * - `Qwen2 72B <https://huggingface.co/Qwen/Qwen2-72B-Instruct>`_
+     - ``pyt_vllm_qwen2-72b``

-   * ``pyt_vllm_llama-3.1-8b_fp8``
+   * - `JAIS 13B <https://huggingface.co/core42/jais-13b-chat>`_
+     - ``pyt_vllm_jais-13b``

-   * ``pyt_vllm_llama-3.1-70b_fp8``
+   * - `JAIS 30B <https://huggingface.co/core42/jais-30b-chat-v3>`_
+     - ``pyt_vllm_jais-30b``

-   * ``pyt_vllm_llama-3.1-405b_fp8``
+   * - `DBRX Instruct <https://huggingface.co/databricks/dbrx-instruct>`_
+     - ``pyt_vllm_dbrx-instruct``

-   * ``pyt_vllm_mixtral-8x7b_fp8``
+   * - `Gemma 2 27B <https://huggingface.co/google/gemma-2-27b>`_
+     - ``pyt_vllm_gemma-2-27b``

-   * ``pyt_vllm_mixtral-8x22b_fp8``
+   * - `C4AI Command R+ 08-2024 <https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024>`_
+     - ``pyt_vllm_c4ai-command-r-plus-08-2024``
+
+   * - `DeepSeek MoE 16B <https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat>`_
+     - ``pyt_vllm_deepseek-moe-16b-chat``
+
+   * - `Llama 3.1 70B FP8 <https://huggingface.co/amd/Llama-3.1-70B-Instruct-FP8-KV>`_
+     - ``pyt_vllm_llama-3.1-70b_fp8``
+
+   * - `Llama 3.1 405B FP8 <https://huggingface.co/amd/Llama-3.1-405B-Instruct-FP8-KV>`_
+     - ``pyt_vllm_llama-3.1-405b_fp8``
+
+   * - `Mixtral MoE 8x7B FP8 <https://huggingface.co/amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV>`_
+     - ``pyt_vllm_mixtral-8x7b_fp8``
+
+   * - `Mixtral MoE 8x22B FP8 <https://huggingface.co/amd/Mixtral-8x22B-Instruct-v0.1-FP8-KV>`_
+     - ``pyt_vllm_mixtral-8x22b_fp8``
+
+   * - `Mistral 7B FP8 <https://huggingface.co/amd/Mistral-7B-v0.1-FP8-KV>`_
+     - ``pyt_vllm_mistral-7b_fp8``
+
+   * - `DBRX Instruct FP8 <https://huggingface.co/amd/dbrx-instruct-FP8-KV>`_
+     - ``pyt_vllm_dbrx_fp8``
+
+   * - `C4AI Command R+ 08-2024 FP8 <https://huggingface.co/amd/c4ai-command-r-plus-FP8-KV>`_
+     - ``pyt_vllm_command-r-plus_fp8``

 .. _vllm-benchmark-standalone:

@@ -181,8 +198,8 @@ snippet.

 .. code-block::

-   docker pull rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
-   docker run -it --device=/dev/kfd --device=/dev/dri --group-add video --shm-size 128G --security-opt seccomp=unconfined --security-opt apparmor=unconfined --cap-add=SYS_PTRACE -v $(pwd):/workspace --env HUGGINGFACE_HUB_CACHE=/workspace --name vllm_v0.6.4 rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
+   docker pull rocm/vllm:rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6
+   docker run -it --device=/dev/kfd --device=/dev/dri --group-add video --shm-size 16G --security-opt seccomp=unconfined --security-opt apparmor=unconfined --cap-add=SYS_PTRACE -v $(pwd):/workspace --env HUGGINGFACE_HUB_CACHE=/workspace --name vllm_v0.6.6 rocm/vllm:rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6

 In the Docker container, clone the ROCm MAD repository and navigate to the
 benchmark scripts directory at ``~/MAD/scripts/vllm``.
@@ -224,8 +241,8 @@ See the :ref:`examples <vllm-benchmark-run-benchmark>` for more information.

 .. _vllm-benchmark-standalone-options:

-Options
-------
+Options and available models
+----------------------------

 .. list-table::
   :header-rows: 1
@@ -248,72 +265,100 @@ Options
     - Measure both throughput and latency

   * - ``$model_repo``
-     - ``meta-llama/Meta-Llama-3.1-8B-Instruct``
-     - Llama 3.1 8B
+     - ``meta-llama/Llama-3.1-8B-Instruct``
+     - `Llama 3.1 8B <https://huggingface.co/meta-llama/Llama-3.1-8B>`_

   * - (``float16``)
-     - ``meta-llama/Meta-Llama-3.1-70B-Instruct``
-     - Llama 3.1 70B
+     - ``meta-llama/Llama-3.1-70B-Instruct``
+     - `Llama 3.1 70B <https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct>`_

   * -
-     - ``meta-llama/Meta-Llama-3.1-405B-Instruct``
-     - Llama 3.1 405B
+     - ``meta-llama/Llama-3.1-405B-Instruct``
+     - `Llama 3.1 405B <https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct>`_
+
+   * -
+     - ``meta-llama/Llama-3.2-11B-Vision-Instruct``
+     - `Llama 3.2 11B Vision <https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct>`_

   * -
     - ``meta-llama/Llama-2-7b-chat-hf``
-     - Llama 2 7B
+     - `Llama 2 7B <https://huggingface.co/meta-llama/Llama-2-7b-chat-hf>`_

   * -
     - ``meta-llama/Llama-2-70b-chat-hf``
-     - Llama 2 70B
+     - `Llama 2 7B <https://huggingface.co/meta-llama/Llama-2-70b-chat-hf>`_

   * -
     - ``mistralai/Mixtral-8x7B-Instruct-v0.1``
-     - Mixtral 8x7B
+     - `Mixtral MoE 8x7B <https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1>`_

   * -
     - ``mistralai/Mixtral-8x22B-Instruct-v0.1``
-     - Mixtral 8x22B
+     - `Mixtral MoE 8x22B <https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1>`_

   * -
     - ``mistralai/Mistral-7B-Instruct-v0.3``
-     - Mixtral 7B
+     - `Mistral 7B <https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3>`_

   * -
     - ``Qwen/Qwen2-7B-Instruct``
-     - Qwen2 7B
+     - `Qwen2 7B <https://huggingface.co/Qwen/Qwen2-7B-Instruct>`_

   * -
     - ``Qwen/Qwen2-72B-Instruct``
-     - Qwen2 72B
+     - `Qwen2 72B <https://huggingface.co/Qwen/Qwen2-72B-Instruct>`_

   * -
     - ``core42/jais-13b-chat``
-     - JAIS 13B
+     - `JAIS 13B <https://huggingface.co/core42/jais-13b-chat>`_

   * -
     - ``core42/jais-30b-chat-v3``
-     - JAIS 30B
-
-   * - ``$model_repo``
-     - ``amd/Meta-Llama-3.1-8B-Instruct-FP8-KV``
-     - Llama 3.1 8B
-
-   * - (``float8``)
-     - ``amd/Meta-Llama-3.1-70B-Instruct-FP8-KV``
-     - Llama 3.1 70B
+     - `JAIS 30B <https://huggingface.co/core42/jais-30b-chat-v3>`_

   * -
-     - ``amd/Meta-Llama-3.1-405B-Instruct-FP8-KV``
-     - Llama 3.1 405B
+     - ``databricks/dbrx-instruct``
+     - `DBRX Instruct <https://huggingface.co/databricks/dbrx-instruct>`_
+
+   * -
+     - ``google/gemma-2-27b``
+     - `Gemma 2 27B <https://huggingface.co/google/gemma-2-27b>`_
+
+   * -
+     - ``CohereForAI/c4ai-command-r-plus-08-2024``
+     - `C4AI Command R+ 08-2024 <https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024>`_
+
+   * -
+     - ``deepseek-ai/deepseek-moe-16b-chat``
+     - `DeepSeek MoE 16B <https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat>`_
+
+   * - ``$model_repo``
+     - ``amd/Llama-3.1-70B-Instruct-FP8-KV``
+     - `Llama 3.1 70B FP8 <https://huggingface.co/amd/Llama-3.1-70B-Instruct-FP8-KV>`_
+
+   * - (``float8``)
+     - ``amd/Llama-3.1-405B-Instruct-FP8-KV``
+     - `Llama 3.1 405B FP8 <https://huggingface.co/amd/Llama-3.1-405B-Instruct-FP8-KV>`_

   * -
     - ``amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV``
-     - Mixtral 8x7B
+     - `Mixtral MoE 8x7B FP8 <https://huggingface.co/amd/Mixtral-8x7B-Instruct-v0.1-FP8-KV>`_

   * -
     - ``amd/Mixtral-8x22B-Instruct-v0.1-FP8-KV``
-     - Mixtral 8x22B
+     - `Mixtral MoE 8x22B FP8 <https://huggingface.co/amd/Mixtral-8x22B-Instruct-v0.1-FP8-KV>`_
+
+   * -
+     - ``amd/Mistral-7B-v0.1-FP8-KV``
+     - `Mistral 7B FP8 <https://huggingface.co/amd/Mistral-7B-v0.1-FP8-KV>`_
+
+   * -
+     - ``amd/dbrx-instruct-FP8-KV``
+     - `DBRX Instruct FP8 <https://huggingface.co/amd/dbrx-instruct-FP8-KV>`_
+
+   * -
+     - ``amd/c4ai-command-r-plus-FP8-KV``
+     - `C4AI Command R+ 08-2024 FP8 <https://huggingface.co/amd/c4ai-command-r-plus-FP8-KV>`_

   * - ``$num_gpu``
     - 1 or 8
@@ -335,34 +380,34 @@ options and their descriptions.
 Example 1: latency benchmark
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Use this command to benchmark the latency of the Llama 3.1 8B model on one GPU with the ``float16`` and ``float8`` data types.
+Use this command to benchmark the latency of the Llama 3.1 70B model on eight GPUs with the ``float16`` and ``float8`` data types.

 .. code-block::

-   ./vllm_benchmark_report.sh -s latency -m meta-llama/Meta-Llama-3.1-8B-Instruct -g 1 -d float16
-   ./vllm_benchmark_report.sh -s latency -m amd/Meta-Llama-3.1-8B-Instruct-FP8-KV -g 1 -d float8
+   ./vllm_benchmark_report.sh -s latency -m meta-llama/Llama-3.1-70B-Instruct -g 8 -d float16
+   ./vllm_benchmark_report.sh -s latency -m amd/Llama-3.1-70B-Instruct-FP8-KV -g 8 -d float8

 Find the latency reports at:

- ``./reports_float16/summary/Meta-Llama-3.1-8B-Instruct_latency_report.csv``
+- ``./reports_float16/summary/Llama-3.1-70B-Instruct_latency_report.csv``

- ``./reports_float8/summary/Meta-Llama-3.1-8B-Instruct-FP8-KV_latency_report.csv``
+- ``./reports_float8/summary/Llama-3.1-70B-Instruct-FP8-KV_latency_report.csv``

 Example 2: throughput benchmark
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-Use this command to benchmark the throughput of the Llama 3.1 8B model on one GPU with the ``float16`` and ``float8`` data types.
+Use this command to benchmark the throughput of the Llama 3.1 70B model on eight GPUs with the ``float16`` and ``float8`` data types.

 .. code-block:: shell

-   ./vllm_benchmark_report.sh -s throughput -m meta-llama/Meta-Llama-3.1-8B-Instruct -g 1 -d float16
-   ./vllm_benchmark_report.sh -s throughput -m amd/Meta-Llama-3.1-8B-Instruct-FP8-KV -g 1 -d float8
+   ./vllm_benchmark_report.sh -s throughput -m meta-llama/Llama-3.1-70B-Instruct -g 8 -d float16
+   ./vllm_benchmark_report.sh -s throughput -m amd/Llama-3.1-70B-Instruct-FP8-KV -g 8 -d float8

 Find the throughput reports at:

- ``./reports_float16/summary/Meta-Llama-3.1-8B-Instruct_throughput_report.csv``
+- ``./reports_float16/summary/Llama-3.1-70B-Instruct_throughput_report.csv``

- ``./reports_float8/summary/Meta-Llama-3.1-8B-Instruct-FP8-KV_throughput_report.csv``
+- ``./reports_float8/summary/Llama-3.1-70B-Instruct-FP8-KV_throughput_report.csv``

 .. raw:: html

@@ -394,7 +439,7 @@ Further reading
  MI300X accelerators, see :doc:`../../system-optimization/mi300x`.

 - To learn how to run LLM models from Hugging Face or your own model, see
-  :doc:`Using ROCm for AI <../index>`.
+  :doc:`Running models from Hugging Face <hugging-face-models>`.

 - To learn how to optimize inference on LLMs, see
  :doc:`Inference optimization <../inference-optimization/index>`.
@@ -402,6 +447,32 @@ Further reading
 - To learn how to fine-tune LLMs, see
  :doc:`Fine-tuning LLMs <../fine-tuning/index>`.

- To compare with the previous version of the ROCm vLLM Docker image for performance validation, refer to
-  `LLM inference performance validation on AMD Instinct MI300X (ROCm 6.2.0) <https://rocm.docs.amd.com/en/docs-6.2.0/how-to/performance-validation/mi300x/vllm-benchmark.html>`_.
+Previous versions
+=================

+This table lists previous versions of the ROCm vLLM Docker image for inference
+performance validation. For detailed information about available models for
+benchmarking, see the version-specific documentation.
+
+.. list-table::
+   :header-rows: 1
+   :stub-columns: 1
+
+   * - ROCm version
+     - vLLM version
+     - PyTorch version
+     - Resources
+
+   * - 6.2.1
+     - 0.6.4
+     - 2.5.0
+     - 
+       * `Documentation <https://rocm.docs.amd.com/en/docs-6.3.0/how-to/performance-validation/mi300x/vllm-benchmark.html>`_
+       * `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4/images/sha256-ccbb74cc9e7adecb8f7bdab9555f7ac6fc73adb580836c2a35ca96ff471890d8>`_
+
+   * - 6.2.0
+     - 0.4.3
+     - 2.4.0
+     -
+       * `Documentation <https://rocm.docs.amd.com/en/docs-6.2.0/how-to/performance-validation/mi300x/vllm-benchmark.html>`_
+       * `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.2_mi300_ubuntu22.04_py3.9_vllm_7c5fd50/images/sha256-9e4dd4788a794c3d346d7d0ba452ae5e92d39b8dfac438b2af8efdc7f15d22c0>`_
Author	SHA1	Message	Date
Adel Johar	a93ac383c2	Docs: Update JAX compatibility page	2025-02-13 16:58:05 +01:00
Peter Park	2751a17cf0	Update vLLM benchmarking guide (#4347 ) * update vllm-benchmark fix hlist overflow update standalone benchmarking options update list of models fix typo and model name unnecessary duplicate info update formatting update vllm benchmark guide - remove Llama 2 FP8 - add Jais 13B - update commands update docker pull tag update MAD available models remove extra mad models not relevant to vllm update PyTorch version add changelog add model names to .wordlist.txt * Update docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst Co-authored-by: Pratik Basyal <pratik.basyal@amd.com> * Update docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst Co-authored-by: Pratik Basyal <pratik.basyal@amd.com> * Update docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst Co-authored-by: Pratik Basyal <pratik.basyal@amd.com> * fix typo * update link * fix link text * change changelog to previous versions * fix typo * remove "for" --------- Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>	2025-02-05 17:18:35 -05:00
Peter Park	9b0ae86b1b	Fix ROCm Bandwidth Test license type Fix ROCm Bandwidth Test license type	2025-02-05 16:40:31 -05:00
harkgill-amd	16f7cb4c04	Update issue workflow to trigger on edit (#4346 )	2025-02-05 14:46:16 -05:00