mirror of
https://github.com/ROCm/ROCm.git
synced 2026-04-05 03:01:17 -04:00
Update vLLM performance Docker docs (#4491)
* add links to performance results words * change "performance validation" to "performance testing" * update vLLM docker 3/11 * add previous versions add previous versions * fix llama 3.1 8b model repo name * words
This commit is contained in:
@@ -1,11 +1,12 @@
|
||||
vllm_benchmark:
|
||||
unified_docker:
|
||||
latest:
|
||||
pull_tag: rocm/vllm:rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6
|
||||
docker_hub_url: https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6/images/sha256-9a12ef62bbbeb5a4c30a01f702c8e025061f575aa129f291a49fbd02d6b4d6c9
|
||||
pull_tag: rocm/vllm:instinct_main
|
||||
docker_hub_url: https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_instinct_vllm0.7.3_20250311/images/sha256-de0a2649b735f45b7ecab8813eb7b19778ae1f40591ca1196b07bc29c42ed4a3
|
||||
rocm_version: 6.3.1
|
||||
vllm_version: 0.6.6
|
||||
pytorch_version: 2.7.0 (2.7.0a0+git3a58512)
|
||||
vllm_version: 0.7.3
|
||||
pytorch_version: 2.7.0 (dev nightly)
|
||||
hipblaslt_version: 0.13
|
||||
model_groups:
|
||||
- group: Llama
|
||||
tag: llama
|
||||
@@ -40,6 +41,11 @@ vllm_benchmark:
|
||||
model_repo: meta-llama/Llama-2-70b-chat-hf
|
||||
url: https://huggingface.co/meta-llama/Llama-2-70b-chat-hf
|
||||
precision: float16
|
||||
- model: Llama 3.1 8B FP8
|
||||
mad_tag: pyt_vllm_llama-3.1-8b_fp8
|
||||
model_repo: amd/Llama-3.1-8B-Instruct-FP8-KV
|
||||
url: https://huggingface.co/amd/Llama-3.1-8B-Instruct-FP8-KV
|
||||
precision: float8
|
||||
- model: Llama 3.1 70B FP8
|
||||
mad_tag: pyt_vllm_llama-3.1-70b_fp8
|
||||
model_repo: amd/Llama-3.1-70B-Instruct-FP8-KV
|
||||
|
||||
@@ -47,7 +47,7 @@ Validating vLLM performance
|
||||
ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM
|
||||
on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV
|
||||
format. For more information, see the guide to
|
||||
`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator <https://github.com/ROCm/MAD/blob/develop/benchmark/vllm/README.md>`_
|
||||
`LLM inference performance testing with vLLM on the AMD Instinct™ MI300X accelerator <https://github.com/ROCm/MAD/blob/develop/benchmark/vllm/README.md>`_
|
||||
on the ROCm GitHub repository.
|
||||
|
||||
.. _rocm-for-ai-serve-hugging-face-tgi:
|
||||
|
||||
@@ -140,8 +140,8 @@ Installing vLLM
|
||||
See :ref:`mi300x-vllm-optimization` for performance optimization tips.
|
||||
|
||||
ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM
|
||||
on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in CSV
|
||||
format. For more information, see :doc:`vllm-benchmark`.
|
||||
on the MI300X accelerator. The Docker image includes ROCm, vLLM, and PyTorch.
|
||||
For more information, see :doc:`vllm-benchmark`.
|
||||
|
||||
.. _fine-tuning-llms-tgi:
|
||||
|
||||
|
||||
@@ -3,9 +3,9 @@
|
||||
ROCm vLLM Docker image.
|
||||
:keywords: model, MAD, automation, dashboarding, validate
|
||||
|
||||
***********************************************************
|
||||
LLM inference performance validation on AMD Instinct MI300X
|
||||
***********************************************************
|
||||
********************************************************
|
||||
LLM inference performance testing on AMD Instinct MI300X
|
||||
********************************************************
|
||||
|
||||
.. _vllm-benchmark-unified-docker:
|
||||
|
||||
@@ -16,9 +16,9 @@ LLM inference performance validation on AMD Instinct MI300X
|
||||
|
||||
The `ROCm vLLM Docker <{{ unified_docker.docker_hub_url }}>`_ image offers
|
||||
a prebuilt, optimized environment for validating large language model (LLM)
|
||||
inference performance on the AMD Instinct™ MI300X accelerator. This ROCm vLLM
|
||||
Docker image integrates vLLM and PyTorch tailored specifically for the MI300X
|
||||
accelerator and includes the following components:
|
||||
inference performance on AMD Instinct™ MI300X series accelerator. This ROCm vLLM
|
||||
Docker image integrates vLLM and PyTorch tailored specifically for MI300X series
|
||||
accelerators and includes the following components:
|
||||
|
||||
* `ROCm {{ unified_docker.rocm_version }} <https://github.com/ROCm/ROCm>`_
|
||||
|
||||
@@ -26,9 +26,11 @@ LLM inference performance validation on AMD Instinct MI300X
|
||||
|
||||
* `PyTorch {{ unified_docker.pytorch_version }} <https://github.com/pytorch/pytorch>`_
|
||||
|
||||
With this Docker image, you can quickly validate the expected inference
|
||||
performance numbers for the MI300X accelerator. This topic also provides tips on
|
||||
optimizing performance with popular AI models.
|
||||
* `hipBLASLt {{ unified_docker.hipblaslt_version }} <https://github.com/ROCm/hipBLASLt>`_
|
||||
|
||||
With this Docker image, you can quickly test the :ref:`expected
|
||||
inference performance numbers <vllm-benchmark-performance-measurements>` for
|
||||
MI300X series accelerators.
|
||||
|
||||
.. _vllm-benchmark-available-models:
|
||||
|
||||
@@ -79,7 +81,6 @@ LLM inference performance validation on AMD Instinct MI300X
|
||||
{% endfor %}
|
||||
{% endfor %}
|
||||
|
||||
|
||||
.. note::
|
||||
|
||||
vLLM is a toolkit and library for LLM inference and serving. AMD implements
|
||||
@@ -87,6 +88,29 @@ LLM inference performance validation on AMD Instinct MI300X
|
||||
See :ref:`fine-tuning-llms-vllm` and :ref:`mi300x-vllm-optimization` for
|
||||
more information.
|
||||
|
||||
.. _vllm-benchmark-performance-measurements:
|
||||
|
||||
Performance measurements
|
||||
========================
|
||||
|
||||
To evaluate performance, the
|
||||
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
|
||||
page provides reference throughput and latency measurements for inferencing
|
||||
popular AI models.
|
||||
|
||||
.. note::
|
||||
|
||||
The performance data presented in
|
||||
`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
|
||||
should not be interpreted as the peak performance achievable by AMD
|
||||
Instinct MI325X and MI300X accelerators or ROCm software.
|
||||
|
||||
Advanced features and known issues
|
||||
==================================
|
||||
|
||||
For information on experimental features and known issues related to ROCm optimization efforts on vLLM,
|
||||
see the developer's guide at `<https://github.com/ROCm/vllm/blob/main/docs/dev-docker/README.md>`__.
|
||||
|
||||
Getting started
|
||||
===============
|
||||
|
||||
@@ -162,13 +186,13 @@ LLM inference performance validation on AMD Instinct MI300X
|
||||
.. tab-item:: Standalone benchmarking
|
||||
|
||||
Run the vLLM benchmark tool independently by starting the
|
||||
`Docker container <https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6/images/sha256-9a12ef62bbbeb5a4c30a01f702c8e025061f575aa129f291a49fbd02d6b4d6c9>`_
|
||||
`Docker container <{{ unified_docker.docker_hub_url }}>`_
|
||||
as shown in the following snippet.
|
||||
|
||||
.. code-block::
|
||||
|
||||
docker pull rocm/vllm:rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6
|
||||
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video --shm-size 16G --security-opt seccomp=unconfined --security-opt apparmor=unconfined --cap-add=SYS_PTRACE -v $(pwd):/workspace --env HUGGINGFACE_HUB_CACHE=/workspace --name vllm_v0.6.6 rocm/vllm:rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6
|
||||
docker pull {{ unified_docker.pull_tag }}
|
||||
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video --shm-size 16G --security-opt seccomp=unconfined --security-opt apparmor=unconfined --cap-add=SYS_PTRACE -v $(pwd):/workspace --env HUGGINGFACE_HUB_CACHE=/workspace --name test {{ unified_docker.pull_tag }}
|
||||
|
||||
In the Docker container, clone the ROCm MAD repository and navigate to the
|
||||
benchmark scripts directory at ``~/MAD/scripts/vllm``.
|
||||
@@ -290,3 +314,40 @@ Further reading
|
||||
|
||||
- To learn how to fine-tune LLMs, see
|
||||
:doc:`Fine-tuning LLMs <../fine-tuning/index>`.
|
||||
|
||||
Previous versions
|
||||
=================
|
||||
|
||||
This table lists previous versions of the ROCm vLLM inference Docker image for
|
||||
inference performance testing. For detailed information about available models
|
||||
for benchmarking, see the version-specific documentation.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:stub-columns: 1
|
||||
|
||||
* - ROCm version
|
||||
- vLLM version
|
||||
- PyTorch version
|
||||
- Resources
|
||||
|
||||
* - 6.3.1
|
||||
- 0.6.6
|
||||
- 2.7.0
|
||||
-
|
||||
* `Documentation <https://rocm.docs.amd.com/en/docs-6.3.2/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.html>`_
|
||||
* `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6/images/sha256-9a12ef62bbbeb5a4c30a01f702c8e025061f575aa129f291a49fbd02d6b4d6c9>`_
|
||||
|
||||
* - 6.2.1
|
||||
- 0.6.4
|
||||
- 2.5.0
|
||||
-
|
||||
* `Documentation <https://rocm.docs.amd.com/en/docs-6.3.0/how-to/performance-validation/mi300x/vllm-benchmark.html>`_
|
||||
* `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4/images/sha256-ccbb74cc9e7adecb8f7bdab9555f7ac6fc73adb580836c2a35ca96ff471890d8>`_
|
||||
|
||||
* - 6.2.0
|
||||
- 0.4.3
|
||||
- 2.4.0
|
||||
-
|
||||
* `Documentation <https://rocm.docs.amd.com/en/docs-6.2.0/how-to/performance-validation/mi300x/vllm-benchmark.html>`_
|
||||
* `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.2_mi300_ubuntu22.04_py3.9_vllm_7c5fd50/images/sha256-9e4dd4788a794c3d346d7d0ba452ae5e92d39b8dfac438b2af8efdc7f15d22c0>`_
|
||||
|
||||
@@ -527,7 +527,7 @@ Previous versions
|
||||
=================
|
||||
|
||||
This table lists previous versions of the ROCm Megatron-LM Docker image for training
|
||||
performance validation. For detailed information about available models for
|
||||
performance testing. For detailed information about available models for
|
||||
benchmarking, see the version-specific documentation.
|
||||
|
||||
.. list-table::
|
||||
|
||||
@@ -73,7 +73,7 @@ subtrees:
|
||||
- file: how-to/rocm-for-ai/inference/llm-inference-frameworks.rst
|
||||
title: LLM inference frameworks
|
||||
- file: how-to/rocm-for-ai/inference/vllm-benchmark.rst
|
||||
title: Performance validation
|
||||
title: Performance testing
|
||||
- file: how-to/rocm-for-ai/inference/deploy-your-model.rst
|
||||
title: Deploy your model
|
||||
|
||||
|
||||
Reference in New Issue
Block a user