From 23a67a3abfafc315f951d21fbfbf843a5cd4ee05 Mon Sep 17 00:00:00 2001 From: Jeffrey Novotny Date: Wed, 4 Sep 2024 17:07:46 -0400 Subject: [PATCH] =?UTF-8?q?Add=20introduction=20and=20links=20to=20the=20n?= =?UTF-8?q?ew=20guide=20to=20the=20vLLM=20optimized=20Doc=E2=80=A6=20(#363?= =?UTF-8?q?7)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Add introduction and links to the new guide to the vLLM optimized Docker image on AMD Infinity Hub * Update target link for the Docker vLLM guide * Change target URL * Change link target URL again --- .../llm-inference-frameworks.rst | 6 ++++++ docs/how-to/rocm-for-ai/deploy-your-model.rst | 9 +++++++++ docs/how-to/tuning-guides/mi300x/workload.rst | 12 ++++++++++++ 3 files changed, 27 insertions(+) diff --git a/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst b/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst index 1ce0d8044..3ee672353 100644 --- a/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst +++ b/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst @@ -137,6 +137,12 @@ Installing vLLM Refer to :ref:`mi300x-vllm-optimization` for performance optimization tips. +ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM +on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV +format. For more information, see the guide to +`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator `_ +on the ROCm GitHub repository. + .. _fine-tuning-llms-tgi: Hugging Face TGI diff --git a/docs/how-to/rocm-for-ai/deploy-your-model.rst b/docs/how-to/rocm-for-ai/deploy-your-model.rst index fd9fe8584..0435e83ee 100644 --- a/docs/how-to/rocm-for-ai/deploy-your-model.rst +++ b/docs/how-to/rocm-for-ai/deploy-your-model.rst @@ -41,6 +41,15 @@ vLLM walkthrough Refer to this developer blog for guidance on serving with vLLM `Inferencing and serving with vLLM on AMD GPUs — ROCm Blogs `_ +Validating vLLM performance +--------------------------- + +ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM +on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV +format. For more information, see the guide to +`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator `_ +on the ROCm GitHub repository. + .. _rocm-for-ai-serve-hugging-face-tgi: Serving using Hugging Face TGI diff --git a/docs/how-to/tuning-guides/mi300x/workload.rst b/docs/how-to/tuning-guides/mi300x/workload.rst index f831a7951..6857eae1b 100644 --- a/docs/how-to/tuning-guides/mi300x/workload.rst +++ b/docs/how-to/tuning-guides/mi300x/workload.rst @@ -150,6 +150,12 @@ the workload to validate improvements and ensure that the changes have had the desired effect. Continuous iteration helps refine the performance gains and address any new bottlenecks that may emerge. +ROCm provides a prebuilt optimized Docker image that has everything required to implement +the tips in this section. It includes ROCm, vLLM, PyTorch, and tuning files in the CSV +format. For more information, see the guide to +`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator `_ +on the ROCm GitHub repository. + .. _mi300x-profiling-tools: Profiling tools @@ -372,6 +378,12 @@ Refer to `vLLM documentation `_ +on the ROCm GitHub repository. + Maximize throughput -------------------