From 23a67a3abfafc315f951d21fbfbf843a5cd4ee05 Mon Sep 17 00:00:00 2001
From: Jeffrey Novotny <jnovotny@amd.com>
Date: Wed, 4 Sep 2024 17:07:46 -0400
Subject: [PATCH] =?UTF-8?q?Add=20introduction=20and=20links=20to=20the=20n?=
 =?UTF-8?q?ew=20guide=20to=20the=20vLLM=20optimized=20Doc=E2=80=A6=20(#363?=
 =?UTF-8?q?7)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* Add introduction and links to the new guide to the vLLM optimized Docker image on AMD Infinity Hub

* Update target link for the Docker vLLM guide

* Change target URL

* Change link target URL again
---
 .../llm-inference-frameworks.rst                     |  6 ++++++
 docs/how-to/rocm-for-ai/deploy-your-model.rst        |  9 +++++++++
 docs/how-to/tuning-guides/mi300x/workload.rst        | 12 ++++++++++++
 3 files changed, 27 insertions(+)

diff --git a/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst b/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst
index 1ce0d8044..3ee672353 100644
--- a/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst
+++ b/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst
@@ -137,6 +137,12 @@ Installing vLLM
 
 Refer to :ref:`mi300x-vllm-optimization` for performance optimization tips.
 
+ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM 
+on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV 
+format. For more information, see the guide to 
+`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator <https://github.com/ROCm/MAD/blob/develop/benchmark/vllm/README.md>`_ 
+on the ROCm GitHub repository.
+
 .. _fine-tuning-llms-tgi:
 
 Hugging Face TGI
diff --git a/docs/how-to/rocm-for-ai/deploy-your-model.rst b/docs/how-to/rocm-for-ai/deploy-your-model.rst
index fd9fe8584..0435e83ee 100644
--- a/docs/how-to/rocm-for-ai/deploy-your-model.rst
+++ b/docs/how-to/rocm-for-ai/deploy-your-model.rst
@@ -41,6 +41,15 @@ vLLM walkthrough
 Refer to this developer blog for guidance on serving with vLLM `Inferencing and serving with vLLM on AMD GPUs — ROCm
 Blogs <https://rocm.blogs.amd.com/artificial-intelligence/vllm/README.html>`_
 
+Validating vLLM performance
+---------------------------
+
+ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM 
+on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV 
+format. For more information, see the guide to 
+`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator <https://github.com/ROCm/MAD/blob/develop/benchmark/vllm/README.md>`_ 
+on the ROCm GitHub repository.
+
 .. _rocm-for-ai-serve-hugging-face-tgi:
 
 Serving using Hugging Face TGI
diff --git a/docs/how-to/tuning-guides/mi300x/workload.rst b/docs/how-to/tuning-guides/mi300x/workload.rst
index f831a7951..6857eae1b 100644
--- a/docs/how-to/tuning-guides/mi300x/workload.rst
+++ b/docs/how-to/tuning-guides/mi300x/workload.rst
@@ -150,6 +150,12 @@ the workload to validate improvements and ensure that the changes have had the
 desired effect. Continuous iteration helps refine the performance gains and
 address any new bottlenecks that may emerge.
 
+ROCm provides a prebuilt optimized Docker image that has everything required to implement
+the tips in this section. It includes ROCm, vLLM, PyTorch, and tuning files in the CSV 
+format. For more information, see the guide to 
+`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator <https://github.com/ROCm/MAD/blob/develop/benchmark/vllm/README.md>`_ 
+on the ROCm GitHub repository.
+
 .. _mi300x-profiling-tools:
 
 Profiling tools
@@ -372,6 +378,12 @@ Refer to `vLLM documentation <https://docs.vllm.ai/en/latest/models/performance.
 for additional performance tips. :ref:`fine-tuning-llms-vllm` describes vLLM
 usage with ROCm.
 
+ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM 
+on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV 
+format. For more information, see the guide to 
+`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator <https://github.com/ROCm/MAD/blob/develop/benchmark/vllm/README.md>`_ 
+on the ROCm GitHub repository.
+
 Maximize throughput
 -------------------