Add introduction and links to the new guide to the vLLM optimized Doc… (#3637)

* Add introduction and links to the new guide to the vLLM optimized Docker image on AMD Infinity Hub * Update target link for the Docker vLLM guide * Change target URL * Change link target URL again
2026-01-10 07:08:08 -05:00 · 2024-09-04 17:07:46 -04:00
parent 87bc26e672
commit b81be39072
3 changed files with 27 additions and 0 deletions
--- a/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst
+++ b/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst
@@ -137,6 +137,12 @@ Installing vLLM

 Refer to :ref:`mi300x-vllm-optimization` for performance optimization tips.

+ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM 
+on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV 
+format. For more information, see the guide to 
+`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator <https://github.com/ROCm/MAD/blob/develop/benchmark/vllm/README.md>`_ 
+on the ROCm GitHub repository.
+
 .. _fine-tuning-llms-tgi:

 Hugging Face TGI
--- a/docs/how-to/rocm-for-ai/deploy-your-model.rst
+++ b/docs/how-to/rocm-for-ai/deploy-your-model.rst
@@ -41,6 +41,15 @@ vLLM walkthrough
 Refer to this developer blog for guidance on serving with vLLM `Inferencing and serving with vLLM on AMD GPUs — ROCm
 Blogs <https://rocm.blogs.amd.com/artificial-intelligence/vllm/README.html>`_

+Validating vLLM performance
+---------------------------
+
+ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM 
+on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV 
+format. For more information, see the guide to 
+`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator <https://github.com/ROCm/MAD/blob/develop/benchmark/vllm/README.md>`_ 
+on the ROCm GitHub repository.
+
 .. _rocm-for-ai-serve-hugging-face-tgi:

 Serving using Hugging Face TGI
--- a/docs/how-to/tuning-guides/mi300x/workload.rst
+++ b/docs/how-to/tuning-guides/mi300x/workload.rst
@@ -150,6 +150,12 @@ the workload to validate improvements and ensure that the changes have had the
 desired effect. Continuous iteration helps refine the performance gains and
 address any new bottlenecks that may emerge.

+ROCm provides a prebuilt optimized Docker image that has everything required to implement
+the tips in this section. It includes ROCm, vLLM, PyTorch, and tuning files in the CSV 
+format. For more information, see the guide to 
+`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator <https://github.com/ROCm/MAD/blob/develop/benchmark/vllm/README.md>`_ 
+on the ROCm GitHub repository.
+
 .. _mi300x-profiling-tools:

 Profiling tools
@@ -372,6 +378,12 @@ Refer to `vLLM documentation <https://docs.vllm.ai/en/latest/models/performance.
 for additional performance tips. :ref:`fine-tuning-llms-vllm` describes vLLM
 usage with ROCm.

+ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM 
+on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV 
+format. For more information, see the guide to 
+`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator <https://github.com/ROCm/MAD/blob/develop/benchmark/vllm/README.md>`_ 
+on the ROCm GitHub repository.
+
 Maximize throughput
 -------------------