diff --git a/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst b/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst index 1ce0d8044..3ee672353 100644 --- a/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst +++ b/docs/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.rst @@ -137,6 +137,12 @@ Installing vLLM Refer to :ref:`mi300x-vllm-optimization` for performance optimization tips. +ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM +on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV +format. For more information, see the guide to +`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator `_ +on the ROCm GitHub repository. + .. _fine-tuning-llms-tgi: Hugging Face TGI diff --git a/docs/how-to/rocm-for-ai/deploy-your-model.rst b/docs/how-to/rocm-for-ai/deploy-your-model.rst index fd9fe8584..0435e83ee 100644 --- a/docs/how-to/rocm-for-ai/deploy-your-model.rst +++ b/docs/how-to/rocm-for-ai/deploy-your-model.rst @@ -41,6 +41,15 @@ vLLM walkthrough Refer to this developer blog for guidance on serving with vLLM `Inferencing and serving with vLLM on AMD GPUs — ROCm Blogs `_ +Validating vLLM performance +--------------------------- + +ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM +on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV +format. For more information, see the guide to +`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator `_ +on the ROCm GitHub repository. + .. _rocm-for-ai-serve-hugging-face-tgi: Serving using Hugging Face TGI diff --git a/docs/how-to/tuning-guides/mi300x/workload.rst b/docs/how-to/tuning-guides/mi300x/workload.rst index f831a7951..6857eae1b 100644 --- a/docs/how-to/tuning-guides/mi300x/workload.rst +++ b/docs/how-to/tuning-guides/mi300x/workload.rst @@ -150,6 +150,12 @@ the workload to validate improvements and ensure that the changes have had the desired effect. Continuous iteration helps refine the performance gains and address any new bottlenecks that may emerge. +ROCm provides a prebuilt optimized Docker image that has everything required to implement +the tips in this section. It includes ROCm, vLLM, PyTorch, and tuning files in the CSV +format. For more information, see the guide to +`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator `_ +on the ROCm GitHub repository. + .. _mi300x-profiling-tools: Profiling tools @@ -372,6 +378,12 @@ Refer to `vLLM documentation `_ +on the ROCm GitHub repository. + Maximize throughput -------------------