diff --git a/docs/how-to/rocm-for-ai/inference-optimization/workload.rst b/docs/how-to/rocm-for-ai/inference-optimization/workload.rst
index 7cd2c7fc6..9e5fe4697 100644
--- a/docs/how-to/rocm-for-ai/inference-optimization/workload.rst
+++ b/docs/how-to/rocm-for-ai/inference-optimization/workload.rst
@@ -99,12 +99,14 @@ execution.
 
 .. seealso::
 
-   See :doc:`vllm-optimization`.
+   See :doc:`vllm-optimization` to learn more about vLLM performance
+   optimization techniques.
 
 .. _mi300x-auto-tune:
 
 Auto-tunable configurations
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
 Auto-tunable configurations can significantly streamline performance
 optimization by automatically adjusting parameters based on workload
 characteristics. For example:
@@ -325,6 +327,22 @@ hardware counters are also included.
 
    ROCm Systems Profiler timeline trace example.
 
+vLLM performance optimization
+=============================
+
+vLLM is a high-throughput and memory efficient inference and serving engine for
+large language models that has gained traction in the AI community for its
+performance and ease of use. See :doc:`vllm-optimization`, where you'll learn
+how to:
+
+* Enable AITER (AI Tensor Engine for ROCm) to speed up on LLM models.
+* Configure environment variables for optimal HIP, RCCL, and Quick Reduce performance.
+* Select the right attention backend for your workload (AITER MHA/MLA vs. Triton).
+* Choose parallelism strategies (tensor, pipeline, data, expert) for multi-GPU deployments.
+* Apply quantization (``FP8``/``FP4``) to reduce memory usage by 2-4× with minimal accuracy loss.
+* Tune engine arguments (batch size, memory utilization, graph modes) for your use case.
+* Benchmark and scale across single-node and multi-node configurations.
+
 .. _mi300x-tunableop:
 
 PyTorch TunableOp