Updates to the vLLM optimization guide for MI300X/MI355X (#5554)

* Expand vLLM optimization guide for MI300X/MI355X with comprehensive AITER coverage. attention backend selection, environment variables (HIP/RCCL/Quick Reduce), parallelism strategies, quantization (FP8/FP4), engine tuning, CUDA graph modes, and multi-node scaling.

Co-authored-by: PinSiang <pinsiang.tan@embeddedllm.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: pinsiangamd <pinsiang.tan@amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
This commit is contained in:
peterjunpark
2025-10-22 12:54:25 -04:00
committed by GitHub
parent 6f8cf36279
commit cb8d21a0df
4 changed files with 1208 additions and 428 deletions

View File

@@ -134,6 +134,8 @@ subtrees:
title: Profile and debug
- file: how-to/rocm-for-ai/inference-optimization/workload.rst
title: Workload optimization
- file: how-to/rocm-for-ai/inference-optimization/vllm-optimization.rst
title: vLLM V1 performance optimization
- url: https://rocm.docs.amd.com/projects/ai-developer-hub/en/latest/
title: AI tutorials