Updates to the vLLM optimization guide for MI300X/MI355X (#5554)

* Expand vLLM optimization guide for MI300X/MI355X with comprehensive AITER coverage. attention backend selection, environment variables (HIP/RCCL/Quick Reduce), parallelism strategies, quantization (FP8/FP4), engine tuning, CUDA graph modes, and multi-node scaling. Co-authored-by: PinSiang <pinsiang.tan@embeddedllm.com> Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com> Co-authored-by: pinsiangamd <pinsiang.tan@amd.com> Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2026-01-08 06:13:59 -05:00 · 2025-10-22 12:54:25 -04:00
parent 6f8cf36279
commit cb8d21a0df
4 changed files with 1208 additions and 428 deletions
--- a/.wordlist.txt
+++ b/.wordlist.txt
@@ -27,6 +27,7 @@ ASICs
 ASan
 ASAN
 ASm
+Async
 ATI
 atomicRMW
 AddressSanitizer
@@ -133,6 +134,7 @@ ELMo
 ENDPGM
 EPYC
 ESXi
+EP
 EoS
 etcd
 fas
@@ -184,6 +186,7 @@ GPR
 GPT
 GPU
 GPU's
+GPUDirect
 GPUs
 GraphBolt
 GraphSage
@@ -302,6 +305,7 @@ Makefiles
 Matplotlib
 Matrox
 MaxText
+MBT
 Megablocks
 Megatrends
 Megatron
@@ -311,6 +315,7 @@ Meta's
 Miniconda
 MirroredStrategy
 Mixtral
+MLA
 MosaicML
 MoEs
 Mooncake
@@ -353,6 +358,7 @@ OFED
 OMM
 OMP
 OMPI
+OOM
 OMPT
 OMPX
 ONNX
@@ -398,6 +404,7 @@ Profiler's
 PyPi
 Pytest
 PyTorch
+QPS
 Qcycles
 Qwen
 RAII
@@ -673,6 +680,7 @@ denoised
 denoises
 denormalize
 dequantization
+dequantized
 dequantizes
 deserializers
 detections
@@ -788,6 +796,7 @@ linalg
 linearized
 linter
 linux
+llm
 llvm
 lm
 localscratch
@@ -838,6 +847,7 @@ passthrough
 pe
 perfcounter
 performant
+piecewise
 perl
 pragma
 pre
@@ -984,6 +994,7 @@ tokenizer
 tokenizes
 toolchain
 toolchains
+topk
 toolset
 toolsets
 torchtitan
@@ -1011,6 +1022,7 @@ USM
 UTCL
 UTIL
 utils
+UX
 vL
 variational
 vdi