Updates to the vLLM optimization guide for MI300X/MI355X (#5554)

* Expand vLLM optimization guide for MI300X/MI355X with comprehensive AITER coverage. attention backend selection, environment variables (HIP/RCCL/Quick Reduce), parallelism strategies, quantization (FP8/FP4), engine tuning, CUDA graph modes, and multi-node scaling.

Co-authored-by: PinSiang <pinsiang.tan@embeddedllm.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: pinsiangamd <pinsiang.tan@amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
This commit is contained in:
peterjunpark
2025-10-22 12:54:25 -04:00
committed by GitHub
parent 6f8cf36279
commit cb8d21a0df
4 changed files with 1208 additions and 428 deletions

View File

@@ -27,6 +27,7 @@ ASICs
ASan
ASAN
ASm
Async
ATI
atomicRMW
AddressSanitizer
@@ -133,6 +134,7 @@ ELMo
ENDPGM
EPYC
ESXi
EP
EoS
etcd
fas
@@ -184,6 +186,7 @@ GPR
GPT
GPU
GPU's
GPUDirect
GPUs
GraphBolt
GraphSage
@@ -302,6 +305,7 @@ Makefiles
Matplotlib
Matrox
MaxText
MBT
Megablocks
Megatrends
Megatron
@@ -311,6 +315,7 @@ Meta's
Miniconda
MirroredStrategy
Mixtral
MLA
MosaicML
MoEs
Mooncake
@@ -353,6 +358,7 @@ OFED
OMM
OMP
OMPI
OOM
OMPT
OMPX
ONNX
@@ -398,6 +404,7 @@ Profiler's
PyPi
Pytest
PyTorch
QPS
Qcycles
Qwen
RAII
@@ -673,6 +680,7 @@ denoised
denoises
denormalize
dequantization
dequantized
dequantizes
deserializers
detections
@@ -788,6 +796,7 @@ linalg
linearized
linter
linux
llm
llvm
lm
localscratch
@@ -838,6 +847,7 @@ passthrough
pe
perfcounter
performant
piecewise
perl
pragma
pre
@@ -984,6 +994,7 @@ tokenizer
tokenizes
toolchain
toolchains
topk
toolset
toolsets
torchtitan
@@ -1011,6 +1022,7 @@ USM
UTCL
UTIL
utils
UX
vL
variational
vdi