mirror of
https://github.com/ROCm/ROCm.git
synced 2026-02-13 16:05:07 -05:00
* Expand vLLM optimization guide for MI300X/MI355X with comprehensive AITER coverage. attention backend selection, environment variables (HIP/RCCL/Quick Reduce), parallelism strategies, quantization (FP8/FP4), engine tuning, CUDA graph modes, and multi-node scaling.
Co-authored-by: PinSiang <pinsiang.tan@embeddedllm.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: pinsiangamd <pinsiang.tan@amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
(cherry picked from commit cb8d21a0df)