mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-10 07:08:08 -05:00
Add introduction and links to the new guide to the vLLM optimized Doc… (#3637)
* Add introduction and links to the new guide to the vLLM optimized Docker image on AMD Infinity Hub * Update target link for the Docker vLLM guide * Change target URL * Change link target URL again
This commit is contained in:
@@ -137,6 +137,12 @@ Installing vLLM
|
||||
|
||||
Refer to :ref:`mi300x-vllm-optimization` for performance optimization tips.
|
||||
|
||||
ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM
|
||||
on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV
|
||||
format. For more information, see the guide to
|
||||
`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator <https://github.com/ROCm/MAD/blob/develop/benchmark/vllm/README.md>`_
|
||||
on the ROCm GitHub repository.
|
||||
|
||||
.. _fine-tuning-llms-tgi:
|
||||
|
||||
Hugging Face TGI
|
||||
|
||||
@@ -41,6 +41,15 @@ vLLM walkthrough
|
||||
Refer to this developer blog for guidance on serving with vLLM `Inferencing and serving with vLLM on AMD GPUs — ROCm
|
||||
Blogs <https://rocm.blogs.amd.com/artificial-intelligence/vllm/README.html>`_
|
||||
|
||||
Validating vLLM performance
|
||||
---------------------------
|
||||
|
||||
ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM
|
||||
on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV
|
||||
format. For more information, see the guide to
|
||||
`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator <https://github.com/ROCm/MAD/blob/develop/benchmark/vllm/README.md>`_
|
||||
on the ROCm GitHub repository.
|
||||
|
||||
.. _rocm-for-ai-serve-hugging-face-tgi:
|
||||
|
||||
Serving using Hugging Face TGI
|
||||
|
||||
@@ -150,6 +150,12 @@ the workload to validate improvements and ensure that the changes have had the
|
||||
desired effect. Continuous iteration helps refine the performance gains and
|
||||
address any new bottlenecks that may emerge.
|
||||
|
||||
ROCm provides a prebuilt optimized Docker image that has everything required to implement
|
||||
the tips in this section. It includes ROCm, vLLM, PyTorch, and tuning files in the CSV
|
||||
format. For more information, see the guide to
|
||||
`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator <https://github.com/ROCm/MAD/blob/develop/benchmark/vllm/README.md>`_
|
||||
on the ROCm GitHub repository.
|
||||
|
||||
.. _mi300x-profiling-tools:
|
||||
|
||||
Profiling tools
|
||||
@@ -372,6 +378,12 @@ Refer to `vLLM documentation <https://docs.vllm.ai/en/latest/models/performance.
|
||||
for additional performance tips. :ref:`fine-tuning-llms-vllm` describes vLLM
|
||||
usage with ROCm.
|
||||
|
||||
ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM
|
||||
on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV
|
||||
format. For more information, see the guide to
|
||||
`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator <https://github.com/ROCm/MAD/blob/develop/benchmark/vllm/README.md>`_
|
||||
on the ROCm GitHub repository.
|
||||
|
||||
Maximize throughput
|
||||
-------------------
|
||||
|
||||
|
||||
Reference in New Issue
Block a user