From bb7af3351a133a46551e508b67b86fd54ca057cd Mon Sep 17 00:00:00 2001 From: Peter Park Date: Thu, 8 May 2025 09:24:51 -0400 Subject: [PATCH] Fix incorrect throughput benchmark command in inference/vllm-benchmark.rst (#4723) * update inference index to include pyt inference * fix incorrect command in throughput benchmark * wording --- docs/how-to/rocm-for-ai/inference/index.rst | 4 +++- docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst | 6 +++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/docs/how-to/rocm-for-ai/inference/index.rst b/docs/how-to/rocm-for-ai/inference/index.rst index 298014b6a..779c32381 100644 --- a/docs/how-to/rocm-for-ai/inference/index.rst +++ b/docs/how-to/rocm-for-ai/inference/index.rst @@ -20,6 +20,8 @@ training, fine-tuning, and inference. It leverages popular machine learning fram - :doc:`LLM inference frameworks ` -- :doc:`Performance testing ` +- :doc:`vLLM inference performance testing ` + +- :doc:`PyTorch inference performance testing ` - :doc:`Deploying your model ` diff --git a/docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst b/docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst index df6454aa9..8d530778f 100644 --- a/docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst +++ b/docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst @@ -276,7 +276,7 @@ vLLM inference performance testing * Latency benchmark - Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with the ``{{model.precision}}`` data type. + Use this command to benchmark the latency of the {{model.model}} model on eight GPUs with ``{{model.precision}}`` precision. .. code-block:: @@ -286,11 +286,11 @@ vLLM inference performance testing * Throughput benchmark - Use this command to throughput the latency of the {{model.model}} model on eight GPUs with the ``{{model.precision}}`` data type. + Use this command to benchmark the throughput of the {{model.model}} model on eight GPUs with ``{{model.precision}}`` precision. .. code-block:: shell - ./vllm_benchmark_report.sh -s latency -m {{model.model_repo}} -g 8 -d {{model.precision}} + ./vllm_benchmark_report.sh -s throughput -m {{model.model_repo}} -g 8 -d {{model.precision}} Find the throughput report at ``./reports_{{model.precision}}_vllm_rocm{{unified_docker.rocm_version}}/summary/{{model.model_repo.split('/', 1)[1] if '/' in model.model_repo else model.model_repo}}_throughput_report.csv``.