Merge remote-tracking branch 'upstream/develop' into idevelop

2026-04-27 03:01:52 -04:00 · 2025-03-18 13:18:11 -04:00
parent 4c4491b6b0 99a35bb1fc
commit 70702eb9ea
36 changed files with 380 additions and 312 deletions
--- a/docs/data/how-to/rocm-for-ai/inference/vllm-benchmark-models.yaml
+++ b/docs/data/how-to/rocm-for-ai/inference/vllm-benchmark-models.yaml
@@ -1,11 +1,12 @@
 vllm_benchmark:
  unified_docker:
    latest:
-      pull_tag: rocm/vllm:rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6
-      docker_hub_url: https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6/images/sha256-9a12ef62bbbeb5a4c30a01f702c8e025061f575aa129f291a49fbd02d6b4d6c9
+      pull_tag: rocm/vllm:instinct_main
+      docker_hub_url: https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_instinct_vllm0.7.3_20250311/images/sha256-de0a2649b735f45b7ecab8813eb7b19778ae1f40591ca1196b07bc29c42ed4a3
      rocm_version: 6.3.1
-      vllm_version: 0.6.6
-      pytorch_version: 2.7.0 (2.7.0a0+git3a58512)
+      vllm_version: 0.7.3
+      pytorch_version: 2.7.0 (dev nightly)
+      hipblaslt_version: 0.13
  model_groups:
    - group: Llama
      tag: llama
@@ -40,6 +41,11 @@ vllm_benchmark:
        model_repo: meta-llama/Llama-2-70b-chat-hf
        url: https://huggingface.co/meta-llama/Llama-2-70b-chat-hf
        precision: float16
+      - model: Llama 3.1 8B FP8
+        mad_tag: pyt_vllm_llama-3.1-8b_fp8
+        model_repo: amd/Llama-3.1-8B-Instruct-FP8-KV
+        url: https://huggingface.co/amd/Llama-3.1-8B-Instruct-FP8-KV
+        precision: float8
      - model: Llama 3.1 70B FP8
        mad_tag: pyt_vllm_llama-3.1-70b_fp8
        model_repo: amd/Llama-3.1-70B-Instruct-FP8-KV
--- a/docs/how-to/rocm-for-ai/inference-optimization/workload.rst
+++ b/docs/how-to/rocm-for-ai/inference-optimization/workload.rst
@@ -1705,12 +1705,12 @@ efficiency and throughput of various computational kernels.

   Occupancy related to VGPRs usage on an Instinct MI300X accelerator

-For example, according to the table, the available VGPR is 512 per Execution
-Unit (EU), and VGPU is allocated at the unit of 16. If the current VGPR usage
-is 170, the actual requested VGPR will be 176, so the occupancy is only 2
-waves per EU since :math:`176 \times 3 > 512`. So, if you set
-``waves_per_eu`` to 3, the LLVM backend tries to bring VGPR usage down so
-that it might fit 3 waves per EU.
+For example, according to the table, each Execution Unit (EU) has 512 available
+VGPRs, which are allocated in blocks of 16. If the current VGPR usage is 170,
+it will be rounded up to 176 due to the allocation granularity. In this case,
+the occupancy is limited to 2 waves per EU because :math:`176 \times 3 > 512`.
+So, if you set ``waves_per_eu`` to 3, the LLVM backend will attempt to reduce
+VGPR usage so that it might fit 3 waves per EU.

 ``BLOCK_M``, ``BLOCK_N``, ``BLOCK_K``
   Tile sizes to be tuned to balance the memory-to-computation ratio. The goal
--- a/docs/how-to/rocm-for-ai/inference/deploy-your-model.rst
+++ b/docs/how-to/rocm-for-ai/inference/deploy-your-model.rst
@@ -47,7 +47,7 @@ Validating vLLM performance
 ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM 
 on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in the CSV 
 format. For more information, see the guide to 
-`LLM inference performance validation with vLLM on the AMD Instinct™ MI300X accelerator <https://github.com/ROCm/MAD/blob/develop/benchmark/vllm/README.md>`_ 
+`LLM inference performance testing with vLLM on the AMD Instinct™ MI300X accelerator <https://github.com/ROCm/MAD/blob/develop/benchmark/vllm/README.md>`_ 
 on the ROCm GitHub repository.

 .. _rocm-for-ai-serve-hugging-face-tgi:
--- a/docs/how-to/rocm-for-ai/inference/index.rst
+++ b/docs/how-to/rocm-for-ai/inference/index.rst
@@ -20,6 +20,6 @@ training, fine-tuning, and inference. It leverages popular machine learning fram

 - :doc:`LLM inference frameworks <llm-inference-frameworks>`

- :doc:`Performance validation <vllm-benchmark>`
+- :doc:`Performance testing <vllm-benchmark>`

 - :doc:`Deploying your model <deploy-your-model>`
--- a/docs/how-to/rocm-for-ai/inference/llm-inference-frameworks.rst
+++ b/docs/how-to/rocm-for-ai/inference/llm-inference-frameworks.rst
@@ -140,8 +140,8 @@ Installing vLLM
   See :ref:`mi300x-vllm-optimization` for performance optimization tips.

   ROCm provides a prebuilt optimized Docker image for validating the performance of LLM inference with vLLM
-   on the MI300X accelerator. The Docker image includes ROCm, vLLM, PyTorch, and tuning files in CSV
-   format. For more information, see :doc:`vllm-benchmark`.
+   on the MI300X accelerator. The Docker image includes ROCm, vLLM, and PyTorch.
+   For more information, see :doc:`vllm-benchmark`.

 .. _fine-tuning-llms-tgi:

--- a/docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst
+++ b/docs/how-to/rocm-for-ai/inference/vllm-benchmark.rst
@@ -3,9 +3,9 @@
                 ROCm vLLM Docker image.
   :keywords: model, MAD, automation, dashboarding, validate

-***********************************************************
-LLM inference performance validation on AMD Instinct MI300X
-***********************************************************
+********************************************************
+LLM inference performance testing on AMD Instinct MI300X
+********************************************************

 .. _vllm-benchmark-unified-docker:

@@ -16,9 +16,9 @@ LLM inference performance validation on AMD Instinct MI300X

   The `ROCm vLLM Docker <{{ unified_docker.docker_hub_url }}>`_ image offers
   a prebuilt, optimized environment for validating large language model (LLM)
-   inference performance on the AMD Instinct™ MI300X accelerator. This ROCm vLLM
-   Docker image integrates vLLM and PyTorch tailored specifically for the MI300X
-   accelerator and includes the following components:
+   inference performance on AMD Instinct™ MI300X series accelerator. This ROCm vLLM
+   Docker image integrates vLLM and PyTorch tailored specifically for MI300X series
+   accelerators and includes the following components:

   * `ROCm {{ unified_docker.rocm_version }} <https://github.com/ROCm/ROCm>`_

@@ -26,9 +26,11 @@ LLM inference performance validation on AMD Instinct MI300X

   * `PyTorch {{ unified_docker.pytorch_version }} <https://github.com/pytorch/pytorch>`_

-   With this Docker image, you can quickly validate the expected inference
-   performance numbers for the MI300X accelerator. This topic also provides tips on
-   optimizing performance with popular AI models.
+   * `hipBLASLt {{ unified_docker.hipblaslt_version }} <https://github.com/ROCm/hipBLASLt>`_
+
+   With this Docker image, you can quickly test the :ref:`expected
+   inference performance numbers <vllm-benchmark-performance-measurements>` for
+   MI300X series accelerators.

   .. _vllm-benchmark-available-models:

@@ -79,7 +81,6 @@ LLM inference performance validation on AMD Instinct MI300X
      {% endfor %}
   {% endfor %}

-
   .. note::

      vLLM is a toolkit and library for LLM inference and serving. AMD implements
@@ -87,6 +88,29 @@ LLM inference performance validation on AMD Instinct MI300X
      See :ref:`fine-tuning-llms-vllm` and :ref:`mi300x-vllm-optimization` for
      more information.

+   .. _vllm-benchmark-performance-measurements:
+
+   Performance measurements
+   ========================
+
+   To evaluate performance, the
+   `Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
+   page provides reference throughput and latency measurements for inferencing
+   popular AI models.
+
+   .. note::
+
+      The performance data presented in
+      `Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html>`_
+      should not be interpreted as the peak performance achievable by AMD
+      Instinct MI325X and MI300X accelerators or ROCm software.
+
+   Advanced features and known issues
+   ==================================
+
+   For information on experimental features and known issues related to ROCm optimization efforts on vLLM,
+   see the developer's guide at `<https://github.com/ROCm/vllm/blob/main/docs/dev-docker/README.md>`__.
+
   Getting started
   ===============

@@ -162,13 +186,13 @@ LLM inference performance validation on AMD Instinct MI300X
         .. tab-item:: Standalone benchmarking

            Run the vLLM benchmark tool independently by starting the
-            `Docker container <https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6/images/sha256-9a12ef62bbbeb5a4c30a01f702c8e025061f575aa129f291a49fbd02d6b4d6c9>`_
+            `Docker container <{{ unified_docker.docker_hub_url }}>`_
            as shown in the following snippet.

            .. code-block::

-               docker pull rocm/vllm:rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6
-               docker run -it --device=/dev/kfd --device=/dev/dri --group-add video --shm-size 16G --security-opt seccomp=unconfined --security-opt apparmor=unconfined --cap-add=SYS_PTRACE -v $(pwd):/workspace --env HUGGINGFACE_HUB_CACHE=/workspace --name vllm_v0.6.6 rocm/vllm:rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6
+               docker pull {{ unified_docker.pull_tag }}
+               docker run -it --device=/dev/kfd --device=/dev/dri --group-add video --shm-size 16G --security-opt seccomp=unconfined --security-opt apparmor=unconfined --cap-add=SYS_PTRACE -v $(pwd):/workspace --env HUGGINGFACE_HUB_CACHE=/workspace --name test {{ unified_docker.pull_tag }}

            In the Docker container, clone the ROCm MAD repository and navigate to the
            benchmark scripts directory at ``~/MAD/scripts/vllm``.
@@ -290,3 +314,40 @@ Further reading

 - To learn how to fine-tune LLMs, see
  :doc:`Fine-tuning LLMs <../fine-tuning/index>`.
+
+Previous versions
+=================
+
+This table lists previous versions of the ROCm vLLM inference Docker image for
+inference performance testing. For detailed information about available models
+for benchmarking, see the version-specific documentation.
+
+.. list-table::
+   :header-rows: 1
+   :stub-columns: 1
+
+   * - ROCm version
+     - vLLM version
+     - PyTorch version
+     - Resources
+
+   * - 6.3.1
+     - 0.6.6
+     - 2.7.0
+     - 
+       * `Documentation <https://rocm.docs.amd.com/en/docs-6.3.2/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.html>`_
+       * `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.3.1_mi300_ubuntu22.04_py3.12_vllm_0.6.6/images/sha256-9a12ef62bbbeb5a4c30a01f702c8e025061f575aa129f291a49fbd02d6b4d6c9>`_
+
+   * - 6.2.1
+     - 0.6.4
+     - 2.5.0
+     - 
+       * `Documentation <https://rocm.docs.amd.com/en/docs-6.3.0/how-to/performance-validation/mi300x/vllm-benchmark.html>`_
+       * `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4/images/sha256-ccbb74cc9e7adecb8f7bdab9555f7ac6fc73adb580836c2a35ca96ff471890d8>`_
+
+   * - 6.2.0
+     - 0.4.3
+     - 2.4.0
+     - 
+       * `Documentation <https://rocm.docs.amd.com/en/docs-6.2.0/how-to/performance-validation/mi300x/vllm-benchmark.html>`_
+       * `Docker Hub <https://hub.docker.com/layers/rocm/vllm/rocm6.2_mi300_ubuntu22.04_py3.9_vllm_7c5fd50/images/sha256-9e4dd4788a794c3d346d7d0ba452ae5e92d39b8dfac438b2af8efdc7f15d22c0>`_
--- a/docs/how-to/rocm-for-ai/training/benchmark-docker/megatron-lm.rst
+++ b/docs/how-to/rocm-for-ai/training/benchmark-docker/megatron-lm.rst
@@ -1,5 +1,3 @@
-:orphan:
-
 .. meta::
   :description: How to train a model using Megatron-LM for ROCm.
   :keywords: ROCm, AI, LLM, train, Megatron-LM, megatron, Llama, tutorial, docker, torch
@@ -527,7 +525,7 @@ Previous versions
 =================

 This table lists previous versions of the ROCm Megatron-LM Docker image for training
-performance validation. For detailed information about available models for
+performance testing. For detailed information about available models for
 benchmarking, see the version-specific documentation.

 .. list-table::
--- a/docs/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.rst
+++ b/docs/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.rst
@@ -1,5 +1,3 @@
-:orphan:
-
 .. meta::
   :description: How to train a model using PyTorch for ROCm.
   :keywords: ROCm, AI, LLM, train, PyTorch, torch, Llama, flux, tutorial, docker
@@ -11,7 +9,7 @@ Training a model with PyTorch for ROCm
 PyTorch is an open-source machine learning framework that is widely used for
 model training with GPU-optimized components for transformer-based models.

-The PyTorch for ROCm training Docker (``rocm/pytorch-training:v25.3``) image
+The PyTorch for ROCm training Docker (``rocm/pytorch-training:v25.4``) image
 provides a prebuilt optimized environment for fine-tuning and pretraining a
 model on AMD Instinct MI325X and MI300X accelerators. It includes the following
 software components to accelerate training workloads:
@@ -39,12 +37,14 @@ software components to accelerate training workloads:
 Supported models
 ================

-The following models are pre-optimized for performance on the AMD Instinct MI300X accelerator.
+The following models are pre-optimized for performance on the AMD Instinct MI325X and MI300X accelerators.

 * Llama 3.1 8B

 * Llama 3.1 70B

+* Llama 2 70B
+
 * FLUX.1-dev

 .. note::
@@ -54,28 +54,30 @@ The following models are pre-optimized for performance on the AMD Instinct MI300
   Some models, such as Llama 3, require an external license agreement through
   a third party (for example, Meta).

+.. _amd-pytorch-training-performance-measurements:
+
+Performance measurements
+========================
+
+To evaluate performance, the
+`Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html#tabs-a8deaeb413-item-21cea50186-tab>`_
+page provides reference throughput and latency measurements for training
+popular AI models.
+
+.. note::
+
+   The performance data presented in
+   `Performance results with AMD ROCm software <https://www.amd.com/en/developer/resources/rocm-hub/dev-ai/performance-results.html#tabs-a8deaeb413-item-21cea50186-tab>`_
+   should not be interpreted as the peak performance achievable by AMD
+   Instinct MI325X and MI300X accelerators or ROCm software.
+
 System validation
 =================

-If you have already validated your system settings, skip this step. Otherwise,
-complete the :ref:`system validation and optimization steps <train-a-model-system-validation>`
-to set up your system before starting training.
-
-Disable NUMA auto-balancing
---------------------------
-
-Generally, application performance can benefit from disabling NUMA auto-balancing. However,
-it might be detrimental to performance with certain types of workloads.
-
-Run the command ``cat /proc/sys/kernel/numa_balancing`` to check your current NUMA (Non-Uniform
-Memory Access) settings. Output ``0`` indicates this setting is disabled. If there is no output or
-the output is ``1``, run the following command to disable NUMA auto-balancing.
-
-.. code-block:: shell
-
-   sudo sh -c 'echo 0 > /proc/sys/kernel/numa_balancing'
-
-See :ref:`mi300x-disable-numa` for more information.
+If you have already validated your system settings, including NUMA
+auto-balancing, skip this step. Otherwise, complete the :ref:`system validation
+and optimization steps <train-a-model-system-validation>` to set up your system
+before starting training.

 Environment setup
 =================
@@ -91,13 +93,13 @@ Download the Docker image

   .. code-block:: shell

-      docker pull rocm/pytorch-training:v25.3
+      docker pull rocm/pytorch-training:v25.4

 2. Run the Docker container.

   .. code-block:: shell

-      docker run -it --device /dev/dri --device /dev/kfd --network host --ipc host --group-add video --cap-add SYS_PTRACE --security-opt seccomp=unconfined --privileged -v $HOME:$HOME -v  $HOME/.ssh:/root/.ssh --shm-size 64G --name training_env rocm/pytorch-training:v25.3
+      docker run -it --device /dev/dri --device /dev/kfd --network host --ipc host --group-add video --cap-add SYS_PTRACE --security-opt seccomp=unconfined --privileged -v $HOME:$HOME -v  $HOME/.ssh:/root/.ssh --shm-size 64G --name training_env rocm/pytorch-training:v25.4

 3. Use these commands if you exit the ``training_env`` container and need to return to it.

@@ -106,20 +108,26 @@ Download the Docker image
      docker start training_env
      docker exec -it training_env bash

-4. In the Docker container, clone the `<https://github.com/ROCm/MAD>`__ repository and navigate to the benchmark scripts directory.
+4. In the Docker container, clone the `<https://github.com/ROCm/MAD>`__
+   repository and navigate to the benchmark scripts directory
+   ``/workspace/MAD/scripts/pytorch_train``.

   .. code-block:: shell

      git clone https://github.com/ROCm/MAD
-      cd MAD/scripts/pytorch-train
+      cd MAD/scripts/pytorch_train

 Prepare training datasets and dependencies
 ------------------------------------------

-The following benchmarking examples may require downloading models and datasets
+The following benchmarking examples require downloading models and datasets
 from Hugging Face. To ensure successful access to gated repos, set your
 ``HF_TOKEN``.

+.. code-block:: shell
+
+   export HF_TOKEN=$your_personal_hugging_face_access_token
+
 Run the setup script to install libraries and datasets needed for benchmarking.

 .. code-block:: shell
@@ -229,10 +237,12 @@ Along with the following datasets:

 * `WikiText <https://huggingface.co/datasets/Salesforce/wikitext>`_

+* `UltraChat 200k <https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k>`_
+
 * `bghira/pseudo-camera-10k <https://huggingface.co/datasets/bghira/pseudo-camera-10k>`_

-Start training on AMD Instinct accelerators
-===========================================
+Getting started
+===============

 The prebuilt PyTorch with ROCm training environment allows users to quickly validate
 system performance, conduct training benchmarks, and achieve superior
@@ -242,7 +252,7 @@ can expect the container to perform in the model configurations described in
 the following section, but other configurations are not validated by AMD.

 Use the following instructions to set up the environment, configure the script
-to train models, and reproduce the benchmark results on MI300X series
+to train models, and reproduce the benchmark results on MI325X and MI300X
 accelerators with the AMD PyTorch training Docker image.

 Once your environment is set up, use the following commands and examples to start benchmarking.
@@ -279,32 +289,59 @@ Options and available models
     - ``finetune_lora``
     - Benchmark LoRA fine-tuning (Llama 3.1 70B with BF16)

+   * -
+     - ``HF_finetune_lora``
+     - Benchmark LoRA fine-tuning with Hugging Face PEFT (Llama 2 70B with BF16)
+
   * - ``$datatype``
-     - FP8 or BF16
+     - ``FP8`` or ``BF16``
     - Only Llama 3.1 8B supports FP8 precision.

   * - ``$model_repo``
-     - Llama-3.1-8B
+     - ``Llama-3.1-8B``
     - `Llama 3.1 8B <https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct>`_

   * - 
-     - Llama-3.1-70B
+     - ``Llama-3.1-70B``
     - `Llama 3.1 70B <https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct>`_

   * - 
-     - Flux
+     - ``Llama-2-70B``
+     - `Llama 2 70B <https://huggingface.co/meta-llama/Llama-2-70B>`_
+
+   * - 
+     - ``Flux``
     - `FLUX.1 [dev] <https://huggingface.co/black-forest-labs/FLUX.1-dev>`_

+   * - ``$sequence_length``
+     - Sequence length for the language model.
+     - Between 2048 and 8192. 8192 by default.
+
+.. note::
+
+   Occasionally, downloading the Flux dataset might fail. In the event of this
+   error, manually download it from Hugging Face at
+   `black-forest-labs/FLUX.1-dev <https://huggingface.co/black-forest-labs/FLUX.1-dev>`_
+   and save it to `/workspace/FluxBenchmark`. This ensures that the test script can access
+   the required dataset.
+
 Fine-tuning
 -----------

-To start the fine-tuning benchmark, use the following command. It will run the benchmarking example of Llama 2 70B
+To start the fine-tuning benchmark, use the following command. It will run the benchmarking example of Llama 3.1 70B
 with the WikiText dataset using the AMD fork of `torchtune <https://github.com/AMD-AIG-AIMA/torchtune>`_.

 .. code-block:: shell

   ./pytorch_benchmark_report.sh -t {finetune_fw, finetune_lora} -p BF16 -m Llama-3.1-70B

+Use the following command to run the benchmarking example of Llama 2 70B with the UltraChat 200k dataset using
+`Hugging Face PEFT <https://huggingface.co/docs/peft/en/index>`_.
+
+.. code-block:: shell
+
+   ./pytorch_benchmark_report.sh -t HF_finetune_lora -p BF16 -m Llama-2-70B
+
 Benchmarking examples
 ---------------------

@@ -339,3 +376,32 @@ Here are some examples of how to use the command.
  .. code-block:: shell

     ./pytorch_benchmark_report.sh -t finetune_lora -p BF16 -m Llama-3.1-70B
+
+* Example 6: Hugging Face PEFT LoRA fine-tuning with Llama 2 70B
+
+  .. code-block:: shell
+
+     ./pytorch_benchmark_report.sh -t HF_finetune_lora -p BF16 -m Llama-2-70B
+
+Previous versions
+=================
+
+This table lists previous versions of the ROCm PyTorch training Docker image for training
+performance validation. For detailed information about available models for
+benchmarking, see the version-specific documentation.
+
+.. list-table::
+   :header-rows: 1
+   :stub-columns: 1
+
+   * - Image version
+     - ROCm version
+     - PyTorch version
+     - Resources
+
+   * - v25.3
+     - 6.3.0
+     - 2.7.0a0+git637433
+     - 
+       * `Documentation <https://rocm.docs.amd.com/en/docs-6.3.2/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.html>`_
+       * `Docker Hub <https://hub.docker.com/layers/rocm/pytorch-training/v25.3/images/sha256-0ffdde1b590fd2787b1c7adf5686875b100980b0f314090901387c44253e709b>`_
--- a/docs/release/versions.md
+++ b/docs/release/versions.md
@@ -18,7 +18,7 @@
 | [6.2.2](https://rocm.docs.amd.com/en/docs-6.2.2/) | September 27, 2024 |
 | [6.2.1](https://rocm.docs.amd.com/en/docs-6.2.1/) | September 20, 2024 |
 | [6.2.0](https://rocm.docs.amd.com/en/docs-6.2.0/) | August 2, 2024 |
-| [6.1.5](https://rocm.docs.amd.com/en/docs-6.1.2/) | March 13, 2025 |
+| [6.1.5](https://rocm.docs.amd.com/en/docs-6.1.5/) | March 13, 2025 |
 | [6.1.2](https://rocm.docs.amd.com/en/docs-6.1.2/) | June 4, 2024 |
 | [6.1.1](https://rocm.docs.amd.com/en/docs-6.1.1/) | May 8, 2024 |
 | [6.1.0](https://rocm.docs.amd.com/en/docs-6.1.0/) | Apr 16, 2024 |
--- a/docs/sphinx/_toc.yml.in
+++ b/docs/sphinx/_toc.yml.in
@@ -73,7 +73,7 @@ subtrees:
          - file: how-to/rocm-for-ai/inference/llm-inference-frameworks.rst
            title: LLM inference frameworks
          - file: how-to/rocm-for-ai/inference/vllm-benchmark.rst
-            title: Performance validation
+            title: Performance testing
          - file: how-to/rocm-for-ai/inference/deploy-your-model.rst
            title: Deploy your model

--- a/docs/sphinx/requirements.in
+++ b/docs/sphinx/requirements.in
@@ -1,4 +1,4 @@
-rocm-docs-core==1.17.1
+rocm-docs-core==1.18.1
 sphinx-reredirects
 sphinx-sitemap
 sphinxcontrib.datatemplates==0.11.0
--- a/docs/sphinx/requirements.txt
+++ b/docs/sphinx/requirements.txt
@@ -190,7 +190,7 @@ requests==2.32.3
    # via
    #   pygithub
    #   sphinx
-rocm-docs-core==1.17.1
+rocm-docs-core==1.18.1
    # via -r requirements.in
 rpds-py==0.22.3
    # via