Add Stable Diffusion XL to PyT training benchmark doc and fix paths in SGLang Disagg Inference doc (#5282)

* add sdxl to pytorch-training * fix sphinx warnings fix links * fix paths in cmds and links in sglang disagg * fix col width * update release highlights * fix quickfix
2026-01-10 15:18:11 -05:00 · 2025-09-16 16:49:33 -04:00
parent 5a5e4dbb6e
commit 26f708da87
6 changed files with 50 additions and 31 deletions
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -335,8 +335,9 @@ ROCm documentation continues to be updated to provide clearer and more comprehen
  benchmarking guides have been updated with expanded model coverage and 
  optimized Docker environments. Highlights include:

-  * The [Training a model with Primus and Megatron](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/primus-megatron.html) benchmarking guide
-    now leverages the unified AMD Primus framework with the Megatron backend. See [Primus: A Lightweight, Unified Training Framework for Large Models on AMD
+  * The [Training a model with Primus and Megatron](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/primus-megatron.html) 
+    and [Training a model with Primus and PyTorch](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/primus-pytorch.html) benchmarking guides
+    now leverage the unified AMD Primus framework with the Megatron and torchtitan backends. See [Primus: A Lightweight, Unified Training Framework for Large Models on AMD
    GPUs](https://rocm.blogs.amd.com/software-tools-optimization/primus/README.html) for an introduction to Primus.

  * The [Training a model with PyTorch](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.html) benchmarking guide
@@ -345,6 +346,9 @@ ROCm documentation continues to be updated to provide clearer and more comprehen
  * The [Training a model with JAX MaxText](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.html) benchmarking guide
    now supports [MAD](https://github.com/ROCm/MAD)-integrated benchmarking. The MaxText training environment now uses JAX 0.6.0 or 0.5.0. FP8 quantized training is supported with JAX 0.5.0.

+  * The [SGLang distributed inference](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference/benchmark-docker/sglang-distributed.html?model=llama-3.1-8b-instruct) guide
+    provides a recipe to get started with disaggregated prefill/decode inference.
+
  * The [vLLM inference performance testing](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference/benchmark-docker/vllm.html) documentation
    now features clearer serving and throughput benchmarking commands -- for improved transparency of model benchmarking configurations. The vLLM inference
    environment now uses vLLM 0.10.1 and includes improved default configurations.
@@ -408,17 +412,16 @@ ROCm documentation continues to be updated to provide clearer and more comprehen

 ## User space, driver, and firmware dependent changes

-The software for AMD Datacenter GPU products requires maintaining a hardware
-and software stack with interdependencies between the GPU and baseboard
-firmware, AMD GPU drivers, and the ROCm user space software.
-
+Running GPU software on AMD data center GPUs requires maintaining a coordinated
+hardware and software stack. This stack has interdependencies between the GPU
+and baseboard firmware, AMD GPU drivers, and the ROCm user-space software.
 As of the ROCm 7.0.0 release, these interdependencies are publicly documented.
-Note that while AMD publishes drivers and ROCm user space, your server or
+While AMD publishes drivers and ROCm user space components, your server or
 infrastructure provider publishes the GPU and baseboard firmware by bundling
 AMD’s firmware releases via AMD’s Platform Level Data Model (PLDM) bundle,
 which includes Integrated Firmware Image (IFWI).

-GPU and baseboard firmware versioning might differ across GPU families. With the
+GPU and baseboard firmware versioning might differ across GPU families. Note that with the
 ROCm 7.0.0 release, the AMD GPU driver (amdgpu) is now versioned separately
 from ROCm. See [AMD GPU Driver/ROCm packaging separation](#amd-gpu-driver-rocm-packaging-separation).

--- a/docs/compatibility/ml-compatibility/tensorflow-compatibility.rst
+++ b/docs/compatibility/ml-compatibility/tensorflow-compatibility.rst
@@ -72,7 +72,7 @@ the |docker-icon| icon to view the image on Docker Hub.

           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.12-tf2.18-dev/images/sha256-96754ce2d30f729e19b497279915b5212ba33d5e408e7e5dd3f2304d87e3441e"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>

-      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
+      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
      - 24.04
      - `Python 3.12 <https://www.python.org/downloads/release/python-31210/>`__
      - `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
@@ -81,7 +81,7 @@ the |docker-icon| icon to view the image on Docker Hub.

           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.10-tf2.18-dev/images/sha256-fa741508d383858e86985a9efac85174529127408102558ae2e3a4ac894eea1e"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>

-      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.18.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
+      - `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
      - 22.04
      - `Python 3.10 <https://www.python.org/downloads/release/python-31017/>`__
      - `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
@@ -90,7 +90,7 @@ the |docker-icon| icon to view the image on Docker Hub.

           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.12-tf2.17-dev/images/sha256-3a0aef09f2a8833c2b64b85874dd9449ffc2ad257351857338ff5b706c03a418"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>

-      - `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.17.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
+      - `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
      - 24.04
      - `Python 3.12 <https://www.python.org/downloads/release/python-31210/>`__
      - `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`__
@@ -99,7 +99,7 @@ the |docker-icon| icon to view the image on Docker Hub.

           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.10-tf2.17-dev/images/sha256-bc7341a41ebe7ab261aa100732874507c452421ef733e408ac4f05ed453b0bc5"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>

-      - `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.17.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
+      - `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
      - 22.04
      - `Python 3.10 <https://www.python.org/downloads/release/python-31017/>`__
      - `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`__
@@ -108,7 +108,7 @@ the |docker-icon| icon to view the image on Docker Hub.

           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.12-tf2.16-dev/images/sha256-4841a8df7c340dab79bf9362dad687797649a00d594e0832eb83ea6880a40d3b"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>

-      - `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.16.2-cp312-cp312-manylinux_2_28_x86_64.whl>`__
+      - `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
      - 24.04
      - `Python 3.12 <https://www.python.org/downloads/release/python-31210/>`__
      - `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`__
@@ -117,7 +117,7 @@ the |docker-icon| icon to view the image on Docker Hub.

           <a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.10-tf2.16-dev/images/sha256-883fa95aba960c58a3e46fceaa18f03ede2c7df89b8e9fd603ab2d47e0852897"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>

-      - `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.16.2-cp310-cp310-manylinux_2_28_x86_64.whl>`__
+      - `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
      - 22.04
      - `Python 3.10 <https://www.python.org/downloads/release/python-31017/>`__
      - `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`__
--- a/docs/data/how-to/rocm-for-ai/training/pytorch-training-benchmark-models.yaml
+++ b/docs/data/how-to/rocm-for-ai/training/pytorch-training-benchmark-models.yaml
@@ -150,6 +150,15 @@ model_groups:
      url: https://huggingface.co/Qwen/Qwen2-7B
      precision: BF16
      training_modes: [finetune_fw, finetune_lora]
+  - group: Stable Diffusion
+    tag: sd
+    models:
+    - model: Stable Diffusion XL
+      mad_tag: pyt_huggingface_stable_diffusion_xl_2k_lora_finetuning
+      model_repo: SDXL
+      url: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
+      precision: BF16
+      training_modes: [finetune_lora]
  - group: Flux
    tag: flux
    models:
--- a/docs/how-to/rocm-for-ai/inference/benchmark-docker/sglang-distributed.rst
+++ b/docs/how-to/rocm-for-ai/inference/benchmark-docker/sglang-distributed.rst
@@ -122,8 +122,8 @@ drivers.
   git clone https://github.com/ROCm/MAD.git
   cd MAD/docker
   docker build \
-       -t sglang_dissag_pd_image \
-       -f sglang_dissag_inference.ubuntu.amd.Dockerfile .
+       -t sglang_disagg_pd_image \
+       -f sglang_disagg_inference.ubuntu.amd.Dockerfile .

 Benchmarking
 ============
@@ -132,16 +132,16 @@ The `<https://github.com/ROCm/MAD/tree/develop/scripts/sglang_dissag>`__
 repository contains scripts to launch SGLang inference with prefill/decode
 disaggregation via Mooncake for supported models.

-* `scripts/sglang_dissag/run_xPyD_models.slurm <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_dissag/run_xPyD_models.slurm>`__
+* `scripts/sglang_dissag/run_xPyD_models.slurm <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_disagg/run_xPyD_models.slurm>`__
  -- the main Slurm batch script to launch Docker containers on all nodes using ``sbatch`` or ``salloc``.

-* `scripts/sglang_dissag/sglang_disagg_server.sh <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_dissag/sglang_disagg_server.sh>`__
+* `scripts/sglang_dissag/sglang_disagg_server.sh <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_disagg/sglang_disagg_server.sh>`__
  -- the entrypoint script that runs inside each container to start the correct service -- proxy, prefill, or decode.

-* `scripts/sglang_dissag/benchmark_xPyD.sh <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_dissag/benchmark_xPyD.sh>`__
+* `scripts/sglang_dissag/benchmark_xPyD.sh <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_disagg/benchmark_xPyD.sh>`__
  -- the benchmark script to run the GSM8K accuracy benchmark and the SGLang benchmarking tool for performance measurement.

-* `scripts/sglang_dissag/benchmark_parser.py <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_dissag/benchmark_parser.py>`__
+* `scripts/sglang_dissag/benchmark_parser.py <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_disagg/benchmark_parser.py>`__
  -- the log parser script to be run on the concurrency benchmark log file to generate tabulated data.

 Launch the service
@@ -163,10 +163,10 @@ allocated nodes.
         # Clone the MAD repo if you haven't already and
         # navigate to the scripts directory
         git clone https://github.com/ROCm/MAD.git
-         cd MAD/scripts/sglang_dissag/
+         cd MAD/scripts/sglang_disagg/

         # Slurm sbatch run command
-         export DOCKER_IMAGE_NAME=sglang_dissag_pd_image
+         export DOCKER_IMAGE_NAME=sglang_disagg_pd_image
         export xP=<num_prefill_nodes>
         export yD=<num_decode_nodes>
         export MODEL_NAME={{ model.model_repo }}
--- a/docs/how-to/rocm-for-ai/training/benchmark-docker/jax-maxtext.rst
+++ b/docs/how-to/rocm-for-ai/training/benchmark-docker/jax-maxtext.rst
@@ -406,8 +406,6 @@ benchmark results:
 Further reading
 ===============

- See the ROCm/maxtext benchmarking README at `<https://github.com/ROCm/maxtext/blob/main/benchmarks/gpu-rocm/readme.md>`__.
-
 - To learn more about MAD and the ``madengine`` CLI, see the `MAD usage guide <https://github.com/ROCm/MAD?tab=readme-ov-file#usage-guide>`__.

 - To learn more about system settings and management practices to configure your system for
--- a/docs/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.rst
+++ b/docs/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.rst
@@ -56,7 +56,7 @@ vary by model -- select one to get started.
            <div class="col-2 me-1 px-2 model-param-head">Model</div>
            <div class="row col-10 pe-0">
      {% for model_group in model_groups %}
-               <div class="col-3 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
+               <div class="col-4 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
      {% endfor %}
            </div>
         </div>
@@ -150,9 +150,6 @@ doesn’t test configurations and run conditions outside those described.
 Run training
 ============

-Run training
-============
-
 .. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/pytorch-training-benchmark-models.yaml

   {% set unified_docker = data.dockers[0] %}
@@ -164,6 +161,12 @@ Run training

      .. tab-item:: MAD-integrated benchmarking

+   {% for model_group in model_groups %}
+      {% for model in model_group.models %}
+
+         The following run command is tailored to {{ model.model }}.
+         See :ref:`amd-pytorch-training-model-support` to switch to another available model.
+
         1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
            directory and install the required packages on the host machine.

@@ -173,9 +176,6 @@ Run training
               cd MAD
               pip install -r requirements.txt

-   {% for model_group in model_groups %}
-      {% for model in model_group.models %}
-
         .. container:: model-doc {{ model.mad_tag }}

            2. For example, use this command to run the performance benchmark test on the {{ model.model }} model
@@ -199,6 +199,15 @@ Run training

      .. tab-item:: Standalone benchmarking

+   {% for model_group in model_groups %}
+      {% for model in model_group.models %}
+
+         The following commands are tailored to {{ model.model }}.
+         See :ref:`amd-pytorch-training-model-support` to switch to another available model.
+
+      {% endfor %}
+   {% endfor %}
+
         .. rubric:: Download the Docker image and required packages

         1. Use the following command to pull the Docker image from Docker Hub.