Add Stable Diffusion XL to PyT training benchmark doc and fix paths in SGLang Disagg Inference doc (#5282)

* add sdxl to pytorch-training

* fix sphinx warnings

fix links

* fix paths in cmds and links in sglang disagg

* fix col width

* update release highlights

* fix

quickfix
This commit is contained in:
Peter Park
2025-09-16 16:49:33 -04:00
committed by GitHub
parent 5a5e4dbb6e
commit 26f708da87
6 changed files with 50 additions and 31 deletions

View File

@@ -335,8 +335,9 @@ ROCm documentation continues to be updated to provide clearer and more comprehen
benchmarking guides have been updated with expanded model coverage and
optimized Docker environments. Highlights include:
* The [Training a model with Primus and Megatron](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/primus-megatron.html) benchmarking guide
now leverages the unified AMD Primus framework with the Megatron backend. See [Primus: A Lightweight, Unified Training Framework for Large Models on AMD
* The [Training a model with Primus and Megatron](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/primus-megatron.html)
and [Training a model with Primus and PyTorch](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/primus-pytorch.html) benchmarking guides
now leverage the unified AMD Primus framework with the Megatron and torchtitan backends. See [Primus: A Lightweight, Unified Training Framework for Large Models on AMD
GPUs](https://rocm.blogs.amd.com/software-tools-optimization/primus/README.html) for an introduction to Primus.
* The [Training a model with PyTorch](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.html) benchmarking guide
@@ -345,6 +346,9 @@ ROCm documentation continues to be updated to provide clearer and more comprehen
* The [Training a model with JAX MaxText](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.html) benchmarking guide
now supports [MAD](https://github.com/ROCm/MAD)-integrated benchmarking. The MaxText training environment now uses JAX 0.6.0 or 0.5.0. FP8 quantized training is supported with JAX 0.5.0.
* The [SGLang distributed inference](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference/benchmark-docker/sglang-distributed.html?model=llama-3.1-8b-instruct) guide
provides a recipe to get started with disaggregated prefill/decode inference.
* The [vLLM inference performance testing](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference/benchmark-docker/vllm.html) documentation
now features clearer serving and throughput benchmarking commands -- for improved transparency of model benchmarking configurations. The vLLM inference
environment now uses vLLM 0.10.1 and includes improved default configurations.
@@ -408,17 +412,16 @@ ROCm documentation continues to be updated to provide clearer and more comprehen
## User space, driver, and firmware dependent changes
The software for AMD Datacenter GPU products requires maintaining a hardware
and software stack with interdependencies between the GPU and baseboard
firmware, AMD GPU drivers, and the ROCm user space software.
Running GPU software on AMD data center GPUs requires maintaining a coordinated
hardware and software stack. This stack has interdependencies between the GPU
and baseboard firmware, AMD GPU drivers, and the ROCm user-space software.
As of the ROCm 7.0.0 release, these interdependencies are publicly documented.
Note that while AMD publishes drivers and ROCm user space, your server or
While AMD publishes drivers and ROCm user space components, your server or
infrastructure provider publishes the GPU and baseboard firmware by bundling
AMDs firmware releases via AMDs Platform Level Data Model (PLDM) bundle,
which includes Integrated Firmware Image (IFWI).
GPU and baseboard firmware versioning might differ across GPU families. With the
GPU and baseboard firmware versioning might differ across GPU families. Note that with the
ROCm 7.0.0 release, the AMD GPU driver (amdgpu) is now versioned separately
from ROCm. See [AMD GPU Driver/ROCm packaging separation](#amd-gpu-driver-rocm-packaging-separation).

View File

@@ -72,7 +72,7 @@ the |docker-icon| icon to view the image on Docker Hub.
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.12-tf2.18-dev/images/sha256-96754ce2d30f729e19b497279915b5212ba33d5e408e7e5dd3f2304d87e3441e"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
- 24.04
- `Python 3.12 <https://www.python.org/downloads/release/python-31210/>`__
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
@@ -81,7 +81,7 @@ the |docker-icon| icon to view the image on Docker Hub.
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.10-tf2.18-dev/images/sha256-fa741508d383858e86985a9efac85174529127408102558ae2e3a4ac894eea1e"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.18.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
- 22.04
- `Python 3.10 <https://www.python.org/downloads/release/python-31017/>`__
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
@@ -90,7 +90,7 @@ the |docker-icon| icon to view the image on Docker Hub.
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.12-tf2.17-dev/images/sha256-3a0aef09f2a8833c2b64b85874dd9449ffc2ad257351857338ff5b706c03a418"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.17.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
- 24.04
- `Python 3.12 <https://www.python.org/downloads/release/python-31210/>`__
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`__
@@ -99,7 +99,7 @@ the |docker-icon| icon to view the image on Docker Hub.
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.10-tf2.17-dev/images/sha256-bc7341a41ebe7ab261aa100732874507c452421ef733e408ac4f05ed453b0bc5"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.17.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
- `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
- 22.04
- `Python 3.10 <https://www.python.org/downloads/release/python-31017/>`__
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`__
@@ -108,7 +108,7 @@ the |docker-icon| icon to view the image on Docker Hub.
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.12-tf2.16-dev/images/sha256-4841a8df7c340dab79bf9362dad687797649a00d594e0832eb83ea6880a40d3b"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.16.2-cp312-cp312-manylinux_2_28_x86_64.whl>`__
- `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
- 24.04
- `Python 3.12 <https://www.python.org/downloads/release/python-31210/>`__
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`__
@@ -117,7 +117,7 @@ the |docker-icon| icon to view the image on Docker Hub.
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.10-tf2.16-dev/images/sha256-883fa95aba960c58a3e46fceaa18f03ede2c7df89b8e9fd603ab2d47e0852897"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
- `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.16.2-cp310-cp310-manylinux_2_28_x86_64.whl>`__
- `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
- 22.04
- `Python 3.10 <https://www.python.org/downloads/release/python-31017/>`__
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`__

View File

@@ -150,6 +150,15 @@ model_groups:
url: https://huggingface.co/Qwen/Qwen2-7B
precision: BF16
training_modes: [finetune_fw, finetune_lora]
- group: Stable Diffusion
tag: sd
models:
- model: Stable Diffusion XL
mad_tag: pyt_huggingface_stable_diffusion_xl_2k_lora_finetuning
model_repo: SDXL
url: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
precision: BF16
training_modes: [finetune_lora]
- group: Flux
tag: flux
models:

View File

@@ -122,8 +122,8 @@ drivers.
git clone https://github.com/ROCm/MAD.git
cd MAD/docker
docker build \
-t sglang_dissag_pd_image \
-f sglang_dissag_inference.ubuntu.amd.Dockerfile .
-t sglang_disagg_pd_image \
-f sglang_disagg_inference.ubuntu.amd.Dockerfile .
Benchmarking
============
@@ -132,16 +132,16 @@ The `<https://github.com/ROCm/MAD/tree/develop/scripts/sglang_dissag>`__
repository contains scripts to launch SGLang inference with prefill/decode
disaggregation via Mooncake for supported models.
* `scripts/sglang_dissag/run_xPyD_models.slurm <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_dissag/run_xPyD_models.slurm>`__
* `scripts/sglang_dissag/run_xPyD_models.slurm <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_disagg/run_xPyD_models.slurm>`__
-- the main Slurm batch script to launch Docker containers on all nodes using ``sbatch`` or ``salloc``.
* `scripts/sglang_dissag/sglang_disagg_server.sh <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_dissag/sglang_disagg_server.sh>`__
* `scripts/sglang_dissag/sglang_disagg_server.sh <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_disagg/sglang_disagg_server.sh>`__
-- the entrypoint script that runs inside each container to start the correct service -- proxy, prefill, or decode.
* `scripts/sglang_dissag/benchmark_xPyD.sh <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_dissag/benchmark_xPyD.sh>`__
* `scripts/sglang_dissag/benchmark_xPyD.sh <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_disagg/benchmark_xPyD.sh>`__
-- the benchmark script to run the GSM8K accuracy benchmark and the SGLang benchmarking tool for performance measurement.
* `scripts/sglang_dissag/benchmark_parser.py <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_dissag/benchmark_parser.py>`__
* `scripts/sglang_dissag/benchmark_parser.py <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_disagg/benchmark_parser.py>`__
-- the log parser script to be run on the concurrency benchmark log file to generate tabulated data.
Launch the service
@@ -163,10 +163,10 @@ allocated nodes.
# Clone the MAD repo if you haven't already and
# navigate to the scripts directory
git clone https://github.com/ROCm/MAD.git
cd MAD/scripts/sglang_dissag/
cd MAD/scripts/sglang_disagg/
# Slurm sbatch run command
export DOCKER_IMAGE_NAME=sglang_dissag_pd_image
export DOCKER_IMAGE_NAME=sglang_disagg_pd_image
export xP=<num_prefill_nodes>
export yD=<num_decode_nodes>
export MODEL_NAME={{ model.model_repo }}

View File

@@ -406,8 +406,6 @@ benchmark results:
Further reading
===============
- See the ROCm/maxtext benchmarking README at `<https://github.com/ROCm/maxtext/blob/main/benchmarks/gpu-rocm/readme.md>`__.
- To learn more about MAD and the ``madengine`` CLI, see the `MAD usage guide <https://github.com/ROCm/MAD?tab=readme-ov-file#usage-guide>`__.
- To learn more about system settings and management practices to configure your system for

View File

@@ -56,7 +56,7 @@ vary by model -- select one to get started.
<div class="col-2 me-1 px-2 model-param-head">Model</div>
<div class="row col-10 pe-0">
{% for model_group in model_groups %}
<div class="col-3 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
<div class="col-4 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
{% endfor %}
</div>
</div>
@@ -150,9 +150,6 @@ doesnt test configurations and run conditions outside those described.
Run training
============
Run training
============
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/pytorch-training-benchmark-models.yaml
{% set unified_docker = data.dockers[0] %}
@@ -164,6 +161,12 @@ Run training
.. tab-item:: MAD-integrated benchmarking
{% for model_group in model_groups %}
{% for model in model_group.models %}
The following run command is tailored to {{ model.model }}.
See :ref:`amd-pytorch-training-model-support` to switch to another available model.
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
directory and install the required packages on the host machine.
@@ -173,9 +176,6 @@ Run training
cd MAD
pip install -r requirements.txt
{% for model_group in model_groups %}
{% for model in model_group.models %}
.. container:: model-doc {{ model.mad_tag }}
2. For example, use this command to run the performance benchmark test on the {{ model.model }} model
@@ -199,6 +199,15 @@ Run training
.. tab-item:: Standalone benchmarking
{% for model_group in model_groups %}
{% for model in model_group.models %}
The following commands are tailored to {{ model.model }}.
See :ref:`amd-pytorch-training-model-support` to switch to another available model.
{% endfor %}
{% endfor %}
.. rubric:: Download the Docker image and required packages
1. Use the following command to pull the Docker image from Docker Hub.