mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-10 15:18:11 -05:00
Add Stable Diffusion XL to PyT training benchmark doc and fix paths in SGLang Disagg Inference doc (#5282)
* add sdxl to pytorch-training * fix sphinx warnings fix links * fix paths in cmds and links in sglang disagg * fix col width * update release highlights * fix quickfix
This commit is contained in:
19
RELEASE.md
19
RELEASE.md
@@ -335,8 +335,9 @@ ROCm documentation continues to be updated to provide clearer and more comprehen
|
||||
benchmarking guides have been updated with expanded model coverage and
|
||||
optimized Docker environments. Highlights include:
|
||||
|
||||
* The [Training a model with Primus and Megatron](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/primus-megatron.html) benchmarking guide
|
||||
now leverages the unified AMD Primus framework with the Megatron backend. See [Primus: A Lightweight, Unified Training Framework for Large Models on AMD
|
||||
* The [Training a model with Primus and Megatron](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/primus-megatron.html)
|
||||
and [Training a model with Primus and PyTorch](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/primus-pytorch.html) benchmarking guides
|
||||
now leverage the unified AMD Primus framework with the Megatron and torchtitan backends. See [Primus: A Lightweight, Unified Training Framework for Large Models on AMD
|
||||
GPUs](https://rocm.blogs.amd.com/software-tools-optimization/primus/README.html) for an introduction to Primus.
|
||||
|
||||
* The [Training a model with PyTorch](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.html) benchmarking guide
|
||||
@@ -345,6 +346,9 @@ ROCm documentation continues to be updated to provide clearer and more comprehen
|
||||
* The [Training a model with JAX MaxText](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/pytorch-training.html) benchmarking guide
|
||||
now supports [MAD](https://github.com/ROCm/MAD)-integrated benchmarking. The MaxText training environment now uses JAX 0.6.0 or 0.5.0. FP8 quantized training is supported with JAX 0.5.0.
|
||||
|
||||
* The [SGLang distributed inference](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference/benchmark-docker/sglang-distributed.html?model=llama-3.1-8b-instruct) guide
|
||||
provides a recipe to get started with disaggregated prefill/decode inference.
|
||||
|
||||
* The [vLLM inference performance testing](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference/benchmark-docker/vllm.html) documentation
|
||||
now features clearer serving and throughput benchmarking commands -- for improved transparency of model benchmarking configurations. The vLLM inference
|
||||
environment now uses vLLM 0.10.1 and includes improved default configurations.
|
||||
@@ -408,17 +412,16 @@ ROCm documentation continues to be updated to provide clearer and more comprehen
|
||||
|
||||
## User space, driver, and firmware dependent changes
|
||||
|
||||
The software for AMD Datacenter GPU products requires maintaining a hardware
|
||||
and software stack with interdependencies between the GPU and baseboard
|
||||
firmware, AMD GPU drivers, and the ROCm user space software.
|
||||
|
||||
Running GPU software on AMD data center GPUs requires maintaining a coordinated
|
||||
hardware and software stack. This stack has interdependencies between the GPU
|
||||
and baseboard firmware, AMD GPU drivers, and the ROCm user-space software.
|
||||
As of the ROCm 7.0.0 release, these interdependencies are publicly documented.
|
||||
Note that while AMD publishes drivers and ROCm user space, your server or
|
||||
While AMD publishes drivers and ROCm user space components, your server or
|
||||
infrastructure provider publishes the GPU and baseboard firmware by bundling
|
||||
AMD’s firmware releases via AMD’s Platform Level Data Model (PLDM) bundle,
|
||||
which includes Integrated Firmware Image (IFWI).
|
||||
|
||||
GPU and baseboard firmware versioning might differ across GPU families. With the
|
||||
GPU and baseboard firmware versioning might differ across GPU families. Note that with the
|
||||
ROCm 7.0.0 release, the AMD GPU driver (amdgpu) is now versioned separately
|
||||
from ROCm. See [AMD GPU Driver/ROCm packaging separation](#amd-gpu-driver-rocm-packaging-separation).
|
||||
|
||||
|
||||
@@ -72,7 +72,7 @@ the |docker-icon| icon to view the image on Docker Hub.
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.12-tf2.18-dev/images/sha256-96754ce2d30f729e19b497279915b5212ba33d5e408e7e5dd3f2304d87e3441e"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
|
||||
|
||||
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
|
||||
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
|
||||
- 24.04
|
||||
- `Python 3.12 <https://www.python.org/downloads/release/python-31210/>`__
|
||||
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
|
||||
@@ -81,7 +81,7 @@ the |docker-icon| icon to view the image on Docker Hub.
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.10-tf2.18-dev/images/sha256-fa741508d383858e86985a9efac85174529127408102558ae2e3a4ac894eea1e"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
|
||||
|
||||
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.18.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
|
||||
- `tensorflow-rocm 2.18.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
|
||||
- 22.04
|
||||
- `Python 3.10 <https://www.python.org/downloads/release/python-31017/>`__
|
||||
- `TensorBoard 2.18.0 <https://github.com/tensorflow/tensorboard/tree/2.18.0>`__
|
||||
@@ -90,7 +90,7 @@ the |docker-icon| icon to view the image on Docker Hub.
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.12-tf2.17-dev/images/sha256-3a0aef09f2a8833c2b64b85874dd9449ffc2ad257351857338ff5b706c03a418"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
|
||||
|
||||
- `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.17.1-cp312-cp312-manylinux_2_28_x86_64.whl>`__
|
||||
- `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
|
||||
- 24.04
|
||||
- `Python 3.12 <https://www.python.org/downloads/release/python-31210/>`__
|
||||
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`__
|
||||
@@ -99,7 +99,7 @@ the |docker-icon| icon to view the image on Docker Hub.
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.10-tf2.17-dev/images/sha256-bc7341a41ebe7ab261aa100732874507c452421ef733e408ac4f05ed453b0bc5"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
|
||||
|
||||
- `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.17.1-cp310-cp310-manylinux_2_28_x86_64.whl>`__
|
||||
- `tensorflow-rocm 2.17.1 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
|
||||
- 22.04
|
||||
- `Python 3.10 <https://www.python.org/downloads/release/python-31017/>`__
|
||||
- `TensorBoard 2.17.1 <https://github.com/tensorflow/tensorboard/tree/2.17.1>`__
|
||||
@@ -108,7 +108,7 @@ the |docker-icon| icon to view the image on Docker Hub.
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.12-tf2.16-dev/images/sha256-4841a8df7c340dab79bf9362dad687797649a00d594e0832eb83ea6880a40d3b"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
|
||||
|
||||
- `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.16.2-cp312-cp312-manylinux_2_28_x86_64.whl>`__
|
||||
- `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
|
||||
- 24.04
|
||||
- `Python 3.12 <https://www.python.org/downloads/release/python-31210/>`__
|
||||
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`__
|
||||
@@ -117,7 +117,7 @@ the |docker-icon| icon to view the image on Docker Hub.
|
||||
|
||||
<a href="https://hub.docker.com/layers/rocm/tensorflow/rocm6.4.2-py3.10-tf2.16-dev/images/sha256-883fa95aba960c58a3e46fceaa18f03ede2c7df89b8e9fd603ab2d47e0852897"><i class="fab fa-docker fa-lg"></i> rocm/tensorflow</a>
|
||||
|
||||
- `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/tensorflow_rocm-2.16.2-cp310-cp310-manylinux_2_28_x86_64.whl>`__
|
||||
- `tensorflow-rocm 2.16.2 <https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.2/>`__
|
||||
- 22.04
|
||||
- `Python 3.10 <https://www.python.org/downloads/release/python-31017/>`__
|
||||
- `TensorBoard 2.16.2 <https://github.com/tensorflow/tensorboard/tree/2.16.2>`__
|
||||
|
||||
@@ -150,6 +150,15 @@ model_groups:
|
||||
url: https://huggingface.co/Qwen/Qwen2-7B
|
||||
precision: BF16
|
||||
training_modes: [finetune_fw, finetune_lora]
|
||||
- group: Stable Diffusion
|
||||
tag: sd
|
||||
models:
|
||||
- model: Stable Diffusion XL
|
||||
mad_tag: pyt_huggingface_stable_diffusion_xl_2k_lora_finetuning
|
||||
model_repo: SDXL
|
||||
url: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
|
||||
precision: BF16
|
||||
training_modes: [finetune_lora]
|
||||
- group: Flux
|
||||
tag: flux
|
||||
models:
|
||||
|
||||
@@ -122,8 +122,8 @@ drivers.
|
||||
git clone https://github.com/ROCm/MAD.git
|
||||
cd MAD/docker
|
||||
docker build \
|
||||
-t sglang_dissag_pd_image \
|
||||
-f sglang_dissag_inference.ubuntu.amd.Dockerfile .
|
||||
-t sglang_disagg_pd_image \
|
||||
-f sglang_disagg_inference.ubuntu.amd.Dockerfile .
|
||||
|
||||
Benchmarking
|
||||
============
|
||||
@@ -132,16 +132,16 @@ The `<https://github.com/ROCm/MAD/tree/develop/scripts/sglang_dissag>`__
|
||||
repository contains scripts to launch SGLang inference with prefill/decode
|
||||
disaggregation via Mooncake for supported models.
|
||||
|
||||
* `scripts/sglang_dissag/run_xPyD_models.slurm <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_dissag/run_xPyD_models.slurm>`__
|
||||
* `scripts/sglang_dissag/run_xPyD_models.slurm <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_disagg/run_xPyD_models.slurm>`__
|
||||
-- the main Slurm batch script to launch Docker containers on all nodes using ``sbatch`` or ``salloc``.
|
||||
|
||||
* `scripts/sglang_dissag/sglang_disagg_server.sh <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_dissag/sglang_disagg_server.sh>`__
|
||||
* `scripts/sglang_dissag/sglang_disagg_server.sh <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_disagg/sglang_disagg_server.sh>`__
|
||||
-- the entrypoint script that runs inside each container to start the correct service -- proxy, prefill, or decode.
|
||||
|
||||
* `scripts/sglang_dissag/benchmark_xPyD.sh <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_dissag/benchmark_xPyD.sh>`__
|
||||
* `scripts/sglang_dissag/benchmark_xPyD.sh <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_disagg/benchmark_xPyD.sh>`__
|
||||
-- the benchmark script to run the GSM8K accuracy benchmark and the SGLang benchmarking tool for performance measurement.
|
||||
|
||||
* `scripts/sglang_dissag/benchmark_parser.py <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_dissag/benchmark_parser.py>`__
|
||||
* `scripts/sglang_dissag/benchmark_parser.py <https://github.com/ROCm/MAD/blob/develop/scripts/sglang_disagg/benchmark_parser.py>`__
|
||||
-- the log parser script to be run on the concurrency benchmark log file to generate tabulated data.
|
||||
|
||||
Launch the service
|
||||
@@ -163,10 +163,10 @@ allocated nodes.
|
||||
# Clone the MAD repo if you haven't already and
|
||||
# navigate to the scripts directory
|
||||
git clone https://github.com/ROCm/MAD.git
|
||||
cd MAD/scripts/sglang_dissag/
|
||||
cd MAD/scripts/sglang_disagg/
|
||||
|
||||
# Slurm sbatch run command
|
||||
export DOCKER_IMAGE_NAME=sglang_dissag_pd_image
|
||||
export DOCKER_IMAGE_NAME=sglang_disagg_pd_image
|
||||
export xP=<num_prefill_nodes>
|
||||
export yD=<num_decode_nodes>
|
||||
export MODEL_NAME={{ model.model_repo }}
|
||||
|
||||
@@ -406,8 +406,6 @@ benchmark results:
|
||||
Further reading
|
||||
===============
|
||||
|
||||
- See the ROCm/maxtext benchmarking README at `<https://github.com/ROCm/maxtext/blob/main/benchmarks/gpu-rocm/readme.md>`__.
|
||||
|
||||
- To learn more about MAD and the ``madengine`` CLI, see the `MAD usage guide <https://github.com/ROCm/MAD?tab=readme-ov-file#usage-guide>`__.
|
||||
|
||||
- To learn more about system settings and management practices to configure your system for
|
||||
|
||||
@@ -56,7 +56,7 @@ vary by model -- select one to get started.
|
||||
<div class="col-2 me-1 px-2 model-param-head">Model</div>
|
||||
<div class="row col-10 pe-0">
|
||||
{% for model_group in model_groups %}
|
||||
<div class="col-3 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
|
||||
<div class="col-4 px-2 model-param" data-param-k="model-group" data-param-v="{{ model_group.tag }}" tabindex="0">{{ model_group.group }}</div>
|
||||
{% endfor %}
|
||||
</div>
|
||||
</div>
|
||||
@@ -150,9 +150,6 @@ doesn’t test configurations and run conditions outside those described.
|
||||
Run training
|
||||
============
|
||||
|
||||
Run training
|
||||
============
|
||||
|
||||
.. datatemplate:yaml:: /data/how-to/rocm-for-ai/training/pytorch-training-benchmark-models.yaml
|
||||
|
||||
{% set unified_docker = data.dockers[0] %}
|
||||
@@ -164,6 +161,12 @@ Run training
|
||||
|
||||
.. tab-item:: MAD-integrated benchmarking
|
||||
|
||||
{% for model_group in model_groups %}
|
||||
{% for model in model_group.models %}
|
||||
|
||||
The following run command is tailored to {{ model.model }}.
|
||||
See :ref:`amd-pytorch-training-model-support` to switch to another available model.
|
||||
|
||||
1. Clone the ROCm Model Automation and Dashboarding (`<https://github.com/ROCm/MAD>`__) repository to a local
|
||||
directory and install the required packages on the host machine.
|
||||
|
||||
@@ -173,9 +176,6 @@ Run training
|
||||
cd MAD
|
||||
pip install -r requirements.txt
|
||||
|
||||
{% for model_group in model_groups %}
|
||||
{% for model in model_group.models %}
|
||||
|
||||
.. container:: model-doc {{ model.mad_tag }}
|
||||
|
||||
2. For example, use this command to run the performance benchmark test on the {{ model.model }} model
|
||||
@@ -199,6 +199,15 @@ Run training
|
||||
|
||||
.. tab-item:: Standalone benchmarking
|
||||
|
||||
{% for model_group in model_groups %}
|
||||
{% for model in model_group.models %}
|
||||
|
||||
The following commands are tailored to {{ model.model }}.
|
||||
See :ref:`amd-pytorch-training-model-support` to switch to another available model.
|
||||
|
||||
{% endfor %}
|
||||
{% endfor %}
|
||||
|
||||
.. rubric:: Download the Docker image and required packages
|
||||
|
||||
1. Use the following command to pull the Docker image from Docker Hub.
|
||||
|
||||
Reference in New Issue
Block a user