Compare commits

..

17 Commits

Author SHA1 Message Date
Alex Xu
9e8116f85c bump rocm-docs-core version to 1.26.0 2025-10-14 11:38:29 -04:00
Pratik Basyal
b326232685 ROCm Software Stack image for 6.4.0 updated (#5112) (#5114) 2025-07-29 09:45:03 -04:00
Pratik Basyal
b6fa01a9ae HIP deprecation notice blog link updated (#5029) 2025-07-10 10:15:05 -07:00
Pratik Basyal
7c920f8070 ROCm for HPC table update for 6.4.0 (#5015)
* 6.4.0 updates synced

* Minor change
2025-07-09 13:39:18 -04:00
yugang-amd
2a7a9a5540 Fix broken link for AMDGPU installer (#4988) 2025-07-02 10:05:28 -04:00
Pratik Basyal
1b826ed8cc KMD UMD support footnote update ROCm 640 (#4973)
* KMD UMD support footnote update ROCm 640

* Histotical footnote
2025-06-26 15:00:02 -04:00
yugang-amd
8a19d34f00 remove broken xref (#4941) 2025-06-18 10:16:19 -04:00
Peter Park
00089f5dec [docs/6.4.0] Link to specific ROCm/vLLM readme in inference/vllm-benchmark.rst (#4921)
update url
2025-06-13 13:49:39 -04:00
Pratik Basyal
d90c653c08 KMD UMD version updated for 6.4.0 (#4880) 2025-06-04 06:53:29 -04:00
randyh62
cbf6793ed9 Add reference to HIP 7.0 is coming blog for upcoming changes (#4861) 2025-05-30 15:36:44 -07:00
yugang-amd
6cf88c3f3e Update SGPR for RDNA3 and RDNA2 series (#4814) 2025-05-27 15:13:01 -04:00
yugang-amd
91971e94cf Merge pull request #4777 from yugang-amd/rocshmem-xref-2
update rocSHMEM xrefs
2025-05-22 15:13:59 -04:00
yugang-amd
ffae30017b update rocSHMEM xrefs 2025-05-22 13:35:45 -04:00
randyh62
1cf941f3b5 Update RELEASE.md (#4746)
* Update RELEASE.md

Add one item to Optimized and two items to Upcoming Changes for HIP

* Update RELEASE.md
2025-05-15 15:41:45 -07:00
Peter Park
cd5bb03205 docs: Add system health check doc under ROCm for AI (#4736) (#4737)
(cherry picked from commit 0a77e7b3a5)

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
2025-05-13 16:09:36 -04:00
Peter Park
7380c89985 docs: Add system health check doc under ROCm for AI (#4736)
* add initial draft

* add to toc and install page

* update wording

* improve documentation structure

* resturcture and expand content

* add to training section

* add to conf.py article_pages

* Update docs/how-to/rocm-for-ai/includes/system-health-benchmarks.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* Update docs/how-to/rocm-for-ai/includes/system-health-benchmarks.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* update wordlist.txt

* Update docs/how-to/rocm-for-ai/includes/system-health-benchmarks.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* inference --> AI workloads

* udpate toc

* update article_pages in conf.py

* Update system validation notes in training docs

* fix links in prerequisite-system-validation

* wording

* add note

* consistency

* remove extra files

* fix links

* add links to training index page

---------

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
(cherry picked from commit 0a77e7b3a5)
2025-05-13 15:55:36 -04:00
Istvan Kiss
165ea54e12 Jax and PyTorch compatibility page update 6.4 (#4732)
* JAX compatibility page upate (#4727)

* Fix compatibility list (#4731)

* Pytorch compatibility page update

* Fix unsupported section structure on JAX  (#4733)
2025-05-13 18:24:19 +02:00
24 changed files with 340 additions and 180 deletions

View File

@@ -34,6 +34,7 @@ Autocast
BARs
BLAS
BMC
BabelStream
Blit
Blockwise
Bluefield
@@ -138,6 +139,7 @@ GDR
GDS
GEMM
GEMMs
GFLOPS
GFortran
GFXIP
Gemma
@@ -641,6 +643,7 @@ hipSPARSELt
hipTensor
hipamd
hipblas
hipcc
hipcub
hipfft
hipfort

View File

@@ -260,7 +260,7 @@ Click {fab}`github` to go to the component's source code on GitHub.
<td><a href="https://github.com/ROCm/rccl"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://github.com/ROCm/rocSHMEM">rocSHMEM</a></td>
<td><a href="https://rocm.docs.amd.com/projects/rocSHMEM/en/docs-6.4.0/index.html">rocSHMEM</a></td>
<td>2.0.0</td>
<td><a href="https://github.com/ROCm/rocSHMEM"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
@@ -777,7 +777,7 @@ and in-depth descriptions.
#### Optimized
* `hipGraphLaunch` parallelism is improved for complex data-parallel graphs.
* Make the round-robin queue selection in command scheduling. For multi-streams execution, HSA queue from null stream lock is freed and won't occupy the queue ID after the kernel in the stream is finished.
* Round-robin queue mechanism is updated for command scheduling. For multi-streams execution, HSA queue from null stream lock is freed and won't occupy the queue ID after the kernel in the stream is finished.
* The HIP runtime doesn't free bitcode object before code generation. It adds a cache, which allows compiled code objects to be reused instead of recompiling. This improves performance on multi-GPU systems.
* Runtime now uses unified copy approach:
@@ -786,6 +786,11 @@ and in-depth descriptions.
- The default environment variable `GPU_FORCE_BLIT_COPY_SIZE` is set to `16`, which limits the kernel copy to sizes less than 16 KB, while copies larger than that would be handled by `SDMA` engine.
- Blit code is refactored, and ASAN instrumentation is cleaned up.
* HIP runtime uses signals without interrupts:
- In active wait mode, uses signals without interrupts by default.
- Only when a callback is required, switches to the interrupts.
#### Resolved issues
* Out-of-memory error on Microsoft Windows. When the user calls `hipMalloc` for device memory allocation while specifying a size larger than the available device memory, the HIP runtime fixes the error in the API implementation, allocating the available device memory plus system memory (shared virtual memory).
@@ -796,13 +801,15 @@ and in-depth descriptions.
The following lists the backward incompatible changes planned for upcoming major ROCm releases.
* Signature changes in APIs to correspond with NVIDIA CUDA APIs,
* Signature changes in APIs to match corresponding CUDA APIs,
- `hiprtcCreateProgram`
- `hiprtcCompileProgram`
- `hipCtxGetApiVersion`
* Behavior of `hipPointerGetAttributes` is changed to match corresponding CUDA API in version 11 and later releases.
* Behavior of `hipFree` is changed to match corresponding CUDA API `cudaFree`.
* HIP vector constructor changes for `hipComplex`.
* Return error/value code updates in the following hip APIs to match the corresponding CUDA APIs,
- `hipModuleLaunchKernel`
@@ -1763,4 +1770,5 @@ There are a number of upcoming changes planned for HIP runtime API in an upcomin
that are not backward compatible with prior releases. Most of these changes increase
alignment between HIP and CUDA APIs or behavior. Some of the upcoming changes are to
clean up header files, remove namespace collision, and have a clear separation between
`hipRTC` and HIP runtime. For more information refer to [HIP Upcoming changes](#hip-6-4-0).
`hipRTC` and HIP runtime. For more information, see [HIP Upcoming changes](#hip-6-4-0)
or [HIP 7.0 Is Coming: What You Need to Know to Stay Ahead](https://rocm.blogs.amd.com/ecosystems-and-partners/transition-to-hip-7.0-blog/README.html).

View File

@@ -37,7 +37,7 @@ ROCm Version,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5, 6.1.2
CUB,2.5.0,2.3.2,2.3.2,2.3.2,2.3.2,2.2.0,2.2.0,2.2.0,2.2.0,2.1.0,2.1.0,2.1.0,2.1.0,2.0.1,2.0.1
,,,,,,,,,,,,,,,
KMD & USER SPACE [#kfd_support-past-60]_,.. _kfd-userspace-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,
KMD versions,"6.4.x, 6.3.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x"
:doc:`KMD versions <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x","6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x"
,,,,,,,,,,,,,,,
ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,
:doc:`Composable Kernel <composable_kernel:index>`,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0,1.1.0
@@ -52,7 +52,7 @@ ROCm Version,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0, 6.1.5, 6.1.2
,,,,,,,,,,,,,,,
COMMUNICATION,.. _commlibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,
:doc:`RCCL <rccl:index>`,2.22.3,2.21.5,2.21.5,2.21.5,2.21.5,2.20.5,2.20.5,2.20.5,2.20.5,2.18.6,2.18.6,2.18.6,2.18.6,2.18.3,2.18.3
`rocSHMEM <https://github.com/ROCm/rocSHMEM>`_,2.0.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
:doc:`rocSHMEM <rocSHMEM:index>`,2.0.0,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A
,,,,,,,,,,,,,,,
MATH LIBS,.. _mathlibs-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,
`half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0,1.12.0
1 ROCm Version 6.4.0 6.3.3 6.3.2 6.3.1 6.3.0 6.2.4 6.2.2 6.2.1 6.2.0 6.1.5 6.1.2 6.1.1 6.1.0 6.0.2 6.0.0
37 CUB 2.5.0 2.3.2 2.3.2 2.3.2 2.3.2 2.2.0 2.2.0 2.2.0 2.2.0 2.1.0 2.1.0 2.1.0 2.1.0 2.0.1 2.0.1
38
39 KMD & USER SPACE [#kfd_support-past-60]_ .. _kfd-userspace-support-compatibility-matrix-past-60:
40 KMD versions :doc:`KMD versions <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>` 6.4.x, 6.3.x 6.4.x, 6.3.x, 6.2.x, 6.1.x 6.4.x, 6.3.x, 6.2.x, 6.1.x 6.4.x, 6.3.x, 6.2.x, 6.1.x 6.4.x, 6.3.x, 6.2.x, 6.1.x 6.4.x, 6.3.x, 6.2.x, 6.1.x 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x 6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x, 5.7.x 6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x 6.2.x, 6.1.x, 6.0.x, 5.7.x, 5.6.x
41
42 ML & COMPUTER VISION .. _mllibs-support-compatibility-matrix-past-60:
43 :doc:`Composable Kernel <composable_kernel:index>` 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0 1.1.0
52
53 COMMUNICATION .. _commlibs-support-compatibility-matrix-past-60:
54 :doc:`RCCL <rccl:index>` 2.22.3 2.21.5 2.21.5 2.21.5 2.21.5 2.20.5 2.20.5 2.20.5 2.20.5 2.18.6 2.18.6 2.18.6 2.18.6 2.18.3 2.18.3
55 `rocSHMEM <https://github.com/ROCm/rocSHMEM>`_ :doc:`rocSHMEM <rocSHMEM:index>` 2.0.0 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
56
57 MATH LIBS .. _mathlibs-support-compatibility-matrix-past-60:
58 `half <https://github.com/ROCm/half>`_ 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0 1.12.0

View File

@@ -62,7 +62,7 @@ compatibility and system requirements.
CUB,2.5.0,2.3.2,2.2.0
,,,
KMD & USER SPACE [#kfd_support]_,.. _kfd-userspace-support-compatibility-matrix:,,
KMD versions,"6.4.x, 6.3.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x"
:doc:`KMD versions <rocm-install-on-linux:reference/user-kernel-space-compat-matrix>`,"6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x","6.4.x, 6.3.x, 6.2.x, 6.1.x, 6.0.x"
,,,
ML & COMPUTER VISION,.. _mllibs-support-compatibility-matrix:,,
:doc:`Composable Kernel <composable_kernel:index>`,1.1.0,1.1.0,1.1.0
@@ -77,7 +77,7 @@ compatibility and system requirements.
,,,
COMMUNICATION,.. _commlibs-support-compatibility-matrix:,,
:doc:`RCCL <rccl:index>`,2.22.3,2.21.5,2.20.5
`rocSHMEM <https://github.com/ROCm/rocSHMEM>`_ ,2.0.0,N/A,N/A
:doc:`rocSHMEM <rocSHMEM:index>`,2.0.0,N/A,N/A
,,,
MATH LIBS,.. _mathlibs-support-compatibility-matrix:,,
`half <https://github.com/ROCm/half>`_ ,1.12.0,1.12.0,1.12.0
@@ -151,7 +151,7 @@ compatibility and system requirements.
.. [#mi300x] Oracle Linux and Azure Linux are supported only on AMD Instinct MI300X.
.. [#single-node] Debian 12 is supported only on AMD Instinct MI300X for single-node functionality.
.. [#mi300_620] **For ROCm 6.2.0** - MI300X (gfx942) is supported on listed operating systems *except* Ubuntu 22.04.5 [6.8 HWE] and Ubuntu 22.04.4 [6.5 HWE].
.. [#kfd_support] Starting from ROCm 6.4.0, forward and backward compatibility between the AMD Kernel-mode GPU Driver (KMD) and its user space software is provided up to a year apart (assuming hardware support is available in both). For earlier ROCm releases, the compatibility is provided for +/- 2 releases. These are the compatibility combinations that are currently supported.
.. [#kfd_support] As of ROCm 6.4.0, forward and backward compatibility between the AMD Kernel-mode GPU Driver (KMD) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The tested user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and kernel-space support matrix <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html>`_.
.. [#ROCT-rocr] Starting from ROCm 6.3.0, the ROCT Thunk Interface is included as part of the ROCr runtime package.
.. _OS-kernel-versions:
@@ -229,5 +229,5 @@ Expand for full historical view of:
.. [#mi300_610-past-60] **For ROCm 6.1.0** - MI300A (gfx942) is supported on Ubuntu 22.04.4, RHEL 9.4, RHEL 9.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.4.
.. [#mi300_602-past-60] **For ROCm 6.0.2** - MI300A (gfx942) is supported on Ubuntu 22.04.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.3.
.. [#mi300_600-past-60] **For ROCm 6.0.0** - MI300A (gfx942) is supported on Ubuntu 22.04.3, RHEL 8.9, and SLES 15 SP5. MI300X (gfx942) is only supported on Ubuntu 22.04.3.
.. [#kfd_support-past-60] Starting from ROCm 6.4.0, forward and backward compatibility between the AMD Kernel-mode GPU Driver (KMD) and its user space software is provided up to a year apart (assuming hardware support is available in both). For earlier ROCm releases, the compatibility is provided for +/- 2 releases. These are the compatibility combinations that are currently supported.
.. [#kfd_support-past-60] As of ROCm 6.4.0, forward and backward compatibility between the AMD Kernel-mode GPU Driver (KMD) and its user space software is provided up to a year apart. For earlier ROCm releases, the compatibility is provided for +/- 2 releases. The tested user space versions on this page were accurate as of the time of initial ROCm release. For the most up-to-date information, see the latest version of this information at `User and kernel-space support matrix <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/user-kernel-space-compat-matrix.html>`_.
.. [#ROCT-rocr-past-60] Starting from ROCm 6.3.0, the ROCT Thunk Interface is included as part of the ROCr runtime package.

View File

@@ -51,6 +51,8 @@ article_pages = [
{"file": "how-to/deep-learning-rocm", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/index", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/install", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/system-health-check", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/index", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/training/train-a-model", "os": ["linux"]},
@@ -67,7 +69,6 @@ article_pages = [
{"file": "how-to/rocm-for-ai/fine-tuning/multi-gpu-fine-tuning-and-inference", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/index", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/install", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/hugging-face-models", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/llm-inference-frameworks", "os": ["linux"]},
{"file": "how-to/rocm-for-ai/inference/vllm-benchmark", "os": ["linux"]},

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.2 MiB

After

Width:  |  Height:  |  Size: 1.1 MiB

View File

@@ -62,47 +62,52 @@ PyTorch inference performance testing
{% endfor %}
{% endfor %}
Getting started
===============
System validation
=================
Use the following procedures to reproduce the benchmark results on an
MI300X series accelerator with the prebuilt PyTorch Docker image.
Before running AI workloads, it's important to validate that your AMD hardware is configured
correctly and performing optimally.
.. _pytorch-benchmark-get-started:
To optimize performance, disable automatic NUMA balancing. Otherwise, the GPU
might hang until the periodic balancing is finalized. For more information,
see the :ref:`system validation steps <rocm-for-ai-system-optimization>`.
1. Disable NUMA auto-balancing.
.. code-block:: shell
To optimize performance, disable automatic NUMA balancing. Otherwise, the GPU
might hang until the periodic balancing is finalized. For more information,
see :ref:`AMD Instinct MI300X system optimization <mi300x-disable-numa>`.
# disable automatic NUMA balancing
sh -c 'echo 0 > /proc/sys/kernel/numa_balancing'
# check if NUMA balancing is disabled (returns 0 if disabled)
cat /proc/sys/kernel/numa_balancing
0
.. code-block:: shell
To test for optimal performance, consult the recommended :ref:`System health benchmarks
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
system's configuration.
# disable automatic NUMA balancing
sh -c 'echo 0 > /proc/sys/kernel/numa_balancing'
# check if NUMA balancing is disabled (returns 0 if disabled)
cat /proc/sys/kernel/numa_balancing
0
Pull the Docker image
=====================
.. container:: model-doc pyt_chai1_inference
2. Use the following command to pull the `ROCm PyTorch Docker image <https://hub.docker.com/layers/rocm/pytorch/rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0_triton_llvm_reg_issue/images/sha256-b736a4239ab38a9d0e448af6d4adca83b117debed00bfbe33846f99c4540f79b>`_ from Docker Hub.
Use the following command to pull the `ROCm PyTorch Docker image <https://hub.docker.com/layers/rocm/pytorch/rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0_triton_llvm_reg_issue/images/sha256-b736a4239ab38a9d0e448af6d4adca83b117debed00bfbe33846f99c4540f79b>`_ from Docker Hub.
.. code-block:: shell
.. code-block:: shell
docker pull rocm/pytorch:rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0_triton_llvm_reg_issue
docker pull rocm/pytorch:rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0_triton_llvm_reg_issue
.. note::
.. note::
The Chai-1 benchmark uses a specifically selected Docker image using ROCm 6.2.3 and PyTorch 2.3.0 to address an accuracy issue.
The Chai-1 benchmark uses a specifically selected Docker image using ROCm 6.2.3 and PyTorch 2.3.0 to address an accuracy issue.
.. container:: model-doc pyt_clip_inference
2. Use the following command to pull the `ROCm PyTorch Docker image <https://hub.docker.com/layers/rocm/pytorch/latest/images/sha256-05b55983e5154f46e7441897d0908d79877370adca4d1fff4899d9539d6c4969>`_ from Docker Hub.
Use the following command to pull the `ROCm PyTorch Docker image <https://hub.docker.com/layers/rocm/pytorch/latest/images/sha256-05b55983e5154f46e7441897d0908d79877370adca4d1fff4899d9539d6c4969>`_ from Docker Hub.
.. code-block:: shell
.. code-block:: shell
docker pull rocm/pytorch:latest
docker pull rocm/pytorch:latest
.. _pytorch-benchmark-get-started:
Benchmarking
============

View File

@@ -109,37 +109,39 @@ vLLM inference performance testing
==================================
For information on experimental features and known issues related to ROCm optimization efforts on vLLM,
see the developer's guide at `<https://github.com/ROCm/vllm/blob/main/docs/dev-docker/README.md>`__.
see the developer's guide at `<https://github.com/ROCm/vllm/tree/7a9f58aae0e7215a5f3dccde60e35072c41656c2/docs/dev-docker>`__.
Getting started
===============
System validation
=================
Use the following procedures to reproduce the benchmark results on an
MI300X accelerator with the prebuilt vLLM Docker image.
Before running AI workloads, it's important to validate that your AMD hardware is configured
correctly and performing optimally.
.. _vllm-benchmark-get-started:
To optimize performance, disable automatic NUMA balancing. Otherwise, the GPU
might hang until the periodic balancing is finalized. For more information,
see the :ref:`system validation steps <rocm-for-ai-system-optimization>`.
1. Disable NUMA auto-balancing.
.. code-block:: shell
To optimize performance, disable automatic NUMA balancing. Otherwise, the GPU
might hang until the periodic balancing is finalized. For more information,
see :ref:`AMD Instinct MI300X system optimization <mi300x-disable-numa>`.
# disable automatic NUMA balancing
sh -c 'echo 0 > /proc/sys/kernel/numa_balancing'
# check if NUMA balancing is disabled (returns 0 if disabled)
cat /proc/sys/kernel/numa_balancing
0
.. code-block:: shell
To test for optimal performance, consult the recommended :ref:`System health benchmarks
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
system's configuration.
# disable automatic NUMA balancing
sh -c 'echo 0 > /proc/sys/kernel/numa_balancing'
# check if NUMA balancing is disabled (returns 0 if disabled)
cat /proc/sys/kernel/numa_balancing
0
Pull the Docker image
=====================
2. Download the `ROCm vLLM Docker image <{{ unified_docker.docker_hub_url }}>`_.
Download the `ROCm vLLM Docker image <{{ unified_docker.docker_hub_url }}>`_.
Use the following command to pull the Docker image from Docker Hub.
Use the following command to pull the Docker image from Docker Hub.
.. code-block:: shell
.. code-block:: shell
docker pull {{ unified_docker.pull_tag }}
docker pull {{ unified_docker.pull_tag }}
Benchmarking
============

View File

@@ -28,9 +28,9 @@ ROCm supports multiple :doc:`installation methods <rocm-install-on-linux:install
* :doc:`Using your Linux distribution's package manager <rocm-install-on-linux:install/install-methods/package-manager-index>`
* :doc:`Using the AMDGPU installer <rocm-install-on-linux:install/amdgpu-install>`
* :doc:`Using the AMDGPU installer <rocm-install-on-linux:install/install-methods/amdgpu-installer-index>`
* :ref:`Multi-version installation <rocm-install-on-linux:installation-types>`.
* :ref:`Multi-version installation <rocm-install-on-linux:installation-types>`
.. grid:: 1
@@ -59,4 +59,8 @@ images with the framework pre-installed.
* :doc:`JAX for ROCm <rocm-install-on-linux:install/3rd-party/jax-install>`
The sections that follow in :doc:`Training a model <../training/train-a-model>` are geared for a ROCm with PyTorch installation.
Next steps
==========
After installing ROCm and your desired ML libraries -- and before running AI workloads -- conduct system health benchmarks
to test the optimal performance of your AMD hardware. See :doc:`system-health-check` to get started.

View File

@@ -0,0 +1,104 @@
.. meta::
:description: System health checks with RVS, RCCL tests, BabelStream, and TransferBench to validate AMD hardware performance running AI workloads.
:keywords: gpu, accelerator, system, health, validation, bench, perf, performance, rvs, rccl, babel, mi300x, mi325x, flops, bandwidth, rbt, training, inference
.. _rocm-for-ai-system-health-bench:
************************
System health benchmarks
************************
Before running AI workloads, it is important to validate that your AMD hardware is configured correctly and is performing optimally. This topic outlines several system health benchmarks you can use to test key aspects like GPU compute capabilities (FLOPS), memory bandwidth, and interconnect performance. Many of these tests are part of the ROCm Validation Suite (RVS).
ROCm Validation Suite (RVS) tests
=================================
RVS provides a collection of tests, benchmarks, and qualification tools, each
targeting a specific subsystem of the system under test. It includes tests for
GPU stress and memory bandwidth.
.. _healthcheck-install-rvs:
Install ROCm Validation Suite
-----------------------------
To get started, install RVS. For example, on an Ubuntu system with ROCm already
installed, run the following command:
.. code-block:: shell
sudo apt update
sudo apt install rocm-validation-suite
See the `ROCm Validation Suite installation instructions <https://rocm.docs.amd.com/projects/ROCmValidationSuite/en/latest/install/installation.html>`_,
and `System validation tests <https://instinct.docs.amd.com/projects/system-acceptance/en/latest/mi300x/system-validation.html#system-validation-tests>`_
in the Instinct documentation for more detailed instructions.
Benchmark, stress, and qualification tests
------------------------------------------
The GPU stress test runs various GEMM computations as workloads to stress the GPU FLOPS performance and check whether it
meets the configured target GFLOPS.
Run the benchmark, stress, and qualification tests included with RVS. See the `Benchmark, stress, qualification
<https://instinct.docs.amd.com/projects/system-acceptance/en/latest/mi300x/system-validation.html#benchmark-stress-qualification>`_
section of the Instinct documentation for usage instructions.
BabelStream test
----------------
BabelStream is a synthetic GPU benchmark based on the STREAM benchmark for
CPUs, measuring memory transfer rates to and from global device memory.
BabelStream tests are included with the RVS package as part of the `BABEL module
<https://rocm.docs.amd.com/projects/ROCmValidationSuite/en/latest/conceptual/rvs-modules.html#babel-benchmark-test-babel-module>`_.
For more information, see `Performance benchmarking
<https://instinct.docs.amd.com/projects/system-acceptance/en/latest/mi300x/performance-bench.html#babelstream-benchmarking-results>`_
in the Instinct documentation.
RCCL tests
==========
The ROCm Communication Collectives Library (RCCL) enables efficient multi-GPU
communication. The `<https://github.com/ROCm/rccl-tests>`__ suite benchmarks
the performance and verifies the correctness of these collective operations.
This helps ensure optimal scaling for multi-accelerator tasks.
1. To get started, build RCCL-tests using the official instructions in the README at
`<https://github.com/ROCm/rccl-tests?tab=readme-ov-file#build>`__ or use the
following commands:
.. code-block:: shell
git clone https://github.com/ROCm/rccl-tests.git
cd rccl-tests
make
2. Run the suggested RCCL tests -- see `RCCL benchmarking
<https://instinct.docs.amd.com/projects/system-acceptance/en/latest/mi300x/performance-bench.html#rccl-benchmarking-results>`_
in the Instinct performance benchmarking documentation for instructions.
TransferBench test
==================
TransferBench is a standalone utility for benchmarking simultaneous data
transfer performance between various devices in the system, including
CPU-to-GPU and GPU-to-GPU (peer-to-peer). This helps identify potential
bottlenecks in data movement between the host system and the GPUs, or between
GPUs, which can impact end-to-end latency.
.. _healthcheck-install-transferbench:
1. To get started, use the instructions in the `TransferBench documentation
<https://rocm.docs.amd.com/projects/TransferBench/en/latest/install/install.html#install-transferbench>`_
or use the following commands:
.. code:: shell
git clone https://github.com/ROCm/TransferBench.git
cd TransferBench
CC=hipcc make
2. Run the suggested TransferBench tests -- see `TransferBench benchmarking
<https://instinct.docs.amd.com/projects/system-acceptance/en/latest/mi300x/performance-bench.html#transferbench-benchmarking-results>`_
in the Instinct performance benchmarking documentation for instructions.

View File

@@ -79,11 +79,18 @@ across different input sequences. Support for packed input format is planned for
System validation
=================
If you have already validated your system settings, including NUMA
auto-balancing, skip this step. Otherwise, complete the :ref:`system validation
and optimization steps <train-a-model-system-validation>` to set up your system
Before running AI workloads, it's important to validate that your AMD hardware is configured
correctly and performing optimally.
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
before starting training.
To test for optimal performance, consult the recommended :ref:`System health benchmarks
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
system's configuration.
Environment setup
=================
@@ -175,8 +182,8 @@ with RDMA, skip ahead to :ref:`amd-maxtext-download-docker`.
.. _amd-maxtext-download-docker:
Download the Docker image
-------------------------
Pull the Docker image
---------------------
1. Use the following command to pull the Docker image from Docker Hub.

View File

@@ -103,11 +103,18 @@ popular AI models.
System validation
=================
If you have already validated your system settings, including NUMA
auto-balancing, skip this step. Otherwise, complete the :ref:`system validation
and optimization steps <train-a-model-system-validation>` to set up your system
Before running AI workloads, it's important to validate that your AMD hardware is configured
correctly and performing optimally.
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
before starting training.
To test for optimal performance, consult the recommended :ref:`System health benchmarks
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
system's configuration.
.. _mi300x-amd-megatron-lm-training:
Environment setup

View File

@@ -34,11 +34,18 @@ for MPT-30B with access to detailed logs and performance metrics.
System validation
=================
If you have already validated your system settings, including NUMA
auto-balancing, skip this step. Otherwise, complete the :ref:`system validation
and optimization steps <train-a-model-system-validation>` to set up your system
Before running AI workloads, it's important to validate that your AMD hardware is configured
correctly and performing optimally.
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
before starting training.
To test for optimal performance, consult the recommended :ref:`System health benchmarks
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
system's configuration.
Getting started
===============

View File

@@ -77,11 +77,18 @@ popular AI models.
System validation
=================
If you have already validated your system settings, including NUMA
auto-balancing, skip this step. Otherwise, complete the :ref:`system validation
and optimization steps <train-a-model-system-validation>` to set up your system
Before running AI workloads, it's important to validate that your AMD hardware is configured
correctly and performing optimally.
If you have already validated your system settings, including aspects like NUMA auto-balancing, you
can skip this step. Otherwise, complete the procedures in the :ref:`System validation and
optimization <rocm-for-ai-system-optimization>` guide to properly configure your system settings
before starting training.
To test for optimal performance, consult the recommended :ref:`System health benchmarks
<rocm-for-ai-system-health-bench>`. This suite of tests will help you verify and fine-tune your
system's configuration.
This Docker image is optimized for specific model configurations outlined
below. Performance can vary for other training workloads, as AMD
doesnt validate configurations and run conditions outside those described.

View File

@@ -21,8 +21,12 @@ In this guide, you'll learn about:
- Training a model
- :doc:`Train a model with Megatron-LM <benchmark-docker/megatron-lm>`
- :doc:`With Megatron-LM <benchmark-docker/megatron-lm>`
- :doc:`Train a model with PyTorch <benchmark-docker/pytorch-training>`
- :doc:`With PyTorch <benchmark-docker/pytorch-training>`
- :doc:`With JAX MaxText <benchmark-docker/jax-maxtext>`
- :doc:`With LLM Foundry <benchmark-docker/mpt-llm-foundry>`
- :doc:`Scaling model training <scale-model-training>`

View File

@@ -5,12 +5,13 @@
:keywords: ROCm, AI, LLM, train, megatron, Llama, tutorial, docker, torch, pytorch, jax
.. _train-a-model-system-validation:
.. _rocm-for-ai-system-optimization:
**********************************************
Prerequisite system validation before training
**********************************************
**********************************************************
Prerequisite system validation before running AI workloads
**********************************************************
Complete the following system validation and optimization steps to set up your system before starting training.
Complete the following system validation and optimization steps to set up your system before starting training and inference.
Disable NUMA auto-balancing
---------------------------
@@ -26,7 +27,8 @@ the output is ``1``, run the following command to disable NUMA auto-balancing.
sudo sh -c 'echo 0 > /proc/sys/kernel/numa_balancing'
See :ref:`mi300x-disable-numa` for more information.
See `Disable NUMA auto-balancing <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html#disable-numa-auto-balancing>`_
in the Instinct documentation for more information.
Hardware verification with ROCm
-------------------------------
@@ -42,7 +44,8 @@ Run the command:
rocm-smi --setperfdeterminism 1900
See :ref:`mi300x-hardware-verification-with-rocm` for more information.
See `Hardware verfication for ROCm <https://instinct.docs.amd.com/projects/amdgpu-docs/en/latest/system-optimization/mi300x.html#hardware-verification-with-rocm>`_
in the Instinct documentation for more information.
RCCL Bandwidth Test for multi-node setups
-----------------------------------------

View File

@@ -76,14 +76,6 @@ Ubuntu versions.
single node workstations, multi and many-core nodes, clusters of nodes via
QMP, and classic vector computers.
* -
- `Grid <https://github.com/amd/InfinityHub-CI/tree/main/grid/>`_
- Grid is a library for lattice QCD calculations that employs a high-level data parallel
approach while using a number of techniques to target multiple types of parallelism.
The library currently supports MPI, OpenMP and short vector parallelism. The SIMD
instructions sets covered include SSE, AVX, AVX2, FMA4, IMCI and AVX512. Recent
releases expanded this support to include GPU offloading.
* -
- `MILC <https://github.com/amd/InfinityHub-CI/tree/main/milc/>`_
- The MILC Code is a set of research codes developed by MIMD Lattice Computation
@@ -237,12 +229,18 @@ Ubuntu versions.
of these applications.
* - Tools and libraries
- `ROCm with GPU-aware MPI container <https://github.com/amd/InfinityHub-CI/tree/main/base-gpu-mpi-rocm-docker>`_
- `AMD ROCm with OpenMPI container <https://github.com/amd/InfinityHub-CI/blob/main/base-gpu-mpi-rocm-docker>`_
- Base container for GPU-aware MPI with ROCm for HPC applications. This
project provides a boilerplate for building and running a Docker
container with ROCm supporting GPU-aware MPI implementations using
OpenMPI or UCX.
* -
- `AMD ROCm with MPICH container <https://github.com/amd/InfinityHub-CI/blob/main/base-mpich-rocm-docker>`_
- Base container for GPU-aware MPI with ROCm for HPC applications. This
project provides a boilerplate for building and running a Docker
container with ROCm supporting GPU-aware MPI implementations using MPICH.
* -
- `Kokkos <https://github.com/amd/InfinityHub-CI/tree/main/kokkos>`_
- Kokkos is a programming model in C++ for writing performance portable

View File

@@ -38,5 +38,5 @@ The variable parsing stops when a syntax error occurs. The erroneous set and the
These environment variables only affect ROCm software, not graphics applications.
Not all CU configurations are valid on all devices. For example, on devices where two CUs can be combined into a WGP (for kernels running in WGP mode), its not valid to disable only a single CU in a WGP. For more information about what to expect when disabling CUs, see the `Exploring AMD GPU Scheduling Details by Experimenting With “Worst Practices” <https://www.cs.unc.edu/~otternes/papers/rtsj2022.pdf>`_ paper.
Not all CU configurations are valid on all devices. For example, on devices where two CUs can be combined into a WGP (for kernels running in WGP mode), its not valid to disable only a single CU in a WGP.

View File

@@ -45,7 +45,7 @@
(communication-libraries)=
* {doc}`RCCL <rccl:index>`
* [rocSHMEM](https://github.com/ROCm/rocSHMEM)
* {doc}`rocSHMEM <rocSHMEM:index>`
:::
:::{grid-item-card} Math

View File

@@ -296,7 +296,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 768
- 16
- 32
- 11
- 0
*
@@ -314,7 +314,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 768
- 16
- 32
- 11
- 0
*
@@ -332,7 +332,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 768
- 16
- 32
- 11
- 0
*
@@ -350,7 +350,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 768
- 16
- 32
- 11
- 0
*
@@ -368,7 +368,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 768
- 16
- 32
- 11
- 0
*
@@ -386,7 +386,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 768
- 16
- 32
- 11
- 0
*
@@ -404,7 +404,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 512
- 16
- 32
- 10
- 3
*
@@ -422,7 +422,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 512
- 16
- 32
- 10
- 3
*
@@ -440,7 +440,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 512
- 16
- 32
- 10
- 3
*
@@ -519,7 +519,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 768
- 16
- 32
- 11
- 0
*
@@ -537,7 +537,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 768
- 16
- 32
- 11
- 0
*
@@ -555,7 +555,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 768
- 16
- 32
- 11
- 0
*
@@ -573,7 +573,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 768
- 16
- 32
- 11
- 0
*
@@ -591,7 +591,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 768
- 16
- 32
- 11
- 0
*
@@ -609,7 +609,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 512
- 16
- 32
- 11
- 0
*
@@ -627,7 +627,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 512
- 16
- 32
- 10
- 3
*
@@ -645,7 +645,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 512
- 16
- 32
- 10
- 3
*
@@ -663,7 +663,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 512
- 16
- 32
- 10
- 3
*
@@ -681,7 +681,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 512
- 16
- 32
- 10
- 3
*
@@ -699,7 +699,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 512
- 16
- 32
- 10
- 3
*
@@ -717,7 +717,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 512
- 16
- 32
- 10
- 3
*
@@ -735,7 +735,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 512
- 16
- 32
- 10
- 3
*
@@ -753,7 +753,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 512
- 16
- 32
- 10
- 3
*
@@ -771,7 +771,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 512
- 16
- 32
- 10
- 3
*
@@ -789,7 +789,7 @@ For more information about ROCm hardware compatibility, see the ROCm `Compatibil
- 16
- 32
- 512
- 16
- 32
- 10
- 3
*

View File

@@ -36,6 +36,10 @@ subtrees:
title: Use ROCm for AI
subtrees:
- entries:
- file: how-to/rocm-for-ai/install.rst
title: Installation
- file: how-to/rocm-for-ai/system-health-check.rst
title: System health benchmarks
- file: how-to/rocm-for-ai/training/index.rst
title: Training
subtrees:
@@ -70,8 +74,6 @@ subtrees:
title: Inference
subtrees:
- entries:
- file: how-to/rocm-for-ai/inference/install.rst
title: Installation
- file: how-to/rocm-for-ai/inference/hugging-face-models.rst
title: Run models from Hugging Face
- file: how-to/rocm-for-ai/inference/llm-inference-frameworks.rst

View File

@@ -1,4 +1,4 @@
rocm-docs-core==1.18.2
rocm-docs-core==1.26.0
sphinx-reredirects
sphinx-sitemap
sphinxcontrib.datatemplates==0.11.0

View File

@@ -8,11 +8,9 @@ accessible-pygments==0.0.5
# via pydata-sphinx-theme
alabaster==1.0.0
# via sphinx
appnope==0.1.4
# via ipykernel
asttokens==3.0.0
# via stack-data
attrs==25.3.0
attrs==25.4.0
# via
# jsonschema
# jupyter-cache
@@ -21,62 +19,62 @@ babel==2.17.0
# via
# pydata-sphinx-theme
# sphinx
beautifulsoup4==4.13.3
beautifulsoup4==4.14.2
# via pydata-sphinx-theme
breathe==4.36.0
# via rocm-docs-core
certifi==2025.1.31
certifi==2025.10.5
# via requests
cffi==1.17.1
cffi==2.0.0
# via
# cryptography
# pynacl
charset-normalizer==3.4.1
charset-normalizer==3.4.4
# via requests
click==8.1.8
click==8.3.0
# via
# jupyter-cache
# sphinx-external-toc
comm==0.2.2
comm==0.2.3
# via ipykernel
cryptography==44.0.2
cryptography==46.0.2
# via pyjwt
debugpy==1.8.14
debugpy==1.8.17
# via ipykernel
decorator==5.2.1
# via ipython
defusedxml==0.7.1
# via sphinxcontrib-datatemplates
deprecated==1.2.18
# via pygithub
docutils==0.21.2
# via
# myst-parser
# pydata-sphinx-theme
# sphinx
exceptiongroup==1.2.2
exceptiongroup==1.3.0
# via ipython
executing==2.2.0
executing==2.2.1
# via stack-data
fastjsonschema==2.21.1
fastjsonschema==2.21.2
# via
# nbformat
# rocm-docs-core
gitdb==4.0.12
# via gitpython
gitpython==3.1.44
gitpython==3.1.45
# via rocm-docs-core
idna==3.10
greenlet==3.2.4
# via sqlalchemy
idna==3.11
# via requests
imagesize==1.4.1
# via sphinx
importlib-metadata==8.6.1
importlib-metadata==8.7.0
# via
# jupyter-cache
# myst-nb
ipykernel==6.29.5
ipykernel==7.0.0
# via myst-nb
ipython==8.35.0
ipython==8.37.0
# via
# ipykernel
# myst-nb
@@ -86,9 +84,9 @@ jinja2==3.1.6
# via
# myst-parser
# sphinx
jsonschema==4.23.0
jsonschema==4.25.1
# via nbformat
jsonschema-specifications==2024.10.1
jsonschema-specifications==2025.9.1
# via jsonschema
jupyter-cache==1.0.1
# via myst-nb
@@ -96,7 +94,7 @@ jupyter-client==8.6.3
# via
# ipykernel
# nbclient
jupyter-core==5.7.2
jupyter-core==5.8.1
# via
# ipykernel
# jupyter-client
@@ -106,17 +104,17 @@ markdown-it-py==3.0.0
# via
# mdit-py-plugins
# myst-parser
markupsafe==3.0.2
markupsafe==3.0.3
# via jinja2
matplotlib-inline==0.1.7
# via
# ipykernel
# ipython
mdit-py-plugins==0.4.2
mdit-py-plugins==0.5.0
# via myst-parser
mdurl==0.1.2
# via markdown-it-py
myst-nb==1.2.0
myst-nb==1.3.0
# via rocm-docs-core
myst-parser==4.0.1
# via myst-nb
@@ -131,34 +129,33 @@ nbformat==5.10.4
# nbclient
nest-asyncio==1.6.0
# via ipykernel
packaging==24.2
packaging==25.0
# via
# ipykernel
# pydata-sphinx-theme
# sphinx
parso==0.8.4
parso==0.8.5
# via jedi
pexpect==4.9.0
# via ipython
platformdirs==4.3.7
platformdirs==4.5.0
# via jupyter-core
prompt-toolkit==3.0.50
prompt-toolkit==3.0.52
# via ipython
psutil==7.0.0
psutil==7.1.0
# via ipykernel
ptyprocess==0.7.0
# via pexpect
pure-eval==0.2.3
# via stack-data
pycparser==2.22
pycparser==2.23
# via cffi
pydata-sphinx-theme==0.15.4
pydata-sphinx-theme==0.16.1
# via
# rocm-docs-core
# sphinx-book-theme
pygithub==2.6.1
pygithub==2.8.1
# via rocm-docs-core
pygments==2.19.1
pygments==2.19.2
# via
# accessible-pygments
# ipython
@@ -166,11 +163,11 @@ pygments==2.19.1
# sphinx
pyjwt[crypto]==2.10.1
# via pygithub
pynacl==1.5.0
pynacl==1.6.0
# via pygithub
python-dateutil==2.9.0.post0
# via jupyter-client
pyyaml==6.0.2
pyyaml==6.0.3
# via
# jupyter-cache
# myst-nb
@@ -178,21 +175,21 @@ pyyaml==6.0.2
# rocm-docs-core
# sphinx-external-toc
# sphinxcontrib-datatemplates
pyzmq==26.4.0
pyzmq==27.1.0
# via
# ipykernel
# jupyter-client
referencing==0.36.2
referencing==0.37.0
# via
# jsonschema
# jsonschema-specifications
requests==2.32.3
requests==2.32.5
# via
# pygithub
# sphinx
rocm-docs-core==1.18.2
rocm-docs-core==1.26.0
# via -r requirements.in
rpds-py==0.24.0
rpds-py==0.27.1
# via
# jsonschema
# referencing
@@ -200,9 +197,9 @@ six==1.17.0
# via python-dateutil
smmap==5.0.2
# via gitdb
snowballstemmer==2.2.0
snowballstemmer==3.0.1
# via sphinx
soupsieve==2.6
soupsieve==2.8
# via beautifulsoup4
sphinx==8.1.3
# via
@@ -215,12 +212,12 @@ sphinx==8.1.3
# sphinx-copybutton
# sphinx-design
# sphinx-external-toc
# sphinx-last-updated-by-git
# sphinx-notfound-page
# sphinx-reredirects
# sphinx-sitemap
# sphinxcontrib-datatemplates
# sphinxcontrib-runcmd
sphinx-book-theme==1.1.4
sphinx-book-theme==1.1.3
# via rocm-docs-core
sphinx-copybutton==0.5.2
# via rocm-docs-core
@@ -228,11 +225,13 @@ sphinx-design==0.6.1
# via rocm-docs-core
sphinx-external-toc==1.0.1
# via rocm-docs-core
sphinx-last-updated-by-git==0.3.8
# via sphinx-sitemap
sphinx-notfound-page==1.1.0
# via rocm-docs-core
sphinx-reredirects==0.1.6
# via -r requirements.in
sphinx-sitemap==2.6.0
sphinx-sitemap==2.9.0
# via -r requirements.in
sphinxcontrib-applehelp==2.0.0
# via sphinx
@@ -250,21 +249,20 @@ sphinxcontrib-runcmd==0.2.0
# via sphinxcontrib-datatemplates
sphinxcontrib-serializinghtml==2.0.0
# via sphinx
sqlalchemy==2.0.40
sqlalchemy==2.0.44
# via jupyter-cache
stack-data==0.6.3
# via ipython
tabulate==0.9.0
# via jupyter-cache
tomli==2.2.1
tomli==2.3.0
# via sphinx
tornado==6.4.2
tornado==6.5.2
# via
# ipykernel
# jupyter-client
traitlets==5.14.3
# via
# comm
# ipykernel
# ipython
# jupyter-client
@@ -272,22 +270,22 @@ traitlets==5.14.3
# matplotlib-inline
# nbclient
# nbformat
typing-extensions==4.13.2
typing-extensions==4.15.0
# via
# beautifulsoup4
# cryptography
# exceptiongroup
# ipython
# myst-nb
# pydata-sphinx-theme
# pygithub
# referencing
# sqlalchemy
urllib3==2.4.0
urllib3==2.5.0
# via
# pygithub
# requests
wcwidth==0.2.13
wcwidth==0.2.14
# via prompt-toolkit
wrapt==1.17.2
# via deprecated
zipp==3.21.0
zipp==3.23.0
# via importlib-metadata

View File

@@ -52,7 +52,7 @@ Communication
:header: "Component", "Description"
":doc:`RCCL <rccl:index>`", "Standalone library that provides multi-GPU and multi-node collective communication primitives"
"`rocSHMEM <https://github.com/ROCm/rocSHMEM>`_", "Runtime that provides GPU-centric networking through an OpenSHMEM-like interface. This intra-kernel networking library simplifies application code complexity and enables more fine-grained communication/computation overlap than traditional host-driven networking"
":doc:`rocSHMEM <rocSHMEM:index>`", "An intra-kernel networking library that provides GPU-centric networking through an OpenSHMEM-like interface"
Math
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^