Compare commits

..

10 Commits

Author SHA1 Message Date
Matt Williams
09f2775011 Apply suggestion from @mattwill-amd 2025-12-10 09:32:53 -05:00
Matt Williams
2d0fa4ec45 Apply suggestion from @mattwill-amd 2025-12-09 17:32:30 -05:00
Matt Williams
f0ddb21aa8 Apply suggestion from @mattwill-amd 2025-12-09 17:32:16 -05:00
Matt Williams
381003e41d Apply suggestion from @mattwill-amd 2025-12-04 12:37:08 -05:00
Matt Williams
ce72a9204d Update what-is-rocm.rst 2025-12-03 14:18:09 -05:00
Matt Williams
ae7aaa3f4a Update what-is-rocm.rst 2025-12-03 14:09:38 -05:00
Matt Williams
f277667142 Apply suggestion from @mattwill-amd 2025-11-28 09:59:53 -05:00
Matt Williams
38478094a6 Apply suggestion from @mattwill-amd 2025-11-27 13:37:26 -05:00
Matt Williams
86ee6bb826 Apply suggestion from @mattwill-amd 2025-11-27 09:59:09 -05:00
Matt Williams
0cdee4f155 Adding ROCm-Optiq note to What is ROCm page
Adding a note for a link to the Optiq docs
2025-11-27 09:52:20 -05:00
7 changed files with 10 additions and 61 deletions

View File

@@ -36,7 +36,6 @@ Andrej
Arb
Autocast
autograd
Backported
BARs
BatchNorm
BLAS
@@ -203,11 +202,9 @@ GenAI
GenZ
GitHub
Gitpod
hardcoded
HBM
HCA
HGX
HLO
HIPCC
hipDataType
HIPExtension
@@ -332,7 +329,6 @@ MoEs
Mooncake
Mpops
Multicore
multihost
Multithreaded
mx
MXFP
@@ -1024,7 +1020,6 @@ uncacheable
uncorrectable
underoptimized
unhandled
unfused
uninstallation
unmapped
unsqueeze

View File

@@ -100,13 +100,12 @@ firmware, AMD GPU drivers, and the ROCm user space software.
01.25.16.03<br>
01.25.15.04
</td>
<td>
<td rowspan="2" style="vertical-align: middle;">
30.20.1<br>
30.20.0<br>
30.10.2<br>
30.10.1<br>
30.10
</td>
30.10</td>
<td rowspan="3" style="vertical-align: middle;">8.6.0.K</td>
</tr>
<tr>
@@ -115,13 +114,6 @@ firmware, AMD GPU drivers, and the ROCm user space software.
01.25.16.03<br>
01.25.15.04
</td>
<td>
30.20.1<br>
30.20.0<br>
30.10.2<br>
30.10.1<br>
30.10
</td>
</tr>
<tr>
<td>MI325X<a href="#footnote1"><sup>[1]</sup></a></td>
@@ -839,7 +831,7 @@ issues related to individual components, review the [Detailed component changes]
### RCCL performance degradation on AMD Instinct MI300X GPU with AMD Pollara AI NIC
If youre using RCCL on AMD Instinct MI300X GPUs with AMD Pollara AI NIC, you might observe performance degradation for specific collectives and message sizes. The affected collectives are `Scatter`, `AllToAll`, and `AlltoAllv`. It's recommended to avoid using RCCL packaged with ROCm 7.1.1. As a workaround, use the {fab}`github`[RCCL `develop` branch](https://github.com/ROCm/rccl/tree/develop), which contains the fix and will be included in a future ROCm release. See [GitHub issue #5717](https://github.com/ROCm/ROCm/issues/5717).
If youre using RCCL on AMD Instinct MI300X GPUs with AMD Pollara AI NIC, you might observe performance degradation for specific collectives and message sizes. The affected collectives are `Scatter`, `AllToAll`, and `AlltoAllv`. It's recommended to avoid using RCCL packaged with ROCm 7.1.1. As a workaround, use the {fab}`github`[RCCL `develop` branch](https://github.com/ROCm/rccl/tree/develop), which contains the fix and will be included in a future ROCm release.
### Segmentation fault in training models using TensorFlow 2.20.0 Docker images
@@ -847,7 +839,7 @@ Training models `tf2_tfm_resnet50_fp16_train` and `tf2_tfm_resnet50_fp32_train`
might fail with a segmentation fault when run on the TensorFlow 2.20.0 Docker
image with ROCm 7.1.1. As a workaround, use TensorFlow 2.19.x Docker image for
training the models in ROCm 7.1.1. This issue will be fixed in a future ROCm
release. See [GitHub issue #5718](https://github.com/ROCm/ROCm/issues/5718).
release.
### AMD SMI CLI triggers repeated kernel errors on GPUs with partitioning support
@@ -866,19 +858,11 @@ amdgpu 0000:15:00.0: amdgpu: renderD153 partition 1 not valid!
These repeated kernel logs can clutter the system logs and may cause
unnecessary concern about GPU health. However, this is a non-functional issue
and does not affect AMD SMI functionality or GPU performance. This issue will
be fixed in a future ROCm release. See [GitHub issue #5720](https://github.com/ROCm/ROCm/issues/5720).
be fixed in a future ROCm release.
### Excessive bad page logs in AMD GPU Driver (amdgpu)
Due to partial data corruption in the Electrically Erasable Programmable Read-Only Memory (EEPROM) and limited error handling in the AMD GPU Driver (amdgpu), excessive log output might occur when querying the reliability, availability, and serviceability (RAS) bad pages. This issue will be fixed in a future AMD GPU Driver (amdgpu) and ROCm release. See [GitHub issue #5719](https://github.com/ROCm/ROCm/issues/5719).
### Incorrect results in gemm_ex operations for rocBLAS and hipBLAS
Some `gemm_ex` operations with 8-bit input data types (`int8`, `float8`, `bfloat8`) for specific matrix dimensions (K = 1 and number of workgroups > 1) might yield incorrect results. The issue results from incorrect tailloop code that fails to consider workgroup index when calculating valid element size. The issue will be fixed in a future ROCm release. See [GitHub issue #5722](https://github.com/ROCm/ROCm/issues/5722).
### hipBLASLt performance variation for a particular FP8 GEMM operation on AMD Instinct MI325X GPUs
If youre using hipBLASLt on AMD Instinct MI325X GPUs for large FP8 GEMM operations (such as 9728x8192x65536), you might observe a noticeable performance variation. The issue is currently under investigation and will be fixed in a future ROCm release. See [GitHub issue #5734](https://github.com/ROCm/ROCm/issues/5734).
Due to partial data corruption of Electrically Erasable Programmable Read-Only Memory (EEPROM) and limited error handling in the AMD GPU Driver(amdgpu), excessive log output might result when querying the reliability, availability, and serviceability (RAS) bad pages. This issue will be fixed in a future AMD GPU Driver(amdgpu) and ROCm release.
## ROCm resolved issues

View File

@@ -269,33 +269,6 @@ For a complete and up-to-date list of JAX public modules (for example, ``jax.num
JAX API modules are maintained by the JAX project and is subject to change.
Refer to the official Jax documentation for the most up-to-date information.
Key features and enhancements for ROCm 7.1
===============================================================================
- Enabled compilation of multihost HLO runner Python bindings.
- Backported multihost HLO runner bindings and some related changes to
:code:`FunctionalHloRunner`.
- Added :code:`requirements_lock_3_12` to enable building for Python 3.12.
- Removed hardcoded NHWC convolution layout for ``fp16`` precision to address the performance drops for ``fp16`` precision on gfx12xx GPUs.
- ROCprofiler-SDK integration:
- Integrated ROCprofiler-SDK (v3) to XLA to improve profiling of GPU events,
support both time-based and step-based profiling.
- Added unit tests for :code:`rocm_collector` and :code:`rocm_tracer`.
- Added Triton unsupported conversion from ``f8E4M3FNUZ`` to ``fp16`` with
rounding mode.
- Introduced :code:`CudnnFusedConvDecomposer` to revert fused convolutions
when :code:`ConvAlgorithmPicker` fails to find a fused algorithm, and removed
unfused fallback paths from :code:`RocmFusedConvRunner`.
Key features and enhancements for ROCm 7.0
===============================================================================

View File

@@ -249,6 +249,3 @@ html_context = {
"granularity_type" : [('Coarse-grained', 'coarse-grained'), ('Fine-grained', 'fine-grained')],
"scope_type" : [('Device', 'device'), ('System', 'system')]
}
# Disable figure and table numbering
numfig = False

View File

@@ -1,4 +1,4 @@
rocm-docs-core==1.30.1
rocm-docs-core==1.29.0
sphinx-reredirects
sphinx-sitemap
sphinxcontrib.datatemplates==0.11.0

View File

@@ -187,7 +187,7 @@ requests==2.32.5
# via
# pygithub
# sphinx
rocm-docs-core==1.30.1
rocm-docs-core==1.29.0
# via -r requirements.in
rpds-py==0.29.0
# via

View File

@@ -123,8 +123,8 @@ Performance
.. note::
`ROCprof Compute Viewer <https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/amd-mainline/>`_ is a tool for visualizing and analyzing GPU thread trace data collected using :doc:`rocprofv3 <rocprofiler-sdk:index>`.
Note that `ROCprof Compute Viewer <https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/amd-mainline/>`_ is in an early access state. Running production workloads is not recommended.
- `ROCprof Compute Viewer <https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/amd-mainline/>`_ is a tool for visualizing and analyzing GPU thread trace data collected using :doc:`rocprofv3 <rocprofiler-sdk:index>`. Note that `ROCprof Compute Viewer <https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/amd-mainline/>`_ is in an early access state. Running production workloads is not recommended.
- ROCm Optiq (Beta) provides deep insights into system-level performance for applications running on the ROCm stack. It serves as the GUI to visualize traces collected by ROCm profiling tools, specifically `ROCm Systems Profiler <https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/index.html>`_.
Development
^^^^^^^^^^^