mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-10 15:18:11 -05:00
Compare commits
10 Commits
update_jax
...
mattwill-a
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
09f2775011 | ||
|
|
2d0fa4ec45 | ||
|
|
f0ddb21aa8 | ||
|
|
381003e41d | ||
|
|
ce72a9204d | ||
|
|
ae7aaa3f4a | ||
|
|
f277667142 | ||
|
|
38478094a6 | ||
|
|
86ee6bb826 | ||
|
|
0cdee4f155 |
@@ -36,7 +36,6 @@ Andrej
|
||||
Arb
|
||||
Autocast
|
||||
autograd
|
||||
Backported
|
||||
BARs
|
||||
BatchNorm
|
||||
BLAS
|
||||
@@ -203,11 +202,9 @@ GenAI
|
||||
GenZ
|
||||
GitHub
|
||||
Gitpod
|
||||
hardcoded
|
||||
HBM
|
||||
HCA
|
||||
HGX
|
||||
HLO
|
||||
HIPCC
|
||||
hipDataType
|
||||
HIPExtension
|
||||
@@ -332,7 +329,6 @@ MoEs
|
||||
Mooncake
|
||||
Mpops
|
||||
Multicore
|
||||
multihost
|
||||
Multithreaded
|
||||
mx
|
||||
MXFP
|
||||
@@ -1024,7 +1020,6 @@ uncacheable
|
||||
uncorrectable
|
||||
underoptimized
|
||||
unhandled
|
||||
unfused
|
||||
uninstallation
|
||||
unmapped
|
||||
unsqueeze
|
||||
|
||||
28
RELEASE.md
28
RELEASE.md
@@ -100,13 +100,12 @@ firmware, AMD GPU drivers, and the ROCm user space software.
|
||||
01.25.16.03<br>
|
||||
01.25.15.04
|
||||
</td>
|
||||
<td>
|
||||
<td rowspan="2" style="vertical-align: middle;">
|
||||
30.20.1<br>
|
||||
30.20.0<br>
|
||||
30.10.2<br>
|
||||
30.10.1<br>
|
||||
30.10
|
||||
</td>
|
||||
30.10</td>
|
||||
<td rowspan="3" style="vertical-align: middle;">8.6.0.K</td>
|
||||
</tr>
|
||||
<tr>
|
||||
@@ -115,13 +114,6 @@ firmware, AMD GPU drivers, and the ROCm user space software.
|
||||
01.25.16.03<br>
|
||||
01.25.15.04
|
||||
</td>
|
||||
<td>
|
||||
30.20.1<br>
|
||||
30.20.0<br>
|
||||
30.10.2<br>
|
||||
30.10.1<br>
|
||||
30.10
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>MI325X<a href="#footnote1"><sup>[1]</sup></a></td>
|
||||
@@ -839,7 +831,7 @@ issues related to individual components, review the [Detailed component changes]
|
||||
|
||||
### RCCL performance degradation on AMD Instinct MI300X GPU with AMD Pollara AI NIC
|
||||
|
||||
If you’re using RCCL on AMD Instinct MI300X GPUs with AMD Pollara AI NIC, you might observe performance degradation for specific collectives and message sizes. The affected collectives are `Scatter`, `AllToAll`, and `AlltoAllv`. It's recommended to avoid using RCCL packaged with ROCm 7.1.1. As a workaround, use the {fab}`github`[RCCL `develop` branch](https://github.com/ROCm/rccl/tree/develop), which contains the fix and will be included in a future ROCm release. See [GitHub issue #5717](https://github.com/ROCm/ROCm/issues/5717).
|
||||
If you’re using RCCL on AMD Instinct MI300X GPUs with AMD Pollara AI NIC, you might observe performance degradation for specific collectives and message sizes. The affected collectives are `Scatter`, `AllToAll`, and `AlltoAllv`. It's recommended to avoid using RCCL packaged with ROCm 7.1.1. As a workaround, use the {fab}`github`[RCCL `develop` branch](https://github.com/ROCm/rccl/tree/develop), which contains the fix and will be included in a future ROCm release.
|
||||
|
||||
### Segmentation fault in training models using TensorFlow 2.20.0 Docker images
|
||||
|
||||
@@ -847,7 +839,7 @@ Training models `tf2_tfm_resnet50_fp16_train` and `tf2_tfm_resnet50_fp32_train`
|
||||
might fail with a segmentation fault when run on the TensorFlow 2.20.0 Docker
|
||||
image with ROCm 7.1.1. As a workaround, use TensorFlow 2.19.x Docker image for
|
||||
training the models in ROCm 7.1.1. This issue will be fixed in a future ROCm
|
||||
release. See [GitHub issue #5718](https://github.com/ROCm/ROCm/issues/5718).
|
||||
release.
|
||||
|
||||
### AMD SMI CLI triggers repeated kernel errors on GPUs with partitioning support
|
||||
|
||||
@@ -866,19 +858,11 @@ amdgpu 0000:15:00.0: amdgpu: renderD153 partition 1 not valid!
|
||||
These repeated kernel logs can clutter the system logs and may cause
|
||||
unnecessary concern about GPU health. However, this is a non-functional issue
|
||||
and does not affect AMD SMI functionality or GPU performance. This issue will
|
||||
be fixed in a future ROCm release. See [GitHub issue #5720](https://github.com/ROCm/ROCm/issues/5720).
|
||||
be fixed in a future ROCm release.
|
||||
|
||||
### Excessive bad page logs in AMD GPU Driver (amdgpu)
|
||||
|
||||
Due to partial data corruption in the Electrically Erasable Programmable Read-Only Memory (EEPROM) and limited error handling in the AMD GPU Driver (amdgpu), excessive log output might occur when querying the reliability, availability, and serviceability (RAS) bad pages. This issue will be fixed in a future AMD GPU Driver (amdgpu) and ROCm release. See [GitHub issue #5719](https://github.com/ROCm/ROCm/issues/5719).
|
||||
|
||||
### Incorrect results in gemm_ex operations for rocBLAS and hipBLAS
|
||||
|
||||
Some `gemm_ex` operations with 8-bit input data types (`int8`, `float8`, `bfloat8`) for specific matrix dimensions (K = 1 and number of workgroups > 1) might yield incorrect results. The issue results from incorrect tailloop code that fails to consider workgroup index when calculating valid element size. The issue will be fixed in a future ROCm release. See [GitHub issue #5722](https://github.com/ROCm/ROCm/issues/5722).
|
||||
|
||||
### hipBLASLt performance variation for a particular FP8 GEMM operation on AMD Instinct MI325X GPUs
|
||||
|
||||
If you’re using hipBLASLt on AMD Instinct MI325X GPUs for large FP8 GEMM operations (such as 9728x8192x65536), you might observe a noticeable performance variation. The issue is currently under investigation and will be fixed in a future ROCm release. See [GitHub issue #5734](https://github.com/ROCm/ROCm/issues/5734).
|
||||
Due to partial data corruption of Electrically Erasable Programmable Read-Only Memory (EEPROM) and limited error handling in the AMD GPU Driver(amdgpu), excessive log output might result when querying the reliability, availability, and serviceability (RAS) bad pages. This issue will be fixed in a future AMD GPU Driver(amdgpu) and ROCm release.
|
||||
|
||||
## ROCm resolved issues
|
||||
|
||||
|
||||
@@ -269,33 +269,6 @@ For a complete and up-to-date list of JAX public modules (for example, ``jax.num
|
||||
JAX API modules are maintained by the JAX project and is subject to change.
|
||||
Refer to the official Jax documentation for the most up-to-date information.
|
||||
|
||||
Key features and enhancements for ROCm 7.1
|
||||
===============================================================================
|
||||
|
||||
- Enabled compilation of multihost HLO runner Python bindings.
|
||||
|
||||
- Backported multihost HLO runner bindings and some related changes to
|
||||
:code:`FunctionalHloRunner`.
|
||||
|
||||
- Added :code:`requirements_lock_3_12` to enable building for Python 3.12.
|
||||
|
||||
- Removed hardcoded NHWC convolution layout for ``fp16`` precision to address the performance drops for ``fp16`` precision on gfx12xx GPUs.
|
||||
|
||||
|
||||
- ROCprofiler-SDK integration:
|
||||
|
||||
- Integrated ROCprofiler-SDK (v3) to XLA to improve profiling of GPU events,
|
||||
support both time-based and step-based profiling.
|
||||
|
||||
- Added unit tests for :code:`rocm_collector` and :code:`rocm_tracer`.
|
||||
|
||||
- Added Triton unsupported conversion from ``f8E4M3FNUZ`` to ``fp16`` with
|
||||
rounding mode.
|
||||
|
||||
- Introduced :code:`CudnnFusedConvDecomposer` to revert fused convolutions
|
||||
when :code:`ConvAlgorithmPicker` fails to find a fused algorithm, and removed
|
||||
unfused fallback paths from :code:`RocmFusedConvRunner`.
|
||||
|
||||
Key features and enhancements for ROCm 7.0
|
||||
===============================================================================
|
||||
|
||||
|
||||
@@ -249,6 +249,3 @@ html_context = {
|
||||
"granularity_type" : [('Coarse-grained', 'coarse-grained'), ('Fine-grained', 'fine-grained')],
|
||||
"scope_type" : [('Device', 'device'), ('System', 'system')]
|
||||
}
|
||||
|
||||
# Disable figure and table numbering
|
||||
numfig = False
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
rocm-docs-core==1.30.1
|
||||
rocm-docs-core==1.29.0
|
||||
sphinx-reredirects
|
||||
sphinx-sitemap
|
||||
sphinxcontrib.datatemplates==0.11.0
|
||||
|
||||
@@ -187,7 +187,7 @@ requests==2.32.5
|
||||
# via
|
||||
# pygithub
|
||||
# sphinx
|
||||
rocm-docs-core==1.30.1
|
||||
rocm-docs-core==1.29.0
|
||||
# via -r requirements.in
|
||||
rpds-py==0.29.0
|
||||
# via
|
||||
|
||||
@@ -123,8 +123,8 @@ Performance
|
||||
|
||||
.. note::
|
||||
|
||||
`ROCprof Compute Viewer <https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/amd-mainline/>`_ is a tool for visualizing and analyzing GPU thread trace data collected using :doc:`rocprofv3 <rocprofiler-sdk:index>`.
|
||||
Note that `ROCprof Compute Viewer <https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/amd-mainline/>`_ is in an early access state. Running production workloads is not recommended.
|
||||
- `ROCprof Compute Viewer <https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/amd-mainline/>`_ is a tool for visualizing and analyzing GPU thread trace data collected using :doc:`rocprofv3 <rocprofiler-sdk:index>`. Note that `ROCprof Compute Viewer <https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/amd-mainline/>`_ is in an early access state. Running production workloads is not recommended.
|
||||
- ROCm Optiq (Beta) provides deep insights into system-level performance for applications running on the ROCm stack. It serves as the GUI to visualize traces collected by ROCm profiling tools, specifically `ROCm Systems Profiler <https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/index.html>`_.
|
||||
|
||||
Development
|
||||
^^^^^^^^^^^
|
||||
|
||||
Reference in New Issue
Block a user