From f21cfe1171045939f5c3f617db11555dbccb9163 Mon Sep 17 00:00:00 2001 From: Pratik Basyal Date: Wed, 15 Oct 2025 09:58:23 -0400 Subject: [PATCH] GitHub issue added to 702 known issues (#5520) * GitHub issue added to 702 known issues * Added missing RCCL changelog --- CHANGELOG.md | 4 ++++ RELEASE.md | 6 +++--- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 16d4032fb..6c068afb0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -912,11 +912,15 @@ HIP runtime has the following functional improvements which improves runtime per * Compatibility with NCCL 2.25.1. * Compatibility with NCCL 2.26.6. +#### Optimized +* Improved the performance of the `FP8` Sum operation by upcasting to `FP16`. + #### Resolved issues * Resolved an issue when using more than 64 channels when multiple collectives are used in the same `ncclGroup()` call. * Fixed unit test failures in tests ending with the `ManagedMem` and `ManagedMemGraph` suffixes. * Fixed a suboptimal algorithmic switching point for AllReduce on the AMD Instinct MI300X. +* Fixed broken functionality within the LL protocol on gfx950 by disabling inlining of LLGenericOp kernels. * Fixed the known issue "When splitting a communicator using `ncclCommSplit` in some GPU configurations, MSCCL initialization can cause a segmentation fault" with a design change to use `comm` instead of `rank` for `mscclStatus`. The global map for `comm` to `mscclStatus` is still not thread safe but should be explicitly handled by mutexes for read-write operations. This is tested for correctness, but there is a plan to use a thread-safe map data structure in an upcoming release. ### **rocAL** (2.3.0) diff --git a/RELEASE.md b/RELEASE.md index 943ba35c3..06d22a836 100644 --- a/RELEASE.md +++ b/RELEASE.md @@ -699,7 +699,7 @@ The problem occurs when attempting to debug a program that contains code that ru The ROCR Debug Agent might also become unresponsive when attempting to capture data from a program that is experiencing queue errors, memory faults, or other triggering events. -For a detailed workaround, see the [Installation troubleshooting](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/install-faq.html#issue-10-rocm-debugging-tools-might-become-unresponsive-in-selinux-enabled-distributions) documentation. This issue will be fixed in a future ROCm release. +For a detailed workaround, see the [Installation troubleshooting](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/install-faq.html#issue-10-rocm-debugging-tools-might-become-unresponsive-in-selinux-enabled-distributions) documentation. This issue will be fixed in a future ROCm release. See [GitHub issue #5498](https://github.com/ROCm/ROCm/issues/5498). ### MIGraphX Python API will fail when running on Python 3.13 @@ -708,11 +708,11 @@ Applications using the MIGraphX Python API will fail when running on Python 3.13 ``` ls -l /opt/rocm-7.0.0/lib/libmigraphx_py_*.so ``` -The issue will be resolved in a future ROCm release. +The issue will be resolved in a future ROCm release. See [GitHub issue #5500](https://github.com/ROCm/ROCm/issues/5500). ### Applications using OpenCV might fail due to package incompatibility between the OS -OpenCV packages built on Ubuntu 24.04 are incompatible with Debian 13 due to a version conflict. As a result, applications, tests, and samples that use OpenCV might fail. To avoid the version conflict, rebuild OpenCV with the version corresponding to Debian 13, then rebuild MIVisionX on top of it. As a workaround, rebuild OpenCV from source, followed by the application that uses OpenCV. This issue will be fixed in a future ROCm release. +OpenCV packages built on Ubuntu 24.04 are incompatible with Debian 13 due to a version conflict. As a result, applications, tests, and samples that use OpenCV might fail. To avoid the version conflict, rebuild OpenCV with the version corresponding to Debian 13, then rebuild MIVisionX on top of it. As a workaround, rebuild OpenCV from source, followed by the application that uses OpenCV. This issue will be fixed in a future ROCm release. See [GitHub issue #5501](https://github.com/ROCm/ROCm/issues/5501). ## ROCm upcoming changes