Update RCCL known issue wording (#3775)

* add MAD page

* fix wording in RCCL known issue

* Revert "add MAD page"

This reverts commit c81d0f3b0a.
This commit is contained in:
Peter Park
2024-09-20 20:04:15 -04:00
committed by GitHub
parent 1e0d3da98c
commit d301e792d6

View File

@@ -470,7 +470,7 @@ See [issue #3767](https://github.com/ROCm/ROCm/issues/3767) on GitHub.
#### Known issues
On systems running Linux kernel 6.8.0, such as Ubuntu 24.04, GPUDirect RDMA is disabled and impacts multi-node RCCL performance.
On systems running Linux kernel 6.8.0, such as Ubuntu 24.04, Direct Memory Access (DMA) transfers between the GPU and NIC are disabled and impacts multi-node RCCL performance.
This issue was reproduced with RCCL 2.20.5 (ROCm 6.2.0 and 6.2.1) on systems with Broadcom Thor-2 NICs and affects other systems with RoCE networks using Linux 6.8.0 or newer.
Older RCCL versions are also impacted.