From 56d258592dae5ea6d344a22d42dddc1964de70b2 Mon Sep 17 00:00:00 2001 From: Peter Park Date: Wed, 21 May 2025 11:15:44 -0400 Subject: [PATCH] Finalize 6.4.1 release notes (#408) * update URLs for production * update historical changelog * remove deep learning compat section from doc highlights * update changelog.md * Update CHANGELOG.md Co-authored-by: yugang-amd * Update CHANGELOG.md Co-authored-by: yugang-amd --------- Co-authored-by: yugang-amd --- CHANGELOG.md | 39 ++++++++++++++---- RELEASE.md | 111 +++++++++++++++++++++++++-------------------------- 2 files changed, 87 insertions(+), 63 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index d5d2c252c..c1ce6a5cb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -16,11 +16,24 @@ for a complete overview of this release. * Dumping CPER entries from RAS tool `amdsmi_get_gpu_cper_entries()` to Python and C APIs. - Dumping CPER entries consist of `amdsmi_cper_hdr_t`. - Dumping CPER entries is also enabled in the CLI interface through `sudo amd-smi ras --cper`. +* `amdsmi_get_gpu_busy_percent` to the C API. -#### Resolved +#### Changed + +* Modified VRAM display for `amd-smi monitor -v`. + +#### Optimized + +* Improved load times for CLI commands when the GPU has multiple parititons. + +#### Resolved issues * Fixed partition enumeration in `amd-smi list -e`, `amdsmi_get_gpu_enumeration_info()`, `amdsmi_enumeration_info_t`, `drm_card`, and `drm_render` fields. +#### Known issues + +* When using the `--follow` flag with `amd-smi ras --cper`, CPER entries are not streamed continuously as intended. This will be fixed in an upcoming ROCm release. + ```{note} See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions. ``` @@ -29,20 +42,22 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc #### Added -* New debug mask, to print precise code object information for logging. +* New log mask enumeration `LOG_COMGR` enables logging precise code object information. #### Changed -* Calling the code object has changed. HIP runtime now uses device bitcode before SPIR-V. +* HIP runtime uses device bitcode before SPIRV. +* The implementation of preventing `hipLaunchKernel` latency degradation with number of idle streams is reverted or disabled by default. #### Optimized -* Improved kernel logging using the demangling shader names. +* Improved kernel logging includes de-mangling shader names. +* Refined implementation in HIP APIs `hipEventRecords` and `hipStreamWaitEvent` for performance improvement. #### Resolved issues -* Stale state during the graph capture. The return error was fixed, and HIP runtime now always uses the latest dependent nodes during `hipEventRecord` capture. -* Issue of `hipEventRecords` failing to call the `hip::getStream` runtime function. +* Stale state during the graph capture. The return error was fixed, HIP runtime now always uses the latest dependent nodes during `hipEventRecord` capture. +* Segmentation fault during kernel execution. HIP runtime now allows maximum stack size as per ISA on the GPU device. ### **hipBLASLt** (0.12.1) @@ -61,6 +76,16 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc * Fixed an issue where early termination, in rare circumstances, could cause the application to stop responding by adding synchronization before destroying a proxy thread. * Fixed the accuracy issue for the MSCCLPP `allreduce7` kernel in graph mode. +#### Known issues + +* When splitting a communicator using `ncclCommSplit` in some GPU configurations, MSCCL initialization can cause a segmentation fault. The recommended workaround is to disable MSCCL with `export RCCL_MSCCL_ENABLE=0`. + This issue will be fixed in a future ROCm release. + +* Within the RCCL-UnitTests test suite, failures occur in tests ending with the + `.ManagedMem` and `.ManagedMemGraph` suffixes. These failures only affect the + test results and do not affect the RCCL component itself. This issue will be + resolved in a future ROCm release. + ### **rocALUTION** (3.2.3) #### Added @@ -100,7 +125,7 @@ See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/rele #### Added -* How-to document for [network performance profiling](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/amd-staging/how-to/nic-profiling.html) for standard Network Interface Cards (NICs). +* How-to document for [network performance profiling](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/how-to/nic-profiling.html) for standard Network Interface Cards (NICs). #### Resolved issues diff --git a/RELEASE.md b/RELEASE.md index 921080f5e..fd79a0e59 100644 --- a/RELEASE.md +++ b/RELEASE.md @@ -48,7 +48,7 @@ The ROCm Data Science toolkit (or ROCm-DS) is an open-source software collection ### ROCm Offline Installer Creator updates -The ROCm Offline Installer Creator 6.4.1 now allows you to use the SPACEBAR or ENTER keys for menu item selection in the GUI. It also adds support for Debian 12 and fixes an issue for “full” mode RHEL offline installer creation, where GDM packages were uninstalled during offline installation. See [ROCm Offline Installer Creator](https://rocm.docs.amd.com/projects/install-on-linux-internal/en/latest/install/rocm-offline-installer.html) for more information. +The ROCm Offline Installer Creator 6.4.1 now allows you to use the SPACEBAR or ENTER keys for menu item selection in the GUI. It also adds support for Debian 12 and fixes an issue for “full” mode RHEL offline installer creation, where GDM packages were uninstalled during offline installation. See [ROCm Offline Installer Creator](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/rocm-offline-installer.html) for more information. ### ROCm Runfile Installer updates @@ -59,7 +59,7 @@ The ROCm Runfile Installer 6.4.1 adds the following improvements: - The Runfile Installer can be used to uninstall any Runfile-based installation of the driver. - In the CLI interface, the `postrocm` argument can now be run separately from the `rocm` argument. In cases where `postrocm` was missed from the initial ROCm install, `postrocm` can now be run on the same target folder. For example, if you installed ROCm 6.4.1 using `install.run target=/myrocm rocm`, you can run the post-installation separately using the command `install.run target=/myrocm/rocm-6.4.1 postrocm`. -For more information, see [ROCm Runfile Installer](https://rocm.docs.amd.com/projects/install-on-linux-internal/en/latest/install/rocm-runfile-installer.html). +For more information, see [ROCm Runfile Installer](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/rocm-runfile-installer.html). ### ROCm documentation updates @@ -71,7 +71,6 @@ ROCm documentation continues to be updated to provide clearer and more comprehen * The [Training a model with JAX MaxText](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/training/benchmark-docker/jax-maxtext.html) performance testing guide has been updated to feature the latest [ROCm/jax-training](https://hub.docker.com/layers/rocm/jax-training/maxtext-v25.5/images/sha256-4e0516358a227cae8f552fb866ec07e2edcf244756f02e7b40212abfbab5217b) Docker image (a preconfigured training environment with ROCm, JAX, and [MaxText](https://github.com/AI-Hypercomputer/maxtext)). Support for [Llama 3.3 70B](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) has been added. * The [vLLM inference performance testing](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference/vllm-benchmark.html?model=pyt_vllm_qwq-32b) guide has been updated to feature the latest [ROCm/vLLM](https://hub.docker.com/layers/rocm/vllm/instinct_main/images/sha256-ad9062dea3483d59dedb17c67f7c49f30eebd6eb37c3fac0a171fb19696cc845) Docker image (a preconfigured environment for inference with ROCm and [vLLM](https://docs.vllm.ai/en/latest/)). Support for the [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) model has been added. * The [PyTorch inference performance testing](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference/pytorch-inference-benchmark.html?model=pyt_clip_inference) guide has been added, featuring the [ROCm/PyTorch](https://hub.docker.com/layers/rocm/pytorch/latest/images/sha256-ab1d350b818b90123cfda31363019d11c0d41a8f12a19e3cb2cb40cf0261137d) Docker image (a preconfigured inference environment with ROCm and PyTorch) with initial support for the [CLIP](https://huggingface.co/laion/CLIP-ViT-B-32-laion2B-s34B-b79K) and [Chai-1](https://huggingface.co/chaidiscovery/chai-1) models. -* The deep learning frameworks compatibility pages have been updated with new information and are reorganized, making them easier to review. For more information, see [PyTorch compatibility](https://rocm.docs.amd.com/en/latest/compatibility/ml-compatibility/pytorch-compatibility.html), [TensorFlow compatibility](https://rocm.docs.amd.com/en/latest/compatibility/ml-compatibility/tensorflow-compatibility.html), and [JAX compatibility](https://rocm.docs.amd.com/en/latest/compatibility/ml-compatibility/jax-compatibility.html). ## Operating system and hardware support changes @@ -113,47 +112,47 @@ Click {fab}`github` to go to the component's source code on GitHub. Libraries Machine learning and computer vision - Composable Kernel + Composable Kernel 1.1.0 - MIGraphX + MIGraphX 2.12.0 - MIOpen + MIOpen 3.4.0 - MIVisionX + MIVisionX 3.2.0 - rocAL + rocAL 2.2.0 - rocDecode + rocDecode 0.10.0 - rocJPEG + rocJPEG 0.8.0 - rocPyDecode + rocPyDecode 0.3.1 - RPP + RPP 1.9.10 @@ -162,7 +161,7 @@ Click {fab}`github` to go to the component's source code on GitHub. Communication - RCCL + RCCL 2.22.3 ⇒ 2.22.3 @@ -176,82 +175,82 @@ Click {fab}`github` to go to the component's source code on GitHub. Math - hipBLAS + hipBLAS 2.4.0 - hipBLASLt + hipBLASLt 0.12.0 ⇒ 0.12.1 - hipFFT + hipFFT 1.0.18 - hipfort + hipfort 0.6.0 - hipRAND + hipRAND 2.12.0 - hipSOLVER + hipSOLVER 2.4.0 - hipSPARSE + hipSPARSE 3.2.0 - hipSPARSELt + hipSPARSELt 0.2.3 - rocALUTION + rocALUTION 3.2.2 ⇒ 3.2.3 - rocBLAS + rocBLAS 4.4.0 - rocFFT + rocFFT 1.0.32 - rocRAND + rocRAND 3.3.0 - rocSOLVER + rocSOLVER 3.28.0 - rocSPARSE + rocSPARSE 3.4.0 - rocWMMA + rocWMMA 1.7.0 - Tensile + Tensile 4.43.0 @@ -260,22 +259,22 @@ Click {fab}`github` to go to the component's source code on GitHub. Primitives - hipCUB + hipCUB 3.4.0 - hipTensor + hipTensor 1.5.0 - rocPRIM + rocPRIM 3.4.0 - rocThrust + rocThrust 3.3.0 @@ -284,27 +283,27 @@ Click {fab}`github` to go to the component's source code on GitHub. Tools System management - AMD SMI + AMD SMI 25.3.0 ⇒ 25.4.2 - ROCm Data Center Tool + ROCm Data Center Tool 0.3.0 ⇒ 0.3.0 - rocminfo + rocminfo 1.0.0 - ROCm SMI + ROCm SMI 7.5.0 ⇒ 7.5.0 - ROCmValidationSuite + ROCmValidationSuite 1.1.0 @@ -313,38 +312,38 @@ Click {fab}`github` to go to the component's source code on GitHub. Performance - ROCm Bandwidth + ROCm Bandwidth Test 1.4.0 - ROCm Compute Profiler + ROCm Compute Profiler 3.1.0 - ROCm Systems Profiler + ROCm Systems Profiler 1.0.0 ⇒ 1.0.1 - ROCProfiler + ROCProfiler 2.0.0 - ROCprofiler-SDK + ROCprofiler-SDK 0.6.0 - ROCTracer + ROCTracer 4.1.0 @@ -354,32 +353,32 @@ Click {fab}`github` to go to the component's source code on GitHub. Development - HIPIFY + HIPIFY 19.0.0 - ROCdbgapi + ROCdbgapi 0.77.2 - ROCm CMake + ROCm CMake 0.14.0 - ROCm Debugger (ROCgdb) + ROCm Debugger (ROCgdb) 15.2 - ROCr Debug Agent + ROCr Debug Agent 2.0.4 Compilers - HIPCC + HIPCC 1.1.1 - llvm-project + llvm-project 19.0.0 @@ -404,12 +403,12 @@ Click {fab}`github` to go to the component's source code on GitHub. Runtimes - HIP + HIP 6.4.0 ⇒ 6.4.1 - ROCr Runtime + ROCr Runtime 1.15.0 ⇒ 1.15.0 @@ -451,7 +450,7 @@ For a historical overview of ROCm component updates, see the {doc}`ROCm consolid * When using the `--follow` flag with `amd-smi ras --cper`, CPER entries are not streamed continuously as intended. This will be fixed in an upcoming ROCm release. ```{note} -See the full [AMD SMI changelog](https://github.com/AMD-ROCm-Internal/amdsmi/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions. +See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions. ``` ### **HIP** (6.4.1) @@ -534,14 +533,14 @@ See the full [AMD SMI changelog](https://github.com/AMD-ROCm-Internal/amdsmi/blo - Fixed partition enumeration. It now refers to the correct DRM Render and Card paths. ```{note} -See the full [ROCm SMI changelog](https://github.com/AMD-ROCm-Internal/rocm_smi_lib/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions. +See the full [ROCm SMI changelog](https://github.com/ROCm/rocm_smi_lib/blob/release/rocm-rel-6.4/CHANGELOG.md) for details, examples, and in-depth descriptions. ``` ### **ROCm Systems Profiler** (1.0.1) #### Added -* How-to document for [network performance profiling](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/amd-staging/how-to/nic-profiling.html) for standard Network Interface Cards (NICs). +* How-to document for [network performance profiling](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/how-to/nic-profiling.html) for standard Network Interface Cards (NICs). #### Resolved issues