* Initial 6.4.2 changes to release notes * 6.4.2 initial changes applied * edited entry for rocPRIM * Add HIP content for 6.4.2 * Release notes for 6.4.2 updated * Conf.py updated * Review and fixed issue updated * RCCL changelog update * Compatibility matrix updated * Pointed to internal linux docs * Histrorical changelog added * Quick change on component table * Changelog for Systems profiler updated * Consolidated changelog synced * ROCm validation suite changelog added * Leo's feedback incoporated * Highlights added * Manifest added for comparision * Manifest 642 RC1 added * RC2 manifest added * KMD/UMD footnote updated * Minor changes * Consolidated changelog synced --------- Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com> Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
32 KiB
ROCm 6.4.2 release notes
The release notes provide a summary of notable changes since the previous ROCm release.
If you’re using AMD Radeon™ PRO or Radeon GPUs in a workstation setting with a display connected, see the [Use ROCm on Radeon GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/docs/compatibility/native_linux/native_linux_compatibility.html)
documentation to verify compatibility and system requirements.
Release highlights
The following are notable new features and improvements in ROCm 6.4.2. For changes to individual components, see Detailed component changes.
ROCm Compute Profiler supports new data types in roofline plots
The roofline-datatype option in ROCm Compute Profiler now supports FP8, FP16, BF16, FP32, FP64, I8, I32, and I64 data types. This is dependent on the GPU architecture. For more information, see Roofline options.
ROCm Compute Profiler now uses AMD SMI
ROCm Compute Profiler now uses AMD SMI instead of ROCm SMI. The AMD System Management Interface Library (AMD SMI) is a successor to ROCm SMI. It is a unified system management interface tool that provides a user-space interface for applications to monitor and control GPU applications and gives users the ability to query information about drivers and GPUs on the system. For more information, see ROCm/amdsmi and the AMD SMI documentation.
ROCm Compute Profiler adds FP8 metrics support
ROCm Compute Profiler has added FP8 metrics support for AMD Instinct MI300 series accelerators. For more information, see Profile mode.
rocSOLVER performance enhancement
rocSOLVER has been enhanced with improved performance for eigensolvers and singular value decomposition (SVD).
ROCm installation instruction update
[Draft] ROCm installation instruction is updated to provide detailed instruction based on the operating system distribution and deprecate the use of the AMDGPU installer for ROCm installation. For more information, see ROCm on Linux detailed installation overview.
ROCm Offline Installer Creator updates
[Placeholder for ROCm Offline Installer Creator 6.4.2 updates]. See ROCm Offline Installer Creator for more information.
ROCm Runfile Installer updates
[Placeholder for ROCm Runfile Installer 6.4.2 updates]For more information, see ROCm Runfile Installer.
ROCm documentation updates
ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.
-
Tutorials for AI developers have been expanded with four new tutorials. These tutorials are Jupyter notebook-based, easy-to-follow documents. They are ideal for AI developers who want to learn about specific topics, including inference, fine-tuning, training, and GPU development and optimization. For more information about the changes, see Changelog for the AI Developer Hub.
-
Documentation for the new ROCprof Compute Viewer was released in May 2025. This tool is used to visualize and analyze GPU thread trace data collected using rocprofv3. Note that ROCprof Compute Viewer is in an early access state. Running production workloads is not recommended.
Operating system and hardware support changes
ROCm 6.4.2 marks the end of support (EoS) for RHEL 9.5.
Hardware support remains unchanged in this release.
See the Compatibility matrix for more information about operating system and hardware compatibility.
ROCm components
The following table lists the versions of ROCm components for ROCm 6.4.2, including any version
changes from 6.4.1 to 6.4.2. Click the component's updated version to go to a list of its changes.
Click {fab}github to go to the component's source code on GitHub.
| Category | Group | Name | Version | |
|---|---|---|---|---|
| Libraries | Machine learning and computer vision | Composable Kernel | 1.1.0 | |
| MIGraphX | 2.12.0 | |||
| MIOpen | 3.4.0 | |||
| MIVisionX | 3.2.0 | |||
| rocAL | 2.2.0 | |||
| rocDecode | 0.10.0 | |||
| rocJPEG | 0.8.0 | |||
| rocPyDecode | 0.3.1 | |||
| RPP | 1.9.10 | |||
| Communication | RCCL | 2.22.3 ⇒ 2.22.3 | ||
| rocSHMEM | 2.0.0 ⇒ 2.0.1 | |||
| Math | hipBLAS | 2.4.0 | ||
| hipBLASLt | 0.12.1 ⇒ 0.12.1 | |||
| hipFFT | 1.0.18 | |||
| hipfort | 0.6.0 | |||
| hipRAND | 2.12.0 | |||
| hipSOLVER | 2.4.0 | |||
| hipSPARSE | 3.2.0 | |||
| hipSPARSELt | 0.2.3 | |||
| rocALUTION | 3.2.3 | |||
| rocBLAS | 4.4.0 | |||
| rocFFT | 1.0.32 | |||
| rocRAND | 3.3.0 | |||
| rocSOLVER | 3.28.0 ⇒ 3.28.2 | |||
| rocSPARSE | 3.4.0 | |||
| rocWMMA | 1.7.0 | |||
| Tensile | 4.43.0 | |||
| Primitives | hipCUB | 3.4.0 | ||
| hipTensor | 1.5.0 | |||
| rocPRIM | 3.4.0 ⇒ 3.4.1 | |||
| rocThrust | 3.3.0 | |||
| Tools | System management | AMD SMI | 25.4.2 | |
| ROCm Data Center Tool | 0.3.0 | |||
| rocminfo | 1.0.0 | |||
| ROCm SMI | 7.5.0 | |||
| ROCm Validation Suite | 1.1.0 ⇒ 1.1.0 | |||
| Performance | ROCm Bandwidth Test | 1.4.0 | ||
| ROCm Compute Profiler | 3.1.0 ⇒ 3.2.0 | |||
| ROCm Systems Profiler | 1.0.1 ⇒ 1.0.2 | |||
| ROCProfiler | 2.0.0 | |||
| ROCprofiler-SDK | 0.6.0 | |||
| ROCTracer | 4.1.0 | |||
| Development | HIPIFY | 19.0.0 | ||
| ROCdbgapi | 0.77.2 | |||
| ROCm CMake | 0.14.0 | |||
| ROCm Debugger (ROCgdb) | 15.2 | |||
| ROCr Debug Agent | 2.0.4 | |||
| Compilers | HIPCC | 1.1.1 | ||
| llvm-project | 19.0.0 | |||
| Runtimes | HIP | 6.4.1 ⇒ 6.4.2 | ||
| ROCr Runtime | 1.15.0 | |||
Detailed component changes
The following sections describe key changes to ROCm components.
For a historical overview of ROCm component updates, see the {doc}`ROCm consolidated changelog </release/changelog>`.
HIP (6.4.2)
Added
- Support for the pointer attribute
HIP_POINTER_ATTRIBUTE_CONTEXT.
Optimized
- Improved implementation in
hipEventSynchronize, HIP runtime now makes internal callbacks non-blocking to gain performance.
Resolved issues
- Issue of dependency on
libgcc-s1during rocm-dev install on Debian Buster. HIP runtime removed this Debian package dependency and useslibgcc1instead for this distros. - Building issue for
COMGRdynamic load on Fedora and other Distros. HIP runtime now doesn't link againstlibamd_comgr.so. - Failure in the API
hipStreamDestroy, when stream type ishipStreamLegacy. The API now returns error codehipErrorInvalidResourceHandleon this condition. - Kernel launch errors, such as
shared object initialization failed,invalid device functionorkernel execution failure. HIP runtime now loadsCOMGRproperly considering the file with its name and mapped image. - Memory access fault in some applications. HIP runtime fixed offset accumulation in memory address.
hipBLASLt (0.12.1)
Added
- Support for gfx1151.
RCCL (2.22.3)
Added
- Added support for the LL128 protocol on gfx942.
ROCm Compute Profiler (3.2.0)
Added
- 8-bit floating point (FP8) metrics support for AMD Instinct MI300 GPUs.
- Additional data types for roofline: FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on the GPU architecture).
- Data type selection option
--roofline-data-type / -Rfor roofline profiling. The default data type is FP32.
Changed
- Change dependency from
rocm-smitoamd-smi.
Resolved issues
- Fixed a crash related to Agent ID caused by the new format of the
rocprofv3output CSV file.
ROCm Systems Profiler (1.0.2)
Optimized
- Improved readability of the OpenMP target offload traces by showing on a single Perfetto track.
Resolved issues
- Fixed the file path to the script that merges Perfetto files from multi-process MPI runs. The script has also been renamed from
merge-multiprocess-output.shtorocprof-sys-merge-output.sh.
ROCm Validation Suite (1.1.0)
Added
- NPS2/DPX and NPS4/CPX partition modes support for AMD Instinct MI300X.
rocPRIM (3.4.1)
Upcoming Changes
- Changes to the template parameters of warp and block algorithms will be made in an upcoming release.
- Due to an upcoming compiler change, the following symbols related to warp size have been marked as deprecated and will be removed in an upcoming major release:
rocprim::device_warp_size(). This has been replaced byrocprim::arch::wavefront::min_size()androcprim::arch::wavefront::max_size()for compile-time constants. Use these when allocating global or shared memory. For run-time constants, userocprim::arch::wavefront::size().rocprim::warp_size()ROCPRIM_WAVEFRONT_SIZE
rocSHMEM (2.0.1)
Resolved Issues
- Resolved incorrect output for
rocshmem_ctx_my_peandrocshmem_ctx_n_pes. - Resolved multi-team errors by providing team specific buffers in
rocshmem_ctx_wg_team_sync. - Resolved missing implementation of
rocshmem_gfor IPC conduit.
rocSOLVER (3.28.2)
Added
- Hybrid computation support for existing routines, such as STERF.
- SVD for general matrices based on Cuppen's Divide and Conquer algorithm:
- GESDD (with batched and strided_batched versions)
Optimized
- Reduced the device memory requirements for STEDC, SYEVD/HEEVD, and SYGVD/HEGVD.
- Improved the performance of STEDC and divide and conquer Eigensolvers.
- Improved the performance of SYTRD, the initial step of the Eigensolvers that start with the tridiagonalization of the input matrix.
ROCm known issues
ROCm known issues are noted on {fab}github GitHub. For known
issues related to individual components, review the Detailed component changes.
ROCm resolved issues
The following are previously known issues resolved in this release. For resolved issues related to individual components, review the Detailed component changes.
AMD SMI CLI: CPER entries not dumped continuously when using follow flag
An issue where CPER entries were not streamed continuously as intended when using the --follow flag with amd-smi ras --cper has been resolved. See issue #4768 on GitHub.
ROCm upcoming changes
The following changes to the ROCm software stack are anticipated for future releases.
ROCm SMI deprecation
ROCm SMI will be phased out in an upcoming ROCm release and will enter maintenance mode. After this transition, only critical bug fixes will be addressed and no further feature development will take place.
It's strongly recommended to transition your projects to AMD SMI, the successor to ROCm SMI. AMD SMI includes all the features of the ROCm SMI and will continue to receive regular updates, new functionality, and ongoing support. For more information on AMD SMI, see the AMD SMI documentation.
ROCTracer, ROCProfiler, rocprof, and rocprofv2 deprecation
Development and support for ROCTracer, ROCProfiler, rocprof, and rocprofv2 are being phased out in favor of ROCprofiler-SDK in upcoming ROCm releases. Starting with ROCm 6.4, only critical defect fixes will be addressed for older versions of the profiling tools and libraries. All users are encouraged to upgrade to the latest version of the ROCprofiler-SDK library and the (rocprofv3) tool to ensure continued support and access to new features. ROCprofiler-SDK is still in beta today and will be production-ready in a future ROCm release.
It's anticipated that ROCTracer, ROCProfiler, rocprof, and rocprofv2 will reach end-of-life by future releases, aligning with Q1 of 2026.
AMDGPU wavefront size compiler macro deprecation
Access to the wavefront size as a compile-time constant via the __AMDGCN_WAVEFRONT_SIZE
and __AMDGCN_WAVEFRONT_SIZE__ macros or the constexpr warpSize variable is deprecated
and will be disabled in a future release.
- The
__AMDGCN_WAVEFRONT_SIZE__macro and__AMDGCN_WAVEFRONT_SIZEalias will be removed in an upcoming release. It is recommended to remove any use of this macro. For more information, see AMDGPU support. warpSizewill only be available as a non-constexprvariable. Where required, the wavefront size should be queried via thewarpSizevariable in device code, or viahipGetDevicePropertiesin host code. Neither of these will result in a compile-time constant.- For cases where compile-time evaluation of the wavefront size cannot be avoided,
uses of
__AMDGCN_WAVEFRONT_SIZE,__AMDGCN_WAVEFRONT_SIZE__, orwarpSizecan be replaced with a user-defined macro orconstexprvariable with the wavefront size(s) for the target hardware. For example:
#if defined(__GFX9__)
#define MY_MACRO_FOR_WAVEFRONT_SIZE 64
#else
#define MY_MACRO_FOR_WAVEFRONT_SIZE 32
#endif
HIPCC Perl scripts deprecation
The HIPCC Perl scripts (hipcc.pl and hipconfig.pl) will be removed in an upcoming release.
Changes to ROCm Object Tooling
ROCm Object Tooling tools roc-obj-ls, roc-obj-extract, and roc-obj are
deprecated in ROCm 6.4, and will be removed in a future release. Functionality
has been added to the llvm-objdump --offloading tool option to extract all
clang-offload-bundles into individual code objects found within the objects
or executables passed as input. The llvm-objdump --offloading tool option also
supports the --arch-name option, and only extracts code objects found with
the specified target architecture. See llvm-objdump
for more information.
HIP runtime API changes
There are a number of upcoming changes planned for HIP runtime API in an upcoming major release
that are not backward compatible with prior releases. Most of these changes increase
alignment between HIP and CUDA APIs or behavior. Some of the upcoming changes are to
clean up header files, remove namespace collision, and have a clear separation between
hipRTC and HIP runtime.