Files
ROCm/RELEASE.md
Pratik Basyal 5d594feeac Initial changes to 6.4.2 release notes, compatibility matrix, and changelog. (#427)
* Initial 6.4.2 changes to release notes

* 6.4.2 initial changes applied

* edited entry for rocPRIM

* Add HIP content for 6.4.2

* Release notes for 6.4.2 updated

* Conf.py updated

* Review and fixed issue updated

* RCCL changelog update

* Compatibility matrix updated

* Pointed to internal linux docs

* Histrorical changelog added

* Quick change on component table

* Changelog for Systems profiler updated

* Consolidated changelog synced

* ROCm validation suite changelog added

* Leo's feedback incoporated

* Highlights added

* Manifest added for comparision

* Manifest 642 RC1 added

* RC2 manifest added

* KMD/UMD footnote updated

* Minor changes

* Consolidated changelog synced

---------

Co-authored-by: spolifroni-amd <sandra.polifroni@amd.com>
Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
2025-06-12 13:40:02 -04:00

32 KiB
Raw Blame History

ROCm 6.4.2 release notes

The release notes provide a summary of notable changes since the previous ROCm release.

If youre using AMD Radeon™ PRO or Radeon GPUs in a workstation setting with a display connected, see the [Use ROCm on Radeon GPUs](https://rocm.docs.amd.com/projects/radeon/en/latest/docs/compatibility/native_linux/native_linux_compatibility.html)
documentation to verify compatibility and system requirements.

Release highlights

The following are notable new features and improvements in ROCm 6.4.2. For changes to individual components, see Detailed component changes.

ROCm Compute Profiler supports new data types in roofline plots

The roofline-datatype option in ROCm Compute Profiler now supports FP8, FP16, BF16, FP32, FP64, I8, I32, and I64 data types. This is dependent on the GPU architecture. For more information, see Roofline options.

ROCm Compute Profiler now uses AMD SMI

ROCm Compute Profiler now uses AMD SMI instead of ROCm SMI. The AMD System Management Interface Library (AMD SMI) is a successor to ROCm SMI. It is a unified system management interface tool that provides a user-space interface for applications to monitor and control GPU applications and gives users the ability to query information about drivers and GPUs on the system. For more information, see ROCm/amdsmi and the AMD SMI documentation.

ROCm Compute Profiler adds FP8 metrics support

ROCm Compute Profiler has added FP8 metrics support for AMD Instinct MI300 series accelerators. For more information, see Profile mode.

rocSOLVER performance enhancement

rocSOLVER has been enhanced with improved performance for eigensolvers and singular value decomposition (SVD).

ROCm installation instruction update

[Draft] ROCm installation instruction is updated to provide detailed instruction based on the operating system distribution and deprecate the use of the AMDGPU installer for ROCm installation. For more information, see ROCm on Linux detailed installation overview.

ROCm Offline Installer Creator updates

[Placeholder for ROCm Offline Installer Creator 6.4.2 updates]. See ROCm Offline Installer Creator for more information.

ROCm Runfile Installer updates

[Placeholder for ROCm Runfile Installer 6.4.2 updates]For more information, see ROCm Runfile Installer.

ROCm documentation updates

ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.

  • Tutorials for AI developers have been expanded with four new tutorials. These tutorials are Jupyter notebook-based, easy-to-follow documents. They are ideal for AI developers who want to learn about specific topics, including inference, fine-tuning, training, and GPU development and optimization. For more information about the changes, see Changelog for the AI Developer Hub.

  • Documentation for the new ROCprof Compute Viewer was released in May 2025. This tool is used to visualize and analyze GPU thread trace data collected using rocprofv3. Note that ROCprof Compute Viewer is in an early access state. Running production workloads is not recommended.

Operating system and hardware support changes

ROCm 6.4.2 marks the end of support (EoS) for RHEL 9.5.

Hardware support remains unchanged in this release.

See the Compatibility matrix for more information about operating system and hardware compatibility.

ROCm components

The following table lists the versions of ROCm components for ROCm 6.4.2, including any version changes from 6.4.1 to 6.4.2. Click the component's updated version to go to a list of its changes. Click {fab}github to go to the component's source code on GitHub.

Category Group Name Version
Libraries Machine learning and computer vision Composable Kernel 1.1.0
MIGraphX 2.12.0
MIOpen 3.4.0
MIVisionX 3.2.0
rocAL 2.2.0
rocDecode 0.10.0
rocJPEG 0.8.0
rocPyDecode 0.3.1
RPP 1.9.10
Communication RCCL 2.22.3 ⇒ 2.22.3
rocSHMEM 2.0.0 ⇒ 2.0.1
Math hipBLAS 2.4.0
hipBLASLt 0.12.1 ⇒ 0.12.1
hipFFT 1.0.18
hipfort 0.6.0
hipRAND 2.12.0
hipSOLVER 2.4.0
hipSPARSE 3.2.0
hipSPARSELt 0.2.3
rocALUTION 3.2.3
rocBLAS 4.4.0
rocFFT 1.0.32
rocRAND 3.3.0
rocSOLVER 3.28.0 ⇒ 3.28.2
rocSPARSE 3.4.0
rocWMMA 1.7.0
Tensile 4.43.0
Primitives hipCUB 3.4.0
hipTensor 1.5.0
rocPRIM 3.4.0 ⇒ 3.4.1
rocThrust 3.3.0
Tools System management AMD SMI 25.4.2
ROCm Data Center Tool 0.3.0
rocminfo 1.0.0
ROCm SMI 7.5.0
ROCm Validation Suite 1.1.0 ⇒ 1.1.0
Performance ROCm Bandwidth Test 1.4.0
ROCm Compute Profiler 3.1.0 ⇒ 3.2.0
ROCm Systems Profiler 1.0.1 ⇒ 1.0.2
ROCProfiler 2.0.0
ROCprofiler-SDK 0.6.0
ROCTracer 4.1.0
Development HIPIFY 19.0.0
ROCdbgapi 0.77.2
ROCm CMake 0.14.0
ROCm Debugger (ROCgdb) 15.2
ROCr Debug Agent 2.0.4
Compilers HIPCC 1.1.1
llvm-project 19.0.0
Runtimes HIP 6.4.1 ⇒ 6.4.2
ROCr Runtime 1.15.0

Detailed component changes

The following sections describe key changes to ROCm components.

For a historical overview of ROCm component updates, see the {doc}`ROCm consolidated changelog </release/changelog>`.

HIP (6.4.2)

Added

  • Support for the pointer attribute HIP_POINTER_ATTRIBUTE_CONTEXT.

Optimized

  • Improved implementation in hipEventSynchronize, HIP runtime now makes internal callbacks non-blocking to gain performance.

Resolved issues

  • Issue of dependency on libgcc-s1 during rocm-dev install on Debian Buster. HIP runtime removed this Debian package dependency and uses libgcc1 instead for this distros.
  • Building issue for COMGR dynamic load on Fedora and other Distros. HIP runtime now doesn't link against libamd_comgr.so.
  • Failure in the API hipStreamDestroy, when stream type is hipStreamLegacy. The API now returns error code hipErrorInvalidResourceHandle on this condition.
  • Kernel launch errors, such as shared object initialization failed, invalid device function or kernel execution failure. HIP runtime now loads COMGR properly considering the file with its name and mapped image.
  • Memory access fault in some applications. HIP runtime fixed offset accumulation in memory address.

hipBLASLt (0.12.1)

Added

  • Support for gfx1151.

RCCL (2.22.3)

Added

  • Added support for the LL128 protocol on gfx942.

ROCm Compute Profiler (3.2.0)

Added

  • 8-bit floating point (FP8) metrics support for AMD Instinct MI300 GPUs.
  • Additional data types for roofline: FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on the GPU architecture).
  • Data type selection option --roofline-data-type / -R for roofline profiling. The default data type is FP32.

Changed

  • Change dependency from rocm-smi to amd-smi.

Resolved issues

  • Fixed a crash related to Agent ID caused by the new format of the rocprofv3 output CSV file.

ROCm Systems Profiler (1.0.2)

Optimized

  • Improved readability of the OpenMP target offload traces by showing on a single Perfetto track.

Resolved issues

  • Fixed the file path to the script that merges Perfetto files from multi-process MPI runs. The script has also been renamed from merge-multiprocess-output.sh to rocprof-sys-merge-output.sh.

ROCm Validation Suite (1.1.0)

Added

  • NPS2/DPX and NPS4/CPX partition modes support for AMD Instinct MI300X.

rocPRIM (3.4.1)

Upcoming Changes

  • Changes to the template parameters of warp and block algorithms will be made in an upcoming release.
  • Due to an upcoming compiler change, the following symbols related to warp size have been marked as deprecated and will be removed in an upcoming major release:
    • rocprim::device_warp_size(). This has been replaced by rocprim::arch::wavefront::min_size() and rocprim::arch::wavefront::max_size() for compile-time constants. Use these when allocating global or shared memory. For run-time constants, use rocprim::arch::wavefront::size().
    • rocprim::warp_size()
    • ROCPRIM_WAVEFRONT_SIZE

rocSHMEM (2.0.1)

Resolved Issues

  • Resolved incorrect output for rocshmem_ctx_my_pe and rocshmem_ctx_n_pes.
  • Resolved multi-team errors by providing team specific buffers in rocshmem_ctx_wg_team_sync.
  • Resolved missing implementation of rocshmem_g for IPC conduit.

rocSOLVER (3.28.2)

Added

  • Hybrid computation support for existing routines, such as STERF.
  • SVD for general matrices based on Cuppen's Divide and Conquer algorithm:
    • GESDD (with batched and strided_batched versions)

Optimized

  • Reduced the device memory requirements for STEDC, SYEVD/HEEVD, and SYGVD/HEGVD.
  • Improved the performance of STEDC and divide and conquer Eigensolvers.
  • Improved the performance of SYTRD, the initial step of the Eigensolvers that start with the tridiagonalization of the input matrix.

ROCm known issues

ROCm known issues are noted on {fab}github GitHub. For known issues related to individual components, review the Detailed component changes.

ROCm resolved issues

The following are previously known issues resolved in this release. For resolved issues related to individual components, review the Detailed component changes.

AMD SMI CLI: CPER entries not dumped continuously when using follow flag

An issue where CPER entries were not streamed continuously as intended when using the --follow flag with amd-smi ras --cper has been resolved. See issue #4768 on GitHub.

ROCm upcoming changes

The following changes to the ROCm software stack are anticipated for future releases.

ROCm SMI deprecation

ROCm SMI will be phased out in an upcoming ROCm release and will enter maintenance mode. After this transition, only critical bug fixes will be addressed and no further feature development will take place.

It's strongly recommended to transition your projects to AMD SMI, the successor to ROCm SMI. AMD SMI includes all the features of the ROCm SMI and will continue to receive regular updates, new functionality, and ongoing support. For more information on AMD SMI, see the AMD SMI documentation.

ROCTracer, ROCProfiler, rocprof, and rocprofv2 deprecation

Development and support for ROCTracer, ROCProfiler, rocprof, and rocprofv2 are being phased out in favor of ROCprofiler-SDK in upcoming ROCm releases. Starting with ROCm 6.4, only critical defect fixes will be addressed for older versions of the profiling tools and libraries. All users are encouraged to upgrade to the latest version of the ROCprofiler-SDK library and the (rocprofv3) tool to ensure continued support and access to new features. ROCprofiler-SDK is still in beta today and will be production-ready in a future ROCm release.

It's anticipated that ROCTracer, ROCProfiler, rocprof, and rocprofv2 will reach end-of-life by future releases, aligning with Q1 of 2026.

AMDGPU wavefront size compiler macro deprecation

Access to the wavefront size as a compile-time constant via the __AMDGCN_WAVEFRONT_SIZE and __AMDGCN_WAVEFRONT_SIZE__ macros or the constexpr warpSize variable is deprecated and will be disabled in a future release.

  • The __AMDGCN_WAVEFRONT_SIZE__ macro and __AMDGCN_WAVEFRONT_SIZE alias will be removed in an upcoming release. It is recommended to remove any use of this macro. For more information, see AMDGPU support.
  • warpSize will only be available as a non-constexpr variable. Where required, the wavefront size should be queried via the warpSize variable in device code, or via hipGetDeviceProperties in host code. Neither of these will result in a compile-time constant.
  • For cases where compile-time evaluation of the wavefront size cannot be avoided, uses of __AMDGCN_WAVEFRONT_SIZE, __AMDGCN_WAVEFRONT_SIZE__, or warpSize can be replaced with a user-defined macro or constexpr variable with the wavefront size(s) for the target hardware. For example:
   #if defined(__GFX9__)
   #define MY_MACRO_FOR_WAVEFRONT_SIZE 64
   #else
   #define MY_MACRO_FOR_WAVEFRONT_SIZE 32
   #endif

HIPCC Perl scripts deprecation

The HIPCC Perl scripts (hipcc.pl and hipconfig.pl) will be removed in an upcoming release.

Changes to ROCm Object Tooling

ROCm Object Tooling tools roc-obj-ls, roc-obj-extract, and roc-obj are deprecated in ROCm 6.4, and will be removed in a future release. Functionality has been added to the llvm-objdump --offloading tool option to extract all clang-offload-bundles into individual code objects found within the objects or executables passed as input. The llvm-objdump --offloading tool option also supports the --arch-name option, and only extracts code objects found with the specified target architecture. See llvm-objdump for more information.

HIP runtime API changes

There are a number of upcoming changes planned for HIP runtime API in an upcoming major release that are not backward compatible with prior releases. Most of these changes increase alignment between HIP and CUDA APIs or behavior. Some of the upcoming changes are to clean up header files, remove namespace collision, and have a clear separation between hipRTC and HIP runtime.