mirror of https://github.com/ROCm/ROCm.git synced 2026-01-08 22:28:06 -05:00

Files

Pratik Basyal 157d86b780 7.1.0 Release Notes Update (#591 )

* Initial changelog added

* Changelog updated

* 7.1.0 draft changes

* Highlight changes

* Add release highlights

* formatting

* Order updated

* Highlights added

* Highlight update

* Changelog updated

* RCCL change

* RCCL changelog entry added

* Changelog updates added

* heading level fixed

* Updates added

* Leo's and Jeff's review feedback incorporated

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Release notes feedback

* Updated highlights

* Minor changes

* TOC for internal updated

---------

Co-authored-by: Peter Park <peter.park@amd.com>
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

2025-10-24 12:41:23 -04:00

73 KiB

Raw Blame History

ROCm 7.1.0 release notes

The release notes provide a summary of notable changes since the previous ROCm release.

Release highlights
Supported hardware, operating system, and virtualization changes
User space, driver, and firmware dependent changes
ROCm components versioning
Detailed component changes
ROCm known issues
ROCm upcoming changes

If you’re using AMD Radeon GPUs or Ryzen APUs in a workstation setting with a display connected, see the [Use ROCm on Radeon and Ryzen](https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/index.html)
documentation to verify compatibility and system requirements.

Release highlights

The following are notable new features and improvements in ROCm 7.1.0. For changes to individual components, see Detailed component changes.

Supported hardware, operating system, and virtualization changes

Hardware support remains unchanged in this release. For more information about supported AMD hardware, see Supported GPUs (Linux).

Debain 13 support has been extended for AMD Instinct MI355X and MI350X GPUs. For more information about supported operating systems, see Supported operating systems and ROCm installation for Linux.

Virtualization support

ROCm 7.1.0 adds Guest OS support for RHEL 10.0 in KVM SR-IOV for AMD Instinct MI355X and MI350X GPUs.

For more information, see Virtualization Support.

User space, driver, and firmware dependent changes

The software for AMD Datacenter GPU products requires maintaining a hardware and software stack with interdependencies between the GPU and baseboard firmware, AMD GPU drivers, and the ROCm user space software.

ROCm Version	GPU	PLDM Bundle (Firmware)	AMD GPU Driver (amdgpu)	AMD GPU Virtualization Driver (GIM)
ROCm 7.1.0	MI355X	01.25.15.04 01.25.13.09	30.20.0 30.10.2 30.10.1 30.10	8.5.0.K
	MI350X	01.25.15.04 01.25.13.09	30.20.0 30.10.2 30.10.1 30.10
	MI325X	01.25.05.01 01.25.04.02	30.20.0 30.10.2 30.10.1 30.10 6.4.z where z (0-3) 6.3.y where y (1-3)
	MI300X	01.25.05.04 (or later)^[1] 01.25.03.12	30.20.0 30.10.2 30.10.1 30.10 6.4.z where z (0–3) 6.3.y where y (0–3) 6.2.x where x (1–4)	8.5.0.K
	MI300A	BKC 26 BKC 25		Not Applicable
	MI250X	IFWI 47
	MI250	MU3 w/ IFWI 73
	MI210	MU3 w/ IFWI 73		8.5.0.K
	MI100	VBIOS D3430401-037		Not Applicable

[1]: PLDM bundle 01.25.05.04 will be available by November 2025.

HIP runtime compatibility improvements

In ROCm 7.1.0, new functionalities were added in HIP runtime including the following, in correspondence with CUDA.

New HIP APIs added for:
- Memory management: hipMemsetD2D8, hipMemsetD2D8Async, hipMemsetD2D16, hipMemsetD2D16Async, hipMemsetD2D32, hipMemsetD2D32Async, hipMemcpyBatchAsync, hipMemcpy3DBatchAsync, hipMemcpy3DPeer, hipMemcpy3DPeerAsync, hipMemPrefetchAsync_v2, and hipMemAdvise_v2.
- Module Management:hipModuleGetFunctionCoun and hipModuleLoadFatBinary
- Stream Management: hipStreamSetAttribute, hipStreamGetAttribute, and hipStreamGetId
- Device Management: hipSetValidDevices
- Driver Entry Point Access: hipGetDriverEntryPoint
New HIP flag hipMemLocationTypeHost enables handling virtual memory management in host memory location, in addition to device memory.
hipHostRegisterIoMemory is supported in hipHostRegister. Previously, it was used to register I/O memory with HIP runtime so it can be accessed by the GPU.
HIP runtime now supports nested tile partitioning within cooperative groups, matching CUDA functionality.

For detailed enhancements and updates refer to the HIP Changelog.

hipSPARSELt: SpMM performance improvements

hipSPARSELt introduces significant performance enhancements for structured sparsity matrix multiplication (SpMM) on AMD Instinct MI300X GPUs:

New feature support -- Enabled multiple buffer single kernel execution for SpMM, improving efficiency in Split-K method scenarios.
Kernel optimization -- Added multiple high-performance kernels optimized for FP16 and BF16 data types, enhancing heuristic-based execution.
Tuning efficiency -- Improved the tuning process for SpMM kernels, resulting in better runtime adaptability and performance.

RPP: New hue and saturation augmentations

RPP adds support for hue and saturation augmentations in the ROCm Performance Primitives (RPP) library. These enhancements are available for both HIP and HOST backends and support multiple data types — U8, F16, F32, and I8 — with layout toggle variants for NCHW and NHWC.

rocAL: Enhancements for vision transformer model training

ROCm 7.1.0 introduces new capabilities in rocAL to support training of Vision Transformer (ViT) models:

Added support for CropResize augmentation and the CIFAR10 dataloader, commonly used in ViT training workflows.
These updates enable seamless integration of rocAL into open-source PyTorch Vision Transformer models.

This enhancement improves preprocessing efficiency and simplifies the setup of data pipelines for ViT-based deep learning applications.

hipBLASLt: Kernel optimizations and model support enhancements

hipBLASlt introduces several performance and model compatibility improvements for AMD Instinct MI350 Series GPUs:

TF32 kernel optimization for MI355X to enhance training and inference efficiency.
FP32 kernel optimization for MI350X, improving precision-based workloads.
Meta Model Optimization for MI350X, enabling better performance across transformer-based models.
Llama 2 70B model support fix: Removed incorrect kernel to ensure accurate and stable execution.
For AMD Instinct MI350X GPUs specific, added multiple high-performance kernels optimized for FP16 and BF16 data types, enhancing heuristic-based execution.

TensileLite: Enhanced SpMM kernel tuning efficiency

Optimized the tuning workflow for the SpMM kernel, resulting in improved performance and streamlined configuration.

RCCL: AMD Instinct MI350 Series enhancements

Optimized performance for select collective operations.
Enhanced single-node performance on AMD Instinct MI350 GPUs.
Achieved higher throughput with increased XGMI speed.
Verified compatibility with NCCL 2.27.7.
Improved efficiency for the All Gather collective.

ROCm Compute Profiler updates

ROCm Compute Profiler has the following enhancements:

Single‑Pass Counter Collection feature has been added. It allows profiling kernels in a single pass using a predefined metric set, reducing profiling overhead and session time.
Dynamic Attach/Detach feature has been added. It allows starting or stopping profiling on a running application without restarting, enabling flexible analysis for long‑running jobs.
Enhanced TUI Experience feature has been added. It allows for interactive exploration of metrics with descriptions and view high‑level compute and memory throughput panels for quick insights.

ROCm Systems Profiler updates

ROCm Systems Profiler has the following enhancements:

Validated support for virtualized Hyper-V environment, JAX AI framework, and PyTorch AI framework.
Transitioned to using AMD SMI by default, instead of ROCm SMI.
Integrated with ROCm Profiling Data (rocpd), enabling profiling results to be stored in a SQLite3 database. This provides a structured and efficient foundation for in-depth analysis and post-processing.

Device-side assertion support and atomic metadata control in Clang

ROCm 7.1.0 introduces two key compiler enhancements:

Device-compatible assertions: A __device__ version of std::__glibcxx_assert_fail() has been added to enable the use of std::array and other libstdc++ features in device code. This resolves previous compilation failures caused by non-constexpr host assertions being invoked from device contexts.
Clang atomic metadata attribute: The new [[clang::atomic]] statement attribute allows fine-grained control over how atomic operations are lowered in LLVM IR. Users can specify memory types (for example, remote_memory, fine_grained_memory) and floating-point behavior (ignore_denormal_mode) to optimize performance without compromising correctness. These attributes can override global compiler flags on a per-block basis, improving atomic operation efficiency on architectures like AMDGPU.

Deep learning and AI framework updates

ROCm provides a comprehensive ecosystem for deep learning development. For more information, see Deep learning frameworks for ROCm and the Compatibility matrix for the complete list of Deep learning and AI framework versions tested for compatibility with ROCm.

Updated framework support

ROCm 7.1.0 introduces several newly supported versions of Deep learning and AI frameworks:

TensorFlow

ROCm 7.1.0 enables support for TensorFlow 2.20.0.

ROCm Offline Installer Creator updates

The ROCm Offline Installer Creator 7.1.0 includes the following features and improvements:

Added support for creating an offline installer for RHEL 8.10, 9.4, 9.6, and 10.0 where the kernel version of the target OS differs from the host OS creating the installer.
Fixes an issue in the Debian 13 Docker that prevented users from creating a driver install package using the default Docker kernel driver.

See ROCm Offline Installer Creator for more information.

ROCm Runfile Installer updates

The ROCm Runfile Installer 7.1.0 fixes warnings that occurred with rocm-examples testing.

For more information, see ROCm Runfile Installer.

End of Support for ROCm Execution Provider (ROCm-EP)

ROCm 7.1.0 marks the End of Support (EOS) for ROCm Execution Provider (ROCm-EP). ROCm 7.0.2 was the last official AMD-supported distribution of ROCm-EP. Refer to this Pull Request for more information. Migrate your applications to use the MIGraphX Execution Provider.

ROCm documentation updates

ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.

Tutorials for AI developers have been expanded with the following two new tutorials:
- Pretraining tutorial: Speculative decoding draft model with SpecForge
- GPU development and optimization tutorial: Quark MXFP4 quantization for vLLM
For more information about the changes, see Changelog for the AI Developer Hub.
ROCm components support a wide range of environment variables that can be used for testing, logging, debugging, experimental features, and more. The rocBLAS and RCCL components have been updated with new environment variable content.
The HIP documentation introduces a new tutorial that shows you how to transform your GPU applications from repeated direction to choreographed performance with HIP graphs. HIP graphs model dependencies between operations as nodes and edges on a diagram. Each node in the graph represents an operation, and each edge represents a dependency between two nodes. For more information, see HIP graphs and HIP Graph API Tutorial.

ROCm components

The following table lists the versions of ROCm components for ROCm 7.1.0, including any version changes from 7.0.2 to 7.1.0. Click the component's updated version to go to a list of its changes.

Click {fab}github to go to the component's source code on GitHub.

Category	Group	Name	Version
Libraries	Machine learning and computer vision	Composable Kernel	1.1.0 ⇒ 1.1.0
		MIGraphX	2.13.0 ⇒ 2.14.0
		MIOpen	3.5.0 ⇒ 3.5.1
		MIVisionX	3.3.0 ⇒ 3.4.0
		rocAL	2.3.0 ⇒ 2.4.0
		rocDecode	1.0.0 ⇒ 1.4.0
		rocJPEG	1.1.0 ⇒ 1.2.0
		rocPyDecode	0.6.0 ⇒ 0.7.0
		RPP	2.0.0 ⇒ 2.1.0
	Communication	RCCL	2.26.6 ⇒ 2.27.7
	Communication	rocSHMEM	3.0.0
	Math	hipBLAS	3.0.2 ⇒ 3.1.0
		hipBLASLt	1.0.0 ⇒ 1.1.0
		hipFFT	1.0.20 ⇒ 1.0.21
		hipfort	0.7.0 ⇒ 0.7.1
		hipRAND	3.0.0 ⇒ 3.1.0
		hipSOLVER	3.0.0 ⇒ 3.1.0
		hipSPARSE	4.0.1 ⇒ 4.1.0
		hipSPARSELt	0.2.4 ⇒ 0.2.5
		rocALUTION	4.0.0 ⇒ 4.0.1
		rocBLAS	5.0.2 ⇒ 5.1.0
		rocFFT	1.0.34 ⇒ 1.0.35
		rocRAND	4.0.0 ⇒ 4.1.0
		rocSOLVER	3.30.1 ⇒ 3.31.0
		rocSPARSE	4.0.3 ⇒ 4.1.0
		rocWMMA	2.0.0
		Tensile	4.44.0
	Primitives	hipCUB	4.0.0 ⇒ 4.1.0
		hipTensor	2.0.0
		rocPRIM	4.0.1 ⇒ 4.1.0
		rocThrust	4.0.0 ⇒ 4.1.0
Tools	System management	AMD SMI	26.0.2 ⇒ 26.1.0
		ROCm Data Center Tool	1.1.0 ⇒ 1.2.0
		rocminfo	1.0.0
		ROCm SMI	7.8.0
		ROCm Validation Suite	1.2.0
			Performance	ROCm Bandwidth Test	2.6.0
				ROCm Compute Profiler	3.2.3 ⇒ 3.3.0
ROCm Systems Profiler	1.1.1 ⇒ 1.2.0
ROCProfiler	2.0.0 ⇒ 2.0.0
ROCprofiler-SDK	1.0.0
ROCTracer	4.1.0
	Development	HIPIFY	20.0.0
		ROCdbgapi	0.77.4
		ROCm CMake	0.14.0
		ROCm Debugger (ROCgdb)	16.3
		ROCr Debug Agent	2.1.0
Compilers		HIPCC	1.1.1
Compilers		llvm-project	20.0.0
Runtimes		HIP	7.0.2 ⇒ 7.1.0
Runtimes		ROCr Runtime	1.18.0

Detailed component changes

The following sections describe key changes to ROCm components.

For a historical overview of ROCm component updates, see the {doc}`ROCm consolidated changelog </release/changelog>`.

AMD SMI (26.1.0)

Added

GPU LINK PORT STATUS table to amd-smi xgmi command. The amd-smi xgmi -s or amd-smi xgmi --source-status will now show the GPU LINK PORT STATUS table.
amdsmi_get_gpu_revision() to Python API. This function retrieves the GPU revision ID. Available in amdsmi_interface.py as amdsmi_get_gpu_revision().
Gpuboard and baseboard temperatures to amd-smi metric command.

Changed

Struct amdsmi_topology_nearest_t member processor_list. Member size changed, processor_list[AMDSMI_MAX_DEVICES * AMDSMI_MAX_NUM_XCP].
amd-smi reset --profile behavior so that it won't also reset the performance level.
- The performance level can still be reset using amd-smi reset --perf-determinism.
Setting power cap is now available in Linux Guest. You can now use amd-smi set --power-cap as usual in Linux Guest systems too.
Changed amd-smi static --vbios to amd-smi static --ifwi.
- VBIOS naming is replaced with IFWI (Integrated Firmware Image) for improved clarity and consistency.
- AMD Instinct MI300 Series GPUs (and later) now use a new version format with enhanced build information.
- Legacy command amd-smi static --vbios remains functional for backward compatibility, but displays updated IFWI heading.
- The Python, C, and Rust API for amdsmi_get_gpu_vbios_version() will now have a new field called boot_firmware, which will return the legacy vbios version number that is also known as the Unified BootLoader (UBL) version.

Optimized

Optimized the way amd-smi process validates, which processes are running on a GPU.

Resolved Issues

Fixed a CPER record count mismatch issue when using the amd-smi ras --cper --file-limit. Updated the deletion calculation to use files_to_delete = len(folder_files) - file_limit for exact file count management.
Fixed the event monitoring segfaults causing RDC to crash. Added the mutex locking around access to device event notification file pointer.
Fixed an issue where using amd-smi ras --folder <folder_name> was forcing the created folder's name to be lowercase. This fix also makes all string input options case-insensitive.
Fixed certain output in amd-smi monitor when GPUs are partitioned. It fixes the issue with amd-smi monitor such as: amd-smi monitor -Vqt, amd-smi monitor -g 0 -Vqt -w 1, and amd-smi monitor -Vqt --file /tmp/test1. These commands will now be able to display as normal in partitioned GPU scenarios.

See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/rocm-rel-7.1/CHANGELOG.md) for details, examples, and in-depth descriptions.

Composable Kernel (1.1.0)

Added

Support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv/bwd).
Support for elementwise kernel.

Upcoming changes

Non-grouped convolutions are deprecated. Their functionality is supported by grouped convolution.

HIP (7.1.0)

Added

New HIP APIs
- hipModuleGetFunctionCount returns the number of functions within a module
- hipMemsetD2D8 used for setting 2D memory range with specified 8-bit values
- hipMemsetD2D8Async used for setting 2D memory range with specified 8-bit values asynchronously
- hipMemsetD2D16 used for setting 2D memory range with specified 16-bit values
- hipMemsetD2D16Async used for setting 2D memory range with specified 16-bit values asynchronously
- hipMemsetD2D32 used for setting 2D memory range with specified 32-bit values
- hipMemsetD2D32Async used for setting 2D memory range with specified 32-bit values asynchronously
- hipStreamSetAttribute sets attributes such as synchronization policy for a given stream
- hipStreamGetAttribute returns attributes such as priority for a given stream
- hipModuleLoadFatBinary loads fatbin binary to a module
- hipMemcpyBatchAsync performs a batch of 1D or 2D memory copied asynchronously
- hipMemcpy3DBatchAsync performs a batch of 3D memory copied asynchronously
- hipMemcpy3DPeer copies memory between devices
- hipMemcpy3DPeerAsynccopied memory between devices asynchronously
- hipMemsetD2D32Async used for setting 2D memory range with specified 32-bit values asynchronously
- hipMemPrefetchAsync_v2 prefetches memory to the specified location
- hipMemAdvise_v2 advise about the usage of a given memory range
- hipGetDriverEntryPoint gets function pointer of a HIP API.
- hipSetValidDevices sets a default list of devices that can be used by HIP
- hipStreamGetId queries the ID of a stream
New HIP flags
- hipMemLocationTypeHost enables handling virtual memory management in host memory location, in addition to device memory.
- hipHostRegisterIoMemory is supported in hipHostRegister. It's used to register I/O memory with HIP runtime so the GPU can access it.
Support for nested tile partitioning within cooperative groups, matching NVIDIA CUDA functionality.

Resolved issues

A segmentation fault occurred in the application when capturing the same HIP graph from multiple streams with cross-stream dependencies. HIP runtime fixed an issue where a forked stream joined to a parent stream that was not originally created with the API hipStreamBeginCapture.
Different behavior of en-queuing command on a legacy stream during stream capture on AMD ROCM platform, compared with NVIDIA CUDA. HIP runtime now returns an error in this specific situation to match CUDA behavior.
Failure of memory access fault occurred in rocm-examples test suite. When Heterogeneous Memory Management (HMM) is not supported in the driver, hipMallocManaged will only allocate system memory in HIP runtime.

Optimized

Improved hip module loading latency.
Optimized kernel metadata retrieval during module post load.
Optimized doorbell ring in HIP runtime, advantages the following for performance improvement,
- Makes efficient packet batching for HIP graph launch
- Dynamic packet copying based on a defined maximum threshold or power-of-2 staggered copy pattern
- If timestamps are not collected for a signal for reuse, it creates a new signal. This can potentially increase the signal footprint if the handler doesn't run fast enough

hipBLAS (3.1.0)

Added

--clients-only build option to only build clients against a prebuilt library.
gfx1103, gfx1150, gfx1151, gfx1200, and gfx1201 support to clients.
FORTRAN enabled for the Microsoft Windows build and tests.
Additional reference library fallback options added.

Changed

Improve the build time for clients by removing clients_common.cpp from the hipblas-test build.

hipBLASLt (1.1.0)

Added

Fused Clamp GEMM for HIPBLASLT_EPILOGUE_CLAMP_EXT and HIPBLASLT_EPILOGUE_CLAMP_BIAS_EXT. This feature requires the minimum (HIPBLASLT_MATMUL_DESC_EPILOGUE_ACT_ARG0_EXT) and maximum (HIPBLASLT_MATMUL_DESC_EPILOGUE_ACT_ARG1_EXT) to be set.
Support for ReLU/Clamp activation functions with auxiliary output for the f16 and bf16 data types for gfx942 to capture intermediate results. This feature is enabled for HIPBLASLT_EPILOGUE_RELU_AUX, HIPBLASLT_EPILOGUE_RELU_AUX_BIAS, HIPBLASLT_EPILOGUE_CLAMP_AUX_EXT, and HIPBLASLT_EPILOGUE_CLAMP_AUX_BIAS_EXT.
Support for HIPBLAS_COMPUTE_32F_FAST_16BF for FP32 data type for gfx950 only.
CPP extension APIs setMaxWorkspaceBytes and getMaxWorkspaceBytes.
Feature to print logs (using HIPBLASLT_LOG_MASK=32) for Grouped GEMM.
Support for swizzleA by using the hipblaslt-ext cpp API.
Support for hipBLASLt extop for gfx11xx and gfx12xx.

Changed

hipblasLtMatmul() now returns an error when the workspace size is insufficient, rather than causing a segmentation fault.

Resolved issues

Fixed incorrect results when using ldd and ldc dimension parameters with some solutions.

hipCUB (4.1.0)

Added

Exposed Thread-level reduction API hipcub::ThreadReduce.
::hipcub::extents, with limited parity to C++23's std::extents. Only static extents is supported; dynamic extents is not. Helper structs have been created to perform computations on ::hipcub::extents only when the backend is rocPRIM. For the CUDA backend, similar functionality exists.
projects/hipcub/hipcub/include/hipcub/backend/rocprim/util_mdspan.hpp to support ::hipcub::extents.
::hipcub::ForEachInExtents API.
hipcub::DeviceTransform::Transform and hipcub::DeviceTransform::TransformStableArgumentAddresses.
hipCUB and its dependency rocPRIM have been moved into the new rocm-libraries monorepo repository. This repository contains a number of ROCm libraries that are frequently used together.
- The repository migration requires a few changes to the way that hipCUB fetches library dependencies.
- CMake build option ROCPRIM_FETCH_METHOD may be set to one of the following:
  - PACKAGE - (default) searches for a preinstalled packaged version of the dependency. If it is not found, the build will fall back using option DOWNLOAD, below.
  - DOWNLOAD - downloads the dependency from the rocm-libraries repository. If git >= 2.25 is present, this option uses a sparse checkout that avoids downloading more than it needs to. If not, the whole monorepo is downloaded (this may take some time).
  - MONOREPO - this option is intended to be used if you are building hipCUB from within a copy of the rocm-libraries repository that you have cloned (and therefore already contains rocPRIM). When selected, the build will try find the dependency in the local repository tree. If it cannot be found, the build will attempt to use git to perform a sparse-checkout of rocPRIM. If that also fails, it will fall back to using the DOWNLOAD option described above.
A new CMake option -DUSE_SYSTEM_LIB to allow tests to be built from installed hipCUB provided by the system.

Removed

Removed TexRefInputIterator, which was removed from CUB after CCCL's 2.6.0 release. This API should have already been removed, but somehow it remained and was not tested.
Deprecated hipcub::ConstantInputIterator, use rocprim::constant_iterator or rocthrust::constant_iterator instead.
Deprecated hipcub::CountingInputIterator, use rocprim::counting_iterator or rocthrust::counting_iterator instead.
Deprecated hipcub::DiscardOutputIterator, use rocprim::discard_iterator or rocthrust::discard_iterator instead.
Deprecated hipcub::TransformInputIterator, use rocprim::transform_iterator or rocthrust::transform_iterator instead.
Deprecated hipcub::AliasTemporaries, which is considered to be an internal API. Moved to the detail namespace.
Deprecated almost all functions in projects/hipcub/hipcub/include/hipcub/backend/rocprim/util_ptx.hpp.
Deprecated hipCUB macros: HIPCUB_MAX, HIPCUB_MIN, HIPCUB_QUOTIENT_FLOOR, HIPCUB_QUOTIENT_CEILING, HIPCUB_ROUND_UP_NEAREST and HIPCUB_ROUND_DOWN_NEAREST.

Changed

Changed include headers to avoid relative includes that have slipped in.
Changed CUDA_STANDARD for tests in test/hipcub, due to C++17 APIs such as std::exclusive_scan is used in some tests. Still use CUDA_STANDARD 14 for test/extra.
Changed CCCL_MINIMUM_VERSION to 2.8.2 to align with CUB.
Changed cmake_minimum_required from 3.16 to 3.18, in order to support CUDA_STANDARD 17 as a valid value.
Add support for large num_items DeviceScan, DevicePartition and Reduce::{ArgMin, ArgMax}.
Added tests for large num_items.
The previous dependency-related build option DEPENDENCIES_FORCE_DOWNLOAD has been renamed EXTERNAL_DEPS_FORCE_DOWNLOAD to differentiate it from the new rocPRIM dependency option described above. Its behavior remains the same - it forces non-ROCm dependencies (Google Benchmark and Google Test) to be downloaded rather than searching for installed packages. This option defaults to OFF.

Known Issues

The __half template specializations of Simd operators are currently disabled due to possible build issues with PyTorch.

hipFFT (1.0.21)

Added

Improved test coverage of multi-stream plans, user-specified work areas, and default stride calculation.
Experimental introduction of hipFFTW library, interfacing rocFFT on AMD platforms using the same symbols as FFTW3 (with partial support).

hipfort (0.7.1)

Added

Support for building with CMake 4.0.

Resolved Issues

Fixed a potential integer overflow issue in hipMalloc interfaces.

hipRAND (3.1.0)

Resolved Issues

Updated error handling for several hipRAND unit tests to accomodate the new hipGetLastError behaviour that was introduced in ROCm 7.0.0. As of ROCm 7.0.0, the internal error state is cleared on each call to hipGetLastError rather than on every HIP API call.

hipSOLVER (3.1.0)

Added

Extended test suites for hipsolverDn compatibility functions.

Changed

Changed code coverage to use llvm-cov instead of gcov.

hipSPARSE (4.1.0)

Added

Brain half float mixed precision to hipsparseAxpby where X and Y use bfloat16 and result and the compute type use float.
Brain half float mixed precision to hipsparseSpVV where X and Y use bfloat16 and result and the compute type use float.
Brain half float mixed precision to hipsparseSpMV where A and X use bfloat16 and Y and the compute type use float.
Brain half float mixed precision to hipsparseSpMM where A and B use bfloat16 and C and the compute type use float.
Brain half float mixed precision to hipsparseSDDMM where A and B use bfloat16 and C and the compute type use float.
Brain half float mixed precision to hipsparseSDDMM where A and B and C use bfloat16 and the compute type use float.
Half float mixed precision to hipsparseSDDMM where A and B and C use float16 and the compute type use float.
Brain half float uniform precision to hipsparseScatter and hipsparseGather routines.
Documentation for installing and building hipSPARSE on Microsoft Windows.

hipSPARSELt (0.2.5)

Changed

Changed the behavior of the Relu activation.

Optimized

Provided more kernels for the FP16 and BF16 data types.

MIOpen (3.5.1)

Added

Added a new trust verify find mode.
Ported Op4dTensorLite kernel from OpenCL to HIP.
Implemented a generic HIP kernel for backward layer normalization.

Changed

Kernel DBs moved from Git LFS to DVC (Data Version Control).

Optimized

[Conv] Enabled Composable Kernel (CK) implicit gemms on gfx950.

Resolved Issues

[BatchNorm] Fixed a bug for the NHWC layout when a variant was not applicable.
Fixed a bug that caused a zero-size LDS array to be defined on Navi.

MIVisionX (3.4.0)

Added

VX_RPP - Update blur
HIP - HIP_CHECK for hipLaunchKernelGGL for gated launch

Changed

AMD Custom V1.1.0 - OpenMP updates
HALF - Fix half.hpp path updates

Resolved Issues

AMD Custom - dependency linking errors resolved
VX_RPP - Fix memory leak
Packaging - Remove Meta Package dependency for HIP

Known Issues

Installation on CentOS/RedHat/SLES requires the manual installation of the FFMPEG & OpenCV dev packages.

Upcoming Changes

VX_AMD_MEDIA - rocDecode support for hardware decode

RCCL (2.27.7)

Added

Added RCCL_P2P_BATCH_THRESHOLD to set the message size limit for batching P2P operations. This mainly affects small message performance for alltoall at a large scale but also applies to alltoallv.
Added RCCL_P2P_BATCH_ENABLE to enable batching P2P operations to receive performance gains for smaller messages up to 4MB for alltoall when the workload requires it. This is to avoid performance dips for larger messages.

Changed

The MSCCL++ feature is now disabled by default. The --disable-mscclpp build flag is replaced with --enable-mscclpp in the rccl/install.sh script.
Compatibility with NCCL 2.27.7.

Optimized

Improved small message performance for alltoall by enabling and optimizing batched P2P operations.

Resolved Issues

Improve small message performance for alltoall by enabling and optimizing batched P2P operations.

Known Issues

Symmetric memory kernels are currently disabled due to ongoing CUMEM enablement work.
When running this version of RCCL using ROCm versions earlier than 6.4.0, the user must set the environment flag HSA_NO_SCRATCH_RECLAIM=1.

rocALUTION (4.0.1)

Added

Added support for gfx950.

Changed

Updated the default build standard to C++17 when compiling rocALUTION from source (previously C++14).

Optimized

Improved and expanded user documentation.

Resolved Issues

Fixed a bug in the GPU hashing algorithm that occurred when not compiling with -O2/-O3.
Fixed an issue with the SPAI preconditioner when using complex numbers.

rocBLAS (5.1.0)

Added

Sample for clients using OpenMP threads calling rocBLAS functions.
gfx1103, gfx1150, and gfx1151 enabled.

Changed

By default, the Tensile build is no longer based on tensile_tag.txt but uses the same commit from shared/tensile in the rocm-libraries repository. The rmake or install -t option can build from another local path with a different commit.

Optimized

Improved the performance of Level 2 gemv transposed (TransA != N) for the problem sizes where m is small and n is large on gfx90a and gfx942.

rocDecode (1.4.0)

Added

AV1 12-bit decode support on VA-API version 1.23.0 and later.
rocdecode-host V1.0.0 library for software decode
FFmpeg version support for 5.1 and 6.1
Find package - rocdecode-host

Resolved Issues

rocdecode-host - failure to build debuginfo packages without FFmpeg resolved.
Fix a memory leak for rocDecodeNegativeTests

Changed

HIP meta package changed - Use hip-dev/devel to bring required hip dev deps
rocdecode host - linking updates to rocdecode-host library

rocFFT (1.0.35)

Optimized

Implemented single-kernel plans for some 2D problem sizes, on devices with at least 160KiB of LDS.
Improved performance of unit-strided, complex-interleaved, forward/inverse FFTs for lengths: (64,64,128), (64,64,52), (60,60,60) , (32,32,128), (32,32,64), (64,32,128)
Improved performance of 3D MPI pencil decompositions by using sub-communicators for global transpose operations.

ROCm Compute Profiler (3.3.0)

Added

Live attach/detach feature that allows coupling with a workload process, without controlling its start or end.
- Use '--attach-pid' to specify the target process ID.
- Use '--attach-duration-msec' to specify time duration.
rocpd choice for --format-rocprof-output option in profile mode.
--retain-rocpd-output option in profile mode to save large raw rocpd databases in workload directory.
Feature to show description of metrics during analysis
- Use --include-cols Description to show the Description column, which is excluded by default from the ROCm Compute Profiler CLI output.
--set filtering option in profile mode to enable single-pass counter collection for predefined subsets of metrics.
--list-sets filtering option in profile mode to list the sets available for single pass counter collection.
Missing counters based on register specification which enables missing metrics.
- Enable SQC_DCACHE_INFLIGHT_LEVEL counter and associated metrics.
- Enable TCP_TCP_LATENCY counter and associated counter for all GPUs except MI300.
Added interactive metric descriptions in TUI analyze mode.
- You can now left click on any metric cell to view detailed descriptions in the dedicated METRIC DESCRIPTION tab.
Support for analysis report output as a sqlite database using --output-format db analysis mode option.
Compute Throughput panel to TUI's High Level Analysis category with the following metrics:
- VALU FLOPs
- VALU IOPs
- MFMA FLOPs (F8)
- MFMA FLOPs (BF16)
- MFMA FLOPs (F16)
- MFMA FLOPs (F32)
- MFMA FLOPs (F64)
- MFMA FLOPs (F6F4) (in gfx950)
- MFMA IOPs (Int8)
- SALU Utilization
- VALU Utilization
- MFMA Utilization
- VMEM Utilization
- Branch Utilization
- IPC
Memory Throughput panel to TUI's High Level Analysis category with the following metrics:
- vL1D Cache BW
- vL1D Cache Utilization
- Theoretical LDS Bandwidth
- LDS Utilization
- L2 Cache BW
- L2 Cache Utilization
- L2-Fabric Read BW
- L2-Fabric Write BW
- sL1D Cache BW
- L1I BW
- Address Processing Unit Busy
- Data-Return Busy
- L1I-L2 Bandwidth
- sL1D-L2 BW
Roofline support for Debian 12 and Azure Linux 3.0.

Changed

On memory chart, long string of numbers are displayed as scientific notation. It also solves the issue of overflow of displaying long number
Add notice for change in default output format to rocpd in a future release
- This is displayed when --format-rocprof-output rocpd is not used in profile mode
When --format-rocprof-output rocpd is used, only pmc_perf.csv will be written to workload directory instead of mulitple csv files.
Improve analysis block based filtering to accept metric ID level filtering
- This can be used to collect individual metrics from various sections of analysis config
CLI analysis mode baseline comparison will now only compare common metrics across workloads and will not show Metric ID
- Remove metrics from analysis configuration files which are explicitly marked as empty or None
Changed the basic (default) view of TUI from aggregated analysis data to individual kernel analysis data.
Update Unit of the following Bandwidth related metrics to Gbps instead of Bytes per Normalization Unit
- Theoretical Bandwidth (section 1202)
- L1I-L2 Bandwidth (section 1303)
- sL1D-L2 BW (section 1403)
- Cache BW (section 1603)
- L1-L2 BW (section 1603)
- Read BW (section 1702)
- Write and Atomic BW (section 1702)
- Bandwidth (section 1703)
- Atomic/Read/Write Bandwidth (section 1703)
- Atomic/Read/Write Bandwidth - (HBM/PCIe/Infinity Fabric) (section 1706)
Add Utilization to metric name for the following Bandwidth related metrics whose Unit is Percent
- Theoretical Bandwidth Utilization (section 1201)
- L1I-L2 Bandwidth Utilization (section 1301)
- Bandwidth Utilization (section 1301)
- Bandwidth Utilization (section 1401)
- sL1D-L2 BW Utilization (section 1401)
- Bandwidth Utilization (section 1601)
Update System Speed-of-Light panel to GPU Speed-of-Light in TUI with the following metrics:
- Theoretical LDS Bandwidth
- vL1D Cache BW
- L2 Cache BW
- L2-Fabric Read BW
- L2-Fabric Write BW
- Kernel Time
- Kernel Time (Cycles)
- SIMD Utilization
- Clock Rate
Analysis output:
- Replace -o / --output analyze mode option with --output-format and --output-name
  - Add --output-format analysis mode option to select the output format of the analysis report.
  - Add --output-name analysis mode option to override the default file/folder name.
- Replace --save-dfs analyze mode option with --output-format csv
Command-line options:
- --list-metrics and --config-dir options moved to general command-line options.
- - --list-metrics option cannot be used without argument (GPU architecture).
- --list-metrics option do not show number of L2 channels.
- --list-available-metrics profile mode option to display the metrics available for profiling in current GPU.
- --list-available-metrics analyze mode option to display the metrics available for analysis.
- --block option cannot be used with --list-metrics and --list-available-metricsoptions.
Default rocprof interface changed from rocprofv3 to rocprofiler-sdk
- Use ROCPROF=rocprofv3 to use rocprofv3 interface

Removed

Usage of rocm-smi in favor of amd-smi.
Hardware IP block-based filtering has been removed in favor of analysis report block-based filtering.
Removed aggregated analysis view from TUI analyze mode.

Optimized

Improved --time-unit option in analyze mode to apply time unit conversion across all analysis sections, not just kernel top stats.
Improved logic to obtain rocprof supported counters which prevents unnecessary warnings.
Improved post-analysis runtime performance by caching and multi-processing.

Resolved issues

Fixed an issue of not detecting the memory clock when using amd-smi.
Fixed standalone GUI crashing.
Fixed L2 read/write/atomic bandwidths on AMD Instinct MI350 series accelerators.
Update metric names for better alignment between analysis configuration and documentation
Fixed an issue where accumulation counters could not be collected on AMD Instinct MI100.
Fixed an issue of kernel filtering not working in the roofline chart

Known issues

MI300A/X L2-Fabric 64B read counter may display negative values - The rocprof-compute metric 17.6.1 (Read 64B) can report negative values due to incorrect calculation when TCC_BUBBLE_sum + TCC_EA0_RDREQ_32B_sum exceeds TCC_EA0_RDREQ_sum.
- A workaround has been implemented using max(0, calculated_value) to prevent negative display values while the root cause is under investigation.

ROCm Systems Profiler (1.2.0)

Added

ROCPROFSYS_ROCM_GROUP_BY_QUEUE configuration setting to allow grouping of events by hardware queue, instead of the default grouping.
Support for rocpd database output with the ROCPROFSYS_USE_ROCPD configuration setting.
Support for profiling PyTorch workloads using the rocpd output database.
Support for tracing OpenMP API in Fortran applications.
An error warning that is triggered if the profiler application fails due to SELinux enforcement being enabled. The warning includes steps to disable SELinux enforcement.

Changed

Updated the grouping of "kernel dispatch" and "memory copy" events in Perfetto traces. They are now grouped together by HIP Stream rather than separately and by hardware queue.
Updated PAPI module to v7.2.0b2.
ROCprofiler-SDK is now used for tracing OMPT API calls.

rocPRIM (4.1.0)

Added

get_sreg_lanemask_lt, get_sreg_lanemask_le, get_sreg_lanemask_gt and get_sreg_lanemask_ge.
rocprim::transform_output_iterator and rocprim::make_transform_output_iterator.
Experimental support for SPIR-V, to use the correct tuned config for part of the appliable algorithms.
A new cmake option, BUILD_OFFLOAD_COMPRESS. When rocPRIM is build with this option enabled, the --offload-compress switch is passed to the compiler. This causes the compiler to compress the binary that it generates. Compression can be useful in cases where you are compiling for a large number of targets, since this often results in a large binary. Without compression, in some cases, the generated binary may become so large symbols are placed out of range, resulting in linking errors. The new BUILD_OFFLOAD_COMPRESS option is set to ON by default.
A new CMake option -DUSE_SYSTEM_LIB to allow tests to be built from ROCm libraries provided by the system.
rocprim::apply which applies a function to a rocprim::tuple.

Changed

Changed tests to support ptr-to-const output in /test/rocprim/test_device_batch_memcpy.cpp.

Optimizations

Improved performance of many algorithms by updating their tuned configs.
- 891 specializations have been improved.
- 399 specializations have been added.

Upcoming Changes

Deprecated the -> operator for the zip_iterator.

Resolved Issues

Fixed device_select, device_merge, and device_merge_sort not allocating the correct amount of virtual shared memory on the host.
Fixed the -> operator for the transform_iterator, the texture_cache_iterator, and the arg_index_iterator, by now returning a proxy pointer.
- The arg_index_iterator also now only returns the internal iterator for the ->.

ROCProfiler (2.0.0)

Removed

rocprofv2 doesn't support gfx12. For gfx12, use rocprofv3 tool.

rocPyDecode (0.7.0)

Added

rocPyJpegPerfSample - samples for JPEG decode

Changed

Package - rocjpeg set as required dependency.
rocDecode host - rocdecode host linking updates

Resolved issues

rocJPEG Bindings - bugfixes
Test package - find dependencies updated

rocRAND (4.1.0)

Resolved Issues

Updated error handling for several rocRAND unit tests to accommodate the new hipGetLastError behavior that was introduced in ROCm 7.0. As of ROCm 7.0, the internal error state is cleared on each call to hipGetLastError rather than on every HIP API call.

Changed

Changed the USE_DEVICE_DISPATCH flag so it can turn device dispatch off by setting it to zero. Device dispatch should be turned off when building for SPIRV.

rocSOLVER (3.30.0)

Added

Hybrid computation support for existing routines: STEQR

Optimized

Improved the performance of:

BDSQR and downstream functions such as GESVD.
STEQR and downstream functions such as SYEV/HEEV.
LARFT and downstream functions such as GEQR2 and GEQRF.

rocSPARSE (4.1.0)

Added

Brain half float mixed precision for the following routines:
- rocsparse_axpby where X and Y use bfloat16 and result and the compute type use float.
- rocsparse_spvv where X and Y use bfloat16 and result and the compute type use float.
- rocsparse_spmv where A and X use bfloat16 and Y and the compute type use float.
- rocsparse_spmm where A and B use bfloat16 and C and the compute type use float.
- rocsparse_sddmm where A and B use bfloat16 and C and the compute type use float.
- rocsparse_sddmm where A and B and C use bfloat16 and the compute type use float.
Half float mixed precision to rocsparse_sddmm where A and B and C use float16 and the compute type use float.
Brain half float uniform precision to rocsparse_scatter and rocsparse_gather routines.

Optimized

Improved the user documentation.

Upcoming Changes

Deprecate trace, debug, and bench logging using the environment variable ROCSPARSE_LAYER.

rocThrust (4.1.0)

Added

A new CMake option -DSQLITE_USE_SYSTEM_PACKAGE to allow SQLite to be provided by the system.
Introduced libhipcxx as a soft dependency. When liphipcxx can be included, rocThrust can use structs and methods defined in libhipcxx. This allows for a more complete behavior parity with CCCL and mirrors CCCL's thrust own dependency on libcudacxx.
Added a new CMake option -DUSE_SYSTEM_LIB to allow tests to be built from ROCm libraries provided by the system.

Known Issues

event test is failing on CI and local runs on MI300, MI250 and MI210.
rocThrust, as well as its dependencies rocPRIM and rocRAND have been moved into the new rocm-libraries monorepo repository (https://github.com/ROCm/rocm-libraries). This repository contains a number of ROCm libraries that are frequently used together.
- The repository migration requires a few changes to the way that rocThrust's ROCm library dependencies are fetched.
- There are new cmake options for obtaining rocPRIM and (optionally, if BUILD_BENCHMARKS is enabled) rocRAND.
- cmake build options ROCPRIM_FETCH_METHOD and ROCRAND_FETCH_METHOD may be set to one of the following:
  - PACKAGE - (default) searches for a preinstalled packaged version of the dependency. If it is not found, the build will fall back using option DOWNLOAD, described below.
  - DOWNLOAD - downloads the dependency from the rocm-libraries repository. If git >= 2.25 is present, this option uses a sparse checkout that avoids downloading more than it needs to. If not, the whole monorepo is downloaded (this may take some time).
  - MONOREPO - this option is intended to be used if you are building rocThrust from within a copy of the rocm-libraries repository that you have cloned (and therefore already contains the dependencies rocPRIM and rocRAND). When selected, the build will try to find the dependency in the local repository tree. If it cannot be found, the build will attempt to add it to the local tree using a sparse-checkout. If that also fails, it will fall back to using the DOWNLOAD option.

Changed

The previously hidden cmake build option FORCE_DEPENDENCIES_DOWNLOAD has been unhidden and renamed EXTERNAL_DEPS_FORCE_DOWNLOAD to differentiate it from the new rocPRIM and rocRAND dependency options described above. Its behavior remains the same - it forces non-ROCm dependencies (Google Benchmark, Google Test, and SQLite) to be downloaded instead of searching for existing installed packages. This option defaults to OFF.

Removed

The previous dependency-related build options DOWNLOAD_ROCPRIM and DOWNLOAD_ROCRAND have been removed. Use ROCPRIM_FETCH_METHOD=DOWNLOAD and ROCRAND_FETCH_METHOD=DOWNLOAD instead.

RPP (2.1.0)

Added

Solarize augmentation for HOST and HIP
Hue and Saturation adjustment augmentations for HOST and HIP
Find RPP - cmake module
Posterize augmentation for HOST and HIP

Changed

HALF - Fix half.hpp path updates
Box filter - padding updates

Removed

Packaging - Remove Meta Package dependency for HIP
SLES 15 SP6 support

Resolved Issues

Test Suite - Fixes for accuracy
HIP Backend - Check return status warning fixes
Bugfix - HIP vector types init

ROCm known issues

ROCm known issues are noted on {fab}github GitHub. For known issues related to individual components, review the Detailed component changes.

ROCm upcoming changes

The following changes to the ROCm software stack are anticipated for future releases.

ROCm SMI deprecation

ROCm SMI will be phased out in an upcoming ROCm release and will enter maintenance mode. After this transition, only critical bug fixes will be addressed and no further feature development will take place.

It's strongly recommended to transition your projects to AMD SMI, the successor to ROCm SMI. AMD SMI includes all the features of the ROCm SMI and will continue to receive regular updates, new functionality, and ongoing support. For more information on AMD SMI, see the AMD SMI documentation.

ROCTracer, ROCProfiler, rocprof, and rocprofv2 deprecation

Development and support for ROCTracer, ROCProfiler, rocprof, and rocprofv2 are being phased out in favor of ROCprofiler-SDK in upcoming ROCm releases. Starting with ROCm 6.4, only critical defect fixes will be addressed for older versions of the profiling tools and libraries. All users are encouraged to upgrade to the latest version of the ROCprofiler-SDK library and the (rocprofv3) tool to ensure continued support and access to new features. ROCprofiler-SDK is still in beta today and will be production-ready in a future ROCm release.

It's anticipated that ROCTracer, ROCProfiler, rocprof, and rocprofv2 will reach end-of-life by future releases, aligning with Q1 of 2026.

AMDGPU wavefront size compiler macro deprecation

Access to the wavefront size as a compile-time constant via the __AMDGCN_WAVEFRONT_SIZE and __AMDGCN_WAVEFRONT_SIZE__ macros are deprecated and will be disabled in a future release. In ROCm 7.0.0 warpSize is only available as a non-constexpr variable. You're encouraged to update your code if needed to ensure future compatibility.

The __AMDGCN_WAVEFRONT_SIZE__ macro and __AMDGCN_WAVEFRONT_SIZE alias will be removed in an upcoming release. It is recommended to remove any use of this macro. For more information, see AMDGPU support.
warpSize is only available as a non-constexpr variable. Where required, the wavefront size should be queried via the warpSize variable in device code, or via hipGetDeviceProperties in host code. Neither of these will result in a compile-time constant. For more information, see warpSize.
For cases where compile-time evaluation of the wavefront size cannot be avoided, uses of __AMDGCN_WAVEFRONT_SIZE, __AMDGCN_WAVEFRONT_SIZE__, or warpSize can be replaced with a user-defined macro or constexpr variable with the wavefront size(s) for the target hardware. For example:

   #if defined(__GFX9__)
   #define MY_MACRO_FOR_WAVEFRONT_SIZE 64
   #else
   #define MY_MACRO_FOR_WAVEFRONT_SIZE 32
   #endif

Changes to ROCm Object Tooling

ROCm Object Tooling tools roc-obj-ls, roc-obj-extract, and roc-obj were deprecated in ROCm 6.4, and will be removed in a future release. Functionality has been added to the llvm-objdump --offloading tool option to extract all clang-offload-bundles into individual code objects found within the objects or executables passed as input. The llvm-objdump --offloading tool option also supports the --arch-name option, and only extracts code objects found with the specified target architecture. See llvm-objdump for more information.

73 KiB Raw Blame History Unescape Escape

ROCm 7.1.0 release notes

Release highlights

Supported hardware, operating system, and virtualization changes

Virtualization support

User space, driver, and firmware dependent changes

HIP runtime compatibility improvements

hipSPARSELt: SpMM performance improvements

RPP: New hue and saturation augmentations

rocAL: Enhancements for vision transformer model training

hipBLASLt: Kernel optimizations and model support enhancements

TensileLite: Enhanced SpMM kernel tuning efficiency

RCCL: AMD Instinct MI350 Series enhancements

ROCm Compute Profiler updates

ROCm Systems Profiler updates

Device-side assertion support and atomic metadata control in Clang

Deep learning and AI framework updates

Updated framework support

TensorFlow

ROCm Offline Installer Creator updates

ROCm Runfile Installer updates

End of Support for ROCm Execution Provider (ROCm-EP)

ROCm documentation updates

ROCm components

Detailed component changes

AMD SMI (26.1.0)

Added

Changed

Optimized

Resolved Issues

Composable Kernel (1.1.0)

Added

Upcoming changes

HIP (7.1.0)

Added

Resolved issues

Optimized

hipBLAS (3.1.0)

Added

Changed

hipBLASLt (1.1.0)

Added

Changed

Resolved issues

hipCUB (4.1.0)

Added

Removed

Changed

Known Issues

hipFFT (1.0.21)

Added

hipfort (0.7.1)

Added

Resolved Issues

hipRAND (3.1.0)

Resolved Issues

hipSOLVER (3.1.0)

Added

Changed

hipSPARSE (4.1.0)

Added

hipSPARSELt (0.2.5)

Changed

Optimized

MIOpen (3.5.1)

Added

Changed

Optimized

Resolved Issues

MIVisionX (3.4.0)

Added

Changed

Resolved Issues

Known Issues

Upcoming Changes

RCCL (2.27.7)

Added

Changed

Optimized

Resolved Issues

73 KiB

Raw Blame History