7.1.0 release notes and compatibility footnote update (#599)

* RDC changelog and highlight addition * Compatibility updated * Minor change * Consolidated changelog synced
2026-01-09 14:48:06 -05:00 · 2025-10-25 09:47:17 -04:00
parent a2e2bd3277
commit c56d5b7495
4 changed files with 934 additions and 183 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,6 +4,759 @@ This page is a historical overview of changes made to ROCm components. This
 consolidated changelog documents key modifications and improvements across
 different versions of the ROCm software stack and its components.

+## ROCm 7.1.0
+
+See the [ROCm 7.0.2 release notes](https://rocm-stg.amd.com/en/latest/about/release-notes.html#rocm-7-1-0-release-notes)
+for a complete overview of this release.
+
+### **AMD SMI** (26.1.0)
+
+#### Added
+
+* `GPU LINK PORT STATUS` table to `amd-smi xgmi` command. The `amd-smi xgmi -s` or `amd-smi xgmi --source-status` will now show the `GPU LINK PORT STATUS` table.  
+
+* `amdsmi_get_gpu_revision()` to Python API. This function retrieves the GPU revision ID. Available in `amdsmi_interface.py` as `amdsmi_get_gpu_revision()`.
+
+* Gpuboard and baseboard temperatures to `amd-smi metric` command.
+
+#### Changed
+
+* Struct `amdsmi_topology_nearest_t` member `processor_list`. Member size changed, processor_list[AMDSMI_MAX_DEVICES * AMDSMI_MAX_NUM_XCP].
+
+* `amd-smi reset --profile` behavior so that it won't also reset the performance level.  
+  * The performance level can still be reset using `amd-smi reset --perf-determinism`.  
+
+* Setting power cap is now available in Linux Guest. You can now use `amd-smi set --power-cap` as usual in Linux Guest systems too.
+
+* Changed `amd-smi static --vbios` to `amd-smi static --ifwi`.  
+  * VBIOS naming is replaced with IFWI (Integrated Firmware Image) for improved clarity and consistency.
+  * AMD Instinct MI300 Series GPUs (and later) now use a new version format with enhanced build information.
+  * Legacy command `amd-smi static --vbios` remains functional for backward compatibility, but displays updated IFWI heading.
+  * The Python, C, and Rust API for `amdsmi_get_gpu_vbios_version()` will now have a new field called `boot_firmware`, which will return the legacy vbios version number that is also known as the Unified BootLoader (UBL) version.
+
+#### Optimized
+
+* Optimized the way `amd-smi process` validates, which processes are running on a GPU. 
+
+#### Resolved issues
+
+* Fixed a CPER record count mismatch issue when using the `amd-smi ras --cper --file-limit`. Updated the deletion calculation to use `files_to_delete = len(folder_files) - file_limit` for exact file count management.
+
+* Fixed the event monitoring segfaults causing RDC to crash. Added the mutex locking around access to device event notification file pointer.
+
+* Fixed an issue where using `amd-smi ras --folder <folder_name>` was forcing the created folder's name to be lowercase. This fix also makes all string input options case-insensitive.
+
+* Fixed certain output in `amd-smi monitor` when GPUs are partitioned. It fixes the issue with amd-smi monitor such as: `amd-smi monitor -Vqt`, `amd-smi monitor -g 0 -Vqt -w 1`, and `amd-smi monitor -Vqt --file /tmp/test1`. These commands will now be able to display as normal in partitioned GPU scenarios.
+
+```{note}
+See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/rocm-rel-7.1/CHANGELOG.md) for details, examples, and in-depth descriptions.
+```
+
+### **Composable Kernel** (1.1.0)
+
+#### Added
+ 
+* Support for hdim as a multiple of 32 for FMHA (fwd/fwd_splitkv/bwd).
+* Support for elementwise kernel.
+ 
+#### Upcoming changes
+ 
+* Non-grouped convolutions are deprecated. Their functionality is supported by grouped convolution.
+
+### **HIP** (7.1.0)
+
+#### Added
+
+* New HIP APIs
+    - `hipModuleGetFunctionCount` returns the number of functions within a module
+    - `hipMemsetD2D8` used for setting 2D memory range with specified 8-bit values
+    - `hipMemsetD2D8Async` used for setting 2D memory range with specified 8-bit values asynchronously
+    - `hipMemsetD2D16` used for setting 2D memory range with specified 16-bit values
+    - `hipMemsetD2D16Async` used for setting 2D memory range with specified 16-bit values asynchronously
+    - `hipMemsetD2D32` used for setting 2D memory range with specified 32-bit values
+    - `hipMemsetD2D32Async` used for setting 2D memory range with specified 32-bit values asynchronously
+    - `hipStreamSetAttribute` sets attributes such as synchronization policy for a given stream
+    - `hipStreamGetAttribute` returns attributes such as priority for a given stream
+    - `hipModuleLoadFatBinary`  loads fatbin binary to a module
+    - `hipMemcpyBatchAsync` performs a batch of 1D or 2D memory copied asynchronously
+    - `hipMemcpy3DBatchAsync` performs a batch of 3D memory copied asynchronously
+    - `hipMemcpy3DPeer` copies memory between devices
+    - `hipMemcpy3DPeerAsync`copies memory between devices asynchronously
+    - `hipMemsetD2D32Async` used for setting 2D memory range with specified 32-bit values
+      asynchronously
+    - `hipMemPrefetchAsync_v2`  prefetches memory to the specified location
+    - `hipMemAdvise_v2`         advise about the usage of a given memory range
+    - `hipGetDriverEntryPoint ` gets function pointer of a HIP API.
+    - `hipSetValidDevices`      sets a default list of devices that can be used by HIP
+    - `hipStreamGetId`          queries the ID of a stream
+* Support for the flag `hipMemLocationTypeHost`, enables handling virtual memory management in host memory location, in addition to device memory.
+* Support for nested tile partitioning within cooperative groups, matching CUDA functionality.
+
+#### Optimized
+
+* Improved hip module loading latency.
+* Optimized kernel metadata retrieval during module post load.
+* Optimized doorbell ring in HIP runtime, advantages the following for performance improvement:
+    - Makes efficient packet batching for HIP graph launch.
+    - Dynamic packet copying based on a defined maximum threshold or power-of-2 staggered copy pattern.
+    - If timestamps are not collected for a signal for reuse, it creates a new signal. This can potentially increase the signal footprint if the handler doesn't run fast enough.
+
+#### Resolved issues
+
+* A segmentation fault occurred in the application when capturing the same HIP graph from multiple streams with cross-stream dependencies.  The HIP runtime has fixed an issue where a forked stream joined to a parent stream that was not originally created with the API `hipStreamBeginCapture`.
+* Different behavior of en-queuing command on a legacy stream during stream capture on AMD ROCM platform, compared with CUDA. HIP runtime now returns an error in this specific situation to match CUDA behavior.
+* Failure of memory access fault occurred in rocm-examples test suite. When Heterogeneous Memory Management (HMM) is not supported in the driver, `hipMallocManaged` will only allocate system memory in HIP runtime.
+
+#### Known issues
+
+* SPIR-V-enabled applications may encounter an issue of segmentation fault. The problem disappears when SPIR-V is disabled. The issue will be fixed in the next ROCm release. 
+
+### **hipBLAS** (3.1.0)
+
+#### Added
+
+* `--clients-only` build option to only build clients against a prebuilt library.
+* gfx1103, gfx1150, gfx1151, gfx1200, and gfx1201 support to clients.
+* FORTRAN enabled for the Microsoft Windows build and tests.
+* Additional reference library fallback options added.
+
+#### Changed
+
+* Improved the build time for clients by removing `clients_common.cpp` from the hipblas-test build.
+
+### **hipBLASLt** (1.1.0)
+
+#### Added
+
+* Fused Clamp GEMM for ``HIPBLASLT_EPILOGUE_CLAMP_EXT`` and ``HIPBLASLT_EPILOGUE_CLAMP_BIAS_EXT``. This feature requires the minimum (``HIPBLASLT_MATMUL_DESC_EPILOGUE_ACT_ARG0_EXT``) and maximum (``HIPBLASLT_MATMUL_DESC_EPILOGUE_ACT_ARG1_EXT``) to be set.
+* Support for ReLU/Clamp activation functions with auxiliary output for the `FP16` and `BF16` data types for gfx942 to capture intermediate results. This feature is enabled for ``HIPBLASLT_EPILOGUE_RELU_AUX``, ``HIPBLASLT_EPILOGUE_RELU_AUX_BIAS``, ``HIPBLASLT_EPILOGUE_CLAMP_AUX_EXT``, and ``HIPBLASLT_EPILOGUE_CLAMP_AUX_BIAS_EXT``.
+* Support for `HIPBLAS_COMPUTE_32F_FAST_16BF` for FP32 data type for gfx950 only.
+* CPP extension APIs ``setMaxWorkspaceBytes`` and ``getMaxWorkspaceBytes``.
+* Feature to print logs (using ``HIPBLASLT_LOG_MASK=32``) for Grouped GEMM.
+* Support for swizzleA by using the hipblaslt-ext cpp API.
+* Support for hipBLASLt extop for gfx11xx and gfx12xx.
+
+#### Changed
+
+* ``hipblasLtMatmul()`` now returns an error when the workspace size is insufficient, rather than causing a segmentation fault.
+
+#### Resolved issues
+
+* Fixed incorrect results when using ldd and ldc dimension parameters with some solutions.
+
+### **hipCUB** (4.1.0)
+
+#### Added
+
+* Exposed Thread-level reduction API `hipcub::ThreadReduce`.
+* `::hipcub::extents`, with limited parity to C++23's `std::extents`. Only `static extents` is supported; `dynamic extents` is not. Helper structs have been created to perform computations on `::hipcub::extents` only when the backend is rocPRIM. For the CUDA backend, similar functionality exists.
+* `projects/hipcub/hipcub/include/hipcub/backend/rocprim/util_mdspan.hpp` to support `::hipcub::extents`.
+* `::hipcub::ForEachInExtents` API.
+* `hipcub::DeviceTransform::Transform` and `hipcub::DeviceTransform::TransformStableArgumentAddresses`.
+* hipCUB and its dependency rocPRIM have been moved into the new `rocm-libraries` [monorepo repository](https://github.com/ROCm/rocm-libraries). This repository contains a number of ROCm libraries that are frequently used together.
+  * The repository migration requires a few changes to the way that hipCUB fetches library dependencies.
+  * CMake build option `ROCPRIM_FETCH_METHOD` may be set to one of the following:
+    * `PACKAGE` - (default) searches for a preinstalled packaged version of the dependency. If it is not found, the build will fall back using option `DOWNLOAD`, below.
+    * `DOWNLOAD` - downloads the dependency from the rocm-libraries repository. If git >= 2.25 is present, this option uses a sparse checkout that avoids downloading more than it needs to. If not, the whole monorepo is downloaded (this may take some time).
+    * `MONOREPO` - this option is intended to be used if you are building hipCUB from within a copy of the rocm-libraries repository that you have cloned (and therefore already contains rocPRIM). When selected, the build will try find the dependency in the local repository tree. If it cannot be found, the build will attempt to use git to perform a sparse-checkout of rocPRIM. If that also fails, it will fall back to using the `DOWNLOAD` option described above.
+
+* A new CMake option `-DUSE_SYSTEM_LIB` to allow tests to be built from installed `hipCUB` provided by the system.
+
+#### Changed
+
+* Changed include headers to avoid relative includes that have slipped in.
+* Changed `CUDA_STANDARD` for tests in `test/hipcub`, due to C++17 APIs such as `std::exclusive_scan` is used in some tests. Still use `CUDA_STANDARD 14` for `test/extra`.
+* Changed `CCCL_MINIMUM_VERSION` to `2.8.2` to align with CUB.
+* Changed `cmake_minimum_required` from `3.16` to `3.18`, in order to support `CUDA_STANDARD 17` as a valid value.
+* Add support for large num_items `DeviceScan`, `DevicePartition` and `Reduce::{ArgMin, ArgMax}`.
+* Added tests for large num_items.
+* The previous dependency-related build option `DEPENDENCIES_FORCE_DOWNLOAD` has been renamed `EXTERNAL_DEPS_FORCE_DOWNLOAD` to differentiate it from the new rocPRIM dependency option described above. Its behavior remains the same - it forces non-ROCm dependencies (Google Benchmark and Google Test) to be downloaded rather than searching for installed packages. This option defaults to `OFF`.
+
+#### Removed
+
+* Removed `TexRefInputIterator`, which was removed from CUB after CCCL's 2.6.0 release. This API should have already been removed, but somehow it remained and was not tested.
+* Deprecated `hipcub::ConstantInputIterator`, use `rocprim::constant_iterator` or `rocthrust::constant_iterator` instead.
+* Deprecated `hipcub::CountingInputIterator`, use `rocprim::counting_iterator` or `rocthrust::counting_iterator` instead.
+* Deprecated `hipcub::DiscardOutputIterator`, use `rocprim::discard_iterator` or `rocthrust::discard_iterator` instead.
+* Deprecated `hipcub::TransformInputIterator`, use `rocprim::transform_iterator` or `rocthrust::transform_iterator` instead.
+* Deprecated `hipcub::AliasTemporaries`, which is considered to be an internal API. Moved to the detail namespace.
+* Deprecated almost all functions in `projects/hipcub/hipcub/include/hipcub/backend/rocprim/util_ptx.hpp`.
+* Deprecated hipCUB macros: `HIPCUB_MAX`, `HIPCUB_MIN`, `HIPCUB_QUOTIENT_FLOOR`, `HIPCUB_QUOTIENT_CEILING`, `HIPCUB_ROUND_UP_NEAREST` and `HIPCUB_ROUND_DOWN_NEAREST`.
+
+#### Known issues
+
+* The `__half` template specializations of Simd operators are currently disabled due to possible build issues with PyTorch.
+
+### **hipFFT** (1.0.21)
+
+#### Added
+
+* Improved test coverage of multi-stream plans, user-specified work areas, and default stride calculation.
+* Experimental introduction of hipFFTW library, interfacing rocFFT on AMD platforms using the same symbols as FFTW3 (with partial support).
+
+### **hipfort** (0.7.1)
+
+#### Added
+
+* Support for building with CMake 4.0.
+
+#### Resolved issues
+
+* Fixed a potential integer overflow issue in `hipMalloc` interfaces.
+
+### **hipRAND** (3.1.0)
+
+#### Resolved issues
+
+* Updated error handling for several hipRAND unit tests to accomodate the new hipGetLastError behaviour that was introduced in ROCm 7.0.0. As of ROCm 7.0.0, the internal error state is cleared on each call to `hipGetLastError` rather than on every HIP API call.
+
+### **hipSOLVER** (3.1.0)
+
+#### Added
+
+* Extended test suites for `hipsolverDn` compatibility functions.
+
+#### Changed
+
+* Changed code coverage to use `llvm-cov` instead of `gcov`.
+
+### **hipSPARSE** (4.1.0)
+
+#### Added
+
+* Brain half float mixed precision for the following routines:
+    * `hipsparseAxpby` where X and Y use bfloat16 and result and the compute type use float.
+    * `hipsparseSpVV` where X and Y use bfloat16 and result and the compute type use float.
+    * `hipsparseSpMV` where A and X use bfloat16 and Y and the compute type use float.
+    * `hipsparseSpMM` where A and B use bfloat16 and C and the compute type use float.
+    * `hipsparseSDDMM` where A and B use bfloat16 and C and the compute type use float.
+    * `hipsparseSDDMM` where A and B and C use bfloat16 and the compute type use float.
+* Half float mixed precision to `hipsparseSDDMM` where A and B and C use float16 and the compute type use float.
+* Brain half float uniform precision to `hipsparseScatter` and `hipsparseGather` routines.
+* Documentation for installing and building hipSPARSE on Microsoft Windows.
+
+### **hipSPARSELt** (0.2.5)
+
+#### Changed
+
+* Changed the behavior of the Relu activation.
+
+#### Optimized
+
+* Provided more kernels for the `FP16` and `BF16` data types.
+
+### **MIGraphX** (2.14.0)
+
+#### Added
+
+* Python 3.13 support.
+* PyTorch wheels to the Dockerfile.
+* Python API for returning serialized bytes.
+* `fixed_pad` operator for padding dynamic shapes to the maximum static shape.
+* Matcher to upcast base `Softmax` operations.
+* Support for the `convolution_backwards` operator through rocMLIR.
+* `LSE` output to attention fusion.
+* Flags to `EnableControlFlowGuard` due to BinSkim errors.
+* New environment variable documentation and reorganized structure.
+* `stash_type` attribute for `LayerNorm` and expanded test coverage.
+* Operator builders (phase 2).
+* `MIGRAPHX_GPU_HIP_FLAGS` to allow extra HIP compile flags.
+
+#### Changed
+
+* Updated C API to include `current()` caller information in error reporting.
+* Updated documentation dependencies:
+  * **rocm-docs-core** bumped from 1.21.1 → 1.25.0 across releases.
+  * **Doxygen** updated to 1.14.0.
+  * **urllib3** updated from 2.2.2 → 2.5.0.
+* Updated `src/CMakeLists.txt` to support `msgpack` 6.x (`msgpack-cxx`).
+* Updated model zoo test generator to fix test issues and add summary logging.
+* Updated `rocMLIR` and `ONNXRuntime` mainline references across commits.
+* Updated module sorting algorithm for improved reliability.
+* Restricted FP8 quantization to `dot` and `convolution` operators.
+* Moved ONNX Runtime launcher script into MIGraphX and updated build scripts.
+* Simplified ONNX `Resize` operator parser for correctness and maintainability.
+* Updated `any_ptr` assertion to avoid failure on default HIP stream.
+* Print kernel and module information on compile failure.
+
+#### Removed
+
+* Removed Perl dependency from SLES builds.
+* Removed redundant includes and unused internal dependencies.
+
+#### Optimized
+
+* Reduced nested visits in reference operators to improve compile time.
+* Avoided dynamic memory allocation during kernel launches.
+* Removed redundant NOP instructions for GFX11/12 platforms.
+* Improved `Graphviz` output (node color and layout updates).
+* Optimized interdependency checking during compilation.
+* Skip hipBLASLt solutions requiring workspace size larger than 128 MB for efficient memory utilization.
+
+#### Resolved issues
+
+* Error in `MIGRAPHX_GPU_COMPILE_PARALLEL` documentation (#4337).
+* rocMLIR `rewrite_reduce` issue (#4218).
+* Bug with `invert_permutation` on GPU (#4194).
+* Compile error when `MIOPEN` is disabled (missing `std` includes) (#4281).
+* ONNX `Resize` parsing when input and output shapes are identical (#4133, #4161).
+* Issue with MHA in attention refactor (#4152).
+* Synchronization issue from upstream ONNX Runtime (#4189).
+* Spelling error in “Contiguous” (#4287).
+* Tidy complaint about duplicate header (#4245).
+* `reshape`, `transpose`, and `broadcast` rewrites between pointwise and reduce operators (#3978).
+* Extraneous include file in HIPRTC-based compilation (#4130).
+* CI Perl dependency issue for SLES builds (#4254).
+* Compiler warnings for ROCm 7.0 of ``error: unknown warning option '-Wnrvo'``(#4192).
+
+### **MIOpen** (3.5.1)
+
+#### Added
+
+* Added a new trust verify find mode.
+* Ported Op4dTensorLite kernel from OpenCL to HIP.
+* Implemented a generic HIP kernel for backward layer normalization.
+
+#### Changed
+
+* Kernel DBs moved from Git LFS to DVC (Data Version Control).
+
+#### Optimized
+
+* [Conv] Enabled Composable Kernel (CK) implicit gemms on gfx950.
+
+#### Resolved issues
+
+* [BatchNorm] Fixed a bug for the NHWC layout when a variant was not applicable.
+* Fixed a bug that caused a zero-size LDS array to be defined on Navi.
+
+### **MIVisionX** (3.4.0)
+
+#### Added
+
+* VX_RPP - Update blur
+* HIP - HIP_CHECK for hipLaunchKernelGGL for gated launch
+
+#### Changed
+
+* AMD Custom V1.1.0 - OpenMP updates
+* HALF - Fix half.hpp path updates
+
+#### Resolved issues
+
+* AMD Custom - dependency linking errors resolved
+* VX_RPP - Fix memory leak
+* Packaging - Remove Meta Package dependency for HIP
+
+#### Known issues
+
+* Installation on CentOS/RedHat/SLES requires the manual installation of the `FFMPEG` &amp; `OpenCV` dev packages.
+
+#### Upcoming changes
+
+* VX_AMD_MEDIA - rocDecode support for hardware decode
+
+### **RCCL** (2.27.7)
+
+#### Added
+
+* Added `RCCL_P2P_BATCH_THRESHOLD` to set the message size limit for batching P2P operations. This mainly affects small message performance for alltoall at a large scale but also applies to alltoallv.
+* Added `RCCL_P2P_BATCH_ENABLE` to enable batching P2P operations to receive performance gains for smaller messages up to 4MB for alltoall when the workload requires it. This is to avoid performance dips for larger messages.
+
+#### Changed
+
+* The MSCCL++ feature is now disabled by default. The `--disable-mscclpp` build flag is replaced with `--enable-mscclpp` in the `rccl/install.sh` script.
+* Compatibility with NCCL 2.27.7.
+
+#### Optimized
+
+* Improved small message performance for `alltoall` by enabling and optimizing batched P2P operations.
+
+#### Resolved issues
+
+* Improve small message performance for alltoall by enabling and optimizing batched P2P operations.
+
+#### Known issues
+
+* Symmetric memory kernels are currently disabled due to ongoing CUMEM enablement work.
+* When running this version of RCCL using ROCm versions earlier than 6.4.0, the user must set the environment flag `HSA_NO_SCRATCH_RECLAIM=1`.
+
+### **ROCm Data Center Tool** (1.2.0)
+
+#### Added
+
+- CPU monitoring support with 30+ CPU field definitions through AMD SMI integration.
+- CPU partition format support (c0.0, c1.0) for monitoring AMD EPYC processors.
+- Mixed GPU/CPU monitoring in single `rdci dmon` command.
+
+#### Optimized
+
+- Improved profiler metrics path detection for counter definitions.
+
+#### Resolved issues
+
+- Group management issues with listing created/non-created groups.
+- ECC_UNCORRECT field behavior.
+
+### **rocAL** (2.4.0)
+
+#### Added
+* JAX iterator support in rocAL
+* rocJPEG - Fused Crop decoding support
+
+#### Changed
+* CropResize - updates and fixes
+* Packaging - Remove Meta Package dependency for HIP
+
+#### Resolved issues
+* OpenMP - dependency linking errors resolved.
+* Bugfix - memory leaks in rocAL.
+
+#### Known issues
+* Package installation on SLES requires manually installing `TurboJPEG`.
+* Package installation on CentOS, RedHat, and SLES requires manually installing the `FFMPEG Dev` package.
+
+### **rocALUTION** (4.0.1)
+
+#### Added
+
+* Added support for gfx950.
+
+#### Changed
+
+* Updated the default build standard to C++17 when compiling rocALUTION from source (previously C++14).
+
+#### Optimized
+
+* Improved and expanded user documentation.
+
+#### Resolved issues
+
+* Fixed a bug in the GPU hashing algorithm that occurred when not compiling with -O2/-O3.
+* Fixed an issue with the SPAI preconditioner when using complex numbers.
+
+### **rocBLAS** (5.1.0)
+
+#### Added
+
+* Sample for clients using OpenMP threads calling rocBLAS functions.
+* gfx1103, gfx1150, and gfx1151 enabled.
+
+#### Changed
+
+* By default, the Tensile build is no longer based on `tensile_tag.txt` but uses the same commit from shared/tensile in the rocm-libraries repository. The rmake or install `-t` option can build from another local path with a different commit.
+
+#### Optimized
+
+* Improved the performance of Level 2 gemv transposed (`TransA != N`) for the problem sizes where `m` is small and `n` is large on gfx90a and gfx942.
+
+### **rocDecode** (1.4.0)
+
+#### Added
+
+* AV1 12-bit decode support on VA-API version 1.23.0 and later.
+* rocdecode-host V1.0.0 library for software decode
+* FFmpeg version support for 5.1 and 6.1
+* Find package - rocdecode-host
+
+#### Resolved issues
+
+* rocdecode-host - failure to build debuginfo packages without FFmpeg resolved.
+* Fix a memory leak for rocDecodeNegativeTests
+
+#### Changed
+
+* HIP meta package changed - Use hip-dev/devel to bring required hip dev deps
+* rocdecode host - linking updates to rocdecode-host library
+
+### **rocFFT** (1.0.35)
+
+#### Optimized
+
+* Implemented single-kernel plans for some 2D problem sizes, on devices with at least 160KiB of LDS.
+* Improved performance of unit-strided, complex-interleaved, forward/inverse FFTs for lengths: (64,64,128), (64,64,52), (60,60,60)
+, (32,32,128), (32,32,64), (64,32,128)
+* Improved performance of 3D MPI pencil decompositions by using sub-communicators for global transpose operations.
+
+### **rocJPEG** (1.2.0)
+
+#### Changed
+* HIP meta package has been changed. Use `hip-dev/devel` to bring required hip dev deps.
+
+#### Resolved issues
+* Fixed an issue where extra padding was incorrectly included when saving decoded JPEG images to files.
+* Resolved a memory leak in the jpegDecode application.
+
+### **ROCm Compute Profiler** (3.3.0)
+
+#### Added
+* Live attach/detach feature that allows coupling with a workload process, without controlling its start or end.
+  * Use '--attach-pid' to specify the target process ID.
+  * Use '--attach-duration-msec' to specify time duration.
+* `rocpd` choice for `--format-rocprof-output` option in profile mode.
+* `--retain-rocpd-output` option in profile mode to save large raw rocpd databases in workload directory.
+* Feature to show description of metrics during analysis.
+  * Use `--include-cols Description` to show the Description column, which is excluded by default from the
+  ROCm Compute Profiler CLI output.
+* `--set` filtering option in profile mode to enable single-pass counter collection for predefined subsets of metrics.
+* `--list-sets` filtering option in profile mode to list the sets available for single pass counter collection.
+* Missing counters based on register specification which enables missing metrics.
+  * Enabled `SQC_DCACHE_INFLIGHT_LEVEL` counter and associated metrics.
+  * Enabled `TCP_TCP_LATENCY` counter and associated counter for all GPUs except MI300.
+* Interactive metric descriptions in TUI analyze mode.
+  * You can now left click on any metric cell to view detailed descriptions in the dedicated `METRIC DESCRIPTION` tab.
+* Support for analysis report output as a sqlite database using ``--output-format db`` analysis mode option.
+* `Compute Throughput` panel to TUI's `High Level Analysis` category with the following metrics: VALU FLOPs, VALU IOPs, MFMA FLOPs (F8), MFMA FLOPs (BF16), MFMA FLOPs (F16), MFMA FLOPs (F32), MFMA FLOPs (F64), MFMA FLOPs (F6F4) (in gfx950), MFMA IOPs (Int8), SALU Utilization, VALU Utilization, MFMA Utilization, VMEM Utilization, Branch Utilization, IPC
+
+* `Memory Throughput` panel to TUI's `High Level Analysis` category with the following metrics: vL1D Cache BW, vL1D Cache Utilization, Theoretical LDS Bandwidth, LDS Utilization, L2 Cache BW, L2 Cache Utilization, L2-Fabric Read BW, L2-Fabric Write BW, sL1D Cache BW, L1I BW, Address Processing Unit Busy, Data-Return Busy, L1I-L2 Bandwidth, sL1D-L2 BW
+* Roofline support for Debian 12 and Azure Linux 3.0.
+* Notice for change in default output format to `rocpd` in a future release
+  * This is displayed when `--format-rocprof-output rocpd` is not used in profile mode
+
+#### Changed
+
+* In the memory chart, long string of numbers are now displayed as scientific notation. It also solves the issue of overflow of displaying long number
+* When `--format-rocprof-output rocpd` is used, only `pmc_perf.csv` will be written to workload directory instead of mulitple csv files.
+* CLI analysis mode baseline comparison will now only compare common metrics across workloads and will not show the Metric ID.
+  * Removed metrics from analysis configuration files which are explicitly marked as empty or None.
+* Changed the basic (default) view of TUI from aggregated analysis data to individual kernel analysis data.
+* Updated `Unit` of the following `Bandwidth` related metrics to `Gbps` instead of `Bytes per Normalization Unit`:
+  * Theoretical Bandwidth (section 1202)
+  * L1I-L2 Bandwidth (section 1303)
+  * sL1D-L2 BW (section 1403)
+  * Cache BW (section 1603)
+  * L1-L2 BW (section 1603)
+  * Read BW (section 1702)
+  * Write and Atomic BW (section 1702)
+  * Bandwidth (section 1703)
+  * Atomic/Read/Write Bandwidth (section 1703)
+  * Atomic/Read/Write Bandwidth - (HBM/PCIe/Infinity Fabric) (section 1706)
+* Updated the metric name for the following `Bandwidth` related metrics whose `Unit` is `Percent` by adding `Utilization`:
+  * Theoretical Bandwidth Utilization (section 1201)
+  * L1I-L2 Bandwidth Utilization (section 1301)
+  * Bandwidth Utilization (section 1301)
+  * Bandwidth Utilization (section 1401)
+  * sL1D-L2 BW Utilization (section 1401)
+  * Bandwidth Utilization (section 1601)
+* Updated `System Speed-of-Light` panel to `GPU Speed-of-Light` in TUI for the following metrics:
+  * Theoretical LDS Bandwidth
+  * vL1D Cache BW
+  * L2 Cache BW
+  * L2-Fabric Read BW
+  * L2-Fabric Write BW
+  * Kernel Time
+  * Kernel Time (Cycles)
+  * SIMD Utilization
+  * Clock Rate
+* Analysis output:
+  * Replaced `-o / --output` analyze mode option with `--output-format` and `--output-name`.
+    * Use ``--output-format`` analysis mode option to select the output format of the analysis report.
+    * Use ``--output-name`` analysis mode option to override the default file/folder name.
+  * Replaced `--save-dfs` analyze mode option with `--output-format csv`.
+* Command-line options:
+  * `--list-metrics` and `--config-dir` options moved to general command-line options.
+  * `--list-metrics` option cannot be used without GPU architecture argument.
+  * `--list-metrics` option do not show number of L2 channels.
+  * `--list-available-metrics` profile mode option to display the metrics available for profiling in current GPU.
+  * `--list-available-metrics` analyze mode option to display the metrics available for analysis.
+  * `--block` option cannot be used with `--list-metrics` and `--list-available-metrics`options.
+* Default `rocprof` interface changed from `rocprofv3` to `rocprofiler-sdk`
+  * Use ROCPROF=rocprofv3 to use rocprofv3 interface
+* Updated metric names for better alignment between analysis configuration and documentation.
+
+#### Removed
+
+* Usage of `rocm-smi` in favor of `amd-smi`.
+* Hardware IP block-based filtering has been removed in favor of analysis report block-based filtering.
+* Aggregated analysis view from TUI analyze mode.
+
+#### Optimized
+
+* Improved `--time-unit` option in analyze mode to apply time unit conversion across all analysis sections, not just kernel top stats.
+* Improved logic to obtain rocprof supported counters which prevents unnecessary warnings.
+* Improved post-analysis runtime performance by caching and multi-processing.
+* Improve analysis block based filtering to accept metric ID level filtering.
+  * This can be used to collect individual metrics from various sections of analysis config.
+
+#### Resolved issues
+
+* Fixed an issue of not detecting the memory clock when using `amd-smi`.
+* Fixed standalone GUI crashing.
+* Fixed L2 read/write/atomic bandwidths on AMD Instinct MI350 Series GPUs.
+* Fixed an issue where accumulation counters could not be collected on AMD Instinct MI100.
+* Fixed an issue of kernel filtering not working in the roofline chart.
+
+#### Known issues
+
+* MI300A/X L2-Fabric 64B read counter may display negative values - The rocprof-compute metric 17.6.1 (Read 64B) can report negative values due to incorrect calculation when TCC_BUBBLE_sum + TCC_EA0_RDREQ_32B_sum exceeds TCC_EA0_RDREQ_sum.
+  * A workaround has been implemented using max(0, calculated_value) to prevent negative display values while the root cause is under investigation.
+
+### **ROCm Systems Profiler** (1.2.0)
+
+#### Added
+
+- ``ROCPROFSYS_ROCM_GROUP_BY_QUEUE`` configuration setting to allow grouping of events by hardware queue, instead of the default grouping.
+- Support for `rocpd` database output with the `ROCPROFSYS_USE_ROCPD` configuration setting.
+- Support for profiling PyTorch workloads using the `rocpd` output database.
+- Support for tracing OpenMP API in Fortran applications.
+- An error warning that is triggered if the profiler application fails due to SELinux enforcement being enabled. The warning includes steps to disable SELinux enforcement.
+
+#### Changed
+
+- Updated the grouping of "kernel dispatch" and "memory copy" events in Perfetto traces. They are now grouped together by HIP Stream rather than separately and by hardware queue.
+- Updated PAPI module to v7.2.0b2.
+- ROCprofiler-SDK is now used for tracing OMPT API calls.
+
+### **rocPRIM** (4.1.0)
+
+#### Added
+
+* `get_sreg_lanemask_lt`, `get_sreg_lanemask_le`, `get_sreg_lanemask_gt` and `get_sreg_lanemask_ge`.
+* `rocprim::transform_output_iterator` and `rocprim::make_transform_output_iterator`.
+* Experimental support for SPIR-V, to use the correct tuned config for part of the appliable algorithms.
+* A new cmake option, `BUILD_OFFLOAD_COMPRESS`. When rocPRIM is build with this option enabled, the `--offload-compress` switch is passed to the compiler. This causes the compiler to compress the binary that it generates. Compression can be useful in cases where you are compiling for a large number of targets, since this often results in a large binary. Without compression, in some cases, the generated binary may become so large symbols are placed out of range, resulting in linking errors. The new `BUILD_OFFLOAD_COMPRESS` option is set to `ON` by default.
+* A new CMake option `-DUSE_SYSTEM_LIB` to allow tests to be built from `ROCm` libraries provided by the system.
+* `rocprim::apply` which applies a function to a `rocprim::tuple`.
+
+#### Changed
+
+* Changed tests to support `ptr-to-const` output in `/test/rocprim/test_device_batch_memcpy.cpp`.
+
+#### Optimized
+
+* Improved performance of many algorithms by updating their tuned configs.
+  * 891 specializations have been improved.
+  * 399 specializations have been added.
+
+#### Resolved issues
+
+* Fixed `device_select`, `device_merge`, and `device_merge_sort` not allocating the correct amount of virtual shared memory on the host.
+* Fixed the `-&gt;` operator for the `transform_iterator`, the `texture_cache_iterator`, and the `arg_index_iterator`, by now returning a proxy pointer.
+  * The `arg_index_iterator` also now only returns the internal iterator for the `-&gt;`.
+
+#### Upcoming changes
+
+* Deprecated the `-&gt;` operator for the `zip_iterator`.
+
+### **ROCProfiler** (2.0.0)
+
+#### Removed
+
+* `rocprofv2` doesn't support gfx12. For gfx12, use `rocprofv3` tool.
+
+### **rocPyDecode** (0.7.0)
+
+#### Added
+* rocPyJpegPerfSample - samples for JPEG decode
+
+#### Changed
+* Package - rocjpeg set as required dependency.
+* rocDecode host - rocdecode host linking updates
+
+#### Resolved issues
+* rocJPEG Bindings - bugfixes
+* Test package - find dependencies updated
+
+### **rocRAND** (4.1.0)
+
+#### Changed
+
+* Changed the `USE_DEVICE_DISPATCH` flag so it can turn device dispatch off by setting it to zero. Device dispatch should be turned off when building for SPIRV.
+
+#### Resolved issues
+
+* Updated error handling for several rocRAND unit tests to accommodate the new `hipGetLastError` behavior that was introduced in ROCm 7.0.
+As of ROCm 7.0, the internal error state is cleared on each call to `hipGetLastError` rather than on every HIP API call.
+
+### **rocSOLVER** (3.30.0)
+
+#### Added
+
+* Hybrid computation support for existing routines: STEQR
+
+#### Optimized
+
+Improved the performance of:
+
+* BDSQR and downstream functions such as GESVD.
+* STEQR and downstream functions such as SYEV/HEEV.
+* LARFT and downstream functions such as GEQR2 and GEQRF.
+
+### **rocSPARSE** (4.1.0)
+
+#### Added
+
+* Brain half float mixed precision for the following routines:
+   * `rocsparse_axpby` where X and Y use bfloat16 and result and the compute type use float.
+   * `rocsparse_spvv` where X and Y use bfloat16 and result and the compute type use float.
+   * `rocsparse_spmv` where A and X use bfloat16 and Y and the compute type use float.
+   * `rocsparse_spmm` where A and B use bfloat16 and C and the compute type use float.
+   * `rocsparse_sddmm` where A and B use bfloat16 and C and the compute type use float.
+   * `rocsparse_sddmm` where A and B and C use bfloat16 and the compute type use float.
+* Half float mixed precision to `rocsparse_sddmm` where A and B and C use float16 and the compute type use float.
+* Brain half float uniform precision to `rocsparse_scatter` and `rocsparse_gather` routines.
+
+#### Optimized
+
+* Improved the user documentation.
+
+#### Upcoming changes
+
+* Deprecate trace, debug, and bench logging using the environment variable `ROCSPARSE_LAYER`.
+
+### **rocThrust** (4.1.0)
+
+#### Added
+
+* A new CMake option `-DSQLITE_USE_SYSTEM_PACKAGE` to allow SQLite to be provided by the system.
+* Introduced `libhipcxx` as a soft dependency. When `liphipcxx` can be included, rocThrust can use structs and methods defined in `libhipcxx`. This allows for a more complete behavior parity with CCCL and mirrors CCCL's thrust own dependency on `libcudacxx`.
+* Added a new CMake option `-DUSE_SYSTEM_LIB` to allow tests to be built from `ROCm` libraries provided by the system.
+
+#### Changed
+
+* The previously hidden cmake build option `FORCE_DEPENDENCIES_DOWNLOAD` has been unhidden and renamed `EXTERNAL_DEPS_FORCE_DOWNLOAD` to differentiate it from the new rocPRIM and rocRAND dependency options described above. Its behavior remains the same - it forces non-ROCm dependencies (Google Benchmark, Google Test, and SQLite) to be downloaded instead of searching for existing installed packages. This option defaults to `OFF`.
+
+#### Removed
+
+* The previous dependency-related build options `DOWNLOAD_ROCPRIM` and `DOWNLOAD_ROCRAND` have been removed. Use `ROCPRIM_FETCH_METHOD=DOWNLOAD` and `ROCRAND_FETCH_METHOD=DOWNLOAD` instead.
+
+#### Known issues
+
+* `event` test is failing on CI and local runs on MI300, MI250 and MI210.
+
+* rocThrust, as well as its dependencies rocPRIM and rocRAND have been moved into the new `rocm-libraries` monorepo repository (https://github.com/ROCm/rocm-libraries). This repository contains a number of ROCm libraries that are frequently used together.
+  * The repository migration requires a few changes to the way that rocThrust's ROCm library dependencies are fetched.
+  * There are new cmake options for obtaining rocPRIM and (optionally, if BUILD_BENCHMARKS is enabled) rocRAND.
+  * cmake build options `ROCPRIM_FETCH_METHOD` and `ROCRAND_FETCH_METHOD` may be set to one of the following:
+    * `PACKAGE` - (default) searches for a preinstalled packaged version of the dependency. If it is not found, the build will fall back using option `DOWNLOAD`, described below.
+    * `DOWNLOAD` - downloads the dependency from the rocm-libraries repository. If git >= 2.25 is present, this option uses a sparse checkout that avoids downloading more than it needs to. If not, the whole monorepo is downloaded (this may take some time).
+    * `MONOREPO` - this option is intended to be used if you are building rocThrust from within a copy of the rocm-libraries repository that you have cloned (and therefore already contains the dependencies rocPRIM and rocRAND). When selected, the build will try to find the dependency in the local repository tree. If it cannot be found, the build will attempt to add it to the local tree using a sparse-checkout. If that also fails, it will fall back to using the `DOWNLOAD` option.
+
+### **RPP** (2.1.0)
+
+#### Added
+
+* Solarize augmentation for HOST and HIP.
+* Hue and Saturation adjustment augmentations for HOST and HIP.
+* Find RPP - cmake module.
+* Posterize augmentation for HOST and HIP.
+
+#### Changed
+
+* HALF - Fix `half.hpp` path updates.
+* Box filter - padding updates.
+
+#### Removed
+
+* Packaging - Removed Meta Package dependency for HIP.
+* SLES 15 SP6 support.
+
+#### Resolved issues
+
+* Test Suite - Fixes for accuracy.
+* HIP Backend - Check return status warning fixes.
+* Bugfix - HIP vector types init.
+
 ## ROCm 7.0.2

 See the [ROCm 7.0.2 release notes](https://rocm.docs.amd.com/en/docs-7.0.2/about/release-notes.html#rocm-7-0-2-release-notes)
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -40,15 +40,23 @@ The following are notable new features and improvements in ROCm 7.1.0. For chang

 ### Supported hardware, operating system, and virtualization changes

-Hardware support remains unchanged in this release. For more information about supported AMD hardware, see [Supported GPUs (Linux)](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.0.2/reference/system-requirements.html#supported-gpus). 
+ROCm 7.1.0 extends the operating system support for the following AMD hardware:

-Debain 13 support has been extended for AMD Instinct MI355X and MI350X GPUs. For more information about supported operating systems, see [Supported operating systems](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.0.2/reference/system-requirements.html#supported-operating-systems) and [ROCm installation for Linux](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.0.2/).
+* AMD Instinct MI355X and MI350X GPUs adds support for Debian 13.
+* AMD Instinct MI325X GPUs adds support for RHEL 10.0, SLES15 SP7, Debian 13, Debian 12, Oracle Linux 10, and Oracle Linux 9.
+* AMD Instinct MI100 adds support for SLES 15 SP7.
+
+For more information about supported: 
+
+* AMD hardware, see [Supported GPUs (Linux)](https://rocm.docs.amd.com/projects/install-on-linux-internal/en/latest/reference/system-requirements.html#supported-gpus). 
+
+* Operating systems, see [Supported operating systems](https://rocm.docs.amd.com/projects/install-on-linux-internal/en/latest/reference/system-requirements.html#supported-operating-systems) and [ROCm installation for Linux](https://rocm.docs.amd.com/projects/install-on-linux-internal/en/latest/).

 #### Virtualization support

 ROCm 7.1.0 adds Guest OS support for RHEL 10.0 in KVM SR-IOV for AMD Instinct MI355X and MI350X GPUs.

-For more information, see  [Virtualization Support](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-7.0.2/reference/system-requirements.html#virtualization-support).
+For more information, see  [Virtualization Support](https://rocm.docs.amd.com/projects/install-on-linux-internal/en/latest/reference/system-requirements.html#virtualization-support).

 ### User space, driver, and firmware dependent changes

@@ -87,7 +95,7 @@ firmware, AMD GPU drivers, and the ROCm user space software.
          <td rowspan="9" style="vertical-align: middle;">ROCm 7.1.0</td>
          <td>MI355X</td>
          <td>
-              01.25.15.04<br>
+              01.25.15.04 (or later)<br>
              01.25.13.09
          </td>
          <td>30.20.0<br>
@@ -99,7 +107,7 @@ firmware, AMD GPU drivers, and the ROCm user space software.
      <tr>
          <td>MI350X</td>
          <td>
-              01.25.15.04<br>
+              01.25.15.04 (or later)<br>
              01.25.13.09
          </td>
          <td>30.20.0<br>
@@ -172,7 +180,7 @@ AMD Instinct MI300X is enabled to provide the capability to set power cap in 1VF

 #### Virtualization update for AMD Instinct MI350 Series GPUs

-* Enabled SPX/NPS1 support for multi-tenant (1VM, 2VM, 4VM, and 8VM). This feature depends on PLDM bundle 01.25.15.04
+* Enabled SPX/NPS1 support for multi-tenant (1VM, 2VM, 4VM, and 8VM). This feature depends on PLDM bundle 01.25.15.04.

 * Enabled CPX/NPS2 support (1VF/OAM). This feature depends on PLDM bundle 01.25.15.04.

@@ -182,7 +190,7 @@ AMD Instinct MI300X is enabled to provide the capability to set power cap in 1VF

 ### HIP runtime compatibility improvements

-In ROCm 7.1.0, new functionalities were added in HIP runtime including the following, in correspondence with CUDA.  
+In ROCm 7.1.0, new functionalities were added in HIP runtime including the following, in correspondence with NVIDIA CUDA.  

 * New HIP APIs added for: 

@@ -201,7 +209,7 @@ For detailed enhancements and updates refer to the [HIP Changelog](#hip-7-1-0).
 hipSPARSELt introduces significant performance enhancements for structured sparsity matrix multiplication (SpMM) on AMD Instinct MI300X GPUs:

 * New feature support -- Enabled multiple buffer single kernel execution for SpMM, improving efficiency in Split-K method scenarios.
-* Kernel optimization -- Added multiple high-performance kernels optimized for FP16 and BF16 data types, enhancing heuristic-based execution.
+* Kernel optimization -- Added multiple high-performance kernels optimized for `FP16` and `BF16` data types, enhancing heuristic-based execution.
 * Tuning efficiency -- Improved the tuning process for SpMM kernels, resulting in better runtime adaptability and performance.

 ### RPP: New hue and saturation augmentations
@@ -228,7 +236,7 @@ hipBLASlt introduces several performance and model compatibility improvements fo
 * FP32 kernel optimization for MI350X, improving precision-based workloads.
 * Meta Model Optimization for MI350X, enabling better performance across transformer-based models.
 * Llama 2 70B model support fix: Removed incorrect kernel to ensure accurate and stable execution.
-* For AMD Instinct MI350X GPUs specific, added multiple high-performance kernels optimized for FP16 and BF16 data types, enhancing heuristic-based execution.
+* For AMD Instinct MI350X GPUs specific, added multiple high-performance kernels optimized for `FP16` and `BF16` data types, enhancing heuristic-based execution.

 ### TensileLite: Enhanced SpMM kernel tuning efficiency

@@ -281,10 +289,7 @@ ROCm 7.1.0 introduces two key compiler enhancements:
 ROCm provides a comprehensive ecosystem for deep learning development. For more information, see [Deep learning frameworks for ROCm](https://rocm.docs.amd.com/en/docs-7.0.2/how-to/deep-learning-rocm.html) and the [Compatibility
 matrix](../../docs/compatibility/compatibility-matrix.rst) for the complete list of Deep learning and AI framework versions tested for compatibility with ROCm.

-#### Updated framework support
-ROCm 7.1.0 introduces several newly supported versions of Deep learning and AI frameworks:
-
-##### TensorFlow
+#### TensorFlow
 ROCm 7.1.0 enables support for TensorFlow 2.20.0.

 ### ROCm Offline Installer Creator updates
@@ -731,7 +736,7 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc
    - `hipMemcpyBatchAsync` performs a batch of 1D or 2D memory copied asynchronously
    - `hipMemcpy3DBatchAsync` performs a batch of 3D memory copied asynchronously
    - `hipMemcpy3DPeer` copies memory between devices
-    - `hipMemcpy3DPeerAsync`copied memory between devices asynchronously
+    - `hipMemcpy3DPeerAsync`copies memory between devices asynchronously
    - `hipMemsetD2D32Async` used for setting 2D memory range with specified 32-bit values
      asynchronously
    - `hipMemPrefetchAsync_v2`  prefetches memory to the specified location
@@ -740,22 +745,22 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc
    - `hipSetValidDevices`      sets a default list of devices that can be used by HIP
    - `hipStreamGetId`          queries the ID of a stream
 * Support for the flag `hipMemLocationTypeHost`, enables handling virtual memory management in host memory location, in addition to device memory.
-* Support for nested tile partitioning within cooperative groups, matching NVIDIA CUDA functionality.
-
-#### Resolved issues
-
-* A segmentation fault occurred in the application when capturing the same HIP graph from multiple streams with cross-stream dependencies.  HIP runtime fixed an issue where a forked stream joined to a parent stream that was not originally created with the API `hipStreamBeginCapture`.
-* Different behavior of en-queuing command on a legacy stream during stream capture on AMD ROCM platform, compared with NVIDIA CUDA. HIP runtime now returns an error in this specific situation to match CUDA behavior.
-* Failure of memory access fault occurred in rocm-examples test suite. When Heterogeneous Memory Management (HMM) is not supported in the driver, `hipMallocManaged` will only allocate system memory in HIP runtime.
+* Support for nested tile partitioning within cooperative groups, matching CUDA functionality.

 #### Optimized

 * Improved hip module loading latency.
 * Optimized kernel metadata retrieval during module post load.
-* Optimized doorbell ring in HIP runtime, advantages the following for performance improvement,
-    - Makes efficient packet batching for HIP graph launch
-    - Dynamic packet copying based on a defined maximum threshold or power-of-2 staggered copy pattern
-    - If timestamps are not collected for a signal for reuse, it creates a new signal. This can potentially increase the signal footprint if the handler doesn't run fast enough
+* Optimized doorbell ring in HIP runtime, advantages the following for performance improvement:
+    - Makes efficient packet batching for HIP graph launch.
+    - Dynamic packet copying based on a defined maximum threshold or power-of-2 staggered copy pattern.
+    - If timestamps are not collected for a signal for reuse, it creates a new signal. This can potentially increase the signal footprint if the handler doesn't run fast enough.
+
+#### Resolved issues
+
+* A segmentation fault occurred in the application when capturing the same HIP graph from multiple streams with cross-stream dependencies.  The HIP runtime has fixed an issue where a forked stream joined to a parent stream that was not originally created with the API `hipStreamBeginCapture`.
+* Different behavior of en-queuing command on a legacy stream during stream capture on AMD ROCM platform, compared with CUDA. HIP runtime now returns an error in this specific situation to match CUDA behavior.
+* Failure of memory access fault occurred in rocm-examples test suite. When Heterogeneous Memory Management (HMM) is not supported in the driver, `hipMallocManaged` will only allocate system memory in HIP runtime.

 #### Known issues

@@ -772,14 +777,14 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc

 #### Changed

-* Improve the build time for clients by removing `clients_common.cpp` from the hipblas-test build.
+* Improved the build time for clients by removing `clients_common.cpp` from the hipblas-test build.

 ### **hipBLASLt** (1.1.0)

 #### Added

 * Fused Clamp GEMM for ``HIPBLASLT_EPILOGUE_CLAMP_EXT`` and ``HIPBLASLT_EPILOGUE_CLAMP_BIAS_EXT``. This feature requires the minimum (``HIPBLASLT_MATMUL_DESC_EPILOGUE_ACT_ARG0_EXT``) and maximum (``HIPBLASLT_MATMUL_DESC_EPILOGUE_ACT_ARG1_EXT``) to be set.
-* Support for ReLU/Clamp activation functions with auxiliary output for the `f16` and `bf16` data types for gfx942 to capture intermediate results. This feature is enabled for ``HIPBLASLT_EPILOGUE_RELU_AUX``, ``HIPBLASLT_EPILOGUE_RELU_AUX_BIAS``, ``HIPBLASLT_EPILOGUE_CLAMP_AUX_EXT``, and ``HIPBLASLT_EPILOGUE_CLAMP_AUX_BIAS_EXT``.
+* Support for ReLU/Clamp activation functions with auxiliary output for the `FP16` and `BF16` data types for gfx942 to capture intermediate results. This feature is enabled for ``HIPBLASLT_EPILOGUE_RELU_AUX``, ``HIPBLASLT_EPILOGUE_RELU_AUX_BIAS``, ``HIPBLASLT_EPILOGUE_CLAMP_AUX_EXT``, and ``HIPBLASLT_EPILOGUE_CLAMP_AUX_BIAS_EXT``.
 * Support for `HIPBLAS_COMPUTE_32F_FAST_16BF` for FP32 data type for gfx950 only.
 * CPP extension APIs ``setMaxWorkspaceBytes`` and ``getMaxWorkspaceBytes``.
 * Feature to print logs (using ``HIPBLASLT_LOG_MASK=32``) for Grouped GEMM.
@@ -803,7 +808,6 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc
 * `projects/hipcub/hipcub/include/hipcub/backend/rocprim/util_mdspan.hpp` to support `::hipcub::extents`.
 * `::hipcub::ForEachInExtents` API.
 * `hipcub::DeviceTransform::Transform` and `hipcub::DeviceTransform::TransformStableArgumentAddresses`.
-
 * hipCUB and its dependency rocPRIM have been moved into the new `rocm-libraries` [monorepo repository](https://github.com/ROCm/rocm-libraries). This repository contains a number of ROCm libraries that are frequently used together.
  * The repository migration requires a few changes to the way that hipCUB fetches library dependencies.
  * CMake build option `ROCPRIM_FETCH_METHOD` may be set to one of the following:
@@ -813,6 +817,16 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc

 * A new CMake option `-DUSE_SYSTEM_LIB` to allow tests to be built from installed `hipCUB` provided by the system.

+#### Changed
+
+* Changed include headers to avoid relative includes that have slipped in.
+* Changed `CUDA_STANDARD` for tests in `test/hipcub`, due to C++17 APIs such as `std::exclusive_scan` is used in some tests. Still use `CUDA_STANDARD 14` for `test/extra`.
+* Changed `CCCL_MINIMUM_VERSION` to `2.8.2` to align with CUB.
+* Changed `cmake_minimum_required` from `3.16` to `3.18`, in order to support `CUDA_STANDARD 17` as a valid value.
+* Add support for large num_items `DeviceScan`, `DevicePartition` and `Reduce::{ArgMin, ArgMax}`.
+* Added tests for large num_items.
+* The previous dependency-related build option `DEPENDENCIES_FORCE_DOWNLOAD` has been renamed `EXTERNAL_DEPS_FORCE_DOWNLOAD` to differentiate it from the new rocPRIM dependency option described above. Its behavior remains the same - it forces non-ROCm dependencies (Google Benchmark and Google Test) to be downloaded rather than searching for installed packages. This option defaults to `OFF`.
+
 #### Removed

 * Removed `TexRefInputIterator`, which was removed from CUB after CCCL's 2.6.0 release. This API should have already been removed, but somehow it remained and was not tested.
@@ -824,16 +838,6 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc
 * Deprecated almost all functions in `projects/hipcub/hipcub/include/hipcub/backend/rocprim/util_ptx.hpp`.
 * Deprecated hipCUB macros: `HIPCUB_MAX`, `HIPCUB_MIN`, `HIPCUB_QUOTIENT_FLOOR`, `HIPCUB_QUOTIENT_CEILING`, `HIPCUB_ROUND_UP_NEAREST` and `HIPCUB_ROUND_DOWN_NEAREST`.

-#### Changed
-
-* Changed include headers to avoid relative includes that have slipped in.
-* Changed `CUDA_STANDARD` for tests in `test/hipcub`, due to C++17 APIs such as `std::exclusive_scan` is used in some tests. Still use `CUDA_STANDARD 14` for `test/extra`.
-* Changed `CCCL_MINIMUM_VERSION` to `2.8.2` to align with CUB.
-* Changed `cmake_minimum_required` from `3.16` to `3.18`, in order to support `CUDA_STANDARD 17` as a valid value.
-* Add support for large num_items `DeviceScan`, `DevicePartition` and `Reduce::{ArgMin, ArgMax}`.
-* Added tests for large num_items.
-* The previous dependency-related build option `DEPENDENCIES_FORCE_DOWNLOAD` has been renamed `EXTERNAL_DEPS_FORCE_DOWNLOAD` to differentiate it from the new rocPRIM dependency option described above. Its behavior remains the same - it forces non-ROCm dependencies (Google Benchmark and Google Test) to be downloaded rather than searching for installed packages. This option defaults to `OFF`.
-
 #### Known issues

 * The `__half` template specializations of Simd operators are currently disabled due to possible build issues with PyTorch.
@@ -875,12 +879,13 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc

 #### Added

-* Brain half float mixed precision to `hipsparseAxpby` where X and Y use bfloat16 and result and the compute type use float.
-* Brain half float mixed precision to `hipsparseSpVV` where X and Y use bfloat16 and result and the compute type use float.
-* Brain half float mixed precision to `hipsparseSpMV` where A and X use bfloat16 and Y and the compute type use float.
-* Brain half float mixed precision to `hipsparseSpMM` where A and B use bfloat16 and C and the compute type use float.
-* Brain half float mixed precision to `hipsparseSDDMM` where A and B use bfloat16 and C and the compute type use float.
-* Brain half float mixed precision to `hipsparseSDDMM` where A and B and C use bfloat16 and the compute type use float.
+* Brain half float mixed precision for the following routines:
+    * `hipsparseAxpby` where X and Y use bfloat16 and result and the compute type use float.
+    * `hipsparseSpVV` where X and Y use bfloat16 and result and the compute type use float.
+    * `hipsparseSpMV` where A and X use bfloat16 and Y and the compute type use float.
+    * `hipsparseSpMM` where A and B use bfloat16 and C and the compute type use float.
+    * `hipsparseSDDMM` where A and B use bfloat16 and C and the compute type use float.
+    * `hipsparseSDDMM` where A and B and C use bfloat16 and the compute type use float.
 * Half float mixed precision to `hipsparseSDDMM` where A and B and C use float16 and the compute type use float.
 * Brain half float uniform precision to `hipsparseScatter` and `hipsparseGather` routines.
 * Documentation for installing and building hipSPARSE on Microsoft Windows.
@@ -1031,6 +1036,23 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc
 * Symmetric memory kernels are currently disabled due to ongoing CUMEM enablement work.
 * When running this version of RCCL using ROCm versions earlier than 6.4.0, the user must set the environment flag `HSA_NO_SCRATCH_RECLAIM=1`.

+### **ROCm Data Center Tool** (1.2.0)
+
+#### Added
+
+- CPU monitoring support with 30+ CPU field definitions through AMD SMI integration.
+- CPU partition format support (c0.0, c1.0) for monitoring AMD EPYC processors.
+- Mixed GPU/CPU monitoring in single `rdci dmon` command.
+
+#### Optimized
+
+- Improved profiler metrics path detection for counter definitions.
+
+#### Resolved issues
+
+- Group management issues with listing created/non-created groups.
+- ECC_UNCORRECT field behavior.
+
 ### **rocAL** (2.4.0)

 #### Added
@@ -1126,79 +1148,34 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc
 * Live attach/detach feature that allows coupling with a workload process, without controlling its start or end.
  * Use '--attach-pid' to specify the target process ID.
  * Use '--attach-duration-msec' to specify time duration.
-
 * `rocpd` choice for `--format-rocprof-output` option in profile mode.
-
 * `--retain-rocpd-output` option in profile mode to save large raw rocpd databases in workload directory.
-
-* Feature to show description of metrics during analysis
+* Feature to show description of metrics during analysis.
  * Use `--include-cols Description` to show the Description column, which is excluded by default from the
  ROCm Compute Profiler CLI output.
 * `--set` filtering option in profile mode to enable single-pass counter collection for predefined subsets of metrics.
 * `--list-sets` filtering option in profile mode to list the sets available for single pass counter collection.
-
 * Missing counters based on register specification which enables missing metrics.
-  * Enable `SQC_DCACHE_INFLIGHT_LEVEL` counter and associated metrics.
-  * Enable `TCP_TCP_LATENCY` counter and associated counter for all GPUs except MI300.
-
-* Added interactive metric descriptions in TUI analyze mode.
+  * Enabled `SQC_DCACHE_INFLIGHT_LEVEL` counter and associated metrics.
+  * Enabled `TCP_TCP_LATENCY` counter and associated counter for all GPUs except MI300.
+* Interactive metric descriptions in TUI analyze mode.
  * You can now left click on any metric cell to view detailed descriptions in the dedicated `METRIC DESCRIPTION` tab.
-
 * Support for analysis report output as a sqlite database using ``--output-format db`` analysis mode option.
+* `Compute Throughput` panel to TUI's `High Level Analysis` category with the following metrics: VALU FLOPs, VALU IOPs, MFMA FLOPs (F8), MFMA FLOPs (BF16), MFMA FLOPs (F16), MFMA FLOPs (F32), MFMA FLOPs (F64), MFMA FLOPs (F6F4) (in gfx950), MFMA IOPs (Int8), SALU Utilization, VALU Utilization, MFMA Utilization, VMEM Utilization, Branch Utilization, IPC

-* `Compute Throughput` panel to TUI's `High Level Analysis` category with the following metrics:
-  * VALU FLOPs
-  * VALU IOPs
-  * MFMA FLOPs (F8)
-  * MFMA FLOPs (BF16)
-  * MFMA FLOPs (F16)
-  * MFMA FLOPs (F32)
-  * MFMA FLOPs (F64)
-  * MFMA FLOPs (F6F4) (in gfx950)
-  * MFMA IOPs (Int8)
-  * SALU Utilization
-  * VALU Utilization
-  * MFMA Utilization
-  * VMEM Utilization
-  * Branch Utilization
-  * IPC
-
-* `Memory Throughput` panel to TUI's `High Level Analysis` category with the following metrics:
-  * vL1D Cache BW
-  * vL1D Cache Utilization
-  * Theoretical LDS Bandwidth
-  * LDS Utilization
-  * L2 Cache BW
-  * L2 Cache Utilization
-  * L2-Fabric Read BW
-  * L2-Fabric Write BW
-  * sL1D Cache BW
-  * L1I BW
-  * Address Processing Unit Busy
-  * Data-Return Busy
-  * L1I-L2 Bandwidth
-  * sL1D-L2 BW
-
+* `Memory Throughput` panel to TUI's `High Level Analysis` category with the following metrics: vL1D Cache BW, vL1D Cache Utilization, Theoretical LDS Bandwidth, LDS Utilization, L2 Cache BW, L2 Cache Utilization, L2-Fabric Read BW, L2-Fabric Write BW, sL1D Cache BW, L1I BW, Address Processing Unit Busy, Data-Return Busy, L1I-L2 Bandwidth, sL1D-L2 BW
 * Roofline support for Debian 12 and Azure Linux 3.0.
+* Notice for change in default output format to `rocpd` in a future release
+  * This is displayed when `--format-rocprof-output rocpd` is not used in profile mode

 #### Changed

-* On memory chart, long string of numbers are displayed as scientific notation. It also solves the issue of overflow of displaying long number
-
-* Add notice for change in default output format to `rocpd` in a future release
-  * This is displayed when `--format-rocprof-output rocpd` is not used in profile mode
-
-* When `--format-rocprof-output rocpd` is used, only pmc_perf.csv will be written to workload directory instead of mulitple csv files.
-
-* Improve analysis block based filtering to accept metric ID level filtering
-  * This can be used to collect individual metrics from various sections of analysis config
-
-* CLI analysis mode baseline comparison will now only compare common metrics across workloads and will not show Metric ID
-  * Remove metrics from analysis configuration files which are explicitly marked as empty or None
-
+* In the memory chart, long string of numbers are now displayed as scientific notation. It also solves the issue of overflow of displaying long number
+* When `--format-rocprof-output rocpd` is used, only `pmc_perf.csv` will be written to workload directory instead of mulitple csv files.
+* CLI analysis mode baseline comparison will now only compare common metrics across workloads and will not show the Metric ID.
+  * Removed metrics from analysis configuration files which are explicitly marked as empty or None.
 * Changed the basic (default) view of TUI from aggregated analysis data to individual kernel analysis data.
-
-* Update `Unit` of the following `Bandwidth` related metrics to `Gbps` instead of `Bytes per Normalization Unit`
+* Updated `Unit` of the following `Bandwidth` related metrics to `Gbps` instead of `Bytes per Normalization Unit`:
  * Theoretical Bandwidth (section 1202)
  * L1I-L2 Bandwidth (section 1303)
  * sL1D-L2 BW (section 1403)
@@ -1209,16 +1186,14 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc
  * Bandwidth (section 1703)
  * Atomic/Read/Write Bandwidth (section 1703)
  * Atomic/Read/Write Bandwidth - (HBM/PCIe/Infinity Fabric) (section 1706)
-
-* Add `Utilization` to metric name for the following `Bandwidth` related metrics whose `Unit` is `Percent`
+* Updated the metric name for the following `Bandwidth` related metrics whose `Unit` is `Percent` by adding `Utilization`:
  * Theoretical Bandwidth Utilization (section 1201)
  * L1I-L2 Bandwidth Utilization (section 1301)
  * Bandwidth Utilization (section 1301)
  * Bandwidth Utilization (section 1401)
  * sL1D-L2 BW Utilization (section 1401)
  * Bandwidth Utilization (section 1601)
-
-* Update `System Speed-of-Light` panel to `GPU Speed-of-Light` in TUI with the following metrics:
+* Updated `System Speed-of-Light` panel to `GPU Speed-of-Light` in TUI for the following metrics:
  * Theoretical LDS Bandwidth
  * vL1D Cache BW
  * L2 Cache BW
@@ -1228,44 +1203,43 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc
  * Kernel Time (Cycles)
  * SIMD Utilization
  * Clock Rate
-
 * Analysis output:
-  * Replace `-o / --output` analyze mode option with `--output-format` and `--output-name`
-    * Add ``--output-format`` analysis mode option to select the output format of the analysis report.
-    * Add ``--output-name`` analysis mode option to override the default file/folder name.
-  * Replace `--save-dfs` analyze mode option with `--output-format csv`
-
+  * Replaced `-o / --output` analyze mode option with `--output-format` and `--output-name`.
+    * Use ``--output-format`` analysis mode option to select the output format of the analysis report.
+    * Use ``--output-name`` analysis mode option to override the default file/folder name.
+  * Replaced `--save-dfs` analyze mode option with `--output-format csv`.
 * Command-line options:
  * `--list-metrics` and `--config-dir` options moved to general command-line options.
-  * * `--list-metrics` option cannot be used without argument (GPU architecture).
+  * `--list-metrics` option cannot be used without GPU architecture argument.
  * `--list-metrics` option do not show number of L2 channels.
  * `--list-available-metrics` profile mode option to display the metrics available for profiling in current GPU.
  * `--list-available-metrics` analyze mode option to display the metrics available for analysis.
  * `--block` option cannot be used with `--list-metrics` and `--list-available-metrics`options.
-
-* Default rocprof interface changed from rocprofv3 to rocprofiler-sdk
+* Default `rocprof` interface changed from `rocprofv3` to `rocprofiler-sdk`
  * Use ROCPROF=rocprofv3 to use rocprofv3 interface
+* Updated metric names for better alignment between analysis configuration and documentation.

 #### Removed

 * Usage of `rocm-smi` in favor of `amd-smi`.
 * Hardware IP block-based filtering has been removed in favor of analysis report block-based filtering.
-* Removed aggregated analysis view from TUI analyze mode.
+* Aggregated analysis view from TUI analyze mode.

 #### Optimized

 * Improved `--time-unit` option in analyze mode to apply time unit conversion across all analysis sections, not just kernel top stats.
 * Improved logic to obtain rocprof supported counters which prevents unnecessary warnings.
 * Improved post-analysis runtime performance by caching and multi-processing.
+* Improve analysis block based filtering to accept metric ID level filtering.
+  * This can be used to collect individual metrics from various sections of analysis config.

 #### Resolved issues

 * Fixed an issue of not detecting the memory clock when using `amd-smi`.
 * Fixed standalone GUI crashing.
-* Fixed L2 read/write/atomic bandwidths on AMD Instinct MI350 series accelerators.
-* Update metric names for better alignment between analysis configuration and documentation
+* Fixed L2 read/write/atomic bandwidths on AMD Instinct MI350 Series GPUs.
 * Fixed an issue where accumulation counters could not be collected on AMD Instinct MI100.
-* Fixed an issue of kernel filtering not working in the roofline chart
+* Fixed an issue of kernel filtering not working in the roofline chart.

 #### Known issues

@@ -1303,22 +1277,22 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc

 * Changed tests to support `ptr-to-const` output in `/test/rocprim/test_device_batch_memcpy.cpp`.

-#### Optimizations
+#### Optimized

 * Improved performance of many algorithms by updating their tuned configs.
  * 891 specializations have been improved.
  * 399 specializations have been added.

-#### Upcoming changes
-
-* Deprecated the `-&gt;` operator for the `zip_iterator`.
-
 #### Resolved issues

 * Fixed `device_select`, `device_merge`, and `device_merge_sort` not allocating the correct amount of virtual shared memory on the host.
 * Fixed the `-&gt;` operator for the `transform_iterator`, the `texture_cache_iterator`, and the `arg_index_iterator`, by now returning a proxy pointer.
  * The `arg_index_iterator` also now only returns the internal iterator for the `-&gt;`.

+#### Upcoming changes
+
+* Deprecated the `-&gt;` operator for the `zip_iterator`.
+
 ### **ROCProfiler** (2.0.0)

 #### Removed
@@ -1340,15 +1314,15 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc

 ### **rocRAND** (4.1.0)

+#### Changed
+
+* Changed the `USE_DEVICE_DISPATCH` flag so it can turn device dispatch off by setting it to zero. Device dispatch should be turned off when building for SPIRV.
+
 #### Resolved issues

 * Updated error handling for several rocRAND unit tests to accommodate the new `hipGetLastError` behavior that was introduced in ROCm 7.0.
 As of ROCm 7.0, the internal error state is cleared on each call to `hipGetLastError` rather than on every HIP API call.

-#### Changed
-
-* Changed the `USE_DEVICE_DISPATCH` flag so it can turn device dispatch off by setting it to zero. Device dispatch should be turned off when building for SPIRV.
-
 ### **rocSOLVER** (3.30.0)

 #### Added
@@ -1393,6 +1367,14 @@ Improved the performance of:
 * Introduced `libhipcxx` as a soft dependency. When `liphipcxx` can be included, rocThrust can use structs and methods defined in `libhipcxx`. This allows for a more complete behavior parity with CCCL and mirrors CCCL's thrust own dependency on `libcudacxx`.
 * Added a new CMake option `-DUSE_SYSTEM_LIB` to allow tests to be built from `ROCm` libraries provided by the system.

+#### Changed
+
+* The previously hidden cmake build option `FORCE_DEPENDENCIES_DOWNLOAD` has been unhidden and renamed `EXTERNAL_DEPS_FORCE_DOWNLOAD` to differentiate it from the new rocPRIM and rocRAND dependency options described above. Its behavior remains the same - it forces non-ROCm dependencies (Google Benchmark, Google Test, and SQLite) to be downloaded instead of searching for existing installed packages. This option defaults to `OFF`.
+
+#### Removed
+
+* The previous dependency-related build options `DOWNLOAD_ROCPRIM` and `DOWNLOAD_ROCRAND` have been removed. Use `ROCPRIM_FETCH_METHOD=DOWNLOAD` and `ROCRAND_FETCH_METHOD=DOWNLOAD` instead.
+
 #### Known issues

 * `event` test is failing on CI and local runs on MI300, MI250 and MI210.
@@ -1405,44 +1387,46 @@ Improved the performance of:
    * `DOWNLOAD` - downloads the dependency from the rocm-libraries repository. If git >= 2.25 is present, this option uses a sparse checkout that avoids downloading more than it needs to. If not, the whole monorepo is downloaded (this may take some time).
    * `MONOREPO` - this option is intended to be used if you are building rocThrust from within a copy of the rocm-libraries repository that you have cloned (and therefore already contains the dependencies rocPRIM and rocRAND). When selected, the build will try to find the dependency in the local repository tree. If it cannot be found, the build will attempt to add it to the local tree using a sparse-checkout. If that also fails, it will fall back to using the `DOWNLOAD` option.

-#### Changed
-
-* The previously hidden cmake build option `FORCE_DEPENDENCIES_DOWNLOAD` has been unhidden and renamed `EXTERNAL_DEPS_FORCE_DOWNLOAD` to differentiate it from the new rocPRIM and rocRAND dependency options described above. Its behavior remains the same - it forces non-ROCm dependencies (Google Benchmark, Google Test, and SQLite) to be downloaded instead of searching for existing installed packages. This option defaults to `OFF`.
-
-#### Removed
-
-* The previous dependency-related build options `DOWNLOAD_ROCPRIM` and `DOWNLOAD_ROCRAND` have been removed. Use `ROCPRIM_FETCH_METHOD=DOWNLOAD` and `ROCRAND_FETCH_METHOD=DOWNLOAD` instead.
-
 ### **RPP** (2.1.0)

 #### Added

-* Solarize augmentation for HOST and HIP
-* Hue and Saturation adjustment augmentations for HOST and HIP 
-* Find RPP - cmake module
-* Posterize augmentation for HOST and HIP
+* Solarize augmentation for HOST and HIP.
+* Hue and Saturation adjustment augmentations for HOST and HIP.
+* Find RPP - cmake module.
+* Posterize augmentation for HOST and HIP.

 #### Changed

-* HALF - Fix half.hpp path updates
-* Box filter - padding updates
+* HALF - Fix `half.hpp` path updates.
+* Box filter - padding updates.

 #### Removed

-* Packaging - Remove Meta Package dependency for HIP
-* SLES 15 SP6 support
+* Packaging - Removed Meta Package dependency for HIP.
+* SLES 15 SP6 support.

 #### Resolved issues

-* Test Suite - Fixes for accuracy
-* HIP Backend - Check return status warning fixes
-* Bugfix - HIP vector types init
+* Test Suite - Fixes for accuracy.
+* HIP Backend - Check return status warning fixes.
+* Bugfix - HIP vector types init.

 ## ROCm known issues

 ROCm known issues are noted on {fab}`github` [GitHub](https://github.com/ROCm/ROCm/labels/Verified%20Issue). For known
 issues related to individual components, review the [Detailed component changes](#detailed-component-changes).

+### MIGraphX Python API will fail when running on Python 3.13
+
+Applications using the MIGraphX Python API will fail when running on Python 3.13 and return the error message `AttributeError: module 'migraphx' has no attribute 'parse_onnx'`. The issue does not occur when you manually build MIGraphX. For detailed instructions, see [Building from source](https://rocm.docs.amd.com/projects/AMDMIGraphX/en/latest/install/building_migraphx.html). As a workaround, change the Python version to the one found in the installed location:
+
+```
+ls -l /opt/rocm-7.0.0/lib/libmigraphx_py_*.so
+```
+The issue will be resolved in a future ROCm release. See [GitHub issue #5500](https://github.com/ROCm/ROCm/issues/5500).
+
+
 ## ROCm upcoming changes

 The following changes to the ROCm software stack are anticipated for future releases.
--- a/docs/compatibility/compatibility-matrix-historical-6.0.csv
+++ b/docs/compatibility/compatibility-matrix-historical-6.0.csv
@@ -4,10 +4,10 @@ ROCm Version,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6
      ,,,,,,,,,,,,,,,,"Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5","Ubuntu 20.04.6, 20.04.5"
      ,"RHEL 10.0 [#rhel-10-702-past-60]_, 9.6 [#rhel-10-702-past-60]_, 9.4 [#rhel-94-702-past-60]_","RHEL 10.0 [#rhel-10-702-past-60]_, 9.6 [#rhel-10-702-past-60]_, 9.4 [#rhel-94-702-past-60]_","RHEL 9.6 [#rhel-10-702-past-60]_, 9.4 [#rhel-94-702-past-60]_","RHEL 9.6, 9.4","RHEL 9.6, 9.4","RHEL 9.6, 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.5, 9.4","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.4, 9.3, 9.2","RHEL 9.3, 9.2","RHEL 9.3, 9.2"
      ,RHEL 8.10 [#rhel-700-past-60]_,RHEL 8.10 [#rhel-700-past-60]_,RHEL 8.10 [#rhel-700-past-60]_,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,RHEL 8.10,"RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.10, 8.9","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8","RHEL 8.9, 8.8"
-      ,SLES 15 SP7 [#sles-db-700-past-60]_,SLES 15 SP7 [#sles-db-700-past-60]_,SLES 15 SP7 [#sles-db-700-past-60]_,"SLES 15 SP7, SP6","SLES 15 SP7, SP6",SLES 15 SP6,SLES 15 SP6,"SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4"
+      ,SLES 15 SP7 [#sles-710-past-60]_,SLES 15 SP7 [#sles-db-700-past-60]_,SLES 15 SP7 [#sles-db-700-past-60]_,"SLES 15 SP7, SP6","SLES 15 SP7, SP6",SLES 15 SP6,SLES 15 SP6,"SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP6, SP5","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4","SLES 15 SP5, SP4"
      ,,,,,,,,,,,,,,,,,CentOS 7.9,CentOS 7.9,CentOS 7.9,CentOS 7.9,CentOS 7.9
-      ,"Oracle Linux 10, 9, 8 [#ol-700-mi300x-past-60]_","Oracle Linux 10, 9, 8 [#ol-700-mi300x-past-60]_","Oracle Linux 9, 8 [#ol-700-mi300x-past-60]_","Oracle Linux 9, 8 [#mi300x-past-60]_","Oracle Linux 9, 8 [#mi300x-past-60]_","Oracle Linux 9, 8 [#mi300x-past-60]_","Oracle Linux 9, 8 [#mi300x-past-60]_",Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,,,
-      ,"Debian 13 [#db-mi300x-past-60]_, 12 [#sles-db-700-past-60]_","Debian 13 [#db-mi300x-past-60]_, 12 [#sles-db-700-past-60]_",Debian 12 [#sles-db-700-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,,,,,,,,,,,
+      ,"Oracle Linux 10, 9, 8 [#ol-710-mi300x-past-60]_","Oracle Linux 10, 9, 8 [#ol-700-mi300x-past-60]_","Oracle Linux 9, 8 [#ol-700-mi300x-past-60]_","Oracle Linux 9, 8 [#mi300x-past-60]_","Oracle Linux 9, 8 [#mi300x-past-60]_","Oracle Linux 9, 8 [#mi300x-past-60]_","Oracle Linux 9, 8 [#mi300x-past-60]_",Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.10 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,Oracle Linux 8.9 [#mi300x-past-60]_,,,
+      ,"Debian 13 [#db-710-mi300x-past-60]_, 12 [#db12-710-past-60]_","Debian 13 [#db-mi300x-past-60]_, 12 [#sles-db-700-past-60]_",Debian 12 [#sles-db-700-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,Debian 12 [#single-node-past-60]_,,,,,,,,,,,
      ,Azure Linux 3.0 [#az-mi300x-past-60]_,Azure Linux 3.0 [#az-mi300x-past-60]_,Azure Linux 3.0 [#az-mi300x-past-60]_,Azure Linux 3.0 [#az-mi300x-past-60]_,Azure Linux 3.0 [#az-mi300x-past-60]_,Azure Linux 3.0 [#az-mi300x-past-60]_,Azure Linux 3.0 [#az-mi300x-past-60]_,Azure Linux 3.0 [#az-mi300x-630-past-60]_,Azure Linux 3.0 [#az-mi300x-630-past-60]_,,,,,,,,,,,,
      ,Rocky Linux 9 [#rl-700-past-60]_,Rocky Linux 9 [#rl-700-past-60]_,Rocky Linux 9 [#rl-700-past-60]_,,,,,,,,,,,,,,,,,,
      ,.. _architecture-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,
@@ -19,15 +19,15 @@ ROCm Version,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6
      ,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3,RDNA3
      ,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2,RDNA2
      ,.. _gpu-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,
-      :doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>`,gfx950 [#mi350x-os-past-60]_,gfx950 [#mi350x-os-past-60]_,gfx950 [#mi350x-os-past-60]_,,,,,,,,,,,,,,,,,,
+      :doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>`,gfx950 [#mi350x-os-710-past-60]_,gfx950 [#mi350x-os-700-past-60]_,gfx950 [#mi350x-os-700-past-60]_,,,,,,,,,,,,,,,,,,
      ,gfx1201 [#RDNA-OS-700-past-60]_,gfx1201 [#RDNA-OS-700-past-60]_,gfx1201 [#RDNA-OS-700-past-60]_,gfx1201 [#RDNA-OS-past-60]_,gfx1201 [#RDNA-OS-past-60]_,gfx1201 [#RDNA-OS-past-60]_,,,,,,,,,,,,,,,
      ,gfx1200 [#RDNA-OS-700-past-60]_,gfx1200 [#RDNA-OS-700-past-60]_,gfx1200 [#RDNA-OS-700-past-60]_,gfx1200 [#RDNA-OS-past-60]_,gfx1200 [#RDNA-OS-past-60]_,gfx1200 [#RDNA-OS-past-60]_,,,,,,,,,,,,,,,
      ,gfx1101 [#RDNA-OS-700-past-60]_ [#rd-v710-past-60]_,gfx1101 [#RDNA-OS-700-past-60]_ [#rd-v710-past-60]_,gfx1101 [#RDNA-OS-700-past-60]_ [#rd-v710-past-60]_,gfx1101 [#RDNA-OS-past-60]_ [#7700XT-OS-past-60]_,gfx1101 [#RDNA-OS-past-60]_ [#7700XT-OS-past-60]_,gfx1101 [#RDNA-OS-past-60]_,,,,,,,,,,,,,,,
      ,gfx1100 [#RDNA-OS-700-past-60]_,gfx1100 [#RDNA-OS-700-past-60]_,gfx1100 [#RDNA-OS-700-past-60]_,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100,gfx1100
      ,gfx1030 [#RDNA-OS-700-past-60]_ [#rd-v620-past-60]_,gfx1030 [#RDNA-OS-700-past-60]_ [#rd-v620-past-60]_,gfx1030 [#RDNA-OS-700-past-60]_ [#rd-v620-past-60]_,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030,gfx1030
-      ,gfx942 [#mi325x-os-past-60]_ [#mi300x-os-past-60]_ [#mi300A-os-past-60]_,gfx942 [#mi325x-os-past-60]_ [#mi300x-os-past-60]_ [#mi300A-os-past-60]_,gfx942 [#mi325x-os-past-60]_ [#mi300x-os-past-60]_ [#mi300A-os-past-60]_,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942 [#mi300_624-past-60]_,gfx942 [#mi300_622-past-60]_,gfx942 [#mi300_621-past-60]_,gfx942 [#mi300_620-past-60]_, gfx942 [#mi300_612-past-60]_, gfx942 [#mi300_612-past-60]_, gfx942 [#mi300_611-past-60]_, gfx942 [#mi300_610-past-60]_, gfx942 [#mi300_602-past-60]_, gfx942 [#mi300_600-past-60]_
+      ,gfx942 [#mi325x-os-710past-60]_ [#mi300x-os-past-60]_ [#mi300A-os-past-60]_,gfx942 [#mi325x-os-past-60]_ [#mi300x-os-past-60]_ [#mi300A-os-past-60]_,gfx942 [#mi325x-os-past-60]_ [#mi300x-os-past-60]_ [#mi300A-os-past-60]_,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942,gfx942 [#mi300_624-past-60]_,gfx942 [#mi300_622-past-60]_,gfx942 [#mi300_621-past-60]_,gfx942 [#mi300_620-past-60]_, gfx942 [#mi300_612-past-60]_, gfx942 [#mi300_612-past-60]_, gfx942 [#mi300_611-past-60]_, gfx942 [#mi300_610-past-60]_, gfx942 [#mi300_602-past-60]_, gfx942 [#mi300_600-past-60]_
      ,gfx90a [#mi200x-os-past-60]_,gfx90a [#mi200x-os-past-60]_,gfx90a [#mi200x-os-past-60]_,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a,gfx90a
-      ,gfx908 [#mi100-os-past-60]_,gfx908 [#mi100-os-past-60]_,gfx908 [#mi100-os-past-60]_,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908
+      ,gfx908 [#mi100-710-os-past-60]_,gfx908 [#mi100-os-past-60]_,gfx908 [#mi100-os-past-60]_,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908,gfx908
      ,,,,,,,,,,,,,,,,,,,,,
      FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix-past-60:,,,,,,,,,,,,,,,,,,,,
      :doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.8, 2.7, 2.6","2.8, 2.7, 2.6","2.7, 2.6, 2.5","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.6, 2.5, 2.4, 2.3","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 1.13","2.4, 2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.3, 2.2, 2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13","2.1, 2.0, 1.13"
@@ -96,7 +96,7 @@ ROCm Version,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6
      :doc:`rocThrust <rocthrust:index>`,4.1.0,4.0.0,4.0.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.3.0,3.1.1,3.1.0,3.1.0,3.0.1,3.0.1,3.0.1,3.0.1,3.0.1,3.0.0,3.0.0
      ,,,,,,,,,,,,,,,,,,,,,
      SUPPORT LIBS,,,,,,,,,,,,,,,,,,,,,
-      `hipother <https://github.com/ROCm/hipother>`_,7.1.25414,7.0.51830,7.0.51830,6.4.43483,6.4.43483,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
+      `hipother <https://github.com/ROCm/hipother>`_,7.1.25414,7.0.51831,7.0.51830,6.4.43483,6.4.43483,6.4.43483,6.4.43482,6.3.42134,6.3.42134,6.3.42133,6.3.42131,6.2.41134,6.2.41134,6.2.41134,6.2.41133,6.1.40093,6.1.40093,6.1.40092,6.1.40091,6.1.32831,6.1.32830
      `rocm-core <https://github.com/ROCm/rocm-core>`_,7.1.0,7.0.2,7.0.1/7.0.0,6.4.3,6.4.2,6.4.1,6.4.0,6.3.3,6.3.2,6.3.1,6.3.0,6.2.4,6.2.2,6.2.1,6.2.0,6.1.5,6.1.2,6.1.1,6.1.0,6.0.2,6.0.0
      `ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,N/A [#ROCT-rocr-past-60]_,20240607.5.7,20240607.5.7,20240607.4.05,20240607.1.4246,20240125.5.08,20240125.5.08,20240125.5.08,20240125.3.30,20231016.2.245,20231016.2.245
      ,,,,,,,,,,,,,,,,,,,,,
--- a/docs/compatibility/compatibility-matrix.rst
+++ b/docs/compatibility/compatibility-matrix.rst
@@ -30,9 +30,9 @@ compatibility and system requirements.
      ,Ubuntu 22.04.5,Ubuntu 22.04.5,Ubuntu 22.04.5
      ,"RHEL 10.0 [#rhel-10-702]_, 9.6 [#rhel-10-702]_, 9.4 [#rhel-94-702]_","RHEL 10.0 [#rhel-10-702]_, 9.6 [#rhel-10-702]_, 9.4 [#rhel-94-702]_","RHEL 9.5, 9.4"
      ,RHEL 8.10 [#rhel-700]_,RHEL 8.10 [#rhel-700]_,RHEL 8.10
-      ,SLES 15 SP7 [#sles-db-700]_,SLES 15 SP7 [#sles-db-700]_,SLES 15 SP6
-      ,"Oracle Linux 10, 9, 8 [#ol-700-mi300x]_","Oracle Linux 10, 9, 8 [#ol-700-mi300x]_","Oracle Linux 9, 8 [#ol-mi300x]_"
-      ,"Debian 13 [#db-mi300x]_, 12 [#sles-db-700]_","Debian 13 [#db-mi300x]_, 12 [#sles-db-700]_",Debian 12 [#single-node]_
+      ,SLES 15 SP7 [#sles-710]_,SLES 15 SP7 [#sles-db-700]_,SLES 15 SP6
+      ,"Oracle Linux 10, 9, 8 [#ol-710-mi300x]_","Oracle Linux 10, 9, 8 [#ol-700-mi300x]_","Oracle Linux 9, 8 [#ol-mi300x]_"
+      ,"Debian 13 [#db-710-mi300x]_, 12 [#db12-710]_","Debian 13 [#db-mi300x]_, 12 [#sles-db-700]_",Debian 12 [#single-node]_
      ,Azure Linux 3.0 [#az-mi300x]_,Azure Linux 3.0 [#az-mi300x]_,Azure Linux 3.0 [#az-mi300x]_
      ,Rocky Linux 9 [#rl-700]_,Rocky Linux 9 [#rl-700]_,
      ,.. _architecture-support-compatibility-matrix:,,
@@ -44,15 +44,15 @@ compatibility and system requirements.
      ,RDNA3,RDNA3,RDNA3
      ,RDNA2,RDNA2,RDNA2
      ,.. _gpu-support-compatibility-matrix:,,
-      :doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>`,gfx950 [#mi350x-os]_,gfx950 [#mi350x-os]_,
+      :doc:`GPU / LLVM target <rocm-install-on-linux:reference/system-requirements>`,gfx950 [#mi350x-os-710]_,gfx950 [#mi350x-os-700]_,
      ,gfx1201 [#RDNA-OS-700]_,gfx1201 [#RDNA-OS-700]_,
      ,gfx1200 [#RDNA-OS-700]_,gfx1200 [#RDNA-OS-700]_,
      ,gfx1101 [#RDNA-OS-700]_ [#rd-v710]_,gfx1101 [#RDNA-OS-700]_ [#rd-v710]_,
      ,gfx1100 [#RDNA-OS-700]_,gfx1100 [#RDNA-OS-700]_,gfx1100
      ,gfx1030 [#RDNA-OS-700]_ [#rd-v620]_,gfx1030 [#RDNA-OS-700]_ [#rd-v620]_,gfx1030
-      ,gfx942 [#mi325x-os]_ [#mi300x-os]_ [#mi300A-os]_,gfx942 [#mi325x-os]_ [#mi300x-os]_ [#mi300A-os]_,gfx942
+      ,gfx942 [#mi325x-os-710]_ [#mi300x-os]_ [#mi300A-os]_,gfx942 [#mi325x-os]_ [#mi300x-os]_ [#mi300A-os]_,gfx942
      ,gfx90a [#mi200x-os]_,gfx90a [#mi200x-os]_,gfx90a
-      ,gfx908 [#mi100-os]_,gfx908 [#mi100-os]_,gfx908
+      ,gfx908 [#mi100-710-os]_,gfx908 [#mi100-os]_,gfx908
      ,,,
      FRAMEWORK SUPPORT,.. _framework-support-compatibility-matrix:,,
      :doc:`PyTorch <../compatibility/ml-compatibility/pytorch-compatibility>`,"2.8, 2.7, 2.6","2.8, 2.7, 2.6","2.6, 2.5, 2.4, 2.3"
@@ -114,7 +114,7 @@ compatibility and system requirements.
      :doc:`rocThrust <rocthrust:index>`,4.1.0,4.0.0,3.3.0
      ,,,
      SUPPORT LIBS,,,
-      `hipother <https://github.com/ROCm/hipother>`_,7.1.25414,7.0.51830,6.4.43482
+      `hipother <https://github.com/ROCm/hipother>`_,7.1.25414,7.0.51831,6.4.43482
      `rocm-core <https://github.com/ROCm/rocm-core>`_,7.1.0,7.0.2,6.4.0
      `ROCT-Thunk-Interface <https://github.com/ROCm/ROCT-Thunk-Interface>`_,N/A [#ROCT-rocr]_,N/A [#ROCT-rocr]_,N/A [#ROCT-rocr]_
      ,,,
@@ -160,22 +160,29 @@ compatibility and system requirements.
 .. [#rhel-10-702] RHEL 10.0 and RHEL 9.6 are supported on all listed :ref:`supported_GPUs` except AMD Radeon PRO V620 GPU.
 .. [#rhel-94-702] RHEL 9.4 is supported on all AMD Instinct GPUs listed under :ref:`supported_GPUs`.
 .. [#rhel-700] RHEL 8.10 is supported only on AMD Instinct MI300X, MI300A, MI250X, MI250, MI210, and MI100 GPUs.
+.. [#sles-710] **For ROCm 7.1.x** - SLES 15 SP7 is supported only on AMD Instinct MI325X, MI300X, MI300A, MI250X, MI250, MI210, and MI100 GPUs.
+.. [#sles-db-700] **For ROCm 7.0.x** - SLES 15 SP7 and Debian 12 are supported only on AMD Instinct MI300X, MI300A, MI250X, MI250, and MI210 GPUs.
+.. [#ol-710-mi300x] **For ROCm 7.1.x** - Oracle Linux 10 and 9 are supported only on AMD Instinct MI355X, MI350X, MI325X, and MI300X GPUs. Oracle Linux 8 is supported only on AMD Instinct MI300X GPU.
 .. [#ol-700-mi300x] **For ROCm 7.0.x** - Oracle Linux 10 and 9 are supported only on AMD Instinct MI355X, MI350X, and MI300X GPUs. Oracle Linux 8 is supported only on AMD Instinct MI300X GPU.
 .. [#ol-mi300x] **Prior ROCm 7.0.0** - Oracle Linux is supported only on AMD Instinct MI300X GPUs.
+.. [#db-710-mi300x] **For ROCm 7.1.x** - Debian 13 is supported only on AMD Instinct MI355X, MI350X, MI325X, and MI300X GPUs.
+.. [#db12-710] **For ROCm 7.1.x** - Debian 12 is supported only on AMD Instinct MI325X, MI300X, MI300A, MI250X, MI250, and MI210 GPUs.
 .. [#db-mi300x] **For ROCm 7.0.2** - Debian 13 is supported only on AMD Instinct MI300X GPUs.
-.. [#sles-db-700] **For ROCm 7.0.x** - SLES 15 SP7 and Debian 12 are supported only on AMD Instinct MI300X, MI300A, MI250X, MI250, and MI210 GPUs.
 .. [#az-mi300x] Starting ROCm 6.4.0, Azure Linux 3.0 is supported only on AMD Instinct MI300X and AMD Radeon PRO V710 GPUs.
 .. [#rl-700] Rocky Linux 9 is supported only on AMD Instinct MI300X and MI300A GPUs.
 .. [#single-node] **Prior to ROCm 7.0.0** - Debian 12 is supported only on AMD Instinct MI300X GPUs for single-node functionality.
-.. [#mi350x-os] AMD Instinct MI355X (gfx950) and MI350X(gfx950) GPUs are supported only on Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, RHEL 9.4, Oracle Linux 10, and Oracle Linux 9.
-.. [#RDNA-OS-700] **For ROCm 7.0.x** - AMD Radeon PRO AI PRO R9700 (gfx1201), AMD Radeon RX 9070 XT (gfx1201), AMD Radeon RX 9070 GRE (gfx1201), AMD Radeon RX 9070 (gfx1201), AMD Radeon RX 9060 XT (gfx1200), AMD Radeon RX 9060 (gfx1200), AMD Radeon RX 7800 XT (gfx1101), AMD Radeon RX 7700 XT (gfx1101), AMD Radeon PRO W7700 (gfx1101), and AMD Radeon PRO W6800 (gfx1030) are supported only on Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, and RHEL 9.6.
-.. [#rd-v710] **For ROCm 7.0.x** - AMD Radeon PRO V710 (gfx1101) GPUs are supported only on Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, and Azure Linux 3.0.
-.. [#rd-v620] **For ROCm 7.0.x** - AMD Radeon PRO V620 (gfx1030) GPUs are supported only on Ubuntu 24.04.3 and Ubuntu 22.04.5.
-.. [#mi325x-os] **For ROCm 7.0.x** - AMD Instinct MI325X GPUs (gfx942) are supported only on Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 9.6, and RHEL 9.4.
-.. [#mi300x-os] **For ROCm 7.0.x** - AMD Instinct MI300X GPUs (gfx942) are supported on all listed :ref:`supported_distributions`.
-.. [#mi300A-os] **For ROCm 7.0.x** - AMD Instinct MI300A GPUs (gfx942) are supported only on Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, RHEL 9.4, RHEL 8.10, SLES 15 SP7, Debian 12, and Rocky Linux 9.
-.. [#mi200x-os] **For ROCm 7.0.x** - AMD Instinct MI200 Series GPUs (gfx90a) are supported only on Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, RHEL 9.4, RHEL 8.10, SLES 15 SP7, and Debian 12.
-.. [#mi100-os] **For ROCm 7.0.x** - AMD Instinct MI100 GPUs (gfx908) are supported only on Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, RHEL 9.4, and RHEL 8.10.
+.. [#mi350x-os-710] AMD Instinct MI355X (gfx950) and MI350X (gfx950) GPUs supports all listed :ref:`supported_distributions` except RHEL 8.10, SLES 15 SP7, Debian 12, Rocky 9, Azure Linux 3.0, and Oracle Linux 8.
+.. [#mi350x-os-700] AMD Instinct MI355X (gfx950) and MI350X (gfx950) GPUs only supports Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, RHEL 9.4, Oracle Linux 10, and Oracle Linux 9.
+.. [#RDNA-OS-700] **For ROCm 7.0.x** - AMD Radeon PRO AI PRO R9700 (gfx1201), AMD Radeon RX 9070 XT (gfx1201), AMD Radeon RX 9070 GRE (gfx1201), AMD Radeon RX 9070 (gfx1201), AMD Radeon RX 9060 XT (gfx1200), AMD Radeon RX 9060 (gfx1200), AMD Radeon RX 7800 XT (gfx1101), AMD Radeon RX 7700 XT (gfx1101), AMD Radeon PRO W7700 (gfx1101), and AMD Radeon PRO W6800 (gfx1030) only supports Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, and RHEL 9.6.
+.. [#rd-v710] **For ROCm 7.0.x** - AMD Radeon PRO V710 (gfx1101) GPUs only supports Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, and Azure Linux 3.0.
+.. [#rd-v620] **For ROCm 7.0.x** - AMD Radeon PRO V620 (gfx1030) GPUs only supports Ubuntu 24.04.3 and Ubuntu 22.04.5.
+.. [#mi325x-os-710] **For ROCm 7.1.x** - AMD Instinct MI325X GPUs (gfx942) supports all listed :ref:`supported_distributions` except RHEL 8.10, Rocky 9, Azure Linux 3.0, and Oracle Linux 8.
+.. [#mi325x-os] **For ROCm 7.0.x** - AMD Instinct MI325X GPUs (gfx942) only supports Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 9.6, and RHEL 9.4.
+.. [#mi300x-os] **Starting ROCm 7.0.x** - AMD Instinct MI300X GPUs (gfx942) supports all listed :ref:`supported_distributions`.
+.. [#mi300A-os] **Starting ROCm 7.0.x** - AMD Instinct MI300A GPUs (gfx942) supports all listed :ref:`supported_distributions` except on Debian 13, Azure Linux 3.0, Oracle Linux 10, Oracle Linux 9, and Oracle Linux 8.
+.. [#mi200x-os] **For ROCm 7.0.x** - AMD Instinct MI200 Series GPUs (gfx90a) only supports Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, RHEL 9.4, RHEL 8.10, SLES 15 SP7, and Debian 12.
+.. [#mi100-710-os] **For ROCM 7.1.x** - AMD Instinct MI100 GPUs (gfx908) only supports Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, RHEL 9.4, RHEL 8.10, and SLES 15 SP7.
+.. [#mi100-os] **For ROCm 7.0.x** - AMD Instinct MI100 GPUs (gfx908) only supports Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, RHEL 9.4, and RHEL 8.10.
 .. [#tf-mi350] TensorFlow 2.17.1 is not supported on AMD Instinct MI350 Series GPUs. Use TensorFlow 2.19.1 or 2.18.1 with MI350 Series GPUs instead.
 .. [#dgl_compat] DGL is supported only on ROCm 6.4.0.
 .. [#llama-cpp_compat] llama.cpp is supported only on ROCm 7.0.0 and ROCm 6.4.x.
@@ -259,25 +266,32 @@ Expand for full historical view of:
   .. [#rhel-10-702-past-60] RHEL 10.0 and RHEL 9.6 are supported on all listed :ref:`supported_GPUs` except AMD Radeon PRO V620 GPU.
   .. [#rhel-94-702-past-60] RHEL 9.4 is supported on all AMD Instinct GPUs listed under :ref:`supported_GPUs`.
   .. [#rhel-700-past-60] **For ROCm 7.0.x** - RHEL 8.10 is supported only on AMD Instinct MI300X, MI300A, MI250X, MI250, MI210, and MI100 GPUs.
+   .. [#sles-710-past-60] **For ROCm 7.1.x** - SLES 15 SP7 is supported only on AMD Instinct MI325X, MI300X, MI300A, MI250X, MI250, MI210, and MI100 GPUs.
+   .. [#sles-db-700-past-60] **For ROCm 7.0.x** - SLES 15 SP7 and Debian 12 are supported only on AMD Instinct MI300X, MI300A, MI250X, MI250, and MI210 GPUs.
+   .. [#ol-710-mi300x-past-60] **For ROCm 7.1.x** - Oracle Linux 10 and 9 are supported only on AMD Instinct MI355X, MI350X, MI325X, and MI300X GPUs. Oracle Linux 8 is supported only on AMD Instinct MI300X GPU.
   .. [#ol-700-mi300x-past-60] **For ROCm 7.0.x** - Oracle Linux 10 and 9 are supported only on AMD Instinct MI355X, MI350X, and MI300X GPUs. Oracle Linux 8 is supported only on AMD Instinct MI300X GPU.
   .. [#mi300x-past-60] **Prior ROCm 7.0.0** - Oracle Linux is supported only on AMD Instinct MI300X GPUs.
+   .. [#db-710-mi300x-past-60] **For ROCm 7.1.x** - Debian 13 is supported only on AMD Instinct MI355X, MI350X, MI325X, and MI300X GPUs.
+   .. [#db12-710-past-60] **For ROCm 7.1.x** - Debian 12 is supported only on AMD Instinct MI325X, MI300X, MI300A, MI250X, MI250, and MI210 GPUs.
   .. [#db-mi300x-past-60] **For ROCm 7.0.2** - Debian 13 is supported only on AMD Instinct MI300X GPUs.
-   .. [#sles-db-700-past-60] **For ROCm 7.0.x** - SLES 15 SP7 and Debian 12 are supported only on AMD Instinct MI300X, MI300A, MI250X, MI250, and MI210 GPUs.
   .. [#single-node-past-60] **Prior to ROCm 7.0.0** - Debian 12 is supported only on AMD Instinct MI300X GPUs for single-node functionality.
   .. [#az-mi300x-past-60] Starting from ROCm 6.4.0, Azure Linux 3.0 is supported only on AMD Instinct MI300X and AMD Radeon PRO V710 GPUs.
   .. [#az-mi300x-630-past-60] **Prior ROCm 6.4.0**- Azure Linux 3.0 is supported only on AMD Instinct MI300X GPUs.
   .. [#rl-700-past-60] Rocky Linux 9 is supported only on AMD Instinct MI300X and MI300A GPUs.
-   .. [#mi350x-os-past-60] AMD Instinct MI355X (gfx950) and MI350X(gfx950) GPUs are supported only on Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 9.6, RHEL 9.4, and Oracle Linux 9.
-   .. [#RDNA-OS-700-past-60] **For ROCm 7.0.x** AMD Radeon PRO AI PRO R9700 (gfx1201), AMD Radeon RX 9070 XT (gfx1201), AMD Radeon RX 9070 GRE (gfx1201), AMD Radeon RX 9070 (gfx1201), AMD Radeon RX 9060 XT (gfx1200), AMD Radeon RX 9060 (gfx1200), AMD Radeon RX 7800 XT (gfx1101), AMD Radeon RX 7700 XT (gfx1101), AMD Radeon PRO W7700 (gfx1101), and AMD Radeon PRO W6800 (gfx1030) are supported only on Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, RHEL 9.4, Oracle Linux 10, and Oracle Linux 9.
-   .. [#RDNA-OS-past-60] **Prior ROCm 7.0.0** - Radeon AI PRO R9700, Radeon RX 9070 XT (gfx1201), Radeon RX 9060 XT (gfx1200), Radeon PRO W7700 (gfx1101), and Radeon RX 7800 XT (gfx1101) are supported only on Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.6, and RHEL 9.4.
-   .. [#rd-v710-past-60] **For ROCm 7.0.x** - AMD Radeon PRO V710 (gfx1101) is supported only on Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, and Azure Linux 3.0.
-   .. [#rd-v620-past-60] **For ROCm 7.0.x** - AMD Radeon PRO V620 (gfx1030) is supported only on Ubuntu 24.04.3 and Ubuntu 22.04.5.
-   .. [#mi325x-os-past-60] **For ROCm 7.0.x** - AMD Instinct MI325X GPU (gfx942) is supported only on Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 9.6, and RHEL 9.4.
-   .. [#mi300x-os-past-60] **For ROCm 7.0.x** - AMD Instinct MI300X GPU (gfx942) is supported on all listed :ref:`supported_distributions`.
-   .. [#mi300A-os-past-60] **For ROCm 7.0.x** - AMD Instinct MI300A GPU (gfx942) is supported only on Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, RHEL 9.4, RHEL 8.10, SLES 15 SP7, Debian 12, and Rocky Linux 9.
-   .. [#mi200x-os-past-60] **For ROCm 7.0.x** - AMD Instinct MI200 Series GPUs (gfx90a) are supported only on Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, RHEL 9.4, RHEL 8.10, SLES 15 SP7, and Debian 12.
-   .. [#mi100-os-past-60] **For ROCm 7.0.x** - AMD Instinct MI100 GPU (gfx908) is supported only on Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, RHEL 9.4, and RHEL 8.10.
-   .. [#7700XT-OS-past-60] **Prior to ROCm 7.0.0** - Radeon RX 7700 XT (gfx1101) is supported only on Ubuntu 24.04.2 and RHEL 9.6.
+   .. [#mi350x-os-710-past-60] **For ROCm 7.1.x** - AMD Instinct MI355X (gfx950) and MI350X (gfx950) GPUs supports all listed :ref:`supported_distributions` except RHEL 8.10, SLES 15 SP7, Debian 12, Rocky 9, Azure Linux 3.0, and Oracle Linux 8.
+   .. [#mi350x-os-700-past-60] **For ROCm 7.0.x** - AMD Instinct MI355X (gfx950) and MI350X (gfx950) GPUs only supports Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 9.6, RHEL 9.4, and Oracle Linux 9.
+   .. [#RDNA-OS-700-past-60] **Starting ROCm 7.0.x** AMD Radeon PRO AI PRO R9700 (gfx1201), AMD Radeon RX 9070 XT (gfx1201), AMD Radeon RX 9070 GRE (gfx1201), AMD Radeon RX 9070 (gfx1201), AMD Radeon RX 9060 XT (gfx1200), AMD Radeon RX 9060 (gfx1200), AMD Radeon RX 7800 XT (gfx1101), AMD Radeon RX 7700 XT (gfx1101), AMD Radeon PRO W7700 (gfx1101), and AMD Radeon PRO W6800 (gfx1030) only supports Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, and RHEL 9.4.
+   .. [#RDNA-OS-past-60] **Prior ROCm 7.0.0** - Radeon AI PRO R9700, Radeon RX 9070 XT (gfx1201), Radeon RX 9060 XT (gfx1200), Radeon PRO W7700 (gfx1101), and Radeon RX 7800 XT (gfx1101) only supports Ubuntu 24.04.2, Ubuntu 22.04.5, RHEL 9.6, and RHEL 9.4.
+   .. [#rd-v710-past-60] **Starting ROCm 7.0.x** - AMD Radeon PRO V710 (gfx1101) only supports Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, and Azure Linux 3.0.
+   .. [#rd-v620-past-60] **Starting ROCm 7.0.x** - AMD Radeon PRO V620 (gfx1030) only supports Ubuntu 24.04.3 and Ubuntu 22.04.5.
+   .. [#mi325x-os-710past-60] **For ROCm 7.1.x** - AMD Instinct MI325X GPU (gfx942) supports all listed :ref:`supported_distributions` except RHEL 8.10, Rocky 9, Azure Linux 3.0, and Oracle Linux 8.
+   .. [#mi325x-os-past-60] **For ROCm 7.0.x** - AMD Instinct MI325X GPU (gfx942) only supports Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 9.6, and RHEL 9.4.
+   .. [#mi300x-os-past-60] **For ROCm 7.0.x** - AMD Instinct MI300X GPU (gfx942) supports all listed :ref:`supported_distributions`.
+   .. [#mi300A-os-past-60] **Starting ROCm 7.0.x** - AMD Instinct MI300A GPUs (gfx942) supports all listed :ref:`supported_distributions` except on Debian 13, Azure Linux 3.0, Oracle Linux 10, Oracle Linux 9, and Oracle Linux 8.
+   .. [#mi200x-os-past-60] **For ROCm 7.0.x** - AMD Instinct MI200 Series GPUs (gfx90a) only supports Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, RHEL 9.4, RHEL 8.10, SLES 15 SP7, and Debian 12.
+   .. [#mi100-710-os-past-60] **For ROCM 7.1.x** - AMD Instinct MI100 GPUs (gfx908) only supports Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, RHEL 9.4, RHEL 8.10, and SLES 15 SP7.
+   .. [#mi100-os-past-60] **For ROCm 7.0.x** - AMD Instinct MI100 GPU (gfx908) only supports Ubuntu 24.04.3, Ubuntu 22.04.5, RHEL 10.0, RHEL 9.6, RHEL 9.4, and RHEL 8.10.
+   .. [#7700XT-OS-past-60] **Prior to ROCm 7.0.0** - Radeon RX 7700 XT (gfx1101) only supports Ubuntu 24.04.2 and RHEL 9.6.
   .. [#mi300_624-past-60] **For ROCm 6.2.4** - MI300X (gfx942) is supported on listed operating systems *except* Ubuntu 22.04.5 [6.8 HWE] and Ubuntu 22.04.4 [6.5 HWE].
   .. [#mi300_622-past-60] **For ROCm 6.2.2** - MI300X (gfx942) is supported on listed operating systems *except* Ubuntu 22.04.5 [6.8 HWE] and Ubuntu 22.04.4 [6.5 HWE].
   .. [#mi300_621-past-60] **For ROCm 6.2.1** - MI300X (gfx942) is supported on listed operating systems *except* Ubuntu 22.04.5 [6.8 HWE] and Ubuntu 22.04.4 [6.5 HWE].