Updates for 6.3.2 in Release Notes (#271)

* Initial 6.3.2 changes in Release Notes

* version 6.3.2 updated

* Update RELEASE.md

add HIP 6.3.2 changes to Release Notes

* Minor update after adding HIP component

* Compatibility matrix update for 6.3.2 (#272)

* External CI: add missing rocAL functionalities (#4238)

* Comptability matrix update for 6.3.2

* Indentation and reference fixed

* Missing refernce added

* Footnote fix

* Blank space in tables removed

* Table fixes

* Pytorch and JAX ref updated

---------

Co-authored-by: Daniel Su <danielsu@amd.com>

* Document update and release highlight updated

* Add TensorFlow compatibility docs (#4247)

* Add Tensorflow

* WIP

* WIP

* minor fmt

* PR feedbacks

* fix missed inconsistent formatting

* WIP

WIP

WIP

WIP

* minor formatting

update tensorflow-rocm docker images to rocm6.3.1

fix urls

* WIP

* fix typo and update wordlist

* fix tables not rendering

* fix table headings

* add period

* update tf dockers

* fix link

* fix link

* wording

* update historical compat

* fix tensile link

---------

Co-authored-by: Mátyás Aradi <matyas@streamhpc.com>
Co-authored-by: Istvan Kiss <neon60@gmail.com>

* Conflict resolved

* version 6.3.2 updated

* Compatibility matrix update for 6.3.2 (#272)

* External CI: add missing rocAL functionalities (#4238)

* Comptability matrix update for 6.3.2

* Indentation and reference fixed

* Missing refernce added

* Footnote fix

* Blank space in tables removed

* Table fixes

* Pytorch and JAX ref updated

---------

Co-authored-by: Daniel Su <danielsu@amd.com>

* Document update and release highlight updated

* Documentation update added

* Merge conflict resolved

* hipfort change updated

* Comptability Matrix updated for version change

* Pytourch version updated

* ROCm Systems Profiler version updated

* historical matrix updated

* Blank space removed

* Changelog for ROCProfiler added

* ROCm System Profiler changelog added

---------

Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
Co-authored-by: Daniel Su <danielsu@amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
Co-authored-by: Mátyás Aradi <matyas@streamhpc.com>
Co-authored-by: Istvan Kiss <neon60@gmail.com>
This commit is contained in:
Pratik Basyal
2025-01-10 11:20:16 -05:00
committed by GitHub
parent 26553d725b
commit d2035f0018
4 changed files with 196 additions and 284 deletions

View File

@@ -10,7 +10,7 @@
<!-- markdownlint-disable reference-links-images -->
<!-- markdownlint-disable no-missing-space-atx -->
<!-- spellcheck-disable -->
# ROCm 6.3.1 release notes
# ROCm 6.3.2 release notes
The release notes provide a summary of notable changes since the previous ROCm release.
@@ -35,57 +35,42 @@ documentation to verify compatibility and system requirements.
```
## Release highlights
The following are notable new features and improvements in ROCm 6.3.1. For changes to individual components, see
The following are notable new features and improvements in ROCm 6.3.2. For changes to individual components, see
[Detailed component changes](#detailed-component-changes).
### Per queue resiliency for Instinct MI300 accelerators
### ROCm Offline Installer Creator updates
The ROCm Offline Installer Creator 6.3.2 introduces new feature [add-content]. For more information, see the [ROCm Offline Installer Creator](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/install/rocm-offline-installer.html) documentation.
### ROCm Runfile Installer updates
The ROCm Runfile Installer 6.3.2 introduces new feature [add-content]. For more information, see the [ROCm Runfile Installer](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/install/rocm-runfile-installer.html) documentation.
The AMDGPU driver now includes enhanced resiliency for misbehaving applications on AMD Instinct MI300 accelerators. This helps isolate the impact of misbehaving applications, ensuring other workloads running on the same accelerator are unaffected.
### ROCm Runfile Installer
ROCm 6.3.1 introduces the ROCm Runfile Installer, with initial support for Ubuntu 22.04. The ROCm Runfile Installer facilitates ROCm installation without using a native Linux package management system, with or without network or internet access. For more information, see the [ROCm Runfile Installer documentation](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/install/rocm-runfile-installer.html).
### ROCm documentation updates
ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.
* Added documentation on training a model with ROCm Megatron-LM. AMD offers a Docker image for MI300X accelerators
containing essential components to get started, including ROCm libraries, PyTorch, and Megatron-LM utilities. See
[Training a model using ROCm Megatron-LM](https://rocm.docs.amd.com/en/docs-6.3.1/how-to/rocm-for-ai/train-a-model.html)
to get started.
* Documentation about ROCm compatibility with deep learning frameworks has been added. These topics outline ROCm-enabled features for each deep learning framework, key ROCm libraries that can influence the capabilities, docker image tags validated, and features supported across the available ROCm and framework versions. For more information, see:
The new ROCm Megatron-LM training Docker accompanies the [ROCm vLLM inference
Docker](https://rocm.docs.amd.com/en/docs-6.3.1/how-to/performance-validation/mi300x/vllm-benchmark.html)
as a set of ready-to-use containerized solutions to get started with using ROCm
for AI.
* [PyTorch compatibility](https://rocm.docs.amd.com/en/latest/compatibility/ml-compatibility/pytorch-compatibility.html)
* Updated the [Instinct MI300X workload tuning
guide](https://rocm.docs.amd.com/en/docs-6.3.1/how-to/tuning-guides/mi300x/workload.html) with more current optimization
strategies. The updated sections include guidance on vLLM optimization, PyTorch TunableOp, and hipBLASLt tuning.
* [JAX compatibility](https://rocm.docs.amd.com/en/latest/compatibility/ml-compatibility/jax-compatibility.html)
* HIP graph-safe libraries operate safely in HIP execution graphs. [HIP graphs](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.1/how-to/hip_runtime_api/hipgraph.html#how-to-hip-graph) are an alternative way of executing tasks on a GPU that can provide performance benefits over launching kernels using the standard method via streams. A topic that shows whether a [ROCm library is graph-safe](https://rocm.docs.amd.com/en/docs-6.3.1/reference/graph-safe-support.html) has been added.
* [TensorFlow compatibility](https://rocm.docs.amd.com/en/latest/compatibility/ml-compatibility/tensorflow-compatibility.html)
* The [Device memory](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.1/how-to/hip_runtime_api/memory_management/device_memory.html) topic in the HIP memory management section has been updated.
* The HIP documentation has expanded with new resources for developers:
* [Multi device management](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.1/how-to/hip_runtime_api/multi_device.html)
* [OpenGL interoperability](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.1/how-to/hip_runtime_api/opengl_interop.html)
* The [HIP C++ language extensions](https://rocm.docs.amd.com/projects/HIP/en/latest/how-to/hip_cpp_language_extensions.html) and [Kernel language C++ support](https://rocm.docs.amd.com/projects/HIP/en/latest/how-to/kernel_language_cpp_support.html) topics have been reorganized to make them easier to find and review. The topics have also been enhanced with new content.
## Operating system and hardware support changes
ROCm 6.3.1 adds support for Debian 12 (kernel: 6.1). Debian is supported only on AMD Instinct accelerators. See the installation instructions at [Debian native installation](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.1/install/native-install/debian.html).
ROCm 6.3.1 enables support for AMD Instinct MI325X accelerator. For more information, see [AMD Instinct™ MI325X Accelerators](https://www.amd.com/en/products/accelerators/instinct/mi300/mi325x.html).
Operating system and hardware support remains unchanged in this release.
See the [Compatibility
matrix](https://rocm.docs.amd.com/en/docs-6.3.1/compatibility/compatibility-matrix.html)
matrix](https://rocm-stg.amd.com/en/latest/compatibility/compatibility-matrix.html)
for more information about operating system and hardware compatibility.
## ROCm components
The following table lists the versions of ROCm components for ROCm 6.3.1, including any version
changes from 6.3.0 to 6.3.1. Click the component's updated version to go to a list of its changes.
The following table lists the versions of ROCm components for ROCm 6.3.2, including any version
changes from 6.3.1 to 6.3.2. Click the component's updated version to go to a list of its changes.
Click {fab}`github` to go to the component's source code on GitHub.
<div class="pst-scrollable-table-container">
@@ -123,7 +108,7 @@ Click {fab}`github` to go to the component's source code on GitHub.
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/MIVisionX/en/docs-6.3.1/index.html">MIVisionX</a></td>
<td>3.1.0&nbsp;&Rightarrow;&nbsp;<a href="#mivisionx-3-1-0">3.1.0</a></td>
<td>3.1.0</td>
<td><a href="https://github.com/ROCm/MIVisionX"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
@@ -157,7 +142,7 @@ Click {fab}`github` to go to the component's source code on GitHub.
<th rowspan="1"></th>
<th rowspan="1">Communication</th>
<td><a href="https://rocm.docs.amd.com/projects/rccl/en/docs-6.3.1/index.html">RCCL</a></td>
<td>2.21.5&nbsp;&Rightarrow;&nbsp;<a href="#rccl-2-21-5">2.21.5</a></td>
<td>2.21.5</td>
<td><a href="https://github.com/ROCm/rccl"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
</tbody>
@@ -181,7 +166,7 @@ Click {fab}`github` to go to the component's source code on GitHub.
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/hipfort/en/docs-6.3.1/index.html">hipfort</a></td>
<td>0.5.0</td>
<td>0.5.0&nbsp;&Rightarrow;&nbsp;<a href="#hipfort-0-5-1">0.5.1</a></td>
<td><a href="https://github.com/ROCm/hipfort"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
@@ -274,7 +259,7 @@ Click {fab}`github` to go to the component's source code on GitHub.
<th rowspan="7">Tools</th>
<th rowspan="7">System management</th>
<td><a href="https://rocm.docs.amd.com/projects/amdsmi/en/docs-6.3.1/index.html">AMD SMI</a></td>
<td>24.7.1&nbsp;&Rightarrow;&nbsp;<a href="#amd-smi-24-7-1">24.7.1</a></td>
<td>24.7.1</td>
<td><a href="https://github.com/ROCm/amdsmi"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
@@ -310,19 +295,19 @@ Click {fab}`github` to go to the component's source code on GitHub.
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocprofiler-compute/en/docs-6.3.1/index.html">ROCm Compute Profiler</a></td>
<td>3.0.0&nbsp;&Rightarrow;&nbsp<a href="#rocm-compute-profiler-3-0-0">3.0.0</a></td>
<td>3.0.0</td>
<td><a href="https://github.com/ROCm/rocprofiler-compute"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocprofiler-systems/en/docs-6.3.1/index.html">ROCm Systems Profiler</a></td>
<td>0.1.0&nbsp;&Rightarrow;&nbsp<a href="#rocm-systems-profiler-0-1-0">0.1.0</a></td>
<td>0.1.0&nbsp;&Rightarrow;&nbsp;<a href="#rocm-systems-profiler-0-1-1">0.1.1</td>
<td><a href="https://github.com/ROCm/rocprofiler-systems"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
<td><a href="https://rocm.docs.amd.com/projects/rocprofiler/en/docs-6.3.1/index.html">ROCProfiler</a></td>
<td>2.0.0</td>
<td>2.0.0&nbsp;&Rightarrow;&nbsp;<a href="#rocprofiler-2-0-0">2.0.0</a></td>
<td><a href="https://github.com/ROCm/ROCProfiler/"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
@@ -344,7 +329,7 @@ Click {fab}`github` to go to the component's source code on GitHub.
<th rowspan="5"></th>
<th rowspan="5">Development</th>
<td><a href="https://rocm.docs.amd.com/projects/HIPIFY/en/docs-6.3.1/index.html">HIPIFY</a></td>
<td>18.0.0&nbsp;&Rightarrow;&nbsp;<a href="#hipify-18-0-0">18.0.0</a></td>
<td>18.0.0</td>
<td><a href="https://github.com/ROCm/HIPIFY/"><i
class="fab fa-github fa-lg"></i></a></td>
</tr>
@@ -394,7 +379,7 @@ Click {fab}`github` to go to the component's source code on GitHub.
<tr>
<th rowspan="2" colspan="2">Runtimes</th>
<td><a href="https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.1/index.html">HIP</a></td>
<td>6.3.0&nbsp;&Rightarrow;&nbsp;<a href="#hip-6-3-1">6.3.1</a></td>
<td>6.3.1&nbsp;&Rightarrow;&nbsp;<a href="#hip-6-3-2">6.3.2</a></td>
<td><a href="https://github.com/ROCm/HIP/"><i class="fab fa-github fa-lg"></i></a></td>
</tr>
<tr>
@@ -410,134 +395,69 @@ Click {fab}`github` to go to the component's source code on GitHub.
The following sections describe key changes to ROCm components.
### **AMD SMI** (24.7.1)
#### Changed
* `amd-smi monitor` displays `VCLOCK` and `DCLOCK` instead of `ENC_CLOCK` and `DEC_CLOCK`.
#### Resolved issues
* Fixed `amd-smi monitor`'s reporting of encode and decode information. `VCLOCK` and `DCLOCK` are
now associated with both `ENC_UTIL` and `DEC_UTIL`.
```{note}
See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/6.3.x/CHANGELOG.md) for more details and examples.
```
### **HIP** (6.3.1)
### **HIP** (6.3.2)
#### Added
* An activeQueues set that tracks only the queues that have a command submitted to them, which allows fast iteration in ``waitActiveStreams``.
* Tracking of Heterogeneous System Architecture (HSA) handlers:
- Adds an atomic counter to track the outstanding HSA handlers.
- Waits on CPU for the callbacks if the number exceeds the defined value.
* Codes to capture Architected Queueing Language (AQL) packets for HIP graph memory copy node between host and device. HIP enqueues AQL packets during graph launch.
* Control to use system pool implementation in runtime commands handling. By default, it is disabled.
* A new path to avoid `WaitAny` calls in `AsyncEventsLoop`. The new path is selected by default.
* Runtime control on decrement counter only if the event is popped. There is a new way to restore dead signals cleanup for the old path.
* A new logic in runtime to track the age of events from the kernel mode driver.
#### Optimized
* HSA callback performance. The HIP runtime creates and submits commands in the queue and interacts with HSA through a callback function. HIP waits for the CPU status from HSA to optimize the handling of events, profiling, commands, and HSA signals for higher performance.
* Runtime optimization which combines all logic of `WaitAny` in a single processing loop and avoids extra memory allocations or reference counting. The runtime won't spin on the CPU if all events are busy.
* Multi-threaded dispatches for performance improvement.
* Command submissions and processing between CPU and GPU by introducing a way to limit the software batch size.
* Switch to `std::shared_mutex` in book/keep logic in streams from multiple threads simultaneously, for performance improvement in specific customer applications.
#### Resolved issues
* A deadlock in a specific customer application by preventing hipLaunchKernel latency degradation with number of idle streams.
* Race condition in multi-threaded producer/consumer scenario with `hipMallocFromPoolAsync`.
* Segmentation fault with `hipStreamLegacy` while using the API `hipStreamWaitEvent`.
* Usage of `hipStreamLegacy` in HIP event record.
### **HIPIFY** (18.0.0)
### **hipfort** (0.5.1)
#### Added
* Support for:
* NVIDIA CUDA 12.6.2
* cuDNN 9.5.1
* LLVM 19.1.3
* Full `hipBLAS` 64-bit APIs
* Full `rocBLAS` 64-bit APIs
* Support for building with LLVM Flang.
#### Resolved issues
* Added missing support for device intrinsics and built-ins: `__all_sync`, `__any_sync`, `__ballot_sync`, `__activemask`, `__match_any_sync`, `__match_all_sync`, `__shfl_sync`, `__shfl_up_sync`, `__shfl_down_sync`, and `__shfl_xor_sync`.
* Fixed the hipSPARSE CMake target.
### **MIVisionX** (3.1.0)
#### Changed
* AMD Clang is now the default CXX and C compiler.
* The dependency on rocDecode has been removed and automatic rocDecode installation is now disabled in the setup script.
### **ROCm Systems Profiler** (0.1.1)
#### Resolved issues
* Canny failure on Instinct MI300 has been fixed.
* Ubuntu 24.04 CTest failures have been fixed.
* Fixed an error when building from source on some SUSE and RHEL systems when using the `ROCPROFSYS_BUILD_DYNINST` option.
#### Known issues
* CentOS, Red Hat, and SLES requires the manual installation of `OpenCV` and `FFMPEG`.
* Hardware decode requires that ROCm is installed with `--usecase=graphics`.
#### Upcoming changes
* Optimized audio augmentations support for VX_RPP.
### **RCCL** (2.21.5)
#### Changed
* Enhanced the user documentation.
#### Resolved Issues
* Corrected some user help strings in `install.sh`.
### **ROCm Compute Profiler** (3.0.0)
#### Resolved issues
* Fixed a minor issue for users upgrading to ROCm 6.3 from 6.2 post-rename from `omniperf`.
See [ROCm Compute Profiler and ROCm Systems Profiler post-upgrade issues](#rocm-compute-profiler-and-rocm-systems-profiler-post-upgrade-issues).
### **ROCm Systems Profiler** (0.1.0)
### **ROCProfiler** (2.0.0)
#### Added
* Improvements to support OMPT target offload.
* SIMD_UTILIZATION metric.
#### Resolved issues
#### Changed
* Fixed an issue with generated Perfetto files. See [issue #3767](https://github.com/ROCm/ROCm/issues/3767) for more information.
* Fixed an issue with merging multiple `.proto` files.
* Fixed an issue causing GPU resource data to be missing from traces of Instinct MI300A systems.
* Fixed a minor issue for users upgrading to ROCm 6.3 from 6.2 post-rename from `omnitrace`.
See [ROCm Compute Profiler and ROCm Systems Profiler post-upgrade issues](#rocm-compute-profiler-and-rocm-systems-profiler-post-upgrade-issues).
* Activity metrics.
## ROCm known issues
ROCm known issues are noted on {fab}`github` [GitHub](https://github.com/ROCm/ROCm/labels/Verified%20Issue). For known
issues related to individual components, review the [Detailed component changes](#detailed-component-changes).
### PCI Express Qualification Tool failure on Debian 12
The PCI Express Qualification Tool (PEQT) module present in the ROCm Validation Suite (RVS) might fail due to the segmentation issue in Debian 12 (bookworm). This will result in failure to determine the characteristics of the PCIe interconnect between the host platform and the GPU like support for Gen 3 atomic completers, DMA transfer statistics, link speed, and link width. The standard PCIe command `lspci` can be used as an alternative to view the characteristics of the PCIe bus interconnect with the GPU. This issue is under investigation and will be addressed in a future release. See [GitHub issue #4175](https://github.com/ROCm/ROCm/issues/4175).
## ROCm resolved issues
The following are previously known issues resolved in this release. For resolved issues related to
individual components, review the [Detailed component changes](#detailed-component-changes).
### Instinct MI300 series: backward weights convolution performance issue
Fixed a performance issue affecting certain tensor shapes during backward weights convolution when using FP16 or FP32 data types on Instinct MI300 series accelerators. See [GitHub issue #4080](https://github.com/ROCm/ROCm/issues/4080).
### ROCm Compute Profiler and ROCm Systems Profiler post-upgrade issues
Packaging metadata for ROCm Compute Profiler (`rocprofiler-compute`) and ROCm Systems Profiler
(`rocprofiler-systems`) has been updated to handle the renaming from Omniperf and Omnitrace,
respectively. This fixes minor issues when upgrading from ROCm 6.2 to 6.3. For more information, see the GitHub issues
[#4082](https://github.com/ROCm/ROCm/issues/4082) and
[#4083](https://github.com/ROCm/ROCm/issues/4083).
### Stale file due to OpenCL ICD loader deprecation
When upgrading from ROCm 6.2.x to ROCm 6.3.0, the issue of removal of the `rocm-icd-loader` package
leaving a stale file in the old `rocm-6.2.x` directory has been resolved. The stale files left during
the upgrade from ROCm 6.2.x to ROCm 6.3.0 will be removed when upgrading to ROCm 6.3.1. For more
information, see [GitHub issue #4084](https://github.com/ROCm/ROCm/issues/4084).
## ROCm upcoming changes
The following changes to the ROCm software stack are anticipated for future releases.