mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-10 15:18:11 -05:00
update release note files (#2617)
--------- Co-authored-by: Sam Wu <sam.wu2@amd.com> Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
This commit is contained in:
@@ -1,4 +1,4 @@
|
||||
# Release Notes
|
||||
# Release notes
|
||||
<!-- Do not edit this file! This file is autogenerated with -->
|
||||
<!-- tools/autotag/tag_script.py -->
|
||||
|
||||
@@ -16,7 +16,7 @@
|
||||
|
||||
<!-- spellcheck-disable -->
|
||||
|
||||
The release notes for the ROCm platform.
|
||||
This page contains the release notes for the ROCm platform.
|
||||
|
||||
{%- for version, release in releases %}
|
||||
|
||||
@@ -27,7 +27,7 @@ The release notes for the ROCm platform.
|
||||
{%- set rocm_changes = "./rocm_changes/" ~ version ~ ".md" %}
|
||||
{% include rocm_changes ignore missing %}
|
||||
|
||||
### Library Changes in ROCM {{version}}
|
||||
### Library changes in ROCM {{version}}
|
||||
|
||||
| Library | Version |
|
||||
|---------|---------|
|
||||
|
||||
@@ -264,13 +264,17 @@ typedef enum hipDeviceAttribute_t {
|
||||
|
||||
#### Incorrect dGPU behavior when using AMDVBFlash tool
|
||||
|
||||
The AMDVBFlash tool, used for flashing the VBIOS image to dGPU, does not communicate with the ROM Controller specifically when the driver is present. This is because the driver, as part of its runtime power management feature, puts the dGPU to a sleep state.
|
||||
The AMDVBFlash tool, used for flashing the VBIOS image to dGPU, does not communicate with the
|
||||
ROM Controller specifically when the driver is present. This is because the driver, as part of its runtime
|
||||
power management feature, puts the dGPU to a sleep state.
|
||||
|
||||
As a workaround, users can run amdgpu.runpm=0, which temporarily disables the runtime power management feature from the driver and dynamically changes some power control-related sysfs files.
|
||||
As a workaround, users can run amdgpu.runpm=0, which temporarily disables the runtime power
|
||||
management feature from the driver and dynamically changes some power control-related sysfs files.
|
||||
|
||||
#### Issue with START timestamp in ROCProfiler
|
||||
|
||||
Users may encounter an issue with the enabled timestamp functionality for monitoring one or multiple counters. ROCProfiler outputs the following four timestamps for each kernel:
|
||||
Users may encounter an issue with the enabled timestamp functionality for monitoring one or multiple
|
||||
counters. ROCProfiler outputs the following four timestamps for each kernel:
|
||||
|
||||
* Dispatch
|
||||
* Start
|
||||
@@ -279,7 +283,8 @@ Users may encounter an issue with the enabled timestamp functionality for monito
|
||||
|
||||
##### Issue
|
||||
|
||||
This defect is related to the Start timestamp functionality, which incorrectly shows an earlier time than the Dispatch timestamp.
|
||||
This defect is related to the Start timestamp functionality, which incorrectly shows an earlier time than
|
||||
the Dispatch timestamp.
|
||||
|
||||
To reproduce the issue,
|
||||
|
||||
@@ -301,20 +306,22 @@ The correct order is:
|
||||
|
||||
Dispatch < Start < End < Complete
|
||||
|
||||
Users cannot use ROCProfiler to measure the time spent on each kernel because of the incorrect timestamp with counter collection enabled.
|
||||
Users cannot use ROCProfiler to measure the time spent on each kernel because of the incorrect
|
||||
timestamp with counter collection enabled.
|
||||
|
||||
##### Recommended workaround
|
||||
|
||||
Users are recommended to collect kernel execution timestamps without monitoring counters, as follows:
|
||||
Users are recommended to collect kernel execution timestamps without monitoring counters, as
|
||||
follows:
|
||||
|
||||
1. Enable timing using the --timestamp on flag, and run the application.
|
||||
|
||||
2. Rerun the application using the -i option with the input filename that contains the name of the counter(s) to monitor, and save this to a different output file using the -o flag.
|
||||
2. Rerun the application using the -i option with the input filename that contains the name of the
|
||||
counter(s) to monitor, and save this to a different output file using the -o flag.
|
||||
|
||||
3. Check the output result file from step 1.
|
||||
|
||||
4. The order of timestamps correctly displays as:
|
||||
DispatchNS < BeginNS < EndNS < CompleteNS
|
||||
4. The order of timestamps correctly displays as: DispatchNS < BeginNS < EndNS < CompleteNS
|
||||
|
||||
5. Users can find the values of the collected counters in the output file generated in step 2.
|
||||
|
||||
@@ -322,17 +329,21 @@ Users are recommended to collect kernel execution timestamps without monitoring
|
||||
|
||||
##### No support for SMI and ROCDebugger on SRIOV
|
||||
|
||||
System Management Interface (SMI) and ROCDebugger are not supported in the SRIOV environment on any GPU. For more information, refer to the Systems Management Interface documentation.
|
||||
System Management Interface (SMI) and ROCDebugger are not supported in the SRIOV environment
|
||||
on any GPU. For more information, refer to the Systems Management Interface documentation.
|
||||
|
||||
### Deprecations and warnings
|
||||
|
||||
#### ROCm libraries changes – deprecations and deprecation removal
|
||||
|
||||
* The hipFFT.h header is now provided only by the hipFFT package. Up to ROCm 5.0, users would get hipFFT.h in the rocFFT package too.
|
||||
* The `hipFFT.h` header is now provided only by the hipFFT package. Up to ROCm 5.0, users would get
|
||||
`hipFFT.h` in the rocFFT package too.
|
||||
|
||||
* The GlobalPairwiseAMG class is now entirely removed, users should use the PairwiseAMG class instead.
|
||||
* The GlobalPairwiseAMG class is now entirely removed, users should use the PairwiseAMG class
|
||||
instead.
|
||||
|
||||
* The rocsparse_spmm signature in 5.0 was changed to match that of rocsparse_spmm_ex. In 5.0, rocsparse_spmm_ex is still present, but deprecated. Signature diff for rocsparse_spmm
|
||||
* The rocsparse_spmm signature in 5.0 was changed to match that of rocsparse_spmm_ex. In 5.0,
|
||||
rocsparse_spmm_ex is still present, but deprecated. Signature diff for rocsparse_spmm
|
||||
rocsparse_spmm in 5.0
|
||||
|
||||
```cpp
|
||||
@@ -374,11 +385,15 @@ System Management Interface (SMI) and ROCDebugger are not supported in the SRIOV
|
||||
|
||||
In this release, arithmetic operators of HIP complex and vector types are deprecated.
|
||||
|
||||
* As alternatives to arithmetic operators of HIP complex types, users can use arithmetic operators of `std::complex` types.
|
||||
* As alternatives to arithmetic operators of HIP complex types, users can use arithmetic operators of
|
||||
`std::complex` types.
|
||||
|
||||
* As alternatives to arithmetic operators of HIP vector types, users can use the operators of the native clang vector type associated with the data member of HIP vector types.
|
||||
* As alternatives to arithmetic operators of HIP vector types, users can use the operators of the native
|
||||
clang vector type associated with the data member of HIP vector types.
|
||||
|
||||
During the deprecation, two macros `_HIP_ENABLE_COMPLEX_OPERATORS` and `_HIP_ENABLE_VECTOR_OPERATORS` are provided to allow users to conditionally enable arithmetic operators of HIP complex or vector types.
|
||||
During the deprecation, two macros `_HIP_ENABLE_COMPLEX_OPERATORS` and
|
||||
`_HIP_ENABLE_VECTOR_OPERATORS` are provided to allow users to conditionally enable arithmetic
|
||||
operators of HIP complex or vector types.
|
||||
|
||||
Note, the two macros are mutually exclusive and, by default, set to Off.
|
||||
|
||||
@@ -388,7 +403,8 @@ Refer to the HIP API Guide for more information.
|
||||
|
||||
#### Warning - compiler-generated code object version 4 deprecation
|
||||
|
||||
Support for loading compiler-generated code object version 4 will be deprecated in a future release with no release announcement and replaced with code object 5 as the default version.
|
||||
Support for loading compiler-generated code object version 4 will be deprecated in a future release
|
||||
with no release announcement and replaced with code object 5 as the default version.
|
||||
|
||||
The current default is code object version 4.
|
||||
|
||||
|
||||
@@ -3,10 +3,17 @@
|
||||
|
||||
#### Refactor of HIPCC/HIPCONFIG
|
||||
|
||||
In prior ROCm releases, by default, the hipcc/hipconfig Perl scripts were used to identify and set target compiler options, target platform, compiler, and runtime appropriately.
|
||||
In prior ROCm releases, by default, the hipcc/hipconfig Perl scripts were used to identify and set target
|
||||
compiler options, target platform, compiler, and runtime appropriately.
|
||||
|
||||
In ROCm v5.0.1, hipcc.bin and hipconfig.bin have been added as the compiled binary implementations of the hipcc and hipconfig. These new binaries are currently a work-in-progress, considered, and marked as experimental. ROCm plans to fully transition to hipcc.bin and hipconfig.bin in the a future ROCm release. The existing hipcc and hipconfig Perl scripts are renamed to hipcc.pl and hipconfig.pl respectively. New top-level hipcc and hipconfig Perl scripts are created, which can switch between the Perl script or the compiled binary based on the environment variable HIPCC_USE_PERL_SCRIPT.
|
||||
In ROCm v5.0.1, hipcc.bin and hipconfig.bin have been added as the compiled binary implementations
|
||||
of the hipcc and hipconfig. These new binaries are currently a work-in-progress, considered, and
|
||||
marked as experimental. ROCm plans to fully transition to hipcc.bin and hipconfig.bin in the a future
|
||||
ROCm release. The existing hipcc and hipconfig Perl scripts are renamed to `hipcc.pl` and `hipconfig.pl`
|
||||
respectively. New top-level hipcc and hipconfig Perl scripts are created, which can switch between the
|
||||
Perl script or the compiled binary based on the environment variable `HIPCC_USE_PERL_SCRIPT`.
|
||||
|
||||
In ROCm 5.0.1, by default, this environment variable is set to use hipcc and hipconfig through the Perl scripts.
|
||||
In ROCm 5.0.1, by default, this environment variable is set to use hipcc and hipconfig through the Perl
|
||||
scripts.
|
||||
|
||||
Subsequently, Perl scripts will no longer be available in ROCm in a future release.
|
||||
Subsequent Perl scripts will no longer be available in ROCm in a future release.
|
||||
|
||||
@@ -1,18 +1,26 @@
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
### Fixed defects
|
||||
### Defect fixes
|
||||
|
||||
The following defects are fixed in the ROCm v5.0.2 release.
|
||||
|
||||
#### Issue with hostcall facility in HIP runtime
|
||||
|
||||
In ROCm v5.0, when using the “assert()” call in a HIP kernel, the compiler may sometimes fail to emit kernel metadata related to the hostcall facility, which results in incomplete initialization of the hostcall facility in the HIP runtime. This can cause the HIP kernel to crash when it attempts to execute the “assert()” call.
|
||||
In ROCm v5.0, when using the “assert()” call in a HIP kernel, the compiler may sometimes fail to emit
|
||||
kernel metadata related to the hostcall facility, which results in incomplete initialization of the hostcall
|
||||
facility in the HIP runtime. This can cause the HIP kernel to crash when it attempts to execute the
|
||||
“assert()” call.
|
||||
|
||||
The root cause was an incorrect check in the compiler to determine whether the hostcall facility is required by the kernel. This is fixed in the ROCm v5.0.2 release.
|
||||
The root cause was an incorrect check in the compiler to determine whether the hostcall facility is
|
||||
required by the kernel. This is fixed in the ROCm v5.0.2 release.
|
||||
|
||||
The resolution includes a compiler change, which emits the required metadata by default, unless the compiler can prove that the hostcall facility is not required by the kernel. This ensures that the “assert()” call never fails.
|
||||
The resolution includes a compiler change, which emits the required metadata by default, unless the
|
||||
compiler can prove that the hostcall facility is not required by the kernel. This ensures that the
|
||||
“assert()” call never fails.
|
||||
|
||||
Note:
|
||||
This fix may lead to breakage in some OpenMP offload use cases, which use print inside a target region and result in an abort in device code. The issue will be fixed in a future release.
|
||||
Compatibility Matrix Updates to the [Deep-learning guide](./how-to/deep-learning-rocm.md)
|
||||
```note
|
||||
This fix may lead to breakage in some OpenMP offload use cases, which use print inside a target region
|
||||
and result in an abort in device code. The issue will be fixed in a future release.
|
||||
```
|
||||
|
||||
The compatibility matrix in the [Deep-learning guide](./how-to/deep-learning-rocm.md) is updated for ROCm v5.0.2.
|
||||
The compatibility matrix in the [Deep-learning guide](./how-to/deep-learning-rocm.md) is updated for
|
||||
ROCm v5.0.2.
|
||||
|
||||
@@ -8,7 +8,8 @@ The ROCm v5.1 release consists of the following HIP enhancements.
|
||||
|
||||
##### HIP installation guide updates
|
||||
|
||||
The HIP Installation Guide is updated to include installation and building HIP from source on the AMD and NVIDIA platforms.
|
||||
The HIP installation guide now includes information on installing and building HIP from source on
|
||||
AMD and NVIDIA platforms.
|
||||
|
||||
Refer to the HIP Installation Guide v5.1 for more details.
|
||||
|
||||
@@ -20,11 +21,14 @@ ROCm v5.1 extends support for HIP Graph.
|
||||
|
||||
###### Separation of hiprtc (libhiprtc) library from hip runtime (amdhip64)
|
||||
|
||||
On ROCm/Linux, to maintain backward compatibility, the hipruntime library (amdhip64) will continue to include hiprtc symbols in future releases. The backward compatible support may be discontinued by removing hiprtc symbols from the hipruntime library (amdhip64) in the next major release.
|
||||
On ROCm/Linux, to maintain backward compatibility, the hipruntime library (amdhip64) will continue
|
||||
to include hiprtc symbols in future releases. The backward compatible support may be discontinued by
|
||||
removing hiprtc symbols from the hipruntime library (amdhip64) in the next major release.
|
||||
|
||||
###### hipDeviceProp_t structure enhancements
|
||||
|
||||
Changes to the hipDeviceProp_t structure in the next major release may result in backward incompatibility. More details on these changes will be provided in subsequent releases.
|
||||
Changes to the hipDeviceProp_t structure in the next major release may result in backward
|
||||
incompatibility. More details on these changes will be provided in subsequent releases.
|
||||
|
||||
#### ROCDebugger enhancements
|
||||
|
||||
@@ -34,15 +38,19 @@ The compiler now generates a source-level variable and function argument debug i
|
||||
|
||||
The accuracy is guaranteed if the compiler options `-g -O0` are used and apply only to HIP.
|
||||
|
||||
This enhancement enables ROCDebugger users to interact with the HIP source-level variables and function arguments.
|
||||
This enhancement enables ROCDebugger users to interact with the HIP source-level variables and
|
||||
function arguments.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> The newly-suggested compiler -g option must be used instead of the previously-suggested `-ggdb` option. Although the effect of these two options is currently equivalent, this is not guaranteed for the future and might get changed by the upstream LLVM community.
|
||||
```note
|
||||
The newly-suggested compiler -g option must be used instead of the previously-suggested `-ggdb`
|
||||
option. Although the effect of these two options is currently equivalent, this is not guaranteed for the
|
||||
future, as changes might be made by the upstream LLVM community.
|
||||
```
|
||||
|
||||
##### Machine interface lanes support
|
||||
|
||||
ROCDebugger Machine Interface (MI) extends support to lanes. The following enhancements are made:
|
||||
ROCDebugger Machine Interface (MI) extends support to lanes, which includes the following
|
||||
enhancements:
|
||||
|
||||
* Added a new -lane-info command, listing the current thread's lanes.
|
||||
|
||||
@@ -52,24 +60,29 @@ ROCDebugger Machine Interface (MI) extends support to lanes. The following enhan
|
||||
-thread-select -l LANE THREAD
|
||||
```
|
||||
|
||||
* The =thread-selected notification gained a lane-id attribute. This enables the frontend to know which lane of the thread was selected.
|
||||
* The =thread-selected notification gained a lane-id attribute. This enables the frontend to know which
|
||||
lane of the thread was selected.
|
||||
|
||||
* The *stopped asynchronous record gained lane-id and hit-lanes attributes. The former indicates which lane is selected, and the latter indicates which lanes explain the stop.
|
||||
* The *stopped asynchronous record gained lane-id and hit-lanes attributes. The former indicates
|
||||
which lane is selected, and the latter indicates which lanes explain the stop.
|
||||
|
||||
* MI commands now accept a global --lane option, similar to the global --thread and --frame options.
|
||||
|
||||
* MI varobjs are now lane-aware.
|
||||
|
||||
For more information, refer to the ROC Debugger User Guide at
|
||||
{doc}`ROCgdb <rocgdb:index>`.
|
||||
For more information, refer to the ROC Debugger User Guide at {doc}`ROCgdb <rocgdb:index>`.
|
||||
|
||||
##### Enhanced - clone-inferior command
|
||||
|
||||
The clone-inferior command now ensures that the TTY, CMD, ARGS, and AMDGPU PRECISE-MEMORY settings are copied from the original inferior to the new one. All modifications to the environment variables done using the 'set environment' or 'unset environment' commands are also copied to the new inferior.
|
||||
The clone-inferior command now ensures that the TTY, CMD, ARGS, and AMDGPU PRECISE-MEMORY
|
||||
settings are copied from the original inferior to the new one. All modifications to the environment
|
||||
variables done using the 'set environment' or 'unset environment' commands are also copied to the
|
||||
new inferior.
|
||||
|
||||
#### MIOpen support for RDNA GPUs
|
||||
|
||||
This release includes support for AMD Radeon™ Pro W6800, in addition to other bug fixes and performance improvements as listed below:
|
||||
This release includes support for AMD Radeon™ Pro W6800, in addition to other bug fixes and
|
||||
performance improvements as listed below:
|
||||
|
||||
* MIOpen now supports RDNA GPUs!! (via MIOpen PRs 973, 780, 764, 740, 739, 677, 660, 653, 493, 498)
|
||||
|
||||
@@ -87,11 +100,13 @@ For more information, see {doc}`Documentation <miopen:index>`.
|
||||
|
||||
#### Checkpoint restore support with CRIU
|
||||
|
||||
The new Checkpoint Restore in Userspace (CRIU) functionality is implemented to support AMD GPU and ROCm applications.
|
||||
The new Checkpoint Restore in Userspace (CRIU) functionality is implemented to support AMD GPU
|
||||
and ROCm applications.
|
||||
|
||||
CRIU is a userspace tool to Checkpoint and Restore an application.
|
||||
|
||||
CRIU lacked the support for checkpoint restore applications that used device files such as a GPU. With this ROCm release, CRIU is enhanced with a new plugin to support AMD GPUs, which includes:
|
||||
CRIU lacked the support for checkpoint restore applications that used device files such as a GPU. With
|
||||
this ROCm release, CRIU is enhanced with a new plugin to support AMD GPUs, which includes:
|
||||
|
||||
* Single and Multi GPU systems (Gfx9)
|
||||
* Checkpoint / Restore on a different system
|
||||
@@ -100,15 +115,19 @@ CRIU lacked the support for checkpoint restore applications that used device fil
|
||||
* TensorFlow
|
||||
* Using CRIU Image Streamer
|
||||
|
||||
For more information, refer to <https://github.com/checkpoint-restore/criu/tree/criu-dev/plugins/amdgpu>
|
||||
For more information, refer to
|
||||
<https://github.com/checkpoint-restore/criu/tree/criu-dev/plugins/amdgpu>
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> The CRIU plugin (amdgpu_plugin) is merged upstream with the CRIU repository. The KFD kernel patches are also available upstream with the amd-staging-drm-next branch (public) and the ROCm 5.1 release branch.
|
||||
```note
|
||||
The CRIU plugin (amdgpu_plugin) is merged upstream with the CRIU repository. The KFD kernel
|
||||
patches are also available upstream with the amd-staging-drm-next branch (public) and the ROCm 5.1
|
||||
release branch.
|
||||
```
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> This is a Beta release of the Checkpoint and Restore functionality, and some features are not available in this release.
|
||||
```note
|
||||
This is a Beta release of the Checkpoint and Restore functionality, and some features are not available
|
||||
in this release.
|
||||
```
|
||||
|
||||
For more information, refer to the following websites:
|
||||
|
||||
@@ -116,7 +135,7 @@ For more information, refer to the following websites:
|
||||
|
||||
* <https://criu.org/Main_Page>
|
||||
|
||||
### Fixed defects
|
||||
### Defect fixes
|
||||
|
||||
The following defects are fixed in this release.
|
||||
|
||||
@@ -126,37 +145,48 @@ The issue with the driver failing to load after ROCm installation is now fixed.
|
||||
|
||||
The driver installs successfully, and the server reboots with working rocminfo and clinfo.
|
||||
|
||||
#### ROCDebugger fixed defects
|
||||
#### ROCDebugger defect fixes
|
||||
|
||||
##### Breakpoints in GPU kernel code before kernel is loaded
|
||||
|
||||
Previously, setting a breakpoint in device code by line number before the device code was loaded into the program resulted in ROCgdb incorrectly moving the breakpoint to the first following line that contains host code.
|
||||
Previously, setting a breakpoint in device code by line number before the device code was loaded into
|
||||
the program resulted in ROCgdb incorrectly moving the breakpoint to the first following line that
|
||||
contains host code.
|
||||
|
||||
Now, the breakpoint is left pending. When the GPU kernel gets loaded, the breakpoint resolves to a location in the kernel.
|
||||
Now, the breakpoint is left pending. When the GPU kernel gets loaded, the breakpoint resolves to a
|
||||
location in the kernel.
|
||||
|
||||
##### Registers invalidated after write
|
||||
|
||||
Previously, the stale just-written value was presented as a current value.
|
||||
|
||||
ROCgdb now invalidates the cached values of registers whose content might differ after being written. For example, registers with read-only bits.
|
||||
ROCgdb now invalidates the cached values of registers whose content might differ after being written.
|
||||
For example, registers with read-only bits.
|
||||
|
||||
ROCgdb also invalidates all volatile registers when a volatile register is written. For example, writing VCC invalidates the content of STATUS as STATUS.VCCZ may change.
|
||||
ROCgdb also invalidates all volatile registers when a volatile register is written. For example, writing
|
||||
VCC invalidates the content of STATUS as STATUS.VCCZ may change.
|
||||
|
||||
##### Scheduler-locking and GPU wavefronts
|
||||
|
||||
When scheduler-locking is in effect, new wavefronts created by a resumed thread, CPU, or GPU wavefront, are held in the halt state. For example, the "set scheduler-locking" command.
|
||||
When scheduler-locking is in effect, new wavefronts created by a resumed thread, CPU, or GPU
|
||||
wavefront, are held in the halt state. For example, the "set scheduler-locking" command.
|
||||
|
||||
##### ROCDebugger fails before completion of kernel execution
|
||||
|
||||
It was possible (although erroneous) for a debugger to load GPU code in memory, send it to the device, start executing a kernel on the device, and dispose of the original code before the kernel had finished execution. If a breakpoint was hit after this point, the debugger failed with an internal error while trying to access the debug information.
|
||||
It was possible (although erroneous) for a debugger to load GPU code in memory, send it to the
|
||||
device, start executing a kernel on the device, and dispose of the original code before the kernel had
|
||||
finished execution. If a breakpoint was hit after this point, the debugger failed with an internal error
|
||||
while trying to access the debug information.
|
||||
|
||||
This issue is now fixed by ensuring that the debugger keeps a local copy of the original code and debug information.
|
||||
This issue is now fixed by ensuring that the debugger keeps a local copy of the original code and
|
||||
debug information.
|
||||
|
||||
### Known issues
|
||||
|
||||
#### Random memory access fault errors observed while running math libraries unit tests
|
||||
|
||||
**Issue:** Random memory access fault issues are observed while running Math libraries unit tests. This issue is encountered in ROCm v5.0, ROCm v5.0.1, and ROCm v5.0.2.
|
||||
**Issue:** Random memory access fault issues are observed while running Math libraries unit tests.
|
||||
This issue is encountered in ROCm v5.0, ROCm v5.0.1, and ROCm v5.0.2.
|
||||
|
||||
Note, the faults only occur in the SRIOV environment.
|
||||
|
||||
@@ -178,13 +208,15 @@ Where expectation is 0.
|
||||
|
||||
#### CU masking causes application to freeze
|
||||
|
||||
Using CU Masking results in an application freeze or runs exceptionally slowly. This issue is noticed only in the GFX10 suite of products. Note, this issue is observed only in GFX10 suite of products.
|
||||
Using CU Masking results in an application freeze or runs exceptionally slowly. This issue is noticed
|
||||
only in the GFX10 suite of products. Note, this issue is observed only in GFX10 suite of products.
|
||||
|
||||
This issue is under active investigation at this time.
|
||||
|
||||
#### Failed checkpoint in Docker containers
|
||||
|
||||
A defect with Ubuntu images kernel-5.13-30-generic and kernel-5.13-35-generic with Overlay FS results in incorrect reporting of the mount ID.
|
||||
A defect with Ubuntu images kernel-5.13-30-generic and kernel-5.13-35-generic with Overlay FS
|
||||
results in incorrect reporting of the mount ID.
|
||||
|
||||
This issue with Ubuntu causes CRIU checkpointing to fail in Docker containers.
|
||||
|
||||
@@ -192,8 +224,8 @@ As a workaround, use an older version of the kernel. For example, Ubuntu 5.11.0-
|
||||
|
||||
#### Issue with restoring workloads using cooperative groups feature
|
||||
|
||||
Workloads that use the cooperative groups function to ensure all waves can be resident at the same time may fail to restore correctly.
|
||||
This issue is under investigation and will be fixed in a future release.
|
||||
Workloads that use the cooperative groups function to ensure all waves can be resident at the same
|
||||
time may fail to restore correctly. This issue is under investigation and will be fixed in a future release.
|
||||
|
||||
#### Radeon Pro V620 and W6800 workstation GPUs
|
||||
|
||||
|
||||
@@ -8,21 +8,26 @@ The ROCm v5.2 release consists of the following HIP enhancements:
|
||||
|
||||
##### HIP installation guide updates
|
||||
|
||||
The HIP Installation Guide is updated to include building HIP tests from source on the AMD and NVIDIA platforms.
|
||||
The HIP Installation Guide is updated to include building HIP tests from source on the AMD and
|
||||
NVIDIA platforms.
|
||||
|
||||
For more details, refer to the HIP Installation Guide v5.2.
|
||||
|
||||
##### Support for device-side malloc on HIP-Clang
|
||||
|
||||
HIP-Clang now supports device-side malloc. This implementation does not require the use of `hipDeviceSetLimit(hipLimitMallocHeapSize,value)` nor respect any setting. The heap is fully dynamic and can grow until the available free memory on the device is consumed.
|
||||
HIP-Clang now supports device-side malloc. This implementation does not require the use of
|
||||
`hipDeviceSetLimit(hipLimitMallocHeapSize,value)` nor respect any setting. The heap is fully dynamic
|
||||
and can grow until the available free memory on the device is consumed.
|
||||
|
||||
The test codes at the following link show how to implement applications using malloc and free functions in device kernels:
|
||||
The test codes at the following link show how to implement applications using malloc and free
|
||||
functions in device kernels:
|
||||
|
||||
<https://github.com/ROCm-Developer-Tools/HIP/blob/develop/tests/src/deviceLib/hipDeviceMalloc.cpp>
|
||||
|
||||
##### New HIP APIs in this release
|
||||
|
||||
The following new HIP APIs are available in the ROCm v5.2 release. Note that this is a pre-official version (beta) release of the new APIs:
|
||||
The following new HIP APIs are available in the ROCm v5.2 release. Note that this is a pre-official
|
||||
version (beta) release of the new APIs:
|
||||
|
||||
###### Device management HIP APIs
|
||||
|
||||
@@ -34,13 +39,11 @@ The new device management HIP APIs are as follows:
|
||||
hipError_t hipDeviceGetUuid(hipUUID* uuid, hipDevice_t device);
|
||||
```
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> This new API corresponds to the following CUDA API:
|
||||
>
|
||||
> ```cpp
|
||||
> CUresult cuDeviceGetUuid(CUuuid* uuid, CUdevice dev);
|
||||
> ```
|
||||
Note that this new API corresponds to the following CUDA API:
|
||||
|
||||
```cpp
|
||||
CUresult cuDeviceGetUuid(CUuuid* uuid, CUdevice dev);
|
||||
```
|
||||
|
||||
* Gets default memory pool of the specified device
|
||||
|
||||
@@ -62,7 +65,7 @@ The new device management HIP APIs are as follows:
|
||||
|
||||
###### New HIP runtime APIs in memory management
|
||||
|
||||
The new Stream Ordered Memory Allocator functions of HIP runtime APIs in memory management are as follows:
|
||||
The new Stream Ordered Memory Allocator functions of HIP runtime APIs in memory management are:
|
||||
|
||||
* Allocates memory with stream ordered semantics
|
||||
|
||||
@@ -180,7 +183,7 @@ The new HIP Graph Management APIs are as follows:
|
||||
* Gets a node attribute
|
||||
|
||||
```cpp
|
||||
hipError_t hipGraphKernelNodeGetAttribute(hipGraphNode_t hNode, hipKernelNodeAttrID attr, hipKernelNodeAttrValue* value);
|
||||
hipError_t hipGraphKernelNodeGetAttribute(hipGraphNode_t hNode, hipKernelNodeAttrID attr, hipKernelNodeAttrValue* value);
|
||||
```
|
||||
|
||||
###### Support for virtual memory management APIs
|
||||
@@ -244,7 +247,7 @@ The new APIs for virtual memory management are as follows:
|
||||
* Maps or unmaps subregions of sparse HIP arrays and sparse HIP mipmapped arrays
|
||||
|
||||
```cpp
|
||||
hipError_t hipMemMapArrayAsync(hipArrayMapInfo* mapInfoList, unsigned int count, hipStream_t stream);
|
||||
hipError_t hipMemMapArrayAsync(hipArrayMapInfo* mapInfoList, unsigned int count, hipStream_t stream);
|
||||
```
|
||||
|
||||
* Release a memory handle representing a memory allocation, that was previously allocated through hipMemCreate
|
||||
@@ -276,41 +279,67 @@ For more information, refer to the HIP API documentation at
|
||||
|
||||
##### Planned HIP changes in future releases
|
||||
|
||||
Changes to `hipDeviceProp_t`, `HIPMEMCPY_3D`, and `hipArray` structures (and related HIP APIs) are planned in the next major release. These changes may impact backward compatibility.
|
||||
Changes to `hipDeviceProp_t`, `HIPMEMCPY_3D`, and `hipArray` structures (and related HIP APIs) are
|
||||
planned in the next major release. These changes may impact backward compatibility.
|
||||
|
||||
Refer to the Release Notes document in subsequent releases for more information.
|
||||
ROCm Math and Communication Libraries
|
||||
Refer to the release notes in subsequent releases for more information.
|
||||
|
||||
In this release, ROCm Math and Communication Libraries consist of the following enhancements and fixes:
|
||||
New rocWMMA for Matrix Multiplication and Accumulation Operations Acceleration
|
||||
#### ROCm math and communication libraries
|
||||
|
||||
This release introduces a new ROCm C++ library for accelerating mixed-precision matrix multiplication and accumulation (MFMA) operations leveraging specialized GPU matrix cores. rocWMMA provides a C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using them in block-wise operations that are distributed in parallel across GPU wavefronts. The API is a header library of GPU device code, meaning matrix core acceleration may be compiled directly into your kernel device code. This can benefit from compiler optimization in the generation of kernel assembly and does not incur additional overhead costs of linking to external runtime libraries or having to launch separate kernels.
|
||||
In this release, ROCm math and communication libraries consist of the following enhancements and
|
||||
fixes:
|
||||
|
||||
rocWMMA is released as a header library and includes test and sample projects to validate and illustrate example usages of the C++ API. GEMM matrix multiplication is used as primary validation given the heavy precedent for the library. However, the usage portfolio is growing significantly and demonstrates different ways rocWMMA may be consumed.
|
||||
* New rocWMMA for matrix multiplication and accumulation operations acceleration
|
||||
|
||||
For more information, refer to
|
||||
[Communication Libraries](./docs/reference/library-index.md)
|
||||
This release introduces a new ROCm C++ library for accelerating mixed-precision matrix multiplication
|
||||
and accumulation (MFMA) operations leveraging specialized GPU matrix cores. rocWMMA provides a
|
||||
C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using
|
||||
them in block-wise operations that are distributed in parallel across GPU wavefronts. The API is a
|
||||
header library of GPU device code, meaning matrix core acceleration may be compiled directly into
|
||||
your kernel device code. This can benefit from compiler optimization in the generation of kernel
|
||||
assembly and does not incur additional overhead costs of linking to external runtime libraries or having
|
||||
to launch separate kernels.
|
||||
|
||||
rocWMMA is released as a header library and includes test and sample projects to validate and
|
||||
illustrate example usages of the C++ API. GEMM matrix multiplication is used as primary validation
|
||||
given the heavy precedent for the library. However, the usage portfolio is growing significantly and
|
||||
demonstrates different ways rocWMMA may be consumed.
|
||||
|
||||
For more information, refer to [Communication Libraries](../reference/library-index.md)
|
||||
|
||||
#### OpenMP enhancements in this release
|
||||
|
||||
##### OMPT target support
|
||||
|
||||
The OpenMP runtime in ROCm implements a subset of the OMPT device APIs, as described in the OpenMP specification document. These are APIs that allow first-party tools to examine the profile and traces for kernels that execute on a device. A tool may register callbacks for data transfer and kernel dispatch entry points. A tool may use APIs to start and stop tracing for device-related activities such as data transfer and kernel dispatch timings and associated metadata. If device tracing is enabled, trace records for device activities are collected during program execution and returned to the tool using the APIs described in the specification.
|
||||
The OpenMP runtime in ROCm implements a subset of the OMPT device APIs, as described in the
|
||||
OpenMP specification document. These are APIs that allow first-party tools to examine the profile
|
||||
and traces for kernels that execute on a device. A tool may register callbacks for data transfer and
|
||||
kernel dispatch entry points. A tool may use APIs to start and stop tracing for device-related activities,
|
||||
such as data transfer and kernel dispatch timings and associated metadata. If device tracing is enabled,
|
||||
trace records for device activities are collected during program execution and returned to the tool
|
||||
using the APIs described in the specification.
|
||||
|
||||
Following is an example demonstrating how a tool would use the OMPT target APIs supported. The README in /opt/rocm/llvm/examples/tools/ompt outlines the steps to follow, and you can run the provided example as indicated below:
|
||||
Following is an example demonstrating how a tool would use the OMPT target APIs supported. The
|
||||
README in /opt/rocm/llvm/examples/tools/ompt outlines the steps to follow, and you can run the
|
||||
provided example as indicated below:
|
||||
|
||||
```sh
|
||||
cd /opt/rocm/llvm/examples/tools/ompt/veccopy-ompt-target-tracing
|
||||
make run
|
||||
```
|
||||
|
||||
The file `veccopy-ompt-target-tracing.c` simulates how a tool would initiate device activity tracing. The file `callbacks.h` shows the callbacks that may be registered and implemented by the tool.
|
||||
The file `veccopy-ompt-target-tracing.c` simulates how a tool would initiate device activity tracing. The
|
||||
file `callbacks.h` shows the callbacks that may be registered and implemented by the tool.
|
||||
|
||||
### Deprecations and warnings
|
||||
|
||||
#### Linux file system hierarchy standard for ROCm
|
||||
|
||||
ROCm packages have adopted the Linux foundation file system hierarchy standard in this release to ensure ROCm components follow open source conventions for Linux-based distributions. While moving to a new file system hierarchy, ROCm ensures backward compatibility with its 5.1 version or older file system hierarchy. See below for a detailed explanation of the new file system hierarchy and backward compatibility.
|
||||
ROCm packages have adopted the Linux foundation file system hierarchy standard in this release to
|
||||
ensure ROCm components follow open source conventions for Linux-based distributions. While
|
||||
moving to a new file system hierarchy, ROCm ensures backward compatibility with its 5.1 version or
|
||||
older file system hierarchy. See below for a detailed explanation of the new file system hierarchy and
|
||||
backward compatibility.
|
||||
|
||||
##### New file system hierarchy
|
||||
|
||||
@@ -346,23 +375,26 @@ The following is the new file system hierarchy:
|
||||
|
||||
```
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major release.
|
||||
```note
|
||||
ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major
|
||||
release.
|
||||
```
|
||||
|
||||
For more information, refer to <https://refspecs.linuxfoundation.org/fhs.shtml>.
|
||||
|
||||
##### Backward compatibility with older file systems
|
||||
|
||||
ROCm has moved header files and libraries to its new location as indicated in the above structure and included symbolic-link and wrapper header files in its old location for backward compatibility.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> ROCm will continue supporting backward compatibility until the next major release.
|
||||
ROCm has moved header files and libraries to its new location as indicated in the above structure and
|
||||
included symbolic-link and wrapper header files in its old location for backward compatibility.
|
||||
|
||||
```note
|
||||
ROCm will continue supporting backward compatibility until the next major release.
|
||||
```
|
||||
##### Wrapper header files
|
||||
|
||||
Wrapper header files are placed in the old location (`/opt/rocm-xxx/<component>/include`) with a warning message to include files from the new location (`/opt/rocm-xxx/include`) as shown in the example below:
|
||||
Wrapper header files are placed in the old location (`/opt/rocm-xxx/<component>/include`) with a
|
||||
warning message to include files from the new location (`/opt/rocm-xxx/include`) as shown in the
|
||||
example below:
|
||||
|
||||
```cpp
|
||||
// Code snippet from hip_runtime.h
|
||||
@@ -379,7 +411,8 @@ The wrapper header files’ backward compatibility deprecation is as follows:
|
||||
|
||||
##### Library files
|
||||
|
||||
Library files are available in the `/opt/rocm-xxx/lib` folder. For backward compatibility, the old library location (`/opt/rocm-xxx/<component>/lib`) has a soft link to the library at the new location.
|
||||
Library files are available in the `/opt/rocm-xxx/lib` folder. For backward compatibility, the old library
|
||||
location (`/opt/rocm-xxx/<component>/lib`) has a soft link to the library at the new location.
|
||||
|
||||
Example:
|
||||
|
||||
@@ -392,7 +425,9 @@ lrwxrwxrwx 1 root root 24 May 10 23:32 libamdhip64.so -> ../../lib/libamdhip64
|
||||
|
||||
##### CMake config files
|
||||
|
||||
All CMake configuration files are available in the `/opt/rocm-xxx/lib/cmake/<component>` folder. For backward compatibility, the old CMake locations (`/opt/rocm-xxx/<component>/lib/cmake`) consist of a soft link to the new CMake config.
|
||||
All CMake configuration files are available in the `/opt/rocm-xxx/lib/cmake/<component>` folder. For
|
||||
backward compatibility, the old CMake locations (`/opt/rocm-xxx/<component>/lib/cmake`) consist of
|
||||
a soft link to the new CMake config.
|
||||
|
||||
Example:
|
||||
|
||||
@@ -404,20 +439,26 @@ lrwxrwxrwx 1 root root 42 May 10 23:32 hip-config.cmake -> ../../../../lib/cmake
|
||||
|
||||
#### Planned deprecation of hip-rocclr and hip-base packages
|
||||
|
||||
In the ROCm v5.2 release, hip-rocclr and hip-base packages (Debian and RPM) are planned for deprecation and will be removed in a future release. hip-runtime-amd and hip-dev(el) will replace these packages respectively. Users of hip-rocclr must install two packages, hip-runtime-amd and hip-dev, to get the same set of packages installed by hip-rocclr previously.
|
||||
In the ROCm v5.2 release, hip-rocclr and hip-base packages (Debian and RPM) are planned for
|
||||
deprecation and will be removed in a future release. hip-runtime-amd and hip-dev(el) will replace
|
||||
these packages respectively. Users of hip-rocclr must install two packages, hip-runtime-amd and
|
||||
hip-dev, to get the same set of packages installed by hip-rocclr previously.
|
||||
|
||||
Currently, both package names hip-rocclr (or) hip-runtime-amd and hip-base (or) hip-dev(el) are supported.
|
||||
Deprecation of Integrated HIP Directed Tests
|
||||
Currently, both package names hip-rocclr (or) hip-runtime-amd and hip-base (or) hip-dev(el) are
|
||||
supported.
|
||||
|
||||
The integrated HIP directed tests, which are currently built by default, are deprecated in this release. The default building and execution support through CMake will be removed in future release.
|
||||
#### Deprecation of integrated HIP directed tests
|
||||
|
||||
### Fixed defects
|
||||
The integrated HIP directed tests, which are currently built by default, are deprecated in this release.
|
||||
The default building and execution support through CMake will be removed in future release.
|
||||
|
||||
| Fixed Defect | Fix |
|
||||
|------------------------------------------------------------------------------|----------|
|
||||
| ROCmInfo does not list gpus | Code fix |
|
||||
| Hang observed while restoring cooperative group samples | Code fix |
|
||||
| ROCM-SMI over SRIOV: Unsupported commands do not return proper error message | Code fix |
|
||||
### Defect fixes
|
||||
|
||||
| Defect | Fix |
|
||||
|--------|------|
|
||||
| ROCmInfo does not list gpus | code fix |
|
||||
| Hang observed while restoring cooperative group samples | code fix |
|
||||
| ROCM-SMI over SRIOV: Unsupported commands do not return proper error message | code fix |
|
||||
|
||||
### Known issues
|
||||
|
||||
@@ -427,35 +468,44 @@ This section consists of known issues in this release.
|
||||
|
||||
##### Issue
|
||||
|
||||
A compiler error occurs when using -O0 flag to compile code for gfx1030 that calls atomicAddNoRet, which is defined in amd_hip_atomic.h. The compiler generates an illegal instruction for gfx1030.
|
||||
A compiler error occurs when using -O0 flag to compile code for gfx1030 that calls atomicAddNoRet,
|
||||
which is defined in amd_hip_atomic.h. The compiler generates an illegal instruction for gfx1030.
|
||||
|
||||
##### Workaround
|
||||
|
||||
The workaround is not to use the -O0 flag for this case. For higher optimization levels, the compiler does not generate an invalid instruction.
|
||||
The workaround is not to use the -O0 flag for this case. For higher optimization levels, the compiler
|
||||
does not generate an invalid instruction.
|
||||
|
||||
#### System freeze observed during CUDA memtest checkpoint
|
||||
|
||||
##### Issue
|
||||
|
||||
Checkpoint/Restore in Userspace (CRIU) requires 20 MB of VRAM approximately to checkpoint and restore. The CRIU process may freeze if the maximum amount of available VRAM is allocated to checkpoint applications.
|
||||
Checkpoint/Restore in Userspace (CRIU) requires 20 MB of VRAM approximately to checkpoint and
|
||||
restore. The CRIU process may freeze if the maximum amount of available VRAM is allocated to
|
||||
checkpoint applications.
|
||||
|
||||
##### Workaround
|
||||
|
||||
To use CRIU to checkpoint and restore your application, limit the amount of VRAM the application uses to ensure at least 20 MB is available.
|
||||
To use CRIU to checkpoint and restore your application, limit the amount of VRAM the application uses
|
||||
to ensure at least 20 MB is available.
|
||||
|
||||
#### HPC test fails with the “HSA_STATUS_ERROR_MEMORY_FAULT” error
|
||||
|
||||
##### Issue
|
||||
|
||||
The compiler may incorrectly compile a program that uses the `__shfl_sync(mask, value, srcLane)` function when the "value" parameter to the function is undefined along some path to the function. For most functions, uninitialized inputs cause undefined behavior, but the definition for `__shfl_sync` should allow for undefined values.
|
||||
The compiler may incorrectly compile a program that uses the `__shfl_sync(mask, value, srcLane)`
|
||||
function when the "value" parameter to the function is undefined along some path to the function. For
|
||||
most functions, uninitialized inputs cause undefined behavior, but the definition for `__shfl_sync` should
|
||||
allow for undefined values.
|
||||
|
||||
##### Workaround
|
||||
|
||||
The workaround is to initialize the parameters to `__shfl_sync`.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> When the `-Wall` compilation flag is used, the compiler generates a warning indicating the variable is initialized along some path.
|
||||
```note
|
||||
When the `-Wall` compilation flag is used, the compiler generates a warning indicating the variable is
|
||||
initialized along some path.
|
||||
```
|
||||
|
||||
Example:
|
||||
|
||||
@@ -471,24 +521,32 @@ res = __shfl_sync(mask, res, 0);
|
||||
|
||||
##### Issue
|
||||
|
||||
In recent changes to Clang, insertion of the noundef attribute to all the function arguments has been enabled by default.
|
||||
In recent changes to Clang, insertion of the noundef attribute to all the function arguments has been
|
||||
enabled by default.
|
||||
|
||||
In the HIP kernel, variable var in shfl_sync may not be initialized, so LLVM IR treats it as undef.
|
||||
|
||||
So, the function argument that is potentially undef (because it is not intialized) has always been assumed to be noundef by LLVM IR (since Clang has inserted noundef attribute). This leads to ambiguous kernel execution.
|
||||
So, the function argument that is potentially undef (because it is not initialized) has always been
|
||||
assumed to be noundef by LLVM IR (since Clang has inserted the noundef attribute). This leads to
|
||||
ambiguous kernel execution.
|
||||
|
||||
##### Workaround
|
||||
|
||||
* Skip adding `noundef` attribute to functions tagged with convergent attribute. Refer to <https://reviews.llvm.org/D124158> for more information.
|
||||
* Skip adding `noundef` attribute to functions tagged with convergent attribute. Refer to
|
||||
<https://reviews.llvm.org/D124158> for more information.
|
||||
|
||||
* Introduce shuffle attribute and add it to `__shfl` like APIs at hip headers. Clang can skip adding noundef attribute, if it finds that argument is tagged with shuffle attribute. Refer to <https://reviews.llvm.org/D125378> for more information.
|
||||
* Introduce shuffle attribute and add it to `__shfl` like APIs at hip headers. Clang can skip adding the
|
||||
`noundef` attribute, if it finds that argument is tagged with shuffle attribute. Refer to
|
||||
<https://reviews.llvm.org/D125378> for more information.
|
||||
|
||||
* Introduce clang builtin for `__shfl` to identify it and skip adding `noundef` attribute.
|
||||
|
||||
* Introduce `__builtin_freeze` to use on the relevant arguments in library wrappers. The library/header need to insert freezes on the relevant inputs.
|
||||
* Introduce `__builtin_freeze` to use on the relevant arguments in library wrappers. The library/header
|
||||
need to insert freezes on the relevant inputs.
|
||||
|
||||
#### Issue with applications triggering oversubscription
|
||||
|
||||
There is a known issue with applications that trigger oversubscription. A hardware hang occurs when ROCgdb is used on AMD Instinct™ MI50 and MI100 systems.
|
||||
There is a known issue with applications that trigger oversubscription. A hardware hang occurs when
|
||||
ROCgdb is used on AMD Instinct™ MI50 and MI100 systems.
|
||||
|
||||
This issue is under investigation and will be fixed in a future release.
|
||||
|
||||
@@ -3,25 +3,28 @@
|
||||
|
||||
#### Ubuntu 18.04 end-of-life announcement
|
||||
|
||||
Support for Ubuntu 18.04 ends in this release. Future releases of ROCm will not provide prebuilt packages for Ubuntu 18.04.
|
||||
HIP and Other Runtimes
|
||||
Support for Ubuntu 18.04 ends in this release. Future releases of ROCm will not provide prebuilt
|
||||
packages for Ubuntu 18.04.
|
||||
|
||||
#### HIP Runtime
|
||||
#### HIP runtime
|
||||
|
||||
##### Fixes
|
||||
|
||||
* A bug was discovered in the HIP graph capture implementation in the ROCm v5.2.0 release. If the same kernel is called twice (with different argument values) in a graph capture, the implementation only kept the argument values for the second kernel call.
|
||||
* A bug was discovered in the HIP graph capture implementation in the ROCm v5.2.0 release. If the
|
||||
same kernel is called twice (with different argument values) in a graph capture, the implementation
|
||||
only kept the argument values for the second kernel call.
|
||||
|
||||
* A bug was introduced in the hiprtc implementation in the ROCm v5.2.0 release. This bug caused the `hiprtcGetLoweredName` call to fail for named expressions with whitespace in it.
|
||||
* A bug was introduced in the hiprtc implementation in the ROCm v5.2.0 release. This bug caused the
|
||||
`hiprtcGetLoweredName` call to fail for named expressions with whitespace in it.
|
||||
|
||||
Example:
|
||||
|
||||
The named expression `my_sqrt<complex<double>>` passed but `my_sqrt<complex<double >>` failed.
|
||||
ROCm Libraries
|
||||
The named expression `my_sqrt<complex<double>>` passed but `my_sqrt<complex<double >>`
|
||||
failed.
|
||||
|
||||
#### RCCL
|
||||
|
||||
##### Added
|
||||
##### Additions
|
||||
|
||||
Compatibility with NCCL 2.12.10
|
||||
|
||||
@@ -33,9 +36,11 @@ Compatibility with NCCL 2.12.10
|
||||
|
||||
* Added experimental support for using multiple ranks per device
|
||||
|
||||
* Requires using a new interface to create communicator (ncclCommInitRankMulti), refer to the interface documentation for details.
|
||||
* Requires using a new interface to create communicator (ncclCommInitRankMulti), refer to the
|
||||
interface documentation for details.
|
||||
|
||||
* To avoid potential deadlocks, user might have to set an environment variables increasing the number of hardware queues. For example,
|
||||
* To avoid potential deadlocks, user might have to set an environment variables increasing the
|
||||
number of hardware queues. For example,
|
||||
|
||||
```sh
|
||||
export GPU_MAX_HW_QUEUES=16
|
||||
@@ -45,20 +50,23 @@ export GPU_MAX_HW_QUEUES=16
|
||||
|
||||
* Opt-in with NCCL_IB_SOCK_CLIENT_PORT_REUSE=1 and NCCL_IB_SOCK_SERVER_PORT_REUSE=1
|
||||
|
||||
* When "Call to bind failed: Address already in use" error happens in large-scale AlltoAll(for example, >=64 MI200 nodes), users are suggested to opt-in either one or both of the options to resolve the massive port usage issue
|
||||
* When "Call to bind failed: Address already in use" error happens in large-scale AlltoAll (for example,
|
||||
\>=64 MI200 nodes), users are suggested to opt-in either one or both of the options to resolve the
|
||||
massive port usage issue
|
||||
|
||||
* Avoid using NCCL_IB_SOCK_SERVER_PORT_REUSE when NCCL_NCHANNELS_PER_NET_PEER is tuned >1
|
||||
* Avoid using NCCL_IB_SOCK_SERVER_PORT_REUSE when NCCL_NCHANNELS_PER_NET_PEER is tuned
|
||||
\>1
|
||||
|
||||
##### Removed
|
||||
##### Removals
|
||||
|
||||
* Removed experimental clique-based kernels
|
||||
|
||||
#### Development tools
|
||||
|
||||
No notable changes in this release for development tools, including the compiler, profiler, and debugger
|
||||
Deployment and Management Tools
|
||||
No notable changes in this release for development tools, including the compiler, profiler, and
|
||||
debugger deployment and management tools
|
||||
|
||||
No notable changes in this release for deployment and management tools.
|
||||
Older ROCm Releases
|
||||
|
||||
For release information for older ROCm releases, refer to <https://github.com/RadeonOpenCompute/ROCm/blob/master/CHANGELOG.md>
|
||||
For release information for older ROCm releases, refer to
|
||||
<https://github.com/RadeonOpenCompute/ROCm/blob/master/CHANGELOG.md>
|
||||
|
||||
@@ -3,15 +3,24 @@
|
||||
|
||||
#### HIP Perl scripts deprecation
|
||||
|
||||
The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, compiled binaries will be available as `hipcc.bin` and `hipconfig.bin` as replacements for the Perl scripts.
|
||||
The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, compiled binaries will be
|
||||
available as `hipcc.bin` and `hipconfig.bin` as replacements for the Perl scripts.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> There will be a transition period where the Perl scripts and compiled binaries are available before the scripts are removed. There will be no functional difference between the Perl scripts and their compiled binary counterpart. No user action is required. Once these are available, users can optionally switch to `hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from `hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
```note
|
||||
There will be a transition period where the Perl scripts and compiled binaries are available before the
|
||||
scripts are removed. There will be no functional difference between the Perl scripts and their compiled
|
||||
binary counterpart. No user action is required. Once these are available, users can optionally switch to
|
||||
`hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from
|
||||
`hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
```
|
||||
|
||||
#### Linux file system hierarchy standard for ROCm
|
||||
|
||||
ROCm packages have adopted the Linux foundation file system hierarchy standard in this release to ensure ROCm components follow open source conventions for Linux-based distributions. While moving to a new file system hierarchy, ROCm ensures backward compatibility with its 5.1 version or older file system hierarchy. See below for a detailed explanation of the new file system hierarchy and backward compatibility.
|
||||
ROCm packages have adopted the Linux foundation file system hierarchy standard in this release to
|
||||
ensure ROCm components follow open source conventions for Linux-based distributions. While
|
||||
moving to a new file system hierarchy, ROCm ensures backward compatibility with its 5.1 version or
|
||||
older file system hierarchy. See below for a detailed explanation of the new file system hierarchy and
|
||||
backward compatibility.
|
||||
|
||||
##### New file system hierarchy
|
||||
|
||||
@@ -47,23 +56,27 @@ The following is the new file system hierarchy:
|
||||
|
||||
```
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major release.
|
||||
```note
|
||||
ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major
|
||||
release.
|
||||
```
|
||||
|
||||
For more information, refer to <https://refspecs.linuxfoundation.org/fhs.shtml>.
|
||||
|
||||
##### Backward compatibility with older file systems
|
||||
|
||||
ROCm has moved header files and libraries to its new location as indicated in the above structure and included symbolic-link and wrapper header files in its old location for backward compatibility.
|
||||
ROCm has moved header files and libraries to its new location as indicated in the above structure and
|
||||
included symbolic-link and wrapper header files in its old location for backward compatibility.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> ROCm will continue supporting backward compatibility until the next major release.
|
||||
```note
|
||||
ROCm will continue supporting backward compatibility until the next major release.
|
||||
```
|
||||
|
||||
##### Wrapper header files
|
||||
|
||||
Wrapper header files are placed in the old location (`/opt/rocm-xxx/<component>/include`) with a warning message to include files from the new location (`/opt/rocm-xxx/include`) as shown in the example below:
|
||||
Wrapper header files are placed in the old location (`/opt/rocm-xxx/<component>/include`) with a
|
||||
warning message to include files from the new location (`/opt/rocm-xxx/include`) as shown in the
|
||||
example below:
|
||||
|
||||
```cpp
|
||||
// Code snippet from hip_runtime.h
|
||||
@@ -80,7 +93,8 @@ The wrapper header files’ backward compatibility deprecation is as follows:
|
||||
|
||||
##### Library files
|
||||
|
||||
Library files are available in the `/opt/rocm-xxx/lib` folder. For backward compatibility, the old library location (`/opt/rocm-xxx/<component>/lib`) has a soft link to the library at the new location.
|
||||
Library files are available in the `/opt/rocm-xxx/lib` folder. For backward compatibility, the old library
|
||||
location (`/opt/rocm-xxx/<component>/lib`) has a soft link to the library at the new location.
|
||||
|
||||
Example:
|
||||
|
||||
@@ -93,7 +107,9 @@ lrwxrwxrwx 1 root root 24 May 10 23:32 libamdhip64.so -> ../../lib/libamdhip64
|
||||
|
||||
##### CMake config files
|
||||
|
||||
All CMake configuration files are available in the `/opt/rocm-xxx/lib/cmake/<component>` folder. For backward compatibility, the old CMake locations (`/opt/rocm-xxx/<component>/lib/cmake`) consist of a soft link to the new CMake config.
|
||||
All CMake configuration files are available in the `/opt/rocm-xxx/lib/cmake/<component>` folder. For
|
||||
backward compatibility, the old CMake locations (`/opt/rocm-xxx/<component>/lib/cmake`) consist of
|
||||
a soft link to the new CMake config.
|
||||
|
||||
Example:
|
||||
|
||||
@@ -103,23 +119,29 @@ total 0
|
||||
lrwxrwxrwx 1 root root 42 May 10 23:32 hip-config.cmake -> ../../../../lib/cmake/hip/hip-config.cmake
|
||||
```
|
||||
|
||||
### Fixed defects
|
||||
### Defect fixes
|
||||
|
||||
The following defects are fixed in this release.
|
||||
|
||||
These defects were identified and documented as known issues in previous ROCm releases and are fixed in the ROCm v5.3 release.
|
||||
These defects were identified and documented as known issues in previous ROCm releases and are
|
||||
fixed in the ROCm v5.3 release.
|
||||
|
||||
#### Kernel produces incorrect results with ROCm 5.2
|
||||
|
||||
User code did not initialize certain data constructs, leading to a correctness issue. A strict reading of the C++ standard suggests that failing to initialize these data constructs is undefined behavior. However, a special case was added for a specific compiler builtin to handle the uninitialized data in a defined manner.
|
||||
User code did not initialize certain data constructs, leading to a correctness issue. A strict reading of
|
||||
the C++ standard suggests that failing to initialize these data constructs is undefined behavior.
|
||||
However, a special case was added for a specific compiler builtin to handle the uninitialized data in a
|
||||
defined manner.
|
||||
|
||||
The compiler fix consists of the following patches:
|
||||
|
||||
* A new `noundef` attribute is added. This attribute denotes when a function call argument or return val may never contain uninitialized bits.
|
||||
For more information, see <https://reviews.llvm.org/D81678>
|
||||
* The application of this attribute was refined such that it was not added to a specific compiler builtin where the compiler knows that inactive lanes do not impact program execution.
|
||||
|
||||
For more information, see <https://github.com/RadeonOpenCompute/llvm-project/commit/accf36c58409268ca1f216cdf5ad812ba97ceccd>.
|
||||
* A new `noundef` attribute is added. This attribute denotes when a function call argument or return
|
||||
value may never contain uninitialized bits. For more information, see
|
||||
<https://reviews.llvm.org/D81678>
|
||||
* The application of this attribute was refined such that it was not added to a specific compiler built-in
|
||||
where the compiler knows that inactive lanes do not impact program execution. For more
|
||||
information, see
|
||||
<https://github.com/RadeonOpenCompute/llvm-project/commit/accf36c58409268ca1f216cdf5ad812ba97ceccd>.
|
||||
|
||||
### Known issues
|
||||
|
||||
@@ -127,7 +149,10 @@ This section consists of known issues in this release.
|
||||
|
||||
#### Issue with OpenMP-extras package upgrade
|
||||
|
||||
The `openmp-extras` package has been split into runtime (`openmp-extras-runtime`) and dev (`openmp-extras-devel`) packages. This change has broken the upgrade support for the `openmp-extras` package in RHEL/SLES.
|
||||
The `openmp-extras` package has been split into runtime (`openmp-extras-runtime`) and dev
|
||||
(`openmp-extras-devel`) packages. This change has broken the upgrade support for the
|
||||
`openmp-extras` package in RHEL/SLES.
|
||||
|
||||
An available workaround in RHEL is to use the following command for upgrades:
|
||||
|
||||
```sh
|
||||
@@ -143,16 +168,21 @@ zypper update --force-resolution <meta-package>
|
||||
|
||||
#### AMD Instinct™ MI200 SRIOV virtualization issue
|
||||
|
||||
There is a known issue in this ROCm v5.3 release with all AMD Instinct™ MI200 devices running within a virtual function (VF) under SRIOV virtualization. This issue will likely impact the functionality of SRIOV-based workloads, but does not impact Discrete Device Assignment (DDA) or Bare Metal.
|
||||
There is a known issue in this ROCm v5.3 release with all AMD Instinct™ MI200 devices running within
|
||||
a virtual function (VF) under SRIOV virtualization. This issue will likely impact the functionality of
|
||||
SRIOV-based workloads, but does not impact Discrete Device Assignment (DDA) or Bare Metal.
|
||||
|
||||
Until a fix is provided, users should rely on ROCm v5.2.3 to support their SRIOV workloads.
|
||||
|
||||
#### System crash when IMMOU is enabled
|
||||
|
||||
If input-output memory management unit (IOMMU) is enabled in SBIOS and ROCm is installed, the system may report the following failure or errors when running workloads such as bandwidth test, clinfo, and HelloWord.cl and cause a system crash.
|
||||
If input-output memory management unit (IOMMU) is enabled in SBIOS and ROCm is installed, the
|
||||
system may report the following failure or errors when running workloads such as bandwidth test,
|
||||
clinfo, and HelloWord.cl and cause a system crash.
|
||||
|
||||
* IO PAGE FAULT
|
||||
* IRQ remapping does not support X2APIC mode
|
||||
* NMI error
|
||||
|
||||
Workaround: To avoid the system crash, add `amd_iommu=on iommu=pt` as the kernel bootparam, as indicated in the warning message.
|
||||
Workaround: To avoid the system crash, add `amd_iommu=on iommu=pt` as the kernel bootparam, as
|
||||
indicated in the warning message.
|
||||
|
||||
@@ -1,13 +1,14 @@
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
### Fixed defects
|
||||
### Defect fixes
|
||||
|
||||
The following known issues in ROCm v5.3.2 are fixed in this release.
|
||||
|
||||
#### Peer-to-peer DMA mapping errors with SLES and RHEL
|
||||
|
||||
Peer-to-Peer Direct Memory Access (DMA) mapping errors on Dell systems (R7525 and R750XA) with SLES 15 SP3/SP4 and RHEL 9.0 are fixed in this release.
|
||||
Peer-to-Peer Direct Memory Access (DMA) mapping errors on Dell systems (R7525 and R750XA) with
|
||||
SLES 15 SP3/SP4 and RHEL 9.0 are fixed in this release.
|
||||
|
||||
Previously, running rocminfo resulted in Peer-to-Peer DMA mapping errors.
|
||||
Previously, running `rocminfo` resulted in Peer-to-Peer DMA mapping errors.
|
||||
|
||||
#### RCCL tuning table
|
||||
|
||||
@@ -15,7 +16,8 @@ The RCCL tuning table is updated for supported platforms.
|
||||
|
||||
#### SGEMM (F32 GEMM) routines in rocBLAS
|
||||
|
||||
Functional correctness failures in SGEMM (F32 GEMM) routines in rocBLAS for certain problem sizes and ranges are fixed in this release.
|
||||
Functional correctness failures in SGEMM (F32 GEMM) routines in rocBLAS for certain problem sizes
|
||||
and ranges are fixed in this release.
|
||||
|
||||
### Known issues
|
||||
|
||||
@@ -23,7 +25,9 @@ This section consists of known issues in this release.
|
||||
|
||||
#### AMD Instinct™ MI200 SRIOV virtualization issue
|
||||
|
||||
There is a known issue in this ROCm v5.3 release with all AMD Instinct™ MI200 devices running within a virtual function (VF) under SRIOV virtualization. This issue will likely impact the functionality of SRIOV-based workloads but does not impact Discrete Device Assignment (DDA) or bare metal.
|
||||
There is a known issue in this ROCm v5.3 release with all AMD Instinct™ MI200 devices running within
|
||||
a virtual function (VF) under SRIOV virtualization. This issue will likely impact the functionality of
|
||||
SRIOV-based workloads but does not impact Discrete Device Assignment (DDA) or bare metal.
|
||||
|
||||
Until a fix is provided, users should rely on ROCm v5.2.3 to support their SRIOV workloads.
|
||||
|
||||
@@ -31,14 +35,18 @@ Until a fix is provided, users should rely on ROCm v5.2.3 to support their SRIOV
|
||||
|
||||
Customers cannot update the Integrated Firmware Image (IFWI) for AMD Instinct™ MI200 accelerators.
|
||||
|
||||
An updated firmware maintenance bundle consisting of an installation tool and images specific to AMD Instinct™ MI200 accelerators is under planning and will be available soon.
|
||||
An updated firmware maintenance bundle consisting of an installation tool and images specific to
|
||||
AMD Instinct™ MI200 accelerators is under planning and will be available soon.
|
||||
|
||||
#### Known issue with rocThrust and rocPRIM libraries
|
||||
|
||||
There is a known known issue with rocThrust and rocPRIM libraries supporting iterator and types in ROCm v5.3.x releases.
|
||||
There is a known known issue with rocThrust and rocPRIM libraries supporting iterator and types in
|
||||
ROCm v5.3.x releases.
|
||||
|
||||
* thrust::merge no longer correctly supports different iterator types for `keys_input1` and `keys_input2`.
|
||||
* `thrust::merge` no longer correctly supports different iterator types for `keys_input1` and
|
||||
`keys_input2`.
|
||||
|
||||
* rocprim::device_merge no longer correctly supports using different types for `keys_input1` and `keys_input2`.
|
||||
* `rocprim::device_merge` no longer correctly supports using different types for `keys_input1` and
|
||||
`keys_input2`.
|
||||
|
||||
This issue is currently under investigation and will be resolved in a future release.
|
||||
|
||||
@@ -1,12 +1,15 @@
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
### Fixed defects
|
||||
### Defect fixes
|
||||
|
||||
#### Issue with rocTHRUST and rocPRIM libraries
|
||||
|
||||
There was a known issue with rocTHRUST and rocPRIM libraries supporting iterator and types in ROCm v5.3.x releases.
|
||||
There was a known issue with rocTHRUST and rocPRIM libraries supporting iterator and types in ROCm
|
||||
v5.3.x releases.
|
||||
|
||||
* `thrust::merge` no longer correctly supports different iterator types for `keys_input1` and `keys_input2`.
|
||||
* `rocprim::device_merge` no longer correctly supports using different types for `keys_input1` and `keys_input2`.
|
||||
* `thrust::merge` no longer correctly supports different iterator types for `keys_input1` and
|
||||
`keys_input2`.
|
||||
* `rocprim::device_merge` no longer correctly supports using different types for `keys_input1` and
|
||||
`keys_input2`.
|
||||
|
||||
This issue is resolved with the following fixes to compilation failures:
|
||||
|
||||
|
||||
@@ -8,13 +8,15 @@ The ROCm v5.4 release consists of the following HIP enhancements:
|
||||
|
||||
##### Support for wall_clock64
|
||||
|
||||
A new timer function wall_clock64() is supported, which returns wall clock count at a constant frequency on the device.
|
||||
A new timer function wall_clock64() is supported, which returns wall clock count at a constant
|
||||
frequency on the device.
|
||||
|
||||
```cpp
|
||||
long long int wall_clock64();
|
||||
```
|
||||
|
||||
It returns wall clock count at a constant frequency on the device, which can be queried via HIP API with the hipDeviceAttributeWallClockRate attribute of the device in the HIP application code.
|
||||
It returns wall clock count at a constant frequency on the device, which can be queried via HIP API with
|
||||
the hipDeviceAttributeWallClockRate attribute of the device in the HIP application code.
|
||||
|
||||
Example:
|
||||
|
||||
@@ -25,19 +27,23 @@ int wallClkRate = 0; //in kilohertz
|
||||
|
||||
Where hipDeviceAttributeWallClockRate is a device attribute.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> The wall clock frequency is a per-device attribute.
|
||||
```note
|
||||
The wall clock frequency is a per-device attribute.
|
||||
```
|
||||
|
||||
##### New registry added for GPU_MAX_HW_QUEUES
|
||||
|
||||
The GPU_MAX_HW_QUEUES registry defines the maximum number of independent hardware queues allocated per process per device.
|
||||
The GPU_MAX_HW_QUEUES registry defines the maximum number of independent hardware queues
|
||||
allocated per process per device.
|
||||
|
||||
The environment variable controls how many independent hardware queues HIP runtime can create per process, per device. If the application allocates more HIP streams than this number, then the HIP runtime reuses the same hardware queues for the new streams in a round-robin manner.
|
||||
The environment variable controls how many independent hardware queues HIP runtime can create
|
||||
per process, per device. If the application allocates more HIP streams than this number, then the HIP
|
||||
runtime reuses the same hardware queues for the new streams in a round-robin manner.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> This maximum number does not apply to hardware queues created for CU-masked HIP streams or cooperative queues for HIP Cooperative Groups (there is only one queue per device).
|
||||
```note
|
||||
This maximum number does not apply to hardware queues created for CU-masked HIP streams or
|
||||
cooperative queues for HIP Cooperative Groups (there is only one queue per device).
|
||||
```
|
||||
|
||||
For more details, refer to the HIP Programming Guide.
|
||||
|
||||
@@ -81,7 +87,8 @@ This release consists of the following OpenMP enhancements:
|
||||
|
||||
* Enable new device RTL in libomptarget as default.
|
||||
* New flag `-fopenmp-target-fast` to imply `-fopenmp-target-ignore-env-vars -fopenmp-assume-no-thread-state -fopenmp-assume-no-nested-parallelism`.
|
||||
* Support for the collapse clause and non-unit stride in cases where the no-loop specialized kernel is generated.
|
||||
* Support for the collapse clause and non-unit stride in cases where the no-loop specialized kernel is
|
||||
generated.
|
||||
* Initial implementation of optimized cross-team sum reduction for float and double type scalars.
|
||||
* Pool-based optimization in the OpenMP runtime to reduce locking during data transfer.
|
||||
|
||||
@@ -89,15 +96,24 @@ This release consists of the following OpenMP enhancements:
|
||||
|
||||
#### HIP Perl scripts deprecation
|
||||
|
||||
The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, compiled binaries will be available as `hipcc.bin` and `hipconfig.bin` as replacements for the Perl scripts.
|
||||
The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, compiled binaries will be
|
||||
available as `hipcc.bin` and `hipconfig.bin` as replacements for the Perl scripts.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> There will be a transition period where the Perl scripts and compiled binaries are available before the scripts are removed. There will be no functional difference between the Perl scripts and their compiled binary counterpart. No user action is required. Once these are available, users can optionally switch to `hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from `hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
```note
|
||||
There will be a transition period where the Perl scripts and compiled binaries are available before the
|
||||
scripts are removed. There will be no functional difference between the Perl scripts and their compiled
|
||||
binary counterpart. No user action is required. Once these are available, users can optionally switch to
|
||||
`hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from
|
||||
`hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
```
|
||||
|
||||
##### Linux file system hierarchy standard for ROCm
|
||||
|
||||
ROCm packages have adopted the Linux foundation file system hierarchy standard in this release to ensure ROCm components follow open source conventions for Linux-based distributions. While moving to a new file system hierarchy, ROCm ensures backward compatibility with its 5.1 version or older file system hierarchy. See below for a detailed explanation of the new file system hierarchy and backward compatibility.
|
||||
ROCm packages have adopted the Linux foundation file system hierarchy standard in this release to
|
||||
ensure ROCm components follow open source conventions for Linux-based distributions. While
|
||||
moving to a new file system hierarchy, ROCm ensures backward compatibility with its 5.1 version or
|
||||
older file system hierarchy. See below for a detailed explanation of the new file system hierarchy and
|
||||
backward compatibility.
|
||||
|
||||
##### New file system hierarchy
|
||||
|
||||
@@ -133,23 +149,27 @@ The following is the new file system hierarchy:
|
||||
|
||||
```
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major release.
|
||||
```note
|
||||
ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major
|
||||
release.
|
||||
```
|
||||
|
||||
For more information, refer to <https://refspecs.linuxfoundation.org/fhs.shtml>.
|
||||
|
||||
##### Backward compatibility with older file systems
|
||||
|
||||
ROCm has moved header files and libraries to its new location as indicated in the above structure and included symbolic-link and wrapper header files in its old location for backward compatibility.
|
||||
ROCm has moved header files and libraries to its new location as indicated in the above structure and
|
||||
included symbolic-link and wrapper header files in its old location for backward compatibility.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> ROCm will continue supporting backward compatibility until the next major release.
|
||||
```note
|
||||
ROCm will continue supporting backward compatibility until the next major release.
|
||||
```
|
||||
|
||||
##### Wrapper header files
|
||||
|
||||
Wrapper header files are placed in the old location (`/opt/rocm-xxx/<component>/include`) with a warning message to include files from the new location (`/opt/rocm-xxx/include`) as shown in the example below:
|
||||
Wrapper header files are placed in the old location (`/opt/rocm-xxx/<component>/include`) with a
|
||||
warning message to include files from the new location (`/opt/rocm-xxx/include`) as shown in the
|
||||
example below:
|
||||
|
||||
```cpp
|
||||
// Code snippet from hip_runtime.h
|
||||
@@ -166,7 +186,8 @@ The wrapper header files’ backward compatibility deprecation is as follows:
|
||||
|
||||
##### Library files
|
||||
|
||||
Library files are available in the `/opt/rocm-xxx/lib` folder. For backward compatibility, the old library location (`/opt/rocm-xxx/<component>/lib`) has a soft link to the library at the new location.
|
||||
Library files are available in the `/opt/rocm-xxx/lib` folder. For backward compatibility, the old library
|
||||
location (`/opt/rocm-xxx/<component>/lib`) has a soft link to the library at the new location.
|
||||
|
||||
Example:
|
||||
|
||||
@@ -179,7 +200,9 @@ lrwxrwxrwx 1 root root 24 May 10 23:32 libamdhip64.so -> ../../lib/libamdhip64
|
||||
|
||||
##### CMake config files
|
||||
|
||||
All CMake configuration files are available in the `/opt/rocm-xxx/lib/cmake/<component>` folder. For backward compatibility, the old CMake locations (`/opt/rocm-xxx/<component>/lib/cmake`) consist of a soft link to the new CMake config.
|
||||
All CMake configuration files are available in the `/opt/rocm-xxx/lib/cmake/<component>` folder. For
|
||||
backward compatibility, the old CMake locations (`/opt/rocm-xxx/<component>/lib/cmake`) consist of
|
||||
a soft link to the new CMake config.
|
||||
|
||||
Example:
|
||||
|
||||
@@ -189,21 +212,24 @@ total 0
|
||||
lrwxrwxrwx 1 root root 42 May 10 23:32 hip-config.cmake -> ../../../../lib/cmake/hip/hip-config.cmake
|
||||
```
|
||||
|
||||
### Fixed defects
|
||||
### Defect fixes
|
||||
|
||||
The following defects are fixed in this release.
|
||||
|
||||
These defects were identified and documented as known issues in previous ROCm releases and are fixed in this release.
|
||||
These defects were identified and documented as known issues in previous ROCm releases and are
|
||||
fixed in this release.
|
||||
|
||||
#### Memory allocated using hipHostMalloc() with flags didn't exhibit fine-grain behavior
|
||||
|
||||
##### Issue
|
||||
|
||||
The test was incorrectly using the `hipDeviceAttributePageableMemoryAccess` device attribute to determine coherent support.
|
||||
The test was incorrectly using the `hipDeviceAttributePageableMemoryAccess` device attribute to
|
||||
determine coherent support.
|
||||
|
||||
##### Fix
|
||||
|
||||
`hipHostMalloc()` allocates memory with fine-grained access by default when the environment variable `HIP_HOST_COHERENT=1` is used.
|
||||
`hipHostMalloc()` allocates memory with fine-grained access by default when the environment variable
|
||||
`HIP_HOST_COHERENT=1` is used.
|
||||
|
||||
For more information, refer to {doc}`hip:.doxygen/docBin/html/index`.
|
||||
|
||||
@@ -212,14 +238,19 @@ For more information, refer to {doc}`hip:.doxygen/docBin/html/index`.
|
||||
|
||||
##### Issue
|
||||
|
||||
On GFX10 GPUs, kernel execution hangs when it is launched on streams created using `hipStreamWithCUMask`.
|
||||
On GFX10 GPUs, kernel execution hangs when it is launched on streams created using
|
||||
`hipStreamWithCUMask`.
|
||||
|
||||
##### Fix
|
||||
|
||||
On GFX10 GPUs, each workgroup processor encompasses two compute units, and the compute units must be enabled as a pair. The `hipStreamWithCUMask` API unit test cases are updated to set compute unit mask (cuMask) in pairs for GFX10 GPUs.
|
||||
On GFX10 GPUs, each workgroup processor encompasses two compute units, and the compute units
|
||||
must be enabled as a pair. The `hipStreamWithCUMask` API unit test cases are updated to set compute
|
||||
unit mask (cuMask) in pairs for GFX10 GPUs.
|
||||
|
||||
#### ROCm tools GPU IDs
|
||||
|
||||
The HIP language device IDs are not the same as the GPU IDs reported by the tools. GPU IDs are globally unique and guaranteed to be consistent across APIs and processes.
|
||||
The HIP language device IDs are not the same as the GPU IDs reported by the tools. GPU IDs are
|
||||
globally unique and guaranteed to be consistent across APIs and processes.
|
||||
|
||||
GPU IDs reported by ROCTracer and ROCProfiler or ROCm Tools are HSA Driver Node ID of that GPU, as it is a unique ID for that device in that particular node.
|
||||
GPU IDs reported by ROCTracer and ROCProfiler or ROCm Tools are HSA Driver Node ID of that GPU,
|
||||
as it is a unique ID for that device in that particular node.
|
||||
|
||||
@@ -25,30 +25,40 @@ This swaps the stream capture mode of a thread.
|
||||
|
||||
This parameter returns `#hipSuccess`, `#hipErrorInvalidValue`.
|
||||
|
||||
For more information, refer to the HIP API documentation at /bundle/HIP_API_Guide/page/modules.html.
|
||||
For more information, refer to the HIP API documentation at
|
||||
/bundle/HIP_API_Guide/page/modules.html.
|
||||
|
||||
### Deprecations and warnings
|
||||
|
||||
#### HIP Perl scripts deprecation
|
||||
|
||||
The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, compiled binaries will be available as `hipcc.bin` and `hipconfig.bin` as replacements for the Perl scripts.
|
||||
The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, compiled binaries will be
|
||||
available as `hipcc.bin` and `hipconfig.bin` as replacements for the Perl scripts.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> There will be a transition period where the Perl scripts and compiled binaries are available before the scripts are removed. There will be no functional difference between the Perl scripts and their compiled binary counterpart. No user action is required. Once these are available, users can optionally switch to `hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from `hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
```note
|
||||
There will be a transition period where the Perl scripts and compiled binaries are available before the
|
||||
scripts are removed. There will be no functional difference between the Perl scripts and their compiled
|
||||
binary counterpart. No user action is required. Once these are available, users can optionally switch to
|
||||
`hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from
|
||||
`hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
```
|
||||
|
||||
### IFWI fixes
|
||||
|
||||
These defects were identified and documented as known issues in previous ROCm releases and are fixed in this release.
|
||||
AMD Instinct™ MI200 Firmware IFWI Maintenance Update #3
|
||||
These defects were identified and documented as known issues in previous ROCm releases and are
|
||||
fixed in this release.
|
||||
|
||||
#### AMD Instinct™ MI200 firmware IFWI maintenance update #3
|
||||
|
||||
This IFWI release fixes the following issue in AMD Instinct™ MI210/MI250 Accelerators.
|
||||
|
||||
After prolonged periods of operation, certain MI200 Instinct™ Accelerators may perform in a degraded way resulting in application failures.
|
||||
After prolonged periods of operation, certain MI200 Instinct™ Accelerators may perform in a degraded
|
||||
way resulting in application failures.
|
||||
|
||||
In this package, AMD delivers a new firmware version for MI200 GPU accelerators and a firmware installation tool – AMD FW FLASH 1.2.
|
||||
In this package, AMD delivers a new firmware version for MI200 GPU accelerators and a firmware
|
||||
installation tool – AMD FW FLASH 1.2.
|
||||
|
||||
| GPU | Production Part Number | SKU | IFWI Name |
|
||||
| GPU | Productionp part number | SKU | IFWI name |
|
||||
|-------|------------|--------|---------------|
|
||||
| MI210 | 113-D673XX | D67302 | D6730200V.110 |
|
||||
| MI210 | 113-D673XX | D67301 | D6730100V.073 |
|
||||
@@ -61,4 +71,5 @@ Instructions on how to download and apply MI200 maintenance updates are availabl
|
||||
|
||||
#### AMD Instinct™ MI200 SRIOV virtualization support
|
||||
|
||||
Maintenance update #3, combined with ROCm 5.4.1, now provides SRIOV virtualization support for all AMD Instinct™ MI200 devices.
|
||||
Maintenance update #3, combined with ROCm 5.4.1, now provides SRIOV virtualization support for all
|
||||
AMD Instinct™ MI200 devices.
|
||||
|
||||
@@ -3,23 +3,32 @@
|
||||
|
||||
#### HIP Perl scripts deprecation
|
||||
|
||||
The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, compiled binaries will be available as `hipcc.bin` and `hipconfig.bin` as replacements for the Perl scripts.
|
||||
The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, compiled binaries will be
|
||||
available as `hipcc.bin` and `hipconfig.bin` as replacements for the Perl scripts.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> There will be a transition period where the Perl scripts and compiled binaries are available before the scripts are removed. There will be no functional difference between the Perl scripts and their compiled binary counterpart. No user action is required. Once these are available, users can optionally switch to `hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from `hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
```note
|
||||
There will be a transition period where the Perl scripts and compiled binaries are available before the
|
||||
scripts are removed. There will be no functional difference between the Perl scripts and their compiled
|
||||
binary counterpart. No user action is required. Once these are available, users can optionally switch to
|
||||
`hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from
|
||||
`hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
```
|
||||
|
||||
#### `hipcc` options deprecation
|
||||
|
||||
The following hipcc options are being deprecated and will be removed in a future release:
|
||||
|
||||
* The `--amdgpu-target` option is being deprecated, and user must use the `–offload-arch` option to specify the GPU architecture.
|
||||
* The `--amdhsa-code-object-version` option is being deprecated. Users can use the Clang/LLVM option `-mllvm -mcode-object-version` to debug issues related to code object versions.
|
||||
* The `--hipcc-func-supp`/`--hipcc-no-func-supp` options are being deprecated, as the function calls are already supported in production on AMD GPUs.
|
||||
* The `--amdgpu-target` option is being deprecated, and user must use the `–offload-arch` option to
|
||||
specify the GPU architecture.
|
||||
* The `--amdhsa-code-object-version` option is being deprecated. Users can use the Clang/LLVM
|
||||
option `-mllvm -mcode-object-version` to debug issues related to code object versions.
|
||||
* The `--hipcc-func-supp`/`--hipcc-no-func-supp` options are being deprecated, as the function calls
|
||||
are already supported in production on AMD GPUs.
|
||||
|
||||
### Known issues
|
||||
|
||||
Under certain circumstances typified by high register pressure, users may encounter a compiler abort with one of the following error messages:
|
||||
Under certain circumstances typified by high register pressure, users may encounter a compiler abort
|
||||
with one of the following error messages:
|
||||
|
||||
* > `error: unhandled SGPR spill to memory`
|
||||
|
||||
|
||||
@@ -3,15 +3,24 @@
|
||||
|
||||
#### HIP Perl scripts deprecation
|
||||
|
||||
The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, compiled binaries will be available as `hipcc.bin` and `hipconfig.bin` as replacements for the Perl scripts.
|
||||
The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, compiled binaries will be
|
||||
available as `hipcc.bin` and `hipconfig.bin` as replacements for the Perl scripts.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> There will be a transition period where the Perl scripts and compiled binaries are available before the scripts are removed. There will be no functional difference between the Perl scripts and their compiled binary counterpart. No user action is required. Once these are available, users can optionally switch to `hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from `hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
```note
|
||||
There will be a transition period where the Perl scripts and compiled binaries are available before the
|
||||
scripts are removed. There will be no functional difference between the Perl scripts and their compiled
|
||||
binary counterpart. No user action is required. Once these are available, users can optionally switch to
|
||||
`hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from
|
||||
`hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
```
|
||||
|
||||
##### Linux file system hierarchy standard for ROCm
|
||||
|
||||
ROCm packages have adopted the Linux foundation file system hierarchy standard in this release to ensure ROCm components follow open source conventions for Linux-based distributions. While moving to a new file system hierarchy, ROCm ensures backward compatibility with its 5.1 version or older file system hierarchy. See below for a detailed explanation of the new file system hierarchy and backward compatibility.
|
||||
ROCm packages have adopted the Linux foundation file system hierarchy standard in this release to
|
||||
ensure ROCm components follow open source conventions for Linux-based distributions. While
|
||||
moving to a new file system hierarchy, ROCm ensures backward compatibility with its 5.1 version or
|
||||
older file system hierarchy. See below for a detailed explanation of the new file system hierarchy and
|
||||
backward compatibility.
|
||||
|
||||
##### New file system hierarchy
|
||||
|
||||
@@ -47,23 +56,27 @@ The following is the new file system hierarchy:4
|
||||
|
||||
```
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major release.
|
||||
```note
|
||||
ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major
|
||||
release.
|
||||
```
|
||||
|
||||
For more information, refer to <https://refspecs.linuxfoundation.org/fhs.shtml>.
|
||||
|
||||
##### Backward compatibility with older file systems
|
||||
|
||||
ROCm has moved header files and libraries to its new location as indicated in the above structure and included symbolic-link and wrapper header files in its old location for backward compatibility.
|
||||
ROCm has moved header files and libraries to its new location as indicated in the above structure and
|
||||
included symbolic-link and wrapper header files in its old location for backward compatibility.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> ROCm will continue supporting backward compatibility until the next major release.
|
||||
```note
|
||||
ROCm will continue supporting backward compatibility until the next major release.
|
||||
```
|
||||
|
||||
##### Wrapper header files
|
||||
|
||||
Wrapper header files are placed in the old location (`/opt/rocm-xxx/<component>/include`) with a warning message to include files from the new location (`/opt/rocm-xxx/include`) as shown in the example below:
|
||||
Wrapper header files are placed in the old location (`/opt/rocm-xxx/<component>/include`) with a
|
||||
warning message to include files from the new location (`/opt/rocm-xxx/include`) as shown in the
|
||||
example below:
|
||||
|
||||
```cpp
|
||||
// Code snippet from hip_runtime.h
|
||||
@@ -80,7 +93,8 @@ The wrapper header files’ backward compatibility deprecation is as follows:
|
||||
|
||||
##### Library files
|
||||
|
||||
Library files are available in the `/opt/rocm-xxx/lib` folder. For backward compatibility, the old library location (`/opt/rocm-xxx/<component>/lib`) has a soft link to the library at the new location.
|
||||
Library files are available in the `/opt/rocm-xxx/lib` folder. For backward compatibility, the old library
|
||||
location (`/opt/rocm-xxx/<component>/lib`) has a soft link to the library at the new location.
|
||||
|
||||
Example:
|
||||
|
||||
@@ -93,7 +107,9 @@ lrwxrwxrwx 1 root root 24 May 10 23:32 libamdhip64.so -> ../../lib/libamdhip64
|
||||
|
||||
##### CMake config files
|
||||
|
||||
All CMake configuration files are available in the `/opt/rocm-xxx/lib/cmake/<component>` folder. For backward compatibility, the old CMake locations (`/opt/rocm-xxx/<component>/lib/cmake`) consist of a soft link to the new CMake config.
|
||||
All CMake configuration files are available in the `/opt/rocm-xxx/lib/cmake/<component>` folder. For
|
||||
backward compatibility, the old CMake locations (`/opt/rocm-xxx/<component>/lib/cmake`) consist of
|
||||
a soft link to the new CMake config.
|
||||
|
||||
Example:
|
||||
|
||||
@@ -103,7 +119,7 @@ total 0
|
||||
lrwxrwxrwx 1 root root 42 May 10 23:32 hip-config.cmake -> ../../../../lib/cmake/hip/hip-config.cmake
|
||||
```
|
||||
|
||||
### Fixed defects
|
||||
### Defect fixes
|
||||
|
||||
#### Compiler improvements
|
||||
|
||||
@@ -117,6 +133,8 @@ In ROCm v5.4.3, improvements to the compiler address errors with the following s
|
||||
|
||||
#### Compiler option error at runtime
|
||||
|
||||
Some users may encounter a “Cannot find Symbol” error at runtime when using `-save-temps`. While most `-save-temps` use cases work correctly, this error may appear occasionally.
|
||||
Some users may encounter a “Cannot find Symbol” error at runtime when using `-save-temps`. While
|
||||
most `-save-temps` use cases work correctly, this error may appear occasionally.
|
||||
|
||||
This issue is under investigation, and the known workaround is not to use `-save-temps` when the error appears.
|
||||
This issue is under investigation, and the known workaround is not to use `-save-temps` when the error
|
||||
appears.
|
||||
|
||||
@@ -15,12 +15,14 @@ Applications requiring to update the stack size can use hipDeviceSetLimit API.
|
||||
|
||||
The following hipcc changes are implemented in this release:
|
||||
|
||||
* `hipcc` will not implicitly link to `libpthread` and `librt`, as they are no longer a link time dependence for HIP programs. Applications that depend on these libraries must explicitly link to them.
|
||||
* `hipcc` will not implicitly link to `libpthread` and `librt`, as they are no longer a link time dependence
|
||||
for HIP programs. Applications that depend on these libraries must explicitly link to them.
|
||||
* `-use-staticlib` and `-use-sharedlib` options are deprecated.
|
||||
|
||||
##### Future changes
|
||||
|
||||
* Separation of `hipcc` binaries (Perl scripts) from HIP to `hipcc` project. Users will access separate `hipcc` package for installing `hipcc` binaries in future ROCm releases.
|
||||
* Separation of `hipcc` binaries (Perl scripts) from HIP to `hipcc` project. Users will access separate
|
||||
`hipcc` package for installing `hipcc` binaries in future ROCm releases.
|
||||
|
||||
* In a future ROCm release, the following samples will be removed from the `hip-tests` project.
|
||||
* `hipBusbandWidth` at <https://github.com/ROCm-Developer-Tools/hip-tests/tree/develop/samples/1_Utils/shipBusBandwidth>
|
||||
@@ -53,9 +55,9 @@ The following hipcc changes are implemented in this release:
|
||||
|
||||
##### New HIP APIs in this release
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> This is a pre-official version (beta) release of the new APIs and may contain unresolved issues.
|
||||
```note
|
||||
This is a pre-official version (beta) release of the new APIs and may contain unresolved issues.
|
||||
```
|
||||
|
||||
###### Memory management HIP APIs
|
||||
|
||||
@@ -71,21 +73,23 @@ The new memory management HIP API is as follows:
|
||||
|
||||
The new module management HIP APIs are as follows:
|
||||
|
||||
* Launches kernel $f$ with launch parameters and shared memory on stream with arguments passed to `kernelParams`, where thread blocks can cooperate and synchronize as they execute.
|
||||
* Launches kernel $f$ with launch parameters and shared memory on stream with arguments passed
|
||||
to `kernelParams`, where thread blocks can cooperate and synchronize as they run.
|
||||
|
||||
```cpp
|
||||
hipError_t hipModuleLaunchCooperativeKernel(hipFunction_t f, unsigned int gridDimX, unsigned int gridDimY, unsigned int gridDimZ, unsigned int blockDimX, unsigned int blockDimY, unsigned int blockDimZ, unsigned int sharedMemBytes, hipStream_t stream, void** kernelParams);
|
||||
```
|
||||
|
||||
* Launches kernels on multiple devices where thread blocks can cooperate and synchronize as they execute.
|
||||
* Launches kernels on multiple devices where thread blocks can cooperate and synchronize as they
|
||||
run.
|
||||
|
||||
```cpp
|
||||
hipError_t hipModuleLaunchCooperativeKernelMultiDevice(hipFunctionLaunchParams* launchParamsList, unsigned int numDevices, unsigned int flags);
|
||||
```
|
||||
|
||||
###### HIP Graph Management APIs
|
||||
###### HIP graph management APIs
|
||||
|
||||
The new HIP Graph Management APIs are as follows:
|
||||
The new HIP graph management APIs are as follows:
|
||||
|
||||
* Creates a memory allocation node and adds it to a graph \[BETA]
|
||||
|
||||
@@ -136,21 +140,27 @@ The new HIP Graph Management APIs are as follows:
|
||||
```
|
||||
|
||||
##### OpenMP enhancements
|
||||
|
||||
This release consists of the following OpenMP enhancements:
|
||||
|
||||
* Additional support for OMPT functions `get_device_time` and `get_record_type`.
|
||||
* Add support for min/max fast fp atomics on AMD GPUs.
|
||||
Fix the use of the abs function in C device regions.
|
||||
* Additional support for OMPT functions `get_device_time` and `get_record_type`
|
||||
* Added support for min/max fast fp atomics on AMD GPUs
|
||||
* Fixed the use of the abs function in C device regions
|
||||
|
||||
### Deprecations and warnings
|
||||
|
||||
#### HIP deprecation
|
||||
|
||||
The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, compiled binaries will be available as `hipcc.bin` and `hipconfig.bin` as replacements for the Perl scripts.
|
||||
The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, compiled binaries will be
|
||||
available as `hipcc.bin` and `hipconfig.bin` as replacements for the Perl scripts.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> There will be a transition period where the Perl scripts and compiled binaries are available before the scripts are removed. There will be no functional difference between the Perl scripts and their compiled binary counterpart. No user action is required. Once these are available, users can optionally switch to `hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from `hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
```note
|
||||
There will be a transition period where the Perl scripts and compiled binaries are available before the
|
||||
scripts are removed. There will be no functional difference between the Perl scripts and their compiled
|
||||
binary counterpart. No user action is required. Once these are available, users can optionally switch to
|
||||
`hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from
|
||||
`hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
```
|
||||
|
||||
##### Linux file system hierarchy standard for ROCm
|
||||
|
||||
@@ -190,15 +200,17 @@ The following is the new file system hierarchy:4
|
||||
|
||||
```
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major release.
|
||||
```note
|
||||
ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major
|
||||
release.
|
||||
```
|
||||
|
||||
For more information, refer to <https://refspecs.linuxfoundation.org/fhs.shtml>.
|
||||
|
||||
##### Backward compatibility with older file systems
|
||||
|
||||
ROCm has moved header files and libraries to its new location as indicated in the above structure and included symbolic-link and wrapper header files in its old location for backward compatibility.
|
||||
ROCm has moved header files and libraries to its new location as indicated in the above structure and
|
||||
included symbolic-link and wrapper header files in its old location for backward compatibility.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
@@ -206,7 +218,9 @@ ROCm has moved header files and libraries to its new location as indicated in th
|
||||
|
||||
##### Wrapper header files
|
||||
|
||||
Wrapper header files are placed in the old location (`/opt/rocm-xxx/<component>/include`) with a warning message to include files from the new location (`/opt/rocm-xxx/include`) as shown in the example below:
|
||||
Wrapper header files are placed in the old location (`/opt/rocm-xxx/<component>/include`) with a
|
||||
warning message to include files from the new location (`/opt/rocm-xxx/include`) as shown in the
|
||||
example below:
|
||||
|
||||
```cpp
|
||||
// Code snippet from hip_runtime.h
|
||||
@@ -223,7 +237,8 @@ The wrapper header files’ backward compatibility deprecation is as follows:
|
||||
|
||||
##### Library files
|
||||
|
||||
Library files are available in the `/opt/rocm-xxx/lib` folder. For backward compatibility, the old library location (`/opt/rocm-xxx/<component>/lib`) has a soft link to the library at the new location.
|
||||
Library files are available in the `/opt/rocm-xxx/lib` folder. For backward compatibility, the old library
|
||||
location (`/opt/rocm-xxx/<component>/lib`) has a soft link to the library at the new location.
|
||||
|
||||
Example:
|
||||
|
||||
@@ -237,7 +252,8 @@ lrwxrwxrwx 1 root root 24 May 10 23:32 libamdhip64.so -> ../../lib/libamdhip64
|
||||
##### CMake config files
|
||||
|
||||
All CMake configuration files are available in the `/opt/rocm-xxx/lib/cmake/<component>` folder.
|
||||
For backward compatibility, the old CMake locations (`/opt/rocm-xxx/<component>/lib/cmake`) consist of a soft link to the new CMake config.
|
||||
For backward compatibility, the old CMake locations (`/opt/rocm-xxx/<component>/lib/cmake`)
|
||||
consist of a soft link to the new CMake config.
|
||||
|
||||
Example:
|
||||
|
||||
@@ -253,7 +269,8 @@ Support for Code Object v3 is deprecated and will be removed in a future release
|
||||
|
||||
#### Comgr V3.0 changes
|
||||
|
||||
The following APIs and macros have been marked as deprecated. These are expected to be removed in a future ROCm release and coincides with the release of Comgr v3.0.
|
||||
The following APIs and macros have been marked as deprecated. These are expected to be removed in
|
||||
a future ROCm release and coincides with the release of Comgr v3.0.
|
||||
|
||||
##### API changes
|
||||
|
||||
@@ -265,7 +282,8 @@ The following APIs and macros have been marked as deprecated. These are expected
|
||||
* `AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES`
|
||||
* `AMD_COMGR_ACTION_COMPILE_SOURCE_TO_FATBIN`
|
||||
|
||||
For replacements, see the `AMD_COMGR_ACTION_INFO_GET`/`SET_OPTION_LIST APIs`, and the `AMD_COMGR_ACTION_COMPILE_SOURCE_(WITH_DEVICE_LIBS)_TO_BC` macros.
|
||||
For replacements, see the `AMD_COMGR_ACTION_INFO_GET`/`SET_OPTION_LIST APIs`, and the
|
||||
`AMD_COMGR_ACTION_COMPILE_SOURCE_(WITH_DEVICE_LIBS)_TO_BC` macros.
|
||||
|
||||
#### Deprecated environment variables
|
||||
|
||||
|
||||
@@ -21,4 +21,5 @@ The following HIP API is updated in the ROCm 5.5.1 release:
|
||||
|
||||
##### `hipDeviceSetCacheConfig`
|
||||
|
||||
* The return value for `hipDeviceSetCacheConfig` is updated from `hipErrorNotSupported` to `hipSuccess`
|
||||
* The return value for `hipDeviceSetCacheConfig` is updated from `hipErrorNotSupported` to
|
||||
`hipSuccess`
|
||||
|
||||
@@ -3,27 +3,37 @@
|
||||
<!-- markdownlint-disable header-increment -->
|
||||
### Release highlights
|
||||
|
||||
ROCm 5.6 consists of several AI software ecosystem improvements to our fast-growing user base. A few examples include:
|
||||
ROCm 5.6 consists of several AI software ecosystem improvements to our fast-growing user base. A
|
||||
few examples include:
|
||||
|
||||
* New documentation portal at https://rocm.docs.amd.com
|
||||
* Ongoing software enhancements for LLMs, ensuring full compliance with the HuggingFace unit test suite
|
||||
* Ongoing software enhancements for LLMs, ensuring full compliance with the HuggingFace unit test
|
||||
suite
|
||||
* OpenAI Triton, CuPy, HIP Graph support, and many other library performance enhancements
|
||||
* Improved ROCm deployment and development tools, including CPU-GPU (rocGDB) debugger, profiler, and docker containers
|
||||
* New pseudorandom generators are available in rocRAND. Added support for half-precision transforms in hipFFT/rocFFT. Added LU refactorization and linear system solver for sparse matrices in rocSOLVER.
|
||||
* Improved ROCm deployment and development tools, including CPU-GPU (rocGDB) debugger,
|
||||
profiler, and docker containers
|
||||
* New pseudorandom generators are available in rocRAND. Added support for half-precision
|
||||
transforms in hipFFT/rocFFT. Added LU refactorization and linear system solver for sparse matrices in
|
||||
rocSOLVER.
|
||||
|
||||
### OS and GPU support changes
|
||||
|
||||
* SLES15 SP5 support was added this release. SLES15 SP3 support was dropped.
|
||||
* AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively referred to as gfx906 GPUs) will be entering the maintenance mode starting Q3 2023. This will be aligned with ROCm 5.7 GA release date.
|
||||
* No new features and performance optimizations will be supported for the gfx906 GPUs beyond ROCm 5.7
|
||||
* Bug fixes / critical security patches will continue to be supported for the gfx906 GPUs till Q2 2024 (End of Maintenance \[EOM])(will be aligned with the closest ROCm release)
|
||||
* AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively referred to as gfx906 GPUs)
|
||||
will be entering the maintenance mode starting Q3 2023. This will be aligned with ROCm 5.7 GA
|
||||
release date.
|
||||
* No new features and performance optimizations will be supported for the gfx906 GPUs beyond
|
||||
ROCm 5.7
|
||||
* Bug fixes / critical security patches will continue to be supported for the gfx906 GPUs till Q2 2024
|
||||
(EOM will be aligned with the closest ROCm release)
|
||||
* Bug fixes during the maintenance will be made to the next ROCm point release
|
||||
* Bug fixes will not be back ported to older ROCm releases for this SKU
|
||||
* Distro / Operating system updates will continue per the ROCm release cadence for gfx906 GPUs till EOM.
|
||||
* Distro / Operating system updates will continue per the ROCm release cadence for gfx906 GPUs till
|
||||
EOM.
|
||||
|
||||
### AMDSMI CLI 23.0.0.4
|
||||
|
||||
#### Added
|
||||
#### Additions
|
||||
|
||||
* AMDSMI CLI tool enabled for Linux Bare Metal & Guest
|
||||
|
||||
@@ -39,7 +49,8 @@ ROCm 5.6 consists of several AI software ecosystem improvements to our fast-grow
|
||||
|
||||
#### Fixes
|
||||
|
||||
* Stability fix for multi GPU system reproducible via ROCm_Bandwidth_Test as reported in [Issue 2198](https://github.com/RadeonOpenCompute/ROCm/issues/2198).
|
||||
* Stability fix for multi GPU system reproducible via ROCm_Bandwidth_Test as reported in
|
||||
[Issue 2198](https://github.com/RadeonOpenCompute/ROCm/issues/2198).
|
||||
|
||||
### HIP 5.6 (for ROCm 5.6)
|
||||
|
||||
@@ -48,7 +59,7 @@ ROCm 5.6 consists of several AI software ecosystem improvements to our fast-grow
|
||||
* Consolidation of hipamd, rocclr and OpenCL projects in clr
|
||||
* Optimized lock for graph global capture mode
|
||||
|
||||
#### Added
|
||||
#### Additions
|
||||
|
||||
* Added hipRTC support for amd_hip_fp16
|
||||
* Added hipStreamGetDevice implementation to get the device associated with the stream
|
||||
@@ -57,14 +68,14 @@ ROCm 5.6 consists of several AI software ecosystem improvements to our fast-grow
|
||||
* hipArrayGetDescriptor for getting 1D or 2D array descriptor
|
||||
* hipArray3DGetDescriptor to get 3D array descriptor
|
||||
|
||||
#### Changed
|
||||
#### Changes
|
||||
|
||||
* hipMallocAsync to return success for zero size allocation to match hipMalloc
|
||||
* Separation of hipcc perl binaries from HIP project to hipcc project. hip-devel package depends on newly added hipcc package
|
||||
* Consolidation of hipamd, ROCclr, and OpenCL repositories into a single repository called clr. Instructions are updated to build HIP from sources in the HIP Installation guide
|
||||
* Removed hipBusBandwidth and hipCommander samples from hip-tests
|
||||
|
||||
#### Fixed
|
||||
#### Fixes
|
||||
|
||||
* Fixed regression in hipMemCpyParam3D when offset is applied
|
||||
|
||||
@@ -98,11 +109,11 @@ ROCm 5.6 consists of several AI software ecosystem improvements to our fast-grow
|
||||
|
||||
### ROCgdb-13 (For ROCm 5.6.0)
|
||||
|
||||
#### Optimized
|
||||
#### Optimizations
|
||||
|
||||
* Improved performances when handling the end of a process with a large number of threads.
|
||||
|
||||
Known Issues
|
||||
#### Known issues
|
||||
|
||||
* On certain configurations, ROCgdb can show the following warning message:
|
||||
|
||||
@@ -176,15 +187,15 @@ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64-v2
|
||||
The resulting `a.out` will depend on
|
||||
`/opt/rocm-5.6.0/lib/librocprofiler64.so.2`.
|
||||
|
||||
#### Optimized
|
||||
#### Optimizations
|
||||
|
||||
* Improved Test Suite
|
||||
|
||||
#### Added
|
||||
#### Additions
|
||||
|
||||
* 'end_time' need to be disabled in roctx_trace.txt
|
||||
|
||||
#### Fixed
|
||||
#### Fixes
|
||||
|
||||
* rocprof in ROcm/5.4.0 gpu selector broken.
|
||||
* rocprof in ROCm/5.4.1 fails to generate kernel info.
|
||||
|
||||
@@ -7,9 +7,9 @@ ROCm 5.6.1 is a point release with several bug fixes in the HIP runtime.
|
||||
|
||||
#### HIP 5.6.1 (for ROCm 5.6.1)
|
||||
|
||||
### Fixed defects
|
||||
### Defect fixes
|
||||
|
||||
* *hipMemcpy* device-to-device (inter-device) is now asynchronous with respect to the host
|
||||
* `hipMemcpy` device-to-device (inter-device) is now asynchronous with respect to the host
|
||||
* Enabled xnack+ check in HIP catch2 tests hang when executing tests
|
||||
* Memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs
|
||||
* Using *hipGraphAddMemFreeNode* no longer results in a crash
|
||||
* Using `hipGraphAddMemFreeNode` no longer results in a crash
|
||||
|
||||
@@ -3,27 +3,45 @@
|
||||
|
||||
### Release highlights for ROCm 5.7
|
||||
|
||||
ROCm 5.7.0 includes many new features. These include: a new library (hipTensor), and optimizations for rocRAND and MIVisionX. Address sanitizer for host and device code (GPU) is now available as a beta. Note that ROCm 5.7.0 is EOS for MI50. 5.7 versions of ROCm are the last major release in the ROCm 5 series. This release is Linux-only.
|
||||
New features include:
|
||||
|
||||
Important: The next major ROCm release (ROCm 6.0) will not be backward compatible with the ROCm 5 series. Changes will include: splitting LLVM packages into more manageable sizes, changes to the HIP runtime API, splitting rocRAND and hipRAND into separate packages, and reorganizing our file structure.
|
||||
* A new library (hipTensor)
|
||||
* Optimizations for rocRAND and MIVisionX
|
||||
* AddressSanitizer for host and device code (GPU) is now available as a beta
|
||||
|
||||
Note that ROCm 5.7.0 is EOS for MI50. 5.7 versions of ROCm are the last major releases in the ROCm 5
|
||||
series. This release is Linux-only.
|
||||
|
||||
```important
|
||||
The next major ROCm release (ROCm 6.0) will not be backward compatible with the ROCm 5 series.
|
||||
Changes will include: splitting LLVM packages into more manageable sizes, changes to the HIP runtime
|
||||
API, splitting rocRAND and hipRAND into separate packages, and reorganizing our file structure.
|
||||
```
|
||||
|
||||
#### AMD Instinct™ MI50 end-of-support notice
|
||||
|
||||
AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively gfx906 GPUs) will enter maintenance mode starting Q3 2023.
|
||||
AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively gfx906 GPUs) will enter
|
||||
maintenance mode starting Q3 2023.
|
||||
|
||||
As outlined in [5.6.0](https://rocm.docs.amd.com/en/docs-5.6.0/release.html), ROCm 5.7 will be the final release for gfx906 GPUs to be in a fully supported state.
|
||||
As outlined in [5.6.0](https://rocm.docs.amd.com/en/docs-5.6.0/release.html), ROCm 5.7 will be the
|
||||
final release for gfx906 GPUs to be in a fully supported state.
|
||||
|
||||
* ROCm 6.0 release will show MI50s as "under maintenance" for [Linux](../about/compatibility/linux-support.md) and [Windows](../about/compatibility/windows-support.md)
|
||||
* ROCm 6.0 release will show MI50s as "under maintenance" for
|
||||
[Linux](../about/compatibility/linux-support.md) and
|
||||
[Windows](../about/compatibility/windows-support.md)
|
||||
|
||||
* No new features and performance optimizations will be supported for the gfx906 GPUs beyond this major release (ROCm 5.7).
|
||||
* No new features and performance optimizations will be supported for the gfx906 GPUs beyond this
|
||||
major release (ROCm 5.7).
|
||||
|
||||
* Bug fixes and critical security patches will continue to be supported for the gfx906 GPUs until Q2 2024 (EOM (End of Maintenance) will be aligned with the closest ROCm release).
|
||||
* Bug fixes and critical security patches will continue to be supported for the gfx906 GPUs until Q2
|
||||
2024 (end of maintenance \[EOM] will be aligned with the closest ROCm release).
|
||||
|
||||
* Bug fixes during the maintenance will be made to the next ROCm point release.
|
||||
|
||||
* Bug fixes will not be backported to older ROCm releases for gfx906.
|
||||
|
||||
* Distribution and operating system updates will continue per the ROCm release cadence for gfx906 GPUs until EOM.
|
||||
* Distribution and operating system updates will continue per the ROCm release cadence for gfx906
|
||||
GPUs until EOM.
|
||||
|
||||
#### Feature updates
|
||||
|
||||
@@ -31,40 +49,60 @@ As outlined in [5.6.0](https://rocm.docs.amd.com/en/docs-5.6.0/release.html), RO
|
||||
|
||||
**Current behavior**
|
||||
|
||||
The current version of HIP printf relies on hostcalls, which, in turn, rely on PCIe atomics. However, PCle atomics are unavailable in some environments, and, as a result, HIP-printf does not work in those environments. Users may see the following error from runtime (with AMD_LOG_LEVEL 1 and above):
|
||||
The current version of HIP printf relies on hostcalls, which, in turn, rely on PCIe atomics. However, PCle
|
||||
atomics are unavailable in some environments, and, as a result, HIP-printf does not work in those
|
||||
environments. Users may see the following error from runtime (with AMD_LOG_LEVEL 1 and above):
|
||||
|
||||
```
|
||||
```shell
|
||||
Pcie atomics not enabled, hostcall not supported
|
||||
```
|
||||
|
||||
**Workaround**
|
||||
|
||||
The ROCm 5.7 release introduces an alternative to the current hostcall-based implementation that leverages an older OpenCL-based printf scheme, which does not rely on hostcalls/PCIe atomics.
|
||||
The ROCm 5.7 release introduces an alternative to the current hostcall-based implementation that
|
||||
leverages an older OpenCL-based printf scheme, which does not rely on hostcalls/PCIe atomics.
|
||||
|
||||
Note: This option is less robust than hostcall-based implementation and is intended to be a workaround when hostcalls do not work.
|
||||
Note: This option is less robust than hostcall-based implementation and is intended to be a
|
||||
workaround when hostcalls do not work.
|
||||
|
||||
The printf variant is now controlled via a new compiler option -mprintf-kind=<value>. This is supported only for HIP programs and takes the following values,
|
||||
The printf variant is now controlled via a new compiler option -mprintf-kind=<value>. This is
|
||||
supported only for HIP programs and takes the following values,
|
||||
|
||||
* “hostcall” – This currently available implementation relies on hostcalls, which require the system to support PCIe atomics. It is the default scheme.
|
||||
* “hostcall” – This currently available implementation relies on hostcalls, which require the system to
|
||||
support PCIe atomics. It is the default scheme.
|
||||
|
||||
* “buffered” – This implementation leverages the older printf scheme used by OpenCL; it relies on a memory buffer where printf arguments are stored during the kernel execution, and then the runtime handles the actual printing once the kernel finishes execution.
|
||||
* “buffered” – This implementation leverages the older printf scheme used by OpenCL; it relies on a
|
||||
memory buffer where printf arguments are stored during the kernel execution, and then the runtime
|
||||
handles the actual printing once the kernel finishes execution.
|
||||
|
||||
**NOTE**: With the new workaround:
|
||||
|
||||
* The printf buffer is fixed size and non-circular. After the buffer is filled, calls to printf will not result in additional output.
|
||||
* The printf buffer is fixed size and non-circular. After the buffer is filled, calls to printf will not result in
|
||||
additional output.
|
||||
|
||||
* The printf call returns either 0 (on success) or -1 (on failure, due to full buffer), unlike the hostcall scheme that returns the number of characters printed.
|
||||
* The printf call returns either 0 (on success) or -1 (on failure, due to full buffer), unlike the hostcall
|
||||
scheme that returns the number of characters printed.
|
||||
|
||||
##### Beta release of LLVM AddressSanitizer (ASan) with the GPU
|
||||
|
||||
The ROCm 5.7 release introduces the beta release of LLVM AddressSanitizer (ASan) with the GPU. The LLVM ASan provides a process that allows developers to detect runtime addressing errors in applications and libraries. The detection is achieved using a combination of compiler-added instrumentation and runtime techniques, including function interception and replacement.
|
||||
The ROCm 5.7 release introduces the beta release of LLVM AddressSanitizer (ASan) with the GPU. The
|
||||
LLVM ASan provides a process that allows developers to detect runtime addressing errors in
|
||||
applications and libraries. The detection is achieved using a combination of compiler-added
|
||||
instrumentation and runtime techniques, including function interception and replacement.
|
||||
|
||||
Until now, the LLVM ASan process was only available for traditional purely CPU applications. However, ROCm has extended this mechanism to additionally allow the detection of some addressing errors on the GPU in heterogeneous applications. Ideally, developers should treat heterogeneous HIP and OpenMP applications like pure CPU applications. However, this simplicity has not been achieved yet.
|
||||
Until now, the LLVM ASan process was only available for traditional purely CPU applications. However,
|
||||
ROCm has extended this mechanism to additionally allow the detection of some addressing errors on
|
||||
the GPU in heterogeneous applications. Ideally, developers should treat heterogeneous HIP and
|
||||
OpenMP applications like pure CPU applications. However, this simplicity has not been achieved yet.
|
||||
|
||||
Refer to the documentation on LLVM ASan with the GPU at [LLVM AddressSanitizer User Guide](../conceptual/using_gpu_sanitizer.md).
|
||||
Refer to the documentation on LLVM ASan with the GPU at
|
||||
[LLVM AddressSanitizer User Guide](../conceptual/using_gpu_sanitizer.md).
|
||||
|
||||
**Note**: The beta release of LLVM ASan for ROCm is currently tested and validated on Ubuntu 20.04.
|
||||
```note
|
||||
The beta release of LLVM ASan for ROCm is currently tested and validated on Ubuntu 20.04.
|
||||
```
|
||||
|
||||
#### Fixed defects
|
||||
#### Defect fixes
|
||||
|
||||
The following defects are fixed in ROCm v5.7:
|
||||
|
||||
@@ -80,7 +118,7 @@ The following defects are fixed in ROCm v5.7:
|
||||
|
||||
##### Optimizations
|
||||
|
||||
##### Added
|
||||
##### Additions
|
||||
|
||||
* Added `meta_group_size`/`rank` for getting the number of tiles and rank of a tile in the partition
|
||||
|
||||
@@ -98,14 +136,16 @@ The following defects are fixed in ROCm v5.7:
|
||||
|
||||
* `hipMipmappedArrayGetLevel` for getting a mipmapped array on a mipmapped level
|
||||
|
||||
##### Changed
|
||||
##### Changes
|
||||
|
||||
##### Fixed
|
||||
##### Fixes
|
||||
|
||||
##### Known issues
|
||||
|
||||
* HIP memory type enum values currently don't support equivalent value to `cudaMemoryTypeUnregistered`, due to HIP functionality backward compatibility.
|
||||
* HIP API `hipPointerGetAttributes` could return invalid value in case the input memory pointer was not allocated through any HIP API on device or host.
|
||||
* HIP memory type enum values currently don't support equivalent value to
|
||||
`cudaMemoryTypeUnregistered`, due to HIP functionality backward compatibility.
|
||||
* HIP API `hipPointerGetAttributes` could return invalid value in case the input memory pointer was not
|
||||
allocated through any HIP API on device or host.
|
||||
|
||||
##### Upcoming changes for HIP in ROCm 6.0 release
|
||||
|
||||
@@ -139,16 +179,17 @@ The following defects are fixed in ROCm v5.7:
|
||||
|
||||
* Removal of deprecated code -hip-hcc codes from hip code tree
|
||||
|
||||
* Correct hipArray usage in HIP APIs such as hipMemcpyAtoH and hipMemcpyHtoA
|
||||
* Correct hipArray usage in HIP APIs such as `hipMemcpyAtoH` and `hipMemcpyHtoA`
|
||||
|
||||
* HIPMEMCPY_3D fields correction to avoid truncation of "size_t" to "unsigned int" inside hipMemcpy3D()
|
||||
* HIPMEMCPY_3D fields correction to avoid truncation of "size_t" to "unsigned int" inside
|
||||
`hipMemcpy3D()`
|
||||
|
||||
* Renaming of 'memoryType' in hipPointerAttribute_t structure to 'type'
|
||||
* Renaming of 'memoryType' in `hipPointerAttribute_t` structure to 'type'
|
||||
|
||||
* Correct hipGetLastError to return the last error instead of last API call's return code
|
||||
* Correct `hipGetLastError` to return the last error instead of last API call's return code
|
||||
|
||||
* Update hipExternalSemaphoreHandleDesc to add "unsigned int reserved[16]"
|
||||
* Update `hipExternalSemaphoreHandleDesc` to add "unsigned int reserved[16]"
|
||||
|
||||
* Correct handling of flag values in hipIpcOpenMemHandle for hipIpcMemLazyEnablePeerAccess
|
||||
* Correct handling of flag values in `hipIpcOpenMemHandle` for `hipIpcMemLazyEnablePeerAccess`
|
||||
|
||||
* Remove hiparray* and make it opaque with hipArray_t
|
||||
* Remove `hiparray*` and make it opaque with `hipArray_t`
|
||||
|
||||
@@ -5,32 +5,50 @@
|
||||
|
||||
ROCm 5.7.1 is a point release with several bug fixes in the HIP runtime.
|
||||
|
||||
#### Installing all GPU Address sanitizer packages with a single command
|
||||
#### Installing all GPU AddressSanitizer packages with a single command
|
||||
|
||||
ROCm 5.7.1 simplifies the installation steps for the optional Address Sanitizer (ASan) packages. This release provides the meta package *rocm-ml-sdk-asan* for ease of ASan installation. The following command can be used to install all ASan packages rather than installing each package separately,
|
||||
ROCm 5.7.1 simplifies the installation steps for the optional AddressSanitizer (ASan) packages. This
|
||||
release provides the meta package *rocm-ml-sdk-asan* for ease of ASan installation. The following
|
||||
command can be used to install all ASan packages rather than installing each package separately,
|
||||
|
||||
sudo apt-get install rocm-ml-sdk-asan
|
||||
|
||||
For more detailed information about using the GPU AddressSanitizer, refer to the [user guide](https://rocm.docs.amd.com/en/docs-5.7.1/understand/using_gpu_sanitizer.html)
|
||||
For more detailed information about using the GPU AddressSanitizer, refer to the
|
||||
[user guide](https://rocm.docs.amd.com/en/docs-5.7.1/understand/using_gpu_sanitizer.html)
|
||||
|
||||
### ROCm Libraries
|
||||
### ROCm libraries
|
||||
|
||||
#### rocBLAS
|
||||
A new functionality rocblas-gemm-tune and an environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH are added to rocBLAS in the ROCm 5.7.1 release.
|
||||
A new functionality rocblas-gemm-tune and an environment variable
|
||||
ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH are added to rocBLAS in the ROCm 5.7.1 release.
|
||||
|
||||
*rocblas-gemm-tune* is used to find the best-performing GEMM kernel for each GEMM problem set. It has a command line interface, which mimics the --yaml input used by rocblas-bench. To generate the expected --yaml input, profile logging can be used, by setting the environment variable ROCBLAS_LAYER4.
|
||||
`rocblas-gemm-tune` is used to find the best-performing GEMM kernel for each GEMM problem set. It
|
||||
has a command line interface, which mimics the --yaml input used by rocblas-bench. To generate the
|
||||
expected --yaml input, profile logging can be used, by setting the environment variable
|
||||
ROCBLAS_LAYER4.
|
||||
|
||||
For more information on rocBLAS logging, see Logging in rocBLAS, in the [API Reference Guide](https://rocm.docs.amd.com/projects/rocBLAS/en/docs-5.7.1/API_Reference_Guide.html#logging-in-rocblas).
|
||||
For more information on rocBLAS logging, see Logging in rocBLAS, in the
|
||||
[API Reference Guide](https://rocm.docs.amd.com/projects/rocBLAS/en/docs-5.7.1/API_Reference_Guide.html#logging-in-rocblas).
|
||||
|
||||
An example input file: Expected output (note selected GEMM idx may differ): Where the far right values (solution_index) are the indices of the best-performing kernels for those GEMMs in the rocBLAS kernel library. These indices can be directly used in future GEMM calls. See rocBLAS/samples/example_user_driven_tuning.cpp for sample code of directly using kernels via their indices.
|
||||
An example input file: Expected output (note selected GEMM idx may differ): Where the far right values
|
||||
(solution_index) are the indices of the best-performing kernels for those GEMMs in the rocBLAS kernel
|
||||
library. These indices can be directly used in future GEMM calls. See
|
||||
` rocBLAS/samples/example_user_driven_tuning.cpp` for sample code of directly using kernels via their
|
||||
indices.
|
||||
|
||||
If the output is stored in a file, the results can be used to override default kernel selection with the kernels found, by setting the environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH, where points to the stored file.
|
||||
If the output is stored in a file, the results can be used to override default kernel selection with the
|
||||
kernels found by setting the environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH, which
|
||||
points to the stored file.
|
||||
|
||||
For more details, refer to the [rocBLAS Programmer's Guide.](https://rocm.docs.amd.com/projects/rocBLAS/en/latest/Programmers_Guide.html#rocblas-gemm-tune)
|
||||
For more details, refer to the
|
||||
[rocBLAS Programmer's Guide](https://rocm.docs.amd.com/projects/rocBLAS/en/latest/Programmers_Guide.html#rocblas-gemm-tune).
|
||||
|
||||
#### HIP 5.7.1 (for ROCm 5.7.1)
|
||||
|
||||
ROCm 5.7.1 is a point release with several bug fixes in the HIP runtime.
|
||||
|
||||
### Fixed defects
|
||||
The *hipPointerGetAttributes* API returns the correct HIP memory type as *hipMemoryTypeManaged* for managed memory.
|
||||
### Defect fixes
|
||||
|
||||
The `hipPointerGetAttributes` API returns the correct HIP memory type as `hipMemoryTypeManaged`
|
||||
for managed memory.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user