19 KiB
Release Notes
The release notes for the ROCm platform.
ROCm 5.2.0
What's New in This Release
HIP Enhancements
The ROCm v5.2 release consists of the following HIP enhancements:
HIP Installation Guide Updates
The HIP Installation Guide is updated to include building HIP tests from source on the AMD and NVIDIA platforms.
For more details, refer to the HIP Installation Guide v5.2.
Support for device-side malloc on HIP-Clang
HIP-Clang now supports device-side malloc. This implementation does not require the use of hipDeviceSetLimit(hipLimitMallocHeapSize,value) nor respect any setting. The heap is fully dynamic and can grow until the available free memory on the device is consumed.
The test codes at the following link show how to implement applications using malloc and free functions in device kernels:
https://github.com/ROCm-Developer-Tools/HIP/blob/develop/tests/src/deviceLib/hipDeviceMalloc.cpp
New HIP APIs in This Release
The following new HIP APIs are available in the ROCm v5.2 release. Note that this is a pre-official version (beta) release of the new APIs:
Device management HIP APIs
The new device management HIP APIs are as follows:
-
Gets a UUID for the device. This API returns a UUID for the device.
hipError_t hipDeviceGetUuid(hipUUID* uuid, hipDevice_t device);Note
This new API corresponds to the following CUDA API:
CUresult cuDeviceGetUuid(CUuuid* uuid, CUdevice dev); -
Gets default memory pool of the specified device
hipError_t hipDeviceGetDefaultMemPool(hipMemPool_t* mem_pool, int device); -
Sets the current memory pool of a device
hipError_t hipDeviceSetMemPool(int device, hipMemPool_t mem_pool); -
Gets the current memory pool for the specified device
hipError_t hipDeviceGetMemPool(hipMemPool_t* mem_pool, int device);
New HIP Runtime APIs in Memory Management
The new Stream Ordered Memory Allocator functions of HIP runtime APIs in memory management are as follows:
-
Allocates memory with stream ordered semantics
hipError_t hipMallocAsync(void** dev_ptr, size_t size, hipStream_t stream); -
Frees memory with stream ordered semantics
hipError_t hipFreeAsync(void* dev_ptr, hipStream_t stream); -
Releases freed memory back to the OS
hipError_t hipMemPoolTrimTo(hipMemPool_t mem_pool, size_t min_bytes_to_hold); -
Sets attributes of a memory pool
hipError_t hipMemPoolSetAttribute(hipMemPool_t mem_pool, hipMemPoolAttr attr, void* value); -
Gets attributes of a memory pool
hipError_t hipMemPoolGetAttribute(hipMemPool_t mem_pool, hipMemPoolAttr attr, void* value); -
Controls visibility of the specified pool between devices
hipError_t hipMemPoolSetAccess(hipMemPool_t mem_pool, const hipMemAccessDesc* desc_list, size_t count); -
Returns the accessibility of a pool from a device
hipError_t hipMemPoolGetAccess(hipMemAccessFlags* flags, hipMemPool_t mem_pool, hipMemLocation* location); -
Creates a memory pool
hipError_t hipMemPoolCreate(hipMemPool_t* mem_pool, const hipMemPoolProps* pool_props); -
Destroys the specified memory pool
hipError_t hipMemPoolDestroy(hipMemPool_t mem_pool); -
Allocates memory from a specified pool with stream ordered semantics
hipError_t hipMallocFromPoolAsync(void** dev_ptr, size_t size, hipMemPool_t mem_pool, hipStream_t stream); -
Exports a memory pool to the requested handle type
hipError_t hipMemPoolExportToShareableHandle( void* shared_handle, hipMemPool_t mem_pool, hipMemAllocationHandleType handle_type, unsigned int flags); -
Imports a memory pool from a shared handle
hipError_t hipMemPoolImportFromShareableHandle( hipMemPool_t* mem_pool, void* shared_handle, hipMemAllocationHandleType handle_type, unsigned int flags); -
Exports data to share a memory pool allocation between processes
hipError_t hipMemPoolExportPointer(hipMemPoolPtrExportData* export_data, void* dev_ptr); Import a memory pool allocation from another process.t hipError_t hipMemPoolImportPointer( void** dev_ptr, hipMemPool_t mem_pool, hipMemPoolPtrExportData* export_data);
HIP Graph Management APIs
The new HIP Graph Management APIs are as follows:
-
Enqueues a host function call in a stream
hipError_t hipLaunchHostFunc(hipStream_t stream, hipHostFn_t fn, void* userData); -
Swaps the stream capture mode of a thread
hipError_t hipThreadExchangeStreamCaptureMode(hipStreamCaptureMode* mode); -
Sets a node attribute
hipError_t hipGraphKernelNodeSetAttribute(hipGraphNode_t hNode, hipKernelNodeAttrID attr, const hipKernelNodeAttrValue* value); -
Gets a node attribute
hipError_t hipGraphKernelNodeGetAttribute(hipGraphNode_t hNode, hipKernelNodeAttrID attr, hipKernelNodeAttrValue* value);
Support for Virtual Memory Management APIs
The new APIs for virtual memory management are as follows:
-
Frees an address range reservation made via hipMemAddressReserve
hipError_t hipMemAddressFree(void* devPtr, size_t size); -
Reserves an address range
hipError_t hipMemAddressReserve(void** ptr, size_t size, size_t alignment, void* addr, unsigned long long flags); -
Creates a memory allocation described by the properties and size
hipError_t hipMemCreate(hipMemGenericAllocationHandle_t* handle, size_t size, const hipMemAllocationProp* prop, unsigned long long flags); -
Exports an allocation to a requested shareable handle type
hipError_t hipMemExportToShareableHandle(void* shareableHandle, hipMemGenericAllocationHandle_t handle, hipMemAllocationHandleType handleType, unsigned long long flags); -
Gets the access flags set for the given location and ptr
hipError_t hipMemGetAccess(unsigned long long* flags, const hipMemLocation* location, void* ptr); -
Calculates either the minimal or recommended granularity
hipError_t hipMemGetAllocationGranularity(size_t* granularity, const hipMemAllocationProp* prop, hipMemAllocationGranularity_flags option); -
Retrieves the property structure of the given handle
hipError_t hipMemGetAllocationPropertiesFromHandle(hipMemAllocationProp* prop, hipMemGenericAllocationHandle_t handle); -
Imports an allocation from a requested shareable handle type
hipError_t hipMemImportFromShareableHandle(hipMemGenericAllocationHandle_t* handle, void* osHandle, hipMemAllocationHandleType shHandleType); -
Maps an allocation handle to a reserved virtual address range
hipError_t hipMemMap(void* ptr, size_t size, size_t offset, hipMemGenericAllocationHandle_t handle, unsigned long long flags); -
Maps or unmaps subregions of sparse HIP arrays and sparse HIP mipmapped arrays
hipError_t hipMemMapArrayAsync(hipArrayMapInfo* mapInfoList, unsigned int count, hipStream_t stream); -
Release a memory handle representing a memory allocation, that was previously allocated through hipMemCreate
hipError_t hipMemRelease(hipMemGenericAllocationHandle_t handle); -
Returns the allocation handle of the backing memory allocation given the address
hipError_t hipMemRetainAllocationHandle(hipMemGenericAllocationHandle_t* handle, void* addr); -
Sets the access flags for each location specified in desc for the given virtual address range
hipError_t hipMemSetAccess(void* ptr, size_t size, const hipMemAccessDesc* desc, size_t count); -
Unmaps memory allocation of a given address range
hipError_t hipMemUnmap(void* ptr, size_t size);
For more information, refer to the HIP API documentation at
{doc}hip:.doxygen/docBin/html/modules.
Planned HIP Changes in Future Releases
Changes to hipDeviceProp_t, HIPMEMCPY_3D, and hipArray structures (and related HIP APIs) are planned in the next major release. These changes may impact backward compatibility.
Refer to the Release Notes document in subsequent releases for more information. ROCm Math and Communication Libraries
In this release, ROCm Math and Communication Libraries consist of the following enhancements and fixes: New rocWMMA for Matrix Multiplication and Accumulation Operations Acceleration
This release introduces a new ROCm C++ library for accelerating mixed precision matrix multiplication and accumulation (MFMA) operations leveraging specialized GPU matrix cores. rocWMMA provides a C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using them in block-wise operations that are distributed in parallel across GPU wavefronts. The API is a header library of GPU device code, meaning matrix core acceleration may be compiled directly into your kernel device code. This can benefit from compiler optimization in the generation of kernel assembly and does not incur additional overhead costs of linking to external runtime libraries or having to launch separate kernels.
rocWMMA is released as a header library and includes test and sample projects to validate and illustrate example usages of the C++ API. GEMM matrix multiplication is used as primary validation given the heavy precedent for the library. However, the usage portfolio is growing significantly and demonstrates different ways rocWMMA may be consumed.
For more information, refer to Communication Libraries.
OpenMP Enhancements in This Release
OMPT Target Support
The OpenMP runtime in ROCm implements a subset of the OMPT device APIs, as described in the OpenMP specification document. These are APIs that allow first-party tools to examine the profile and traces for kernels that execute on a device. A tool may register callbacks for data transfer and kernel dispatch entry points. A tool may use APIs to start and stop tracing for device-related activities such as data transfer and kernel dispatch timings and associated metadata. If device tracing is enabled, trace records for device activities are collected during program execution and returned to the tool using the APIs described in the specification.
Following is an example demonstrating how a tool would use the OMPT target APIs supported. The README in /opt/rocm/llvm/examples/tools/ompt outlines the steps to follow, and you can run the provided example as indicated below:
cd /opt/rocm/llvm/examples/tools/ompt/veccopy-ompt-target-tracing
make run
The file veccopy-ompt-target-tracing.c simulates how a tool would initiate device activity tracing. The file callbacks.h shows the callbacks that may be registered and implemented by the tool.
Deprecations and Warnings
Linux Filesystem Hierarchy Standard for ROCm
ROCm packages have adopted the Linux foundation filesystem hierarchy standard in this release to ensure ROCm components follow open source conventions for Linux-based distributions. While moving to a new filesystem hierarchy, ROCm ensures backward compatibility with its 5.1 version or older filesystem hierarchy. See below for a detailed explanation of the new filesystem hierarchy and backward compatibility.
New Filesystem Hierarchy
The following is the new filesystem hierarchy:
/opt/rocm-<ver>
| --bin
| --All externally exposed Binaries
| --libexec
| --<component>
| -- Component specific private non-ISA executables (architecture independent)
| --include
| -- <component>
| --<header files>
| --lib
| --lib<soname>.so -> lib<soname>.so.major -> lib<soname>.so.major.minor.patch
(public libraries linked with application)
| --<component> (component specific private library, executable data)
| --<cmake>
| --components
| --<component>.config.cmake
| --share
| --html/<component>/*.html
| --info/<component>/*.[pdf, md, txt]
| --man
| --doc
| --<component>
| --<licenses>
| --<component>
| --<misc files> (arch independent non-executable)
| --samples
Note
ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major release.
For more information, refer to https://refspecs.linuxfoundation.org/fhs.shtml.
Backward Compatibility with Older Filesystems
ROCm has moved header files and libraries to its new location as indicated in the above structure and included symbolic-link and wrapper header files in its old location for backward compatibility.
Note
ROCm will continue supporting backward compatibility until the next major release.
Wrapper header files
Wrapper header files are placed in the old location (/opt/rocm-xxx/<component>/include) with a warning message to include files from the new location (/opt/rocm-xxx/include) as shown in the example below:
// Code snippet from hip_runtime.h
#pragma message “This file is deprecated. Use file from include path /opt/rocm-ver/include/ and prefix with hip”.
#include "hip/hip_runtime.h"
The wrapper header files’ backward compatibility deprecation is as follows:
#pragmamessage announcing deprecation -- ROCm v5.2 release#pragmamessage changed to#warning-- Future release#warningchanged to#error-- Future release- Backward compatibility wrappers removed -- Future release
Library files
Library files are available in the /opt/rocm-xxx/lib folder. For backward compatibility, the old library location (/opt/rocm-xxx/<component>/lib) has a soft link to the library at the new location.
Example:
$ ls -l /opt/rocm/hip/lib/
total 4
drwxr-xr-x 4 root root 4096 May 12 10:45 cmake
lrwxrwxrwx 1 root root 24 May 10 23:32 libamdhip64.so -> ../../lib/libamdhip64.so
CMake Config files
All CMake configuration files are available in the /opt/rocm-xxx/lib/cmake/<component> folder. For backward compatibility, the old CMake locations (/opt/rocm-xxx/<component>/lib/cmake) consist of a soft link to the new CMake config.
Example:
$ ls -l /opt/rocm/hip/lib/cmake/hip/
total 0
lrwxrwxrwx 1 root root 42 May 10 23:32 hip-config.cmake -> ../../../../lib/cmake/hip/hip-config.cmake
Planned deprecation of hip-rocclr and hip-base packages
In the ROCm v5.2 release, hip-rocclr and hip-base packages (Debian and RPM) are planned for deprecation and will be removed in a future release. hip-runtime-amd and hip-dev(el) will replace these packages respectively. Users of hip-rocclr must install two packages, hip-runtime-amd and hip-dev, to get the same set of packages installed by hip-rocclr previously.
Currently, both package names hip-rocclr (or) hip-runtime-amd and hip-base (or) hip-dev(el) are supported. Deprecation of Integrated HIP Directed Tests
The integrated HIP directed tests, which are currently built by default, are deprecated in this release. The default building and execution support through CMake will be removed in future release.
Fixed Defects
| Fixed Defect | Fix |
|---|---|
| ROCmInfo does not list gpus | Code fix |
| Hang observed while restoring cooperative group samples | Code fix |
| ROCM-SMI over SRIOV: Unsupported commands do not return proper error message | Code fix |
Known Issues
This section consists of known issues in this release.
Compiler Error on gfx1030 When Compiling at -O0
Issue
A compiler error occurs when using -O0 flag to compile code for gfx1030 that calls atomicAddNoRet, which is defined in amd_hip_atomic.h. The compiler generates an illegal instruction for gfx1030.
Workaround
The workaround is not to use the -O0 flag for this case. For higher optimization levels, the compiler does not generate an invalid instruction.
System Freeze Observed During CUDA Memtest Checkpoint
Issue
Checkpoint/Restore in Userspace (CRIU) requires 20 MB of VRAM approximately to checkpoint and restore. The CRIU process may freeze if the maximum amount of available VRAM is allocated to checkpoint applications.
Workaround
To use CRIU to checkpoint and restore your application, limit the amount of VRAM the application uses to ensure at least 20 MB is available.
HPC test fails with the “HSA_STATUS_ERROR_MEMORY_FAULT” error
Issue
The compiler may incorrectly compile a program that uses the __shfl_sync(mask, value, srcLane) function when the "value" parameter to the function is undefined along some path to the function. For most functions, uninitialized inputs cause undefined behavior, but the definition for __shfl_sync should allow for undefined values.
Workaround
The workaround is to initialize the parameters to __shfl_sync.
Note
When the
-Wallcompilation flag is used, the compiler generates a warning indicating the variable is initialized along some path.
Example:
double res = 0.0; // Initialize the input to __shfl_sync.
if (lane == 0) {
res = <some expression>
}
res = __shfl_sync(mask, res, 0);
Kernel produces incorrect result
Issue
In recent changes to Clang, insertion of the noundef attribute to all the function arguments has been enabled by default.
In the HIP kernel, variable var in shfl_sync may not be initialized, so LLVM IR treats it as undef.
So, the function argument that is potentially undef (because it is not intialized) has always been assumed to be noundef by LLVM IR (since Clang has inserted noundef attribute). This leads to ambiguous kernel execution.
Workaround
-
Skip adding
noundefattribute to functions tagged with convergent attribute. Refer to https://reviews.llvm.org/D124158 for more information. -
Introduce shuffle attribute and add it to
__shfllike APIs at hip headers. Clang can skip adding noundef attribute, if it finds that argument is tagged with shuffle attribute. Refer to https://reviews.llvm.org/D125378 for more information. -
Introduce clang builtin for
__shflto identify it and skip addingnoundefattribute. -
Introduce
__builtin_freezeto use on the relevant arguments in library wrappers. The library/header need to insert freezes on the relevant inputs.
Issue with Applications Triggering Oversubscription
There is a known issue with applications that trigger oversubscription. A hardware hang occurs when ROCgdb is used on AMD Instinct™ MI50 and MI100 systems.
This issue is under investigation and will be fixed in a future release.