Compare commits

...

12 Commits

Author SHA1 Message Date
Sam Wu
3ef81535df Update documentation requirements 2024-09-16 10:12:12 -08:00
Sam Wu
78380916b3 Update documentation requirements 2024-06-06 16:58:16 -06:00
Sam Wu
9f6cef51e1 Fix RTD config 2024-05-02 08:53:26 -06:00
Sam Wu
3f5f3a6fc7 Update documentation requirements 2024-05-01 16:58:35 -06:00
Sam Wu
55711837bc Update documentation requirements 2024-05-01 16:50:34 -06:00
Sam Wu
4595b88df7 add version to html title 2023-08-04 17:14:50 -06:00
Sam Wu
5ff050428b rocm-docs-core v0.18.3 2023-06-30 09:31:15 -06:00
Máté Ferenc Nagy-Egri
0dd3fc9eb4 Downgrade license notice to 5.1.0 2023-06-22 18:46:13 +02:00
Máté Ferenc Nagy-Egri
0908ed22b1 Downgrade changelog to 5.1.0 2023-06-22 18:46:13 +02:00
Máté Ferenc Nagy-Egri
683b940a89 Downgrade install instructions to 5.1.0 2023-06-22 18:46:12 +02:00
Máté Ferenc Nagy-Egri
3f60f1df3b Downgrade OS support to 5.1.0 2023-06-22 18:46:12 +02:00
Máté Ferenc Nagy-Egri
d9543485ce Downgrade release notes to 5.1.0 2023-06-22 18:46:12 +02:00
15 changed files with 310 additions and 1488 deletions

View File

@@ -3,12 +3,19 @@
version: 2
build:
os: ubuntu-22.04
tools:
python: "3.10"
apt_packages:
- "doxygen"
- "graphviz" # For dot graphs in doxygen
python:
install:
- requirements: docs/sphinx/requirements.txt
sphinx:
configuration: docs/conf.py
formats: [htmlzip, pdf, epub]
python:
version: "3.8"
install:
- requirements: docs/sphinx/requirements.txt
formats: []

View File

@@ -15,827 +15,6 @@ The release notes for the ROCm platform.
-------------------
## ROCm 5.2.0
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-duplicate-header -->
### What's New in This Release
#### HIP Enhancements
The ROCm v5.2 release consists of the following HIP enhancements:
##### HIP Installation Guide Updates
The HIP Installation Guide is updated to include building HIP tests from source on the AMD and NVIDIA platforms.
For more details, refer to the HIP Installation Guide v5.2.
##### Support for device-side malloc on HIP-Clang
HIP-Clang now supports device-side malloc. This implementation does not require the use of `hipDeviceSetLimit(hipLimitMallocHeapSize,value)` nor respect any setting. The heap is fully dynamic and can grow until the available free memory on the device is consumed.
The test codes at the following link show how to implement applications using malloc and free functions in device kernels:
<https://github.com/ROCm-Developer-Tools/HIP/blob/develop/tests/src/deviceLib/hipDeviceMalloc.cpp>
##### New HIP APIs in This Release
The following new HIP APIs are available in the ROCm v5.2 release. Note that this is a pre-official version (beta) release of the new APIs:
###### Device management HIP APIs
The new device management HIP APIs are as follows:
- Gets a UUID for the device. This API returns a UUID for the device.
```h
hipError_t hipDeviceGetUuid(hipUUID* uuid, hipDevice_t device);
```
> **Note**
>
> This new API corresponds to the following CUDA API:
>
> ```h
> CUresult cuDeviceGetUuid(CUuuid* uuid, CUdevice dev);
> ```
- Gets default memory pool of the specified device
```h
hipError_t hipDeviceGetDefaultMemPool(hipMemPool_t* mem_pool, int device);
```
- Sets the current memory pool of a device
```h
hipError_t hipDeviceSetMemPool(int device, hipMemPool_t mem_pool);
```
- Gets the current memory pool for the specified device
```h
hipError_t hipDeviceGetMemPool(hipMemPool_t* mem_pool, int device);
```
###### New HIP Runtime APIs in Memory Management
The new Stream Ordered Memory Allocator functions of HIP runtime APIs in memory management are as follows:
- Allocates memory with stream ordered semantics
```h
hipError_t hipMallocAsync(void** dev_ptr, size_t size, hipStream_t stream);
```
- Frees memory with stream ordered semantics
```h
hipError_t hipFreeAsync(void* dev_ptr, hipStream_t stream);
```
- Releases freed memory back to the OS
```h
hipError_t hipMemPoolTrimTo(hipMemPool_t mem_pool, size_t min_bytes_to_hold);
```
- Sets attributes of a memory pool
```h
hipError_t hipMemPoolSetAttribute(hipMemPool_t mem_pool, hipMemPoolAttr attr, void* value);
```
- Gets attributes of a memory pool
```h
hipError_t hipMemPoolGetAttribute(hipMemPool_t mem_pool, hipMemPoolAttr attr, void* value);
```
- Controls visibility of the specified pool between devices
```h
hipError_t hipMemPoolSetAccess(hipMemPool_t mem_pool, const hipMemAccessDesc* desc_list, size_t count);
```
- Returns the accessibility of a pool from a device
```h
hipError_t hipMemPoolGetAccess(hipMemAccessFlags* flags, hipMemPool_t mem_pool, hipMemLocation* location);
```
- Creates a memory pool
```h
hipError_t hipMemPoolCreate(hipMemPool_t* mem_pool, const hipMemPoolProps* pool_props);
```
- Destroys the specified memory pool
```h
hipError_t hipMemPoolDestroy(hipMemPool_t mem_pool);
```
- Allocates memory from a specified pool with stream ordered semantics
```h
hipError_t hipMallocFromPoolAsync(void** dev_ptr, size_t size, hipMemPool_t mem_pool, hipStream_t stream);
```
- Exports a memory pool to the requested handle type
```h
hipError_t hipMemPoolExportToShareableHandle(
void* shared_handle,
hipMemPool_t mem_pool,
hipMemAllocationHandleType handle_type,
unsigned int flags);
```
- Imports a memory pool from a shared handle
```h
hipError_t hipMemPoolImportFromShareableHandle(
hipMemPool_t* mem_pool,
void* shared_handle,
hipMemAllocationHandleType handle_type,
unsigned int flags);
```
- Exports data to share a memory pool allocation between processes
```h
hipError_t hipMemPoolExportPointer(hipMemPoolPtrExportData* export_data, void* dev_ptr);
Import a memory pool allocation from another process.t
hipError_t hipMemPoolImportPointer(
void** dev_ptr,
hipMemPool_t mem_pool,
hipMemPoolPtrExportData* export_data);
```
###### HIP Graph Management APIs
The new HIP Graph Management APIs are as follows:
- Enqueues a host function call in a stream
```h
hipError_t hipLaunchHostFunc(hipStream_t stream, hipHostFn_t fn, void* userData);
```
- Swaps the stream capture mode of a thread
```h
hipError_t hipThreadExchangeStreamCaptureMode(hipStreamCaptureMode* mode);
```
- Sets a node attribute
```h
hipError_t hipGraphKernelNodeSetAttribute(hipGraphNode_t hNode, hipKernelNodeAttrID attr, const hipKernelNodeAttrValue* value);
```
- Gets a node attribute
```h
hipError_t hipGraphKernelNodeGetAttribute(hipGraphNode_t hNode, hipKernelNodeAttrID attr, hipKernelNodeAttrValue* value);
```
###### Support for Virtual Memory Management APIs
The new APIs for virtual memory management are as follows:
- Frees an address range reservation made via hipMemAddressReserve
```h
hipError_t hipMemAddressFree(void* devPtr, size_t size);
```
- Reserves an address range
```h
hipError_t hipMemAddressReserve(void** ptr, size_t size, size_t alignment, void* addr, unsigned long long flags);
```
- Creates a memory allocation described by the properties and size
```h
hipError_t hipMemCreate(hipMemGenericAllocationHandle_t* handle, size_t size, const hipMemAllocationProp* prop, unsigned long long flags);
```
- Exports an allocation to a requested shareable handle type
```h
hipError_t hipMemExportToShareableHandle(void* shareableHandle, hipMemGenericAllocationHandle_t handle, hipMemAllocationHandleType handleType, unsigned long long flags);
```
- Gets the access flags set for the given location and ptr
```h
hipError_t hipMemGetAccess(unsigned long long* flags, const hipMemLocation* location, void* ptr);
```
- Calculates either the minimal or recommended granularity
```h
hipError_t hipMemGetAllocationGranularity(size_t* granularity, const hipMemAllocationProp* prop, hipMemAllocationGranularity_flags option);
```
- Retrieves the property structure of the given handle
```h
hipError_t hipMemGetAllocationPropertiesFromHandle(hipMemAllocationProp* prop, hipMemGenericAllocationHandle_t handle);
```
- Imports an allocation from a requested shareable handle type
```h
hipError_t hipMemImportFromShareableHandle(hipMemGenericAllocationHandle_t* handle, void* osHandle, hipMemAllocationHandleType shHandleType);
```
- Maps an allocation handle to a reserved virtual address range
```h
hipError_t hipMemMap(void* ptr, size_t size, size_t offset, hipMemGenericAllocationHandle_t handle, unsigned long long flags);
```
- Maps or unmaps subregions of sparse HIP arrays and sparse HIP mipmapped arrays
```h
hipError_t hipMemMapArrayAsync(hipArrayMapInfo* mapInfoList, unsigned int count, hipStream_t stream);
```
- Release a memory handle representing a memory allocation, that was previously allocated through hipMemCreate
```h
hipError_t hipMemRelease(hipMemGenericAllocationHandle_t handle);
```
- Returns the allocation handle of the backing memory allocation given the address
```h
hipError_t hipMemRetainAllocationHandle(hipMemGenericAllocationHandle_t* handle, void* addr);
```
- Sets the access flags for each location specified in desc for the given virtual address range
```h
hipError_t hipMemSetAccess(void* ptr, size_t size, const hipMemAccessDesc* desc, size_t count);
```
- Unmaps memory allocation of a given address range
```h
hipError_t hipMemUnmap(void* ptr, size_t size);
```
For more information, refer to the HIP API documentation at <https://docs.amd.com/bundle/HIP_API_Guide/page/modules.html>
##### Planned HIP Changes in Future Releases
Changes to `hipDeviceProp_t`, `HIPMEMCPY_3D`, and `hipArray` structures (and related HIP APIs) are planned in the next major release. These changes may impact backward compatibility.
Refer to the Release Notes document in subsequent releases for more information.
ROCm Math and Communication Libraries
In this release, ROCm Math and Communication Libraries consist of the following enhancements and fixes:
New rocWMMA for Matrix Multiplication and Accumulation Operations Acceleration
This release introduces a new ROCm C++ library for accelerating mixed precision matrix multiplication and accumulation (MFMA) operations leveraging specialized GPU matrix cores. rocWMMA provides a C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using them in block-wise operations that are distributed in parallel across GPU wavefronts. The API is a header library of GPU device code, meaning matrix core acceleration may be compiled directly into your kernel device code. This can benefit from compiler optimization in the generation of kernel assembly and does not incur additional overhead costs of linking to external runtime libraries or having to launch separate kernels.
rocWMMA is released as a header library and includes test and sample projects to validate and illustrate example usages of the C++ API. GEMM matrix multiplication is used as primary validation given the heavy precedent for the library. However, the usage portfolio is growing significantly and demonstrates different ways rocWMMA may be consumed.
For more information, refer to <https://docs.amd.com/category/libraries>.
#### OpenMP Enhancements in This Release
##### OMPT Target Support
The OpenMP runtime in ROCm implements a subset of the OMPT device APIs, as described in the OpenMP specification document. These are APIs that allow first-party tools to examine the profile and traces for kernels that execute on a device. A tool may register callbacks for data transfer and kernel dispatch entry points. A tool may use APIs to start and stop tracing for device-related activities such as data transfer and kernel dispatch timings and associated metadata. If device tracing is enabled, trace records for device activities are collected during program execution and returned to the tool using the APIs described in the specification.
Following is an example demonstrating how a tool would use the OMPT target APIs supported. The README in /opt/rocm/llvm/examples/tools/ompt outlines the steps to follow, and you can run the provided example as indicated below:
```sh
cd /opt/rocm/llvm/examples/tools/ompt/veccopy-ompt-target-tracing
make run
```
The file `veccopy-ompt-target-tracing.c` simulates how a tool would initiate device activity tracing. The file `callbacks.h` shows the callbacks that may be registered and implemented by the tool.
### Deprecations and Warnings
#### Linux Filesystem Hierarchy Standard for ROCm
ROCm packages have adopted the Linux foundation filesystem hierarchy standard in this release to ensure ROCm components follow open source conventions for Linux-based distributions. While moving to a new filesystem hierarchy, ROCm ensures backward compatibility with its 5.1 version or older filesystem hierarchy. See below for a detailed explanation of the new filesystem hierarchy and backward compatibility.
##### New Filesystem Hierarchy
The following is the new filesystem hierarchy:
```text
/opt/rocm-<ver>
| --bin
| --All externally exposed Binaries
| --libexec
| --<component>
| -- Component specific private non-ISA executables (architecture independent)
| --include
| -- <component>
| --<header files>
| --lib
| --lib<soname>.so -> lib<soname>.so.major -> lib<soname>.so.major.minor.patch
(public libraries linked with application)
| --<component> (component specific private library, executable data)
| --<cmake>
| --components
| --<component>.config.cmake
| --share
| --html/<component>/*.html
| --info/<component>/*.[pdf, md, txt]
| --man
| --doc
| --<component>
| --<licenses>
| --<component>
| --<misc files> (arch independent non-executable)
| --samples
```
> **Note**
>
> ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major release.
For more information, refer to <https://refspecs.linuxfoundation.org/fhs.shtml>.
##### Backward Compatibility with Older Filesystems
ROCm has moved header files and libraries to its new location as indicated in the above structure and included symbolic-link and wrapper header files in its old location for backward compatibility.
> **Note**
>
> ROCm will continue supporting backward compatibility until the next major release.
##### Wrapper header files
Wrapper header files are placed in the old location (`/opt/rocm-xxx/<component>/include`) with a warning message to include files from the new location (`/opt/rocm-xxx/include`) as shown in the example below:
```h
// Code snippet from hip_runtime.h
#pragma message “This file is deprecated. Use file from include path /opt/rocm-ver/include/ and prefix with hip”.
#include "hip/hip_runtime.h"
```
The wrapper header files backward compatibility deprecation is as follows:
- `#pragma` message announcing deprecation -- ROCm v5.2 release
- `#pragma` message changed to `#warning` -- Future release
- `#warning` changed to `#error` -- Future release
- Backward compatibility wrappers removed -- Future release
##### Library files
Library files are available in the `/opt/rocm-xxx/lib` folder. For backward compatibility, the old library location (`/opt/rocm-xxx/<component>/lib`) has a soft link to the library at the new location.
Example:
```log
$ ls -l /opt/rocm/hip/lib/
total 4
drwxr-xr-x 4 root root 4096 May 12 10:45 cmake
lrwxrwxrwx 1 root root 24 May 10 23:32 libamdhip64.so -> ../../lib/libamdhip64.so
```
##### CMake Config files
All CMake configuration files are available in the `/opt/rocm-xxx/lib/cmake/<component>` folder. For backward compatibility, the old CMake locations (`/opt/rocm-xxx/<component>/lib/cmake`) consist of a soft link to the new CMake config.
Example:
```log
$ ls -l /opt/rocm/hip/lib/cmake/hip/
total 0
lrwxrwxrwx 1 root root 42 May 10 23:32 hip-config.cmake -> ../../../../lib/cmake/hip/hip-config.cmake
```
#### Planned deprecation of hip-rocclr and hip-base packages
In the ROCm v5.2 release, hip-rocclr and hip-base packages (Debian and RPM) are planned for deprecation and will be removed in a future release. hip-runtime-amd and hip-dev(el) will replace these packages respectively. Users of hip-rocclr must install two packages, hip-runtime-amd and hip-dev, to get the same set of packages installed by hip-rocclr previously.
Currently, both package names hip-rocclr (or) hip-runtime-amd and hip-base (or) hip-dev(el) are supported.
Deprecation of Integrated HIP Directed Tests
The integrated HIP directed tests, which are currently built by default, are deprecated in this release. The default building and execution support through CMake will be removed in future release.
### Fixed Defects
| Fixed Defect | Fix |
|------------------------------------------------------------------------------|----------|
| ROCmInfo does not list gpus | Code fix |
| Hang observed while restoring cooperative group samples | Code fix |
| ROCM-SMI over SRIOV: Unsupported commands do not return proper error message | Code fix |
### Known Issues
This section consists of known issues in this release.
#### Compiler Error on gfx1030 When Compiling at -O0
##### Issue
A compiler error occurs when using -O0 flag to compile code for gfx1030 that calls atomicAddNoRet, which is defined in amd_hip_atomic.h. The compiler generates an illegal instruction for gfx1030.
##### Workaround
The workaround is not to use the -O0 flag for this case. For higher optimization levels, the compiler does not generate an invalid instruction.
#### System Freeze Observed During CUDA Memtest Checkpoint
##### Issue
Checkpoint/Restore in Userspace (CRIU) requires 20 MB of VRAM approximately to checkpoint and restore. The CRIU process may freeze if the maximum amount of available VRAM is allocated to checkpoint applications.
##### Workaround
To use CRIU to checkpoint and restore your application, limit the amount of VRAM the application uses to ensure at least 20 MB is available.
#### HPC test fails with the “HSA_STATUS_ERROR_MEMORY_FAULT” error
##### Issue
The compiler may incorrectly compile a program that uses the `__shfl_sync(mask, value, srcLane)` function when the "value" parameter to the function is undefined along some path to the function. For most functions, uninitialized inputs cause undefined behavior, but the definition for `__shfl_sync` should allow for undefined values.
##### Workaround
The workaround is to initialize the parameters to `__shfl_sync`.
> **Note**
>
> When the `-Wall` compilation flag is used, the compiler generates a warning indicating the variable is initialized along some path.
Example:
```cpp
double res = 0.0; // Initialize the input to __shfl_sync.
if (lane == 0) {
res = <some expression>
}
res = __shfl_sync(mask, res, 0);
```
#### Kernel produces incorrect result
##### Issue
In recent changes to Clang, insertion of the noundef attribute to all the function arguments has been enabled by default.
In the HIP kernel, variable var in shfl_sync may not be initialized, so LLVM IR treats it as undef.
So, the function argument that is potentially undef (because it is not intialized) has always been assumed to be noundef by LLVM IR (since Clang has inserted noundef attribute). This leads to ambiguous kernel execution.
##### Workaround
- Skip adding `noundef` attribute to functions tagged with convergent attribute. Refer to <https://reviews.llvm.org/D124158> for more information.
- Introduce shuffle attribute and add it to `__shfl` like APIs at hip headers. Clang can skip adding noundef attribute, if it finds that argument is tagged with shuffle attribute. Refer to <https://reviews.llvm.org/D125378> for more information.
- Introduce clang builtin for `__shfl` to identify it and skip adding `noundef` attribute.
- Introduce `__builtin_freeze` to use on the relevant arguments in library wrappers. The library/header need to insert freezes on the relevant inputs.
#### Issue with Applications Triggering Oversubscription
There is a known issue with applications that trigger oversubscription. A hardware hang occurs when ROCgdb is used on AMD Instinct™ MI50 and MI100 systems.
This issue is under investigation and will be fixed in a future release.
### Library Changes in ROCM 5.2.0
| Library | Version |
|---------|---------|
| hipBLAS | 0.50.0 ⇒ [0.51.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.2.0) |
| hipCUB | 2.11.0 ⇒ [2.11.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.2.0) |
| hipFFT | 1.0.7 ⇒ [1.0.8](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.2.0) |
| hipSOLVER | 1.3.0 ⇒ [1.4.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.2.0) |
| hipSPARSE | 2.1.0 ⇒ [2.2.0](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.2.0) |
| rccl | [2.11.4](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.2.0) |
| rocALUTION | 2.0.2 ⇒ [2.0.3](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.2.0) |
| rocBLAS | 2.43.0 ⇒ [2.44.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.2.0) |
| rocFFT | 1.0.16 ⇒ [1.0.17](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.2.0) |
| rocPRIM | 2.10.13 ⇒ [2.10.14](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.2.0) |
| rocRAND | 2.10.13 ⇒ [2.10.14](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.2.0) |
| rocSOLVER | 3.17.0 ⇒ [3.18.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.2.0) |
| rocSPARSE | 2.1.0 ⇒ [2.2.0](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.2.0) |
| rocThrust | 2.14.0 ⇒ [2.15.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.2.0) |
| rocWMMA | ⇒ [0.7](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.2.0) |
| Tensile | 4.32.0 ⇒ [4.33.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.2.0) |
#### hipBLAS 0.51.0
hipBLAS 0.51.0 for ROCm 5.2.0
##### Added
- Packages for test and benchmark executables on all supported OSes using CPack.
- Added File/Folder Reorg Changes with backward compatibility support enabled using ROCM-CMAKE wrapper functions
- Added user-specified initialization option to hipblas-bench
##### Fixed
- Fixed version gathering in performance measuring script
#### hipCUB 2.11.1
hipCUB 2.11.1 for ROCm 5.2.0
##### Added
- Packages for tests and benchmark executable on all supported OSes using CPack.
#### hipFFT 1.0.8
hipFFT 1.0.8 for ROCm 5.2.0
##### Added
- Added File/Folder Reorg Changes with backward compatibility support using ROCM-CMAKE wrapper functions.
- Packages for test and benchmark executables on all supported OSes using CPack.
#### hipSOLVER 1.4.0
hipSOLVER 1.4.0 for ROCm 5.2.0
##### Added
- Package generation for test and benchmark executables on all supported OSes using CPack.
- File/Folder Reorg
- Added File/Folder Reorg Changes with backward compatibility support using ROCM-CMAKE wrapper functions.
##### Fixed
- Fixed the ReadTheDocs documentation generation.
#### hipSPARSE 2.2.0
hipSPARSE 2.2.0 for ROCm 5.2.0
##### Added
- Packages for test and benchmark executables on all supported OSes using CPack.
#### rocALUTION 2.0.3
rocALUTION 2.0.3 for ROCm 5.2.0
##### Added
- Packages for test and benchmark executables on all supported OSes using CPack.
#### rocBLAS 2.44.0
rocBLAS 2.44.0 for ROCm 5.2.0
##### Added
- Packages for test and benchmark executables on all supported OSes using CPack.
- Added Denormal number detection to the Numerical checking helper function to detect denormal/subnormal numbers in the input and the output vectors of rocBLAS level 1 and 2 functions.
- Added Denormal number detection to the Numerical checking helper function to detect denormal/subnormal numbers in the input and the output general matrices of rocBLAS level 2 and 3 functions.
- Added NaN initialization tests to the yaml files of Level 2 rocBLAS batched and strided-batched functions for testing purposes.
- Added memory allocation check to avoid disk swapping during rocblas-test runs by skipping tests.
##### Optimizations
- Improved performance of non-batched and batched her2 for all sizes and data types.
- Improved performance of non-batched and batched amin for all data types using shuffle reductions.
- Improved performance of non-batched and batched amax for all data types using shuffle reductions.
- Improved performance of trsv for all sizes and data types.
##### Changed
- Modifying gemm_ex for HBH (High-precision F16). The alpha/beta data type remains as F32 without narrowing to F16 and expanding back to F32 in the kernel. This change prevents rounding errors due to alpha/beta conversion in situations where alpha/beta are not exactly represented as an F16.
- Modified non-batched and batched asum, nrm2 functions to use shuffle instruction based reductions.
- For gemm, gemm_ex, gemm_ex2 internal API use rocblas_stride datatype for offset.
- For symm, hemm, syrk, herk, dgmm, geam internal API use rocblas_stride datatype for offset.
- AMD copyright year for all rocBLAS files.
- For gemv (transpose-case), typecasted the &#39;lda&#39;(offset) datatype to size_t during offset calculation to avoid overflow and remove duplicate template functions.
##### Fixed
- For function her2 avoid overflow in offset calculation.
- For trsm when alpha == 0 and on host, allow A to be nullptr.
- Fixed memory access issue in trsv.
- Fixed git pre-commit script to update only AMD copyright year.
- Fixed dgmm, geam test functions to set correct stride values.
- For functions ssyr2k and dsyr2k allow trans == rocblas_operation_conjugate_transpose.
- Fixed compilation error for clients-only build.
##### Removed
- Remove Navi12 (gfx1011) from fat binary.
#### rocFFT 1.0.17
rocFFT 1.0.17 for ROCm 5.2.0
##### Added
- Packages for test and benchmark executables on all supported OSes using CPack.
- Added File/Folder Reorg Changes with backward compatibility support using ROCM-CMAKE wrapper functions.
##### Changed
- Improved reuse of twiddle memory between plans.
- Set a default load/store callback when only one callback
type is set via the API for improved performance.
##### Optimizations
- Introduced a new access pattern of lds (non-linear) and applied it on
sbcc kernels len 64 to get performance improvement.
##### Fixed
- Fixed plan creation failure in cases where SBCC kernels would need to write to non-unit-stride buffers.
#### rocPRIM 2.10.14
rocPRIM 2.10.14 for ROCm 5.2.0
##### Added
- Packages for tests and benchmark executable on all supported OSes using CPack.
- Added File/Folder Reorg Changes and Enabled Backward compatibility support using wrapper headers.
#### rocRAND 2.10.14
rocRAND 2.10.14 for ROCm 5.2.0
##### Added
- Backward compatibility for deprecated `#include &lt;rocrand.h&gt;` using wrapper header files.
- Packages for test and benchmark executables on all supported OSes using CPack.
#### rocSOLVER 3.18.0
rocSOLVER 3.18.0 for ROCm 5.2.0
##### Added
- Partial eigenvalue decomposition routines:
- STEBZ
- STEIN
- Package generation for test and benchmark executables on all supported OSes using CPack.
- Added tests for multi-level logging
- Added tests for rocsolver-bench client
- File/Folder Reorg
- Added File/Folder Reorg Changes with backward compatibility support using ROCM-CMAKE wrapper functions.
##### Fixed
- Fixed compatibility with libfmt 8.1
#### rocSPARSE 2.2.0
rocSPARSE 2.2.0 for ROCm 5.2.0
##### Added
- batched SpMM for CSR, COO and Blocked ELL formats.
- Packages for test and benchmark executables on all supported OSes using CPack.
- Clients file importers and exporters.
##### Improved
- Clients code size reduction.
- Clients error handling.
- Clients benchmarking for performance tracking.
##### Changed
- Test adjustments due to roundoff errors.
- Fixing API calls compatiblity with rocPRIM.
##### Known Issues
- none
#### rocThrust 2.15.0
rocThrust 2.15.0 for ROCm 5.2.0
##### Added
- Packages for tests and benchmark executable on all supported OSes using CPack.
#### rocWMMA 0.7
rocWMMA 0.7 for ROCm 5.2.0
##### Added
- Added unit tests for DLRM kernels
- Added GEMM sample
- Added DLRM sample
- Added SGEMV sample
- Added unit tests for cooperative wmma load and stores
- Added unit tests for IOBarrier.h
- Added wmma load/ store tests for different matrix types (A, B and Accumulator)
- Added more block sizes 1, 2, 4, 8 to test MmaSyncMultiTest
- Added block sizes 4, 8 to test MmaSynMultiLdsTest
- Added support for wmma load / store layouts with block dimension greater than 64
- Added IOShape structure to define the attributes of mapping and layouts for all wmma matrix types
- Added CI testing for rocWMMA
##### Changed
- Renamed wmma to rocwmma in cmake, header files and documentation
- Renamed library files
- Modified Layout.h to use different matrix offset calculations (base offset, incremental offset and cumulative offset)
- Opaque load/store continue to use incrementatl offsets as they fill the entire block
- Cooperative load/store use cumulative offsets as they fill only small portions for the entire block
- Increased Max split counts to 64 for cooperative load/store
- Moved all the wmma definitions, API headers to rocwmma namespace
- Modified wmma fill unit tests to validate all matrix types (A, B, Accumulator)
#### Tensile 4.33.0
Tensile 4.33.0 for ROCm 5.2.0
##### Added
- TensileUpdateLibrary for updating old library logic files
- Support for TensileRetuneLibrary to use sizes from separate file
- ZGEMM DirectToVgpr/DirectToLds/StoreCInUnroll/MIArchVgpr support
- Tests for denorm correctness
- Option to write different architectures to different TensileLibrary files
##### Optimizations
- Optimize MessagePackLoadLibraryFile by switching to fread
- DGEMM tail loop optimization for PrefetchAcrossPersistentMode=1/DirectToVgpr
##### Changed
- Alpha/beta datatype remains as F32 for HPA HGEMM
- Force assembly kernels to not flush denorms
- Use hipDeviceAttributePhysicalMultiProcessorCount as multiProcessorCount
##### Fixed
- Fix segmentation fault when run i8 datatype with TENSILE_DB=0x80
-------------------
## ROCm 5.1.3
### Library Changes in ROCM 5.1.3
| Library | Version |
|---------|---------|
| hipBLAS | [0.50.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.1.3) |
| hipCUB | [2.11.0](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.1.3) |
| hipFFT | [1.0.7](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.1.3) |
| hipSOLVER | [1.3.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.1.3) |
| hipSPARSE | [2.1.0](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.1.3) |
| rccl | [2.11.4](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.1.3) |
| rocALUTION | [2.0.2](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.1.3) |
| rocBLAS | [2.43.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.1.3) |
| rocFFT | [1.0.16](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.1.3) |
| rocPRIM | [2.10.13](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.1.3) |
| rocRAND | [2.10.13](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.1.3) |
| rocSOLVER | [3.17.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.1.3) |
| rocSPARSE | [2.1.0](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.1.3) |
| rocThrust | [2.14.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.1.3) |
| Tensile | [4.32.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.1.3) |
-------------------
## ROCm 5.1.1
### Library Changes in ROCM 5.1.1
| Library | Version |
|---------|---------|
| hipBLAS | [0.50.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.1.1) |
| hipCUB | [2.11.0](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.1.1) |
| hipFFT | [1.0.7](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.1.1) |
| hipSOLVER | [1.3.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.1.1) |
| hipSPARSE | [2.1.0](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.1.1) |
| rccl | [2.11.4](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.1.1) |
| rocALUTION | [2.0.2](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.1.1) |
| rocBLAS | [2.43.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.1.1) |
| rocFFT | [1.0.16](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.1.1) |
| rocPRIM | [2.10.13](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.1.1) |
| rocRAND | [2.10.13](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.1.1) |
| rocSOLVER | [3.17.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.1.1) |
| rocSPARSE | [2.1.0](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.1.1) |
| rocThrust | [2.14.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.1.1) |
| Tensile | [4.32.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.1.1) |
-------------------
## ROCm 5.1.0
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-blanks-blockquote -->

View File

@@ -15,498 +15,219 @@ The release notes for the ROCm platform.
-------------------
## ROCm 5.2.0
## ROCm 5.1.0
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-duplicate-header -->
### What's New in This Release
#### HIP Enhancements
The ROCm v5.2 release consists of the following HIP enhancements:
The ROCm v5.1 release consists of the following HIP enhancements.
##### HIP Installation Guide Updates
The HIP Installation Guide is updated to include building HIP tests from source on the AMD and NVIDIA platforms.
The HIP Installation Guide is updated to include installation and building HIP from source on the AMD and NVIDIA platforms.
For more details, refer to the HIP Installation Guide v5.2.
Refer to the HIP Installation Guide v5.1 for more details.
##### Support for device-side malloc on HIP-Clang
##### Support for HIP Graph
HIP-Clang now supports device-side malloc. This implementation does not require the use of `hipDeviceSetLimit(hipLimitMallocHeapSize,value)` nor respect any setting. The heap is fully dynamic and can grow until the available free memory on the device is consumed.
ROCm v5.1 extends support for HIP Graph.
The test codes at the following link show how to implement applications using malloc and free functions in device kernels:
##### Planned Changes for HIP in Future Releases
<https://github.com/ROCm-Developer-Tools/HIP/blob/develop/tests/src/deviceLib/hipDeviceMalloc.cpp>
###### Separation of hiprtc (libhiprtc) library from hip runtime (amdhip64)
##### New HIP APIs in This Release
On ROCm/Linux, to maintain backward compatibility, the hipruntime library (amdhip64) will continue to include hiprtc symbols in future releases. The backward compatible support may be discontinued by removing hiprtc symbols from the hipruntime library (amdhip64) in the next major release.
The following new HIP APIs are available in the ROCm v5.2 release. Note that this is a pre-official version (beta) release of the new APIs:
###### hipDeviceProp_t Structure Enhancements
###### Device management HIP APIs
Changes to the hipDeviceProp_t structure in the next major release may result in backward incompatibility. More details on these changes will be provided in subsequent releases.
The new device management HIP APIs are as follows:
#### ROCDebugger Enhancements
- Gets a UUID for the device. This API returns a UUID for the device.
##### Multi-language Source Level Debugger
```h
hipError_t hipDeviceGetUuid(hipUUID* uuid, hipDevice_t device);
```
The compiler now generates a source-level variable and function argument debug information.
> **Note**
>
> This new API corresponds to the following CUDA API:
>
> ```h
> CUresult cuDeviceGetUuid(CUuuid* uuid, CUdevice dev);
> ```
The accuracy is guaranteed if the compiler options `-g -O0` are used and apply only to HIP.
- Gets default memory pool of the specified device
```h
hipError_t hipDeviceGetDefaultMemPool(hipMemPool_t* mem_pool, int device);
```
- Sets the current memory pool of a device
```h
hipError_t hipDeviceSetMemPool(int device, hipMemPool_t mem_pool);
```
- Gets the current memory pool for the specified device
```h
hipError_t hipDeviceGetMemPool(hipMemPool_t* mem_pool, int device);
```
###### New HIP Runtime APIs in Memory Management
The new Stream Ordered Memory Allocator functions of HIP runtime APIs in memory management are as follows:
- Allocates memory with stream ordered semantics
```h
hipError_t hipMallocAsync(void** dev_ptr, size_t size, hipStream_t stream);
```
- Frees memory with stream ordered semantics
```h
hipError_t hipFreeAsync(void* dev_ptr, hipStream_t stream);
```
- Releases freed memory back to the OS
```h
hipError_t hipMemPoolTrimTo(hipMemPool_t mem_pool, size_t min_bytes_to_hold);
```
- Sets attributes of a memory pool
```h
hipError_t hipMemPoolSetAttribute(hipMemPool_t mem_pool, hipMemPoolAttr attr, void* value);
```
- Gets attributes of a memory pool
```h
hipError_t hipMemPoolGetAttribute(hipMemPool_t mem_pool, hipMemPoolAttr attr, void* value);
```
- Controls visibility of the specified pool between devices
```h
hipError_t hipMemPoolSetAccess(hipMemPool_t mem_pool, const hipMemAccessDesc* desc_list, size_t count);
```
- Returns the accessibility of a pool from a device
```h
hipError_t hipMemPoolGetAccess(hipMemAccessFlags* flags, hipMemPool_t mem_pool, hipMemLocation* location);
```
- Creates a memory pool
```h
hipError_t hipMemPoolCreate(hipMemPool_t* mem_pool, const hipMemPoolProps* pool_props);
```
- Destroys the specified memory pool
```h
hipError_t hipMemPoolDestroy(hipMemPool_t mem_pool);
```
- Allocates memory from a specified pool with stream ordered semantics
```h
hipError_t hipMallocFromPoolAsync(void** dev_ptr, size_t size, hipMemPool_t mem_pool, hipStream_t stream);
```
- Exports a memory pool to the requested handle type
```h
hipError_t hipMemPoolExportToShareableHandle(
void* shared_handle,
hipMemPool_t mem_pool,
hipMemAllocationHandleType handle_type,
unsigned int flags);
```
- Imports a memory pool from a shared handle
```h
hipError_t hipMemPoolImportFromShareableHandle(
hipMemPool_t* mem_pool,
void* shared_handle,
hipMemAllocationHandleType handle_type,
unsigned int flags);
```
- Exports data to share a memory pool allocation between processes
```h
hipError_t hipMemPoolExportPointer(hipMemPoolPtrExportData* export_data, void* dev_ptr);
Import a memory pool allocation from another process.t
hipError_t hipMemPoolImportPointer(
void** dev_ptr,
hipMemPool_t mem_pool,
hipMemPoolPtrExportData* export_data);
```
###### HIP Graph Management APIs
The new HIP Graph Management APIs are as follows:
- Enqueues a host function call in a stream
```h
hipError_t hipLaunchHostFunc(hipStream_t stream, hipHostFn_t fn, void* userData);
```
- Swaps the stream capture mode of a thread
```h
hipError_t hipThreadExchangeStreamCaptureMode(hipStreamCaptureMode* mode);
```
- Sets a node attribute
```h
hipError_t hipGraphKernelNodeSetAttribute(hipGraphNode_t hNode, hipKernelNodeAttrID attr, const hipKernelNodeAttrValue* value);
```
- Gets a node attribute
```h
hipError_t hipGraphKernelNodeGetAttribute(hipGraphNode_t hNode, hipKernelNodeAttrID attr, hipKernelNodeAttrValue* value);
```
###### Support for Virtual Memory Management APIs
The new APIs for virtual memory management are as follows:
- Frees an address range reservation made via hipMemAddressReserve
```h
hipError_t hipMemAddressFree(void* devPtr, size_t size);
```
- Reserves an address range
```h
hipError_t hipMemAddressReserve(void** ptr, size_t size, size_t alignment, void* addr, unsigned long long flags);
```
- Creates a memory allocation described by the properties and size
```h
hipError_t hipMemCreate(hipMemGenericAllocationHandle_t* handle, size_t size, const hipMemAllocationProp* prop, unsigned long long flags);
```
- Exports an allocation to a requested shareable handle type
```h
hipError_t hipMemExportToShareableHandle(void* shareableHandle, hipMemGenericAllocationHandle_t handle, hipMemAllocationHandleType handleType, unsigned long long flags);
```
- Gets the access flags set for the given location and ptr
```h
hipError_t hipMemGetAccess(unsigned long long* flags, const hipMemLocation* location, void* ptr);
```
- Calculates either the minimal or recommended granularity
```h
hipError_t hipMemGetAllocationGranularity(size_t* granularity, const hipMemAllocationProp* prop, hipMemAllocationGranularity_flags option);
```
- Retrieves the property structure of the given handle
```h
hipError_t hipMemGetAllocationPropertiesFromHandle(hipMemAllocationProp* prop, hipMemGenericAllocationHandle_t handle);
```
- Imports an allocation from a requested shareable handle type
```h
hipError_t hipMemImportFromShareableHandle(hipMemGenericAllocationHandle_t* handle, void* osHandle, hipMemAllocationHandleType shHandleType);
```
- Maps an allocation handle to a reserved virtual address range
```h
hipError_t hipMemMap(void* ptr, size_t size, size_t offset, hipMemGenericAllocationHandle_t handle, unsigned long long flags);
```
- Maps or unmaps subregions of sparse HIP arrays and sparse HIP mipmapped arrays
```h
hipError_t hipMemMapArrayAsync(hipArrayMapInfo* mapInfoList, unsigned int count, hipStream_t stream);
```
- Release a memory handle representing a memory allocation, that was previously allocated through hipMemCreate
```h
hipError_t hipMemRelease(hipMemGenericAllocationHandle_t handle);
```
- Returns the allocation handle of the backing memory allocation given the address
```h
hipError_t hipMemRetainAllocationHandle(hipMemGenericAllocationHandle_t* handle, void* addr);
```
- Sets the access flags for each location specified in desc for the given virtual address range
```h
hipError_t hipMemSetAccess(void* ptr, size_t size, const hipMemAccessDesc* desc, size_t count);
```
- Unmaps memory allocation of a given address range
```h
hipError_t hipMemUnmap(void* ptr, size_t size);
```
For more information, refer to the HIP API documentation at
{doc}`hip:.doxygen/docBin/html/modules`.
##### Planned HIP Changes in Future Releases
Changes to `hipDeviceProp_t`, `HIPMEMCPY_3D`, and `hipArray` structures (and related HIP APIs) are planned in the next major release. These changes may impact backward compatibility.
Refer to the Release Notes document in subsequent releases for more information.
ROCm Math and Communication Libraries
In this release, ROCm Math and Communication Libraries consist of the following enhancements and fixes:
New rocWMMA for Matrix Multiplication and Accumulation Operations Acceleration
This release introduces a new ROCm C++ library for accelerating mixed precision matrix multiplication and accumulation (MFMA) operations leveraging specialized GPU matrix cores. rocWMMA provides a C++ API to facilitate breaking down matrix multiply accumulate problems into fragments and using them in block-wise operations that are distributed in parallel across GPU wavefronts. The API is a header library of GPU device code, meaning matrix core acceleration may be compiled directly into your kernel device code. This can benefit from compiler optimization in the generation of kernel assembly and does not incur additional overhead costs of linking to external runtime libraries or having to launch separate kernels.
rocWMMA is released as a header library and includes test and sample projects to validate and illustrate example usages of the C++ API. GEMM matrix multiplication is used as primary validation given the heavy precedent for the library. However, the usage portfolio is growing significantly and demonstrates different ways rocWMMA may be consumed.
For more information, refer to
[Communication Libraries](../../../../docs/reference/gpu_libraries/communication.md).
#### OpenMP Enhancements in This Release
##### OMPT Target Support
The OpenMP runtime in ROCm implements a subset of the OMPT device APIs, as described in the OpenMP specification document. These are APIs that allow first-party tools to examine the profile and traces for kernels that execute on a device. A tool may register callbacks for data transfer and kernel dispatch entry points. A tool may use APIs to start and stop tracing for device-related activities such as data transfer and kernel dispatch timings and associated metadata. If device tracing is enabled, trace records for device activities are collected during program execution and returned to the tool using the APIs described in the specification.
Following is an example demonstrating how a tool would use the OMPT target APIs supported. The README in /opt/rocm/llvm/examples/tools/ompt outlines the steps to follow, and you can run the provided example as indicated below:
```sh
cd /opt/rocm/llvm/examples/tools/ompt/veccopy-ompt-target-tracing
make run
```
The file `veccopy-ompt-target-tracing.c` simulates how a tool would initiate device activity tracing. The file `callbacks.h` shows the callbacks that may be registered and implemented by the tool.
### Deprecations and Warnings
#### Linux Filesystem Hierarchy Standard for ROCm
ROCm packages have adopted the Linux foundation filesystem hierarchy standard in this release to ensure ROCm components follow open source conventions for Linux-based distributions. While moving to a new filesystem hierarchy, ROCm ensures backward compatibility with its 5.1 version or older filesystem hierarchy. See below for a detailed explanation of the new filesystem hierarchy and backward compatibility.
##### New Filesystem Hierarchy
The following is the new filesystem hierarchy:
```text
/opt/rocm-<ver>
| --bin
| --All externally exposed Binaries
| --libexec
| --<component>
| -- Component specific private non-ISA executables (architecture independent)
| --include
| -- <component>
| --<header files>
| --lib
| --lib<soname>.so -> lib<soname>.so.major -> lib<soname>.so.major.minor.patch
(public libraries linked with application)
| --<component> (component specific private library, executable data)
| --<cmake>
| --components
| --<component>.config.cmake
| --share
| --html/<component>/*.html
| --info/<component>/*.[pdf, md, txt]
| --man
| --doc
| --<component>
| --<licenses>
| --<component>
| --<misc files> (arch independent non-executable)
| --samples
```
This enhancement enables ROCDebugger users to interact with the HIP source-level variables and function arguments.
> **Note**
>
> ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major release.
> The newly-suggested compiler -g option must be used instead of the previously-suggested `-ggdb` option. Although the effect of these two options is currently equivalent, this is not guaranteed for the future and might get changed by the upstream LLVM community.
For more information, refer to <https://refspecs.linuxfoundation.org/fhs.shtml>.
##### Machine Interface Lanes Support
##### Backward Compatibility with Older Filesystems
ROCDebugger Machine Interface (MI) extends support to lanes. The following enhancements are made:
ROCm has moved header files and libraries to its new location as indicated in the above structure and included symbolic-link and wrapper header files in its old location for backward compatibility.
- Added a new -lane-info command, listing the current thread's lanes.
- The -thread-select command now supports a lane switch to switch to a specific lane of a thread:
```sh
-thread-select -l LANE THREAD
```
- The =thread-selected notification gained a lane-id attribute. This enables the frontend to know which lane of the thread was selected.
- The *stopped asynchronous record gained lane-id and hit-lanes attributes. The former indicates which lane is selected, and the latter indicates which lanes explain the stop.
- MI commands now accept a global --lane option, similar to the global --thread and --frame options.
- MI varobjs are now lane-aware.
For more information, refer to the ROC Debugger User Guide at
{doc}`ROCgdb <rocgdb:index>`.
##### Enhanced - clone-inferior Command
The clone-inferior command now ensures that the TTY, CMD, ARGS, and AMDGPU PRECISE-MEMORY settings are copied from the original inferior to the new one. All modifications to the environment variables done using the 'set environment' or 'unset environment' commands are also copied to the new inferior.
#### MIOpen Support for RDNA GPUs
This release includes support for AMD Radeon™ Pro W6800, in addition to other bug fixes and performance improvements as listed below:
- MIOpen now supports RDNA GPUs!! (via MIOpen PRs 973, 780, 764, 740, 739, 677, 660, 653, 493, 498)
- Fixed a correctness issue with ImplicitGemm algorithm
- Updated the performance data for new kernel versions
- Improved MIOpen build time by splitting large kernel header files
- Fixed an issue in reduction kernels for padded tensors
- Various other bug fixes and performance improvements
For more information, see {doc}`Documentation <miopen:index>`.
#### Checkpoint Restore Support With CRIU
The new Checkpoint Restore in Userspace (CRIU) functionality is implemented to support AMD GPU and ROCm applications.
CRIU is a userspace tool to Checkpoint and Restore an application.
CRIU lacked the support for checkpoint restore applications that used device files such as a GPU. With this ROCm release, CRIU is enhanced with a new plugin to support AMD GPUs, which includes:
- Single and Multi GPU systems (Gfx9)
- Checkpoint / Restore on a different system
- Checkpoint / Restore inside a docker container
- PyTorch
- Tensorflow
- Using CRIU Image Streamer
For more information, refer to <https://github.com/checkpoint-restore/criu/tree/criu-dev/plugins/amdgpu>
> **Note**
>
> ROCm will continue supporting backward compatibility until the next major release.
> The CRIU plugin (amdgpu_plugin) is merged upstream with the CRIU repository. The KFD kernel patches are also available upstream with the amd-staging-drm-next branch (public) and the ROCm 5.1 release branch.
##### Wrapper header files
> **Note**
>
> This is a Beta release of the Checkpoint and Restore functionality, and some features are not available in this release.
Wrapper header files are placed in the old location (`/opt/rocm-xxx/<component>/include`) with a warning message to include files from the new location (`/opt/rocm-xxx/include`) as shown in the example below:
For more information, refer to the following websites:
```h
// Code snippet from hip_runtime.h
#pragma message “This file is deprecated. Use file from include path /opt/rocm-ver/include/ and prefix with hip”.
#include "hip/hip_runtime.h"
```
- <https://github.com/RadeonOpenCompute/criu/blob/amdgpu_plugin-03252022/Documentation/amdgpu_plugin.txt>
The wrapper header files backward compatibility deprecation is as follows:
- `#pragma` message announcing deprecation -- ROCm v5.2 release
- `#pragma` message changed to `#warning` -- Future release
- `#warning` changed to `#error` -- Future release
- Backward compatibility wrappers removed -- Future release
##### Library files
Library files are available in the `/opt/rocm-xxx/lib` folder. For backward compatibility, the old library location (`/opt/rocm-xxx/<component>/lib`) has a soft link to the library at the new location.
Example:
```log
$ ls -l /opt/rocm/hip/lib/
total 4
drwxr-xr-x 4 root root 4096 May 12 10:45 cmake
lrwxrwxrwx 1 root root 24 May 10 23:32 libamdhip64.so -> ../../lib/libamdhip64.so
```
##### CMake Config files
All CMake configuration files are available in the `/opt/rocm-xxx/lib/cmake/<component>` folder. For backward compatibility, the old CMake locations (`/opt/rocm-xxx/<component>/lib/cmake`) consist of a soft link to the new CMake config.
Example:
```log
$ ls -l /opt/rocm/hip/lib/cmake/hip/
total 0
lrwxrwxrwx 1 root root 42 May 10 23:32 hip-config.cmake -> ../../../../lib/cmake/hip/hip-config.cmake
```
#### Planned deprecation of hip-rocclr and hip-base packages
In the ROCm v5.2 release, hip-rocclr and hip-base packages (Debian and RPM) are planned for deprecation and will be removed in a future release. hip-runtime-amd and hip-dev(el) will replace these packages respectively. Users of hip-rocclr must install two packages, hip-runtime-amd and hip-dev, to get the same set of packages installed by hip-rocclr previously.
Currently, both package names hip-rocclr (or) hip-runtime-amd and hip-base (or) hip-dev(el) are supported.
Deprecation of Integrated HIP Directed Tests
The integrated HIP directed tests, which are currently built by default, are deprecated in this release. The default building and execution support through CMake will be removed in future release.
- <https://criu.org/Main_Page>
### Fixed Defects
| Fixed Defect | Fix |
|------------------------------------------------------------------------------|----------|
| ROCmInfo does not list gpus | Code fix |
| Hang observed while restoring cooperative group samples | Code fix |
| ROCM-SMI over SRIOV: Unsupported commands do not return proper error message | Code fix |
The following defects are fixed in this release.
#### Driver Fails To Load after Installation
The issue with the driver failing to load after ROCm installation is now fixed.
The driver installs successfully, and the server reboots with working rocminfo and clinfo.
#### ROCDebugger Fixed Defects
##### Breakpoints in GPU kernel code Before Kernel Is Loaded
Previously, setting a breakpoint in device code by line number before the device code was loaded into the program resulted in ROCgdb incorrectly moving the breakpoint to the first following line that contains host code.
Now, the breakpoint is left pending. When the GPU kernel gets loaded, the breakpoint resolves to a location in the kernel.
##### Registers Invalidated After Write
Previously, the stale just-written value was presented as a current value.
ROCgdb now invalidates the cached values of registers whose content might differ after being written. For example, registers with read-only bits.
ROCgdb also invalidates all volatile registers when a volatile register is written. For example, writing VCC invalidates the content of STATUS as STATUS.VCCZ may change.
##### Scheduler-locking and GPU Wavefronts
When scheduler-locking is in effect, new wavefronts created by a resumed thread, CPU, or GPU wavefront, are held in the halt state. For example, the "set scheduler-locking" command.
##### ROCDebugger Fails Before Completion of Kernel Execution
It was possible (although erroneous) for a debugger to load GPU code in memory, send it to the device, start executing a kernel on the device, and dispose of the original code before the kernel had finished execution. If a breakpoint was hit after this point, the debugger failed with an internal error while trying to access the debug information.
This issue is now fixed by ensuring that the debugger keeps a local copy of the original code and debug information.
### Known Issues
This section consists of known issues in this release.
#### Random Memory Access Fault Errors Observed While Running Math Libraries Unit Tests
#### Compiler Error on gfx1030 When Compiling at -O0
**Issue:** Random memory access fault issues are observed while running Math libraries unit tests. This issue is encountered in ROCm v5.0, ROCm v5.0.1, and ROCm v5.0.2.
##### Issue
Note, the faults only occur in the SRIOV environment.
A compiler error occurs when using -O0 flag to compile code for gfx1030 that calls atomicAddNoRet, which is defined in amd_hip_atomic.h. The compiler generates an illegal instruction for gfx1030.
**Workaround:** Use SDMA to update the page table. The Guest set up steps are as follows:
##### Workaround
The workaround is not to use the -O0 flag for this case. For higher optimization levels, the compiler does not generate an invalid instruction.
#### System Freeze Observed During CUDA Memtest Checkpoint
##### Issue
Checkpoint/Restore in Userspace (CRIU) requires 20 MB of VRAM approximately to checkpoint and restore. The CRIU process may freeze if the maximum amount of available VRAM is allocated to checkpoint applications.
##### Workaround
To use CRIU to checkpoint and restore your application, limit the amount of VRAM the application uses to ensure at least 20 MB is available.
#### HPC test fails with the “HSA_STATUS_ERROR_MEMORY_FAULT” error
##### Issue
The compiler may incorrectly compile a program that uses the `__shfl_sync(mask, value, srcLane)` function when the "value" parameter to the function is undefined along some path to the function. For most functions, uninitialized inputs cause undefined behavior, but the definition for `__shfl_sync` should allow for undefined values.
##### Workaround
The workaround is to initialize the parameters to `__shfl_sync`.
> **Note**
>
> When the `-Wall` compilation flag is used, the compiler generates a warning indicating the variable is initialized along some path.
Example:
```cpp
double res = 0.0; // Initialize the input to __shfl_sync.
if (lane == 0) {
res = <some expression>
}
res = __shfl_sync(mask, res, 0);
```sh
sudo modprobe amdgpu vm_update_mode=0
```
#### Kernel produces incorrect result
To verify, use
##### Issue
**Guest:**
In recent changes to Clang, insertion of the noundef attribute to all the function arguments has been enabled by default.
```sh
cat /sys/module/amdgpu/parameters/vm_update_mode 0
```
In the HIP kernel, variable var in shfl_sync may not be initialized, so LLVM IR treats it as undef.
Where expectation is 0.
So, the function argument that is potentially undef (because it is not intialized) has always been assumed to be noundef by LLVM IR (since Clang has inserted noundef attribute). This leads to ambiguous kernel execution.
#### CU Masking Causes Application to Freeze
##### Workaround
Using CU Masking results in an application freeze or runs exceptionally slowly. This issue is noticed only in the GFX10 suite of products. Note, this issue is observed only in GFX10 suite of products.
- Skip adding `noundef` attribute to functions tagged with convergent attribute. Refer to <https://reviews.llvm.org/D124158> for more information.
This issue is under active investigation at this time.
- Introduce shuffle attribute and add it to `__shfl` like APIs at hip headers. Clang can skip adding noundef attribute, if it finds that argument is tagged with shuffle attribute. Refer to <https://reviews.llvm.org/D125378> for more information.
#### Failed Checkpoint in Docker Containers
- Introduce clang builtin for `__shfl` to identify it and skip adding `noundef` attribute.
A defect with Ubuntu images kernel-5.13-30-generic and kernel-5.13-35-generic with Overlay FS results in incorrect reporting of the mount ID.
- Introduce `__builtin_freeze` to use on the relevant arguments in library wrappers. The library/header need to insert freezes on the relevant inputs.
This issue with Ubuntu causes CRIU checkpointing to fail in Docker containers.
#### Issue with Applications Triggering Oversubscription
As a workaround, use an older version of the kernel. For example, Ubuntu 5.11.0-46-generic.
There is a known issue with applications that trigger oversubscription. A hardware hang occurs when ROCgdb is used on AMD Instinct™ MI50 and MI100 systems.
#### Issue with Restoring Workloads Using Cooperative Groups Feature
Workloads that use the cooperative groups function to ensure all waves can be resident at the same time may fail to restore correctly.
This issue is under investigation and will be fixed in a future release.
#### Radeon Pro V620 and W6800 Workstation GPUs
##### No Support for ROCDebugger on SRIOV
ROCDebugger is not supported in the SRIOV environment on any GPU.
This is a known issue and will be fixed in a future release.
#### Random Error Messages in ROCm SMI for SR-IOV
Random error messages are generated by unsupported functions or commands.
This is a known issue and will be fixed in a future release.

View File

@@ -14,6 +14,13 @@ shutil.copy2('../RELEASE.md','./release.md')
# Keep capitalization due to similar linking on GitHub's markdown preview.
shutil.copy2('../CHANGELOG.md','./CHANGELOG.md')
# configurations for PDF output by Read the Docs
project = "ROCm Documentation"
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved."
version = "5.1.0"
release = "5.1.0"
setting_all_article_info = True
all_article_info_os = ["linux"]
all_article_info_author = ""
@@ -57,7 +64,7 @@ article_pages = [
external_toc_path = "./sphinx/_toc.yml"
docs_core = ROCmDocs("ROCm Documentation Home")
docs_core = ROCmDocs("ROCm 5.1.0 Documentation Home")
docs_core.setup()
external_projects_current_project = "rocm"

View File

@@ -18,8 +18,8 @@ following commands based on your distribution.
```shell
sudo apt update
wget https://repo.radeon.com/amdgpu-install/22.20/ubuntu/bionic/amdgpu-install_22.20.50200-1_all.deb
sudo apt install ./amdgpu-install_22.20.50200-1_all.deb
wget https://repo.radeon.com/amdgpu-install/22.10/ubuntu/bionic/amdgpu-install_22.10.50100-1_all.deb
sudo apt install ./amdgpu-install_22.10.50100-1_all.deb
```
:::
@@ -28,8 +28,8 @@ sudo apt install ./amdgpu-install_22.20.50200-1_all.deb
```shell
sudo apt update
wget https://repo.radeon.com/amdgpu-install/22.20/ubuntu/focal/amdgpu-install_22.20.50200-1_all.deb
sudo apt install ./amdgpu-install_22.20.50200-1_all.deb
wget https://repo.radeon.com/amdgpu-install/22.10/ubuntu/focal/amdgpu-install_22.10.50100-1_all.deb
sudo apt install ./amdgpu-install_22.10.50100-1_all.deb
```
:::
@@ -56,15 +56,6 @@ sudo yum install https://repo.radeon.com/amdgpu-install/22.20/rhel/7.9/amdgpu-in
sudo yum install https://repo.radeon.com/amdgpu-install/22.20/rhel/8.5/amdgpu-install-22.20.50200-1.el8.noarch.rpm
```
:::
:::{tab-item} RHEL 8.6
:sync: RHEL-8.6
:sync: RHEL-8
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/22.20/rhel/8.6/amdgpu-install-22.20.50200-1.el8.noarch.rpm
```
:::
::::
:::::
@@ -72,19 +63,11 @@ sudo yum install https://repo.radeon.com/amdgpu-install/22.20/rhel/8.6/amdgpu-in
:sync: SLES15
::::{tab-set}
:::{tab-item} Service Pack 4
:sync: SLES15-SP4
```shell
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/22.20/sle/15.4/amdgpu-install-22.20.50200-1.noarch.rpm
```
:::
:::{tab-item} Service Pack 3
:sync: SLES15-SP3
```shell
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/22.20/sle/15.3/amdgpu-install-22.20.50200-1.noarch.rpm
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/22.10/sle/15/amdgpu-install-22.10.50100-1.noarch.rpm
```
:::
@@ -163,9 +146,9 @@ the installer script will install packages in the single-version layout.
For the multi-version ROCm installation you must use the installer script from
the latest release of ROCm that you wish to install.
**Example:** If you want to install ROCm releases 5.1.3 and 5.2
**Example:** If you want to install ROCm releases 5.0.2 and 5.1
simultaneously, you are required to download the installer from the latest ROCm
release v5.2.
release v5.1.
### Add Required Repositories
@@ -184,7 +167,7 @@ Run the following commands based on your distribution to add the repositories:
:sync: ubuntu-18.04
```shell
for ver in 5.1.3; do
for ver in 5.0.2; do
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver bionic main" | sudo tee /etc/apt/sources.list.d/rocm.list
done
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
@@ -196,7 +179,7 @@ sudo apt update
:sync: ubuntu-20.04
```shell
for ver in 5.1.3; do
for ver in 5.0.2; do
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" | sudo tee /etc/apt/sources.list.d/rocm.list
done
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
@@ -214,7 +197,7 @@ sudo apt update
:sync: RHEL-7
```shell
for ver in 5.1.3; do
for ver in 5.0.2; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -233,7 +216,7 @@ sudo yum clean all
:sync: RHEL-8
```shell
for ver in 5.1.3; do
for ver in 5.0.2; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -258,27 +241,10 @@ sudo yum clean all
:sync: SLES15-SP3
```shell
for ver in 5.1.3; do
for ver in 5.0.2; do
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
name=rocm
baseurl=https://repo.radeon.com/rocm/$ver/sle/15.3/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
done
sudo zypper ref
```
:::
:::{tab-item} Service Pack 4
:sync: SLES15-SP4
```shell
for ver in 5.1.3; do
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
name=rocm
baseurl=https://repo.radeon.com/rocm/$ver/sle/15.4/main/x86_64
baseurl=https://repo.radeon.com/rocm/$ver/sle/15/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
@@ -308,8 +274,8 @@ driver, associated with the ROCm release v5.3, will be installed as its latest
release in the list.
```none
sudo amdgpu-install --usecase=rocm --rocmrelease=5.1.3
sudo amdgpu-install --usecase=rocm --rocmrelease=5.2.0
sudo amdgpu-install --usecase=rocm --rocmrelease=5.0.2
sudo amdgpu-install --usecase=rocm --rocmrelease=5.1.0
```
## Additional options

View File

@@ -62,9 +62,20 @@ sudo apt update
:::{tab-item} Ubuntu 20.04
:sync: ubuntu-20.04
```shell
# amdgpu repository for bionic
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/22.10/ubuntu bionic main' \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
:::
:::{tab-item} Ubuntu 20.04
:sync: ubuntu-20.04
```shell
# amdgpu repository for focal
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/22.20/ubuntu focal main' \
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/22.10/ubuntu focal main' \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
@@ -91,7 +102,7 @@ To add the ROCm repository, use the following steps:
```shell
# ROCm repositories for bionic
for ver in 5.1.3 5.2; do
for ver in 5.0.2 5.1; do
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver bionic main" \
| sudo tee --append /etc/apt/sources.list.d/rocm.list
done
@@ -106,7 +117,7 @@ sudo apt update
```shell
# ROCm repositories for focal
for ver in 5.1.3 5.2; do
for ver in 5.0.2 5.1; do
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" \
| sudo tee --append /etc/apt/sources.list.d/rocm.list
done
@@ -136,7 +147,7 @@ For a comprehensive list of meta-packages, refer to
- Sample Multi-version installation
```shell
sudo apt install rocm-hip-sdk5.2.0 rocm-hip-sdk5.1.3
sudo apt install rocm-hip-sdk5.1.0 rocm-hip-sdk5.0.2
```
:::::
@@ -160,7 +171,7 @@ section.
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/22.20/rhel/7.9/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/22.10/rhel/7.9/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -179,7 +190,7 @@ sudo yum clean all
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/22.20/rhel/8.5/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/22.10/rhel/8.5/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -189,26 +200,6 @@ sudo yum clean all
```
:::
:::{tab-item} RHEL 8.6
:sync: RHEL-8.6
:sync: RHEL-8
```shell
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/22.20/rhel/8.6/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
::::
Install the kernel mode driver and reboot the system using the following
@@ -229,7 +220,7 @@ To add the ROCm repository, use the following steps, based on your distribution:
:sync: RHEL-7
```shell
for ver in 5.2.1 5.2; do
for ver in 5.0.2 5.1; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -248,7 +239,7 @@ sudo yum clean all
:sync: RHEL-8
```shell
for ver in 5.1.3 5.2; do
for ver in 5.0.2 5.1; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -283,7 +274,7 @@ For a comprehensive list of meta-packages, refer to
- Sample Multi-version installation
```shell
sudo yum install rocm-hip-sdk5.2.0 rocm-hip-sdk5.1.3
sudo yum install rocm-hip-sdk5.1.0 rocm-hip-sdk5.0.2
```
:::::
@@ -306,23 +297,7 @@ section.
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/22.20/sle/15.3/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo zypper ref
```
:::
:::{tab-item} Service Pack 4
:sync: SLES15-SP4
```shell
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/22.20/sle/15.4/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/22.10/sle/15.3/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
@@ -347,7 +322,7 @@ sudo reboot
To add the ROCm repository, use the following steps:
```shell
for ver in 5.1.3 5.2; do
for ver in 5.0.2 5.1; do
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -379,7 +354,7 @@ For a comprehensive list of meta-packages, refer to
- Sample Multi-version installation
```shell
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk5.2.0 rocm-hip-sdk5.1.3
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk5.1.0 rocm-hip-sdk5.0.2
```
:::::
@@ -416,7 +391,7 @@ but are generally useful. Verification of the install is advised.
2. Add binary paths to the `PATH` environment variable.
```shell
export PATH=$PATH:/opt/rocm-5.2.0/bin:/opt/rocm-5.2.0/opencl/bin
export PATH=$PATH:/opt/rocm-5.1.0/bin:/opt/rocm-5.1.0/opencl/bin
```
```{attention}

View File

@@ -26,7 +26,7 @@ repository to the new release.
```shell
# amdgpu repository for bionic
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/22.20.3/ubuntu bionic main' \
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/22.10/ubuntu bionic main' \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
@@ -37,7 +37,7 @@ sudo apt update
```shell
# amdgpu repository for focal
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/22.20/ubuntu focal main' \
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/22.10/ubuntu focal main' \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
@@ -57,7 +57,7 @@ sudo apt update
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/22.20/rhel/7.9/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/22.10/rhel/7.9/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -75,7 +75,7 @@ sudo yum clean all
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/22.20/rhel/8.5/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/22.10/rhel/8.5/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -85,24 +85,8 @@ sudo yum clean all
```
:::
:::{tab-item} RHEL 8.6
:sync: RHEL-8.6
:sync: RHEL-8
```shell
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/22.20/rhel/8.6/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
::::
:::::
:::::{tab-item} SUSE Linux Enterprise Server 15
:sync: SLES15
@@ -115,23 +99,7 @@ sudo yum clean all
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/22.20/sle/15.3/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo zypper ref
```
:::
:::{tab-item} Service Pack 4
:sync: SLES15-SP4
```shell
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/22.20/sle/15.4/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/22.10/sle/15.3/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
@@ -194,7 +162,7 @@ repository to the new release.
:sync: ubuntu-18.04
```shell
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.2.3 bionic main" \
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.1 bionic main" \
| sudo tee /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
@@ -206,7 +174,7 @@ sudo apt update
:sync: ubuntu-20.04
```shell
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.2.3 focal main" \
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.1 focal main" \
| sudo tee /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
@@ -225,9 +193,9 @@ sudo apt update
```shell
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-5.2]
name=ROCm5.2
baseurl=https://repo.radeon.com/rocm/yum/5.2/main
[ROCm-5.1]
name=ROCm5.1
baseurl=https://repo.radeon.com/rocm/yum/5.1/main
enabled=1
priority=50
gpgcheck=1
@@ -242,9 +210,9 @@ sudo yum clean all
```shell
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-5.2]
name=ROCm5.2
baseurl=https://repo.radeon.com/rocm/rhel8/5.2/main
[ROCm-5.1]
name=ROCm5.1
baseurl=https://repo.radeon.com/rocm/rhel8/5.1/main
enabled=1
priority=50
gpgcheck=1
@@ -261,10 +229,10 @@ sudo yum clean all
```shell
sudo tee /etc/zypp/repos.d/rocm.repo <<EOF
[ROCm-5.2]
name=ROCm5.2
[ROCm-5.1]
name=ROCm5.1
name=rocm
baseurl=https://repo.radeon.com/rocm/zyp/5.2/main
baseurl=https://repo.radeon.com/rocm/zyp/5.1/main
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key

View File

@@ -10,17 +10,17 @@ AMD's library for high performance machine learning primitives.
:::
:::{grid-item-card} {doc}`Composable Kernel <composable-kernel:index>`
:::{grid-item-card} {doc}`Composable Kernel <composable_kernel:index>`
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
- {doc}`Documentation <composable-kernel:index>`
- {doc}`Documentation <composable_kernel:index>`
:::
:::{grid-item-card} {doc}`MIGraphX <migraphx:index>`
:::{grid-item-card} {doc}`MIGraphX <amdmigraphx:index>`
AMD MIGraphX is AMD's graph inference engine that accelerates machine learning model inference.
- {doc}`Documentation <migraphx:index>`
- {doc}`Documentation <amdmigraphx:index>`
:::

View File

@@ -42,8 +42,8 @@ Inter and intra-node communication is supported by the following projects:
Libraries related to AI.
- {doc}`MIOpen <miopen:index>`
- {doc}`Composable Kernel <composable-kernel:index>`
- {doc}`MIGraphX <migraphx:index>`
- {doc}`Composable Kernel <composable_kernel:index>`
- {doc}`MIGraphX <amdmigraphx:index>`
:::
@@ -80,7 +80,7 @@ Computer vision related projects.
:::{grid-item-card} [Validation Tools](validation_tools)
- {doc}`ROCm Validation Suite <rocm-validation-suite:index>`
- {doc}`ROCm Validation Suite <rocmvalidationsuite:index>`
- {doc}`TransferBench <transferbench:index>`
:::

View File

@@ -3,10 +3,10 @@
:::::{grid} 1 1 2 2
:gutter: 1
:::{grid-item-card} {doc}`RVS <rocm-validation-suite:index>`
:::{grid-item-card} {doc}`RVS <rocmvalidationsuite:index>`
The ROCm Validation Suite is a system administrators and cluster manager's tool for detecting and troubleshooting common problems affecting AMD GPU(s) running in a high-performance computing environment, enabled using the ROCm software stack on a compatible platform.
- {doc}`Documentation <rocm-validation-suite:index>`
- {doc}`Documentation <rocmvalidationsuite:index>`
:::

View File

@@ -8,13 +8,12 @@ AMD ROCm™ Platform supports the following Linux distributions.
| Distribution |Processor Architectures| Validated Kernel |
|--------------------|-----------------------|--------------------|
| CentOS 8.4 | x86-64 | 4.18 |
| CentOS 7.9 | x86-64 | 3.10 |
| RHEL 8.6 to 8.5 | x86-64 | 4.18 |
| RHEL 8.5 | x86-64 | 4.18 |
| RHEL 7.9 | x86-64 | 3.10 |
| SLES 15 SP4 | x86-64 | 5.14.21 |
| SLES 15 SP3 | x86-64 | 5.3.18 |
| Ubuntu 20.04.4 LTS | x86-64 | 5.13 |
| Ubuntu 20.04.3 LTS | x86-64 | 5.11 |
| Ubuntu 18.04.5 LTS | x86-64 | 5.4.0 |
## Virtualization Support

View File

@@ -121,4 +121,4 @@ following location: `/opt/rocm/share/doc/<component-name>/`
For example, you can fetch the licensing information of the `_amd_comgr_`
component (Code Object Manager) from the `amd_comgr` folder. A file named
`LICENSE.txt` contains the license details at:
`/opt/rocm-5.2.0/share/doc/amd_comgr/LICENSE.txt`
`/opt/rocm-5.1.0/share/doc/amd_comgr/LICENSE.txt`

View File

@@ -146,9 +146,9 @@ subtrees:
- title: MIOpen - Machine Intelligence
url: ${project:miopen}
- title: Composable Kernel
url: ${project:composable-kernel}
url: ${project:composable_kernel}
- title: MIGraphX - Graph Optimization
url: ${project:migraphx}
url: ${project:amdmigraphx}
- file: reference/computer_vision
subtrees:
- entries:
@@ -171,7 +171,7 @@ subtrees:
title: Validation Tools
subtrees:
- entries:
- url: ${project:rocm-validation-suite}
- url: ${project:rocmvalidationsuite}
title: RVS
- url: ${project:transferbench}
title: TransferBench

View File

@@ -1 +1,2 @@
rocm-docs-core==0.16.0
rocm-docs-core==1.8.0
sphinx-reredirects

View File

@@ -1,114 +1,106 @@
#
# This file is autogenerated by pip-compile with Python 3.11
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
# pip-compile docs/sphinx/requirements.in
# pip-compile requirements.in
#
accessible-pygments==0.0.3
accessible-pygments==0.0.5
# via pydata-sphinx-theme
alabaster==0.7.13
alabaster==1.0.0
# via sphinx
babel==2.11.0
babel==2.16.0
# via
# pydata-sphinx-theme
# sphinx
beautifulsoup4==4.11.2
beautifulsoup4==4.12.3
# via pydata-sphinx-theme
breathe==4.34.0
breathe==4.35.0
# via rocm-docs-core
certifi==2022.12.7
certifi==2024.8.30
# via requests
cffi==1.15.1
cffi==1.17.1
# via
# cryptography
# pynacl
charset-normalizer==2.1.1
charset-normalizer==3.3.2
# via requests
click==8.1.3
click==8.1.7
# via sphinx-external-toc
colorama==0.4.6
# via
# click
# sphinx
cryptography==40.0.2
cryptography==43.0.1
# via pyjwt
deprecated==1.2.13
deprecated==1.2.14
# via pygithub
docutils==0.19
docutils==0.21.2
# via
# breathe
# myst-parser
# pydata-sphinx-theme
# sphinx
fastjsonschema==2.16.3
fastjsonschema==2.20.0
# via rocm-docs-core
gitdb==4.0.10
gitdb==4.0.11
# via gitpython
gitpython==3.1.30
gitpython==3.1.43
# via rocm-docs-core
idna==3.4
idna==3.10
# via requests
imagesize==1.4.1
# via sphinx
jinja2==3.1.2
jinja2==3.1.4
# via
# myst-parser
# sphinx
linkify-it-py==1.0.3
# via myst-parser
markdown-it-py==2.2.0
markdown-it-py==3.0.0
# via
# mdit-py-plugins
# myst-parser
markupsafe==2.1.2
markupsafe==2.1.5
# via jinja2
mdit-py-plugins==0.3.4
mdit-py-plugins==0.4.2
# via myst-parser
mdurl==0.1.2
# via markdown-it-py
myst-parser[linkify]==1.0.0
myst-parser==4.0.0
# via rocm-docs-core
packaging==23.0
packaging==24.1
# via
# pydata-sphinx-theme
# sphinx
pycparser==2.21
pycparser==2.22
# via cffi
pydata-sphinx-theme==0.13.3
pydata-sphinx-theme==0.15.4
# via
# rocm-docs-core
# sphinx-book-theme
pygithub==1.58.1
pygithub==2.4.0
# via rocm-docs-core
pygments==2.14.0
pygments==2.18.0
# via
# accessible-pygments
# pydata-sphinx-theme
# sphinx
pyjwt[crypto]==2.6.0
pyjwt[crypto]==2.9.0
# via pygithub
pynacl==1.5.0
# via pygithub
pytz==2022.7.1
# via babel
pyyaml==6.0
pyyaml==6.0.2
# via
# myst-parser
# rocm-docs-core
# sphinx-external-toc
requests==2.28.1
requests==2.32.3
# via
# pygithub
# sphinx
rocm-docs-core==0.16.0
# via -r docs/sphinx/requirements.in
smmap==5.0.0
rocm-docs-core==1.8.0
# via -r requirements.in
smmap==5.0.1
# via gitdb
snowballstemmer==2.2.0
# via sphinx
soupsieve==2.4
soupsieve==2.6
# via beautifulsoup4
sphinx==5.3.0
sphinx==8.0.2
# via
# breathe
# myst-parser
@@ -119,33 +111,40 @@ sphinx==5.3.0
# sphinx-design
# sphinx-external-toc
# sphinx-notfound-page
sphinx-book-theme==1.0.1
# sphinx-reredirects
sphinx-book-theme==1.1.3
# via rocm-docs-core
sphinx-copybutton==0.5.1
sphinx-copybutton==0.5.2
# via rocm-docs-core
sphinx-design==0.4.1
sphinx-design==0.6.1
# via rocm-docs-core
sphinx-external-toc==0.3.1
sphinx-external-toc==1.0.1
# via rocm-docs-core
sphinx-notfound-page==0.8.3
sphinx-notfound-page==1.0.4
# via rocm-docs-core
sphinxcontrib-applehelp==1.0.4
sphinx-reredirects==0.1.5
# via -r requirements.in
sphinxcontrib-applehelp==2.0.0
# via sphinx
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-devhelp==2.0.0
# via sphinx
sphinxcontrib-htmlhelp==2.0.1
sphinxcontrib-htmlhelp==2.1.0
# via sphinx
sphinxcontrib-jsmath==1.0.1
# via sphinx
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-qthelp==2.0.0
# via sphinx
sphinxcontrib-serializinghtml==1.1.5
sphinxcontrib-serializinghtml==2.0.0
# via sphinx
typing-extensions==4.5.0
# via pydata-sphinx-theme
uc-micro-py==1.0.1
# via linkify-it-py
urllib3==1.26.13
# via requests
wrapt==1.14.1
tomli==2.0.1
# via sphinx
typing-extensions==4.12.2
# via
# pydata-sphinx-theme
# pygithub
urllib3==2.2.3
# via
# pygithub
# requests
wrapt==1.16.0
# via deprecated