Documentation Redesign (#1883)

This commit is contained in:
Saad Rahim
2023-03-09 12:02:54 -07:00
committed by GitHub
parent a2790438b5
commit 67cd4c3789
121 changed files with 4061 additions and 269 deletions

3
.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
.vscode
build
_build

14
.readthedocs.yaml Normal file
View File

@@ -0,0 +1,14 @@
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
version: 2
sphinx:
configuration: docs/sphinx/conf.py
formats: all
python:
version: "3.8"
install:
- requirements: docs/sphinx/requirements.txt

View File

@@ -1,124 +1,155 @@
Changelog
------------------
# AMD ROCm™ Releases
# Changelog
## AMD ROCm™ V5.2 Release
--------------------------------------------------------------------------------
AMD ROCm v5.2 is now released. The release documentation is available at https://docs.amd.com.
## AMD ROCm™ Releases
## AMD ROCm™ V5.1.3 Release
### AMD ROCm™ V5.2 Release
AMD ROCm v5.1.3 is now released. The release documentation is available at https://docs.amd.com.
AMD ROCm v5.2 is now released. The release documentation is available at
<https://docs.amd.com>.
## AMD ROCm™ V5.1.1 Release
### AMD ROCm™ V5.1.3 Release
AMD ROCm v5.1.1 is now released. The release documentation is available at https://docs.amd.com.
AMD ROCm v5.1.3 is now released. The release documentation is available at
<https://docs.amd.com>.
## AMD ROCm™ V5.1 Release
### AMD ROCm™ V5.1.1 Release
AMD ROCm v5.1 is now released. The release documentation is available at https://docs.amd.com.
AMD ROCm v5.1.1 is now released. The release documentation is available at
<https://docs.amd.com>.
### AMD ROCm™ V5.1 Release
## AMD ROCm v5.0.2 Release Notes
AMD ROCm v5.1 is now released. The release documentation is available at
<https://docs.amd.com>.
### Fixed Defects in This Release
### AMD ROCm™ v5.0.2 Release Notes
#### Fixed Defects in This Release
The following defects are fixed in the ROCm v5.0.2 release.
#### Issue with hostcall Facility in HIP Runtime
##### Issue with hostcall Facility in HIP Runtime
In ROCm v5.0, when using the assert() call in a HIP kernel, the compiler may sometimes fail to emit kernel metadata related to the hostcall facility, which results in incomplete initialization of the hostcall facility in the HIP runtime. This can cause the HIP kernel to crash when it attempts to execute the “assert()” call.
The root cause was an incorrect check in the compiler to determine whether the hostcall facility is required by the kernel. This is fixed in the ROCm v5.0.2 release.
The resolution includes a compiler change, which emits the required metadata by default, unless the compiler can prove that the hostcall facility is not required by the kernel. This ensures that the “assert()” call never fails.
In ROCm v5.0, when using the `assert()` call in a HIP kernel, the compiler may
sometimes fail to emit kernel metadata related to the hostcall facility, which
results in incomplete initialization of the hostcall facility in the HIP
runtime. This can cause the HIP kernel to crash when it attempts to execute the
`assert()` call. The root cause was an incorrect check in the compiler to
determine whether the hostcall facility is required by the kernel. This is fixed
in the ROCm v5.0.2 release. The resolution includes a compiler change, which
emits the required metadata by default, unless the compiler can prove that the
hostcall facility is not required by the kernel. This ensures that the
`assert()` call never fails.
**Note**: This fix may lead to breakage in some OpenMP offload use cases, which use print inside a target region and result in an abort in device code. The issue will be fixed in a future release.
**Note**: This fix may lead to breakage in some OpenMP offload use cases, which
use print inside a target region and result in an abort in device code.
The issue will be fixed in a future release.
#### Compatibility Matrix Updates to ROCm Deep Learning Guide
##### Compatibility Matrix Updates to ROCm Deep Learning Guide
The compatibility matrix in the AMD Deep Learning Guide is updated for ROCm v5.0.2.
The compatibility matrix in the AMD Deep Learning Guide is updated for ROCm
v5.0.2.
For more information and documentation updates, refer to https://docs.amd.com.
For more information and documentation updates, refer to <https://docs.amd.com>.
### AMD ROCm™ v5.0.1 Release Notes
#### Deprecations and Warnings
## AMD ROCm™ v5.0.1 Release Notes
##### Refactor of HIPCC/HIPCONFIG
### Deprecations and Warnings
In prior ROCm releases, by default, the `hipcc`/`hipconfig` Perl scripts were
used to identify and set target compiler options, target platform, compiler, and
runtime appropriately.
#### Refactor of HIPCC/HIPCONFIG
In ROCm v5.0.1, `hipcc.bin` and `hipconfig.bin` have been added as the compiled
binary implementations of the `hipcc` and `hipconfig`. These new binaries are
currently a work-in-progress, considered, and marked as experimental. ROCm plans
to fully transition to `hipcc.bin` and `hipconfig.bin` in the a future ROCm
release. The existing `hipcc` and `hipconfig` Perl scripts are renamed to
`hipcc.pl` and `hipconfig.pl` respectively. New top-level `hipcc` and
`hipconfig` Perl scripts are created, which can switch between the Perl script
or the compiled binary based on the environment variable
`HIPCC_USE_PERL_SCRIPT`.
In prior ROCm releases, by default, the hipcc/hipconfig Perl scripts were used to identify and set target compiler options, target platform, compiler, and runtime appropriately.
In ROCm v5.0.1, hipcc.bin and hipconfig.bin have been added as the compiled binary implementations of the hipcc and hipconfig. These new binaries are currently a work-in-progress, considered, and marked as experimental. ROCm plans to fully transition to hipcc.bin and hipconfig.bin in the a future ROCm release. The existing hipcc and hipconfig Perl scripts are renamed to hipcc.pl and hipconfig.pl respectively. New top-level hipcc and hipconfig Perl scripts are created, which can switch between the Perl script or the compiled binary based on the environment variable HIPCC_USE_PERL_SCRIPT.
In ROCm 5.0.1, by default, this environment variable is set to use hipcc and hipconfig through the Perl scripts.
Subsequently, Perl scripts will no longer be available in ROCm in a future release.
In ROCm 5.0.1, by default, this environment variable is set to use `hipcc` and
`hipconfig` through the Perl scripts. Subsequently, Perl scripts will no longer
be available in ROCm in a future release.
#### ROCM DOCUMENTATION UPDATES FOR ROCM 5.0.1
### ROCM DOCUMENTATION UPDATES FOR ROCM 5.0.1
- ROCm Downloads Guide
* ROCm Downloads Guide
- ROCm Installation Guide
* ROCm Installation Guide
- ROCm Release Notes
* ROCm Release Notes
For more information, see <https://docs.amd.com>.
For more information, see https://docs.amd.com.
### AMD ROCm™ v5.0 Release Notes
## ROCm Installation Updates
## AMD ROCm™ v5.0 Release Notes
# ROCm Installation Updates
This document describes the features, fixed issues, and information about downloading and installing the AMD ROCm™ software.
This document describes the features, fixed issues, and information about
downloading and installing the AMD ROCm™ software.
It also covers known issues and deprecations in this release.
## Notice for Open-source and Closed-source ROCm Repositories in Future Releases
To make a distinction between open-source and closed-source components, all ROCm repositories will consist of sub-folders in future releases.
To make a distinction between open-source and closed-source components, all ROCm
repositories will consist of sub-folders in future releases.
- All open-source components will be placed in the `base-url/<rocm-ver>/main` sub-folder
- All closed-source components will reside in the `base-url/<rocm-ver>/proprietary` sub-folder
- All open-source components will be placed in the `base-url/<rocm-ver>/main`
sub-folder
- All closed-source components will reside in the
`base-url/<rocm-ver>/proprietary` sub-folder
## List of Supported Operating Systems
### List of Supported Operating Systems
The AMD ROCm platform supports the following operating systems:
| **OS-Version (64-bit)** | **Kernel Versions** |
| --- | --- |
| CentOS 8.3 | 4.18.0-193.el8 |
| CentOS 7.9 | 3.10.0-1127 |
| RHEL 8.5 | 4.18.0-348.7.1.el8\_5.x86\_64 |
| RHEL 8.4 | 4.18.0-305.el8.x86\_64 |
| RHEL 7.9 | 3.10.0-1160.6.1.el7 |
| SLES 15 SP3 | 5.3.18-59.16-default |
| Ubuntu 20.04.3 | 5.8.0 LTS / 5.11 HWE |
| Ubuntu 18.04.5 [5.4 HWE kernel] | 5.4.0-71-generic |
| **OS-Version (64-bit)** | **Kernel Versions** |
|:-------------------------------:|:-----------------------------:|
| CentOS 8.3 | `4.18.0-193.el8` |
| CentOS 7.9 | `3.10.0-1127` |
| RHEL 8.5 | `4.18.0-348.7.1.el8_5.x86_64` |
| RHEL 8.4 | `4.18.0-305.el8.x86_64` |
| RHEL 7.9 | `3.10.0-1160.6.1.el7` |
| SLES 15 SP3 | `5.3.18-59.16-default` |
| Ubuntu 20.04.3 | `5.8.0 LTS / 5.11 HWE` |
| Ubuntu 18.04.5 [5.4 HWE kernel] | `5.4.0-71-generic` |
### Support for RHEL v8.5
#### Support for RHEL v8.5
This release extends support for RHEL v8.5.
### Supported GPUs
#### Supported GPUs
#### Radeon Pro V620 and W6800 Workstation GPUs
##### Radeon Pro V620 and W6800 Workstation GPUs
This release extends ROCm support for Radeon Pro V620 and W6800 Workstation GPUs.
This release extends ROCm support for Radeon Pro V620 and W6800 Workstation
GPUs.
- SRIOV virtualization support for Radeon Pro V620
- KVM Hypervisor (1VF support only) on Ubuntu Host OS with Ubuntu, CentOs, and RHEL Guest
- Support for ROCm-SMI in an SRIOV environment. For more details, refer to the ROCm SMI API documentation.
- KVM Hypervisor (1VF support only) on Ubuntu Host OS with Ubuntu, CentOs, and
RHEL Guest
- Support for ROCm-SMI in an SRIOV environment. For more details, refer to the
ROCm SMI API documentation.
**Note:** Radeon Pro v620 is not supported on SLES.
## ROCm Installation Updates for ROCm v5.0
### ROCm Installation Updates for ROCm v5.0
This release has the following ROCm installation enhancements.
### Support for Kernel Mode Driver
#### Support for Kernel Mode Driver
In this release, users can install the kernel-mode driver using the Installer method. Some of the ROCm-specific use cases that the installer currently supports are:
In this release, users can install the kernel-mode driver using the Installer
method. Some of the ROCm-specific use cases that the installer currently
supports are:
- OpenCL (ROCr/KFD based) runtime
- HIP runtimes
@@ -127,56 +158,63 @@ In this release, users can install the kernel-mode driver using the Installer me
- ROCr runtime and thunk
- Kernel-mode driver
### Support for Multi-version ROCm Installation and Uninstallation
#### Support for Multi-version ROCm Installation and Uninstallation
Users now can install multiple ROCm releases simultaneously on a system using the newly introduced installer script and package manager install mechanism.
Users now can install multiple ROCm releases simultaneously on a system using
the newly introduced installer script and package manager install mechanism.
Users can also uninstall multi-version ROCm releases using the `amdgpu-uninstall` script and package manager.
Users can also uninstall multi-version ROCm releases using the
`amdgpu-uninstall` script and package manager.
### Support for Updating Information on Local Repositories
#### Support for Updating Information on Local Repositories
In this release, the `amdgpu-install` script automates the process of updating local repository information before proceeding to ROCm installation.
In this release, the `amdgpu-install` script automates the process of updating
local repository information before proceeding to ROCm installation.
### Support for Release Upgrades
#### Support for Release Upgrades
Users can now upgrade the existing ROCm installation to specific or latest ROCm releases.
Users can now upgrade the existing ROCm installation to specific or latest ROCm
releases.
For more details, refer to the AMD ROCm Installation Guide v5.0.
# AMD ROCm V5.0 Documentation Updates
## AMD ROCm V5.0 Documentation Updates
## New AMD ROCm Information Portal ROCm v4.5 and Above
### New AMD ROCm Information Portal ROCm v4.5 and Above
Beginning ROCm release v5.0, AMD ROCm documentation has a new portal at https://docs.amd.com. This portal consists of ROCm documentation v4.5 and above.
Beginning ROCm release v5.0, AMD ROCm documentation has a new portal at
<https://docs.amd.com>. This portal consists of ROCm documentation v4.5 and
above.
For documentation prior to ROCm v4.5, you may continue to access https://rocmdocs.amd.com.
For documentation prior to ROCm v4.5, you may continue to access
<https://rocmdocs.amd.com>.
## Documentation Updates for ROCm 5.0
### Documentation Updates for ROCm 5.0
### Deployment Tools
#### Deployment Tools
#### ROCm Data Center Tool Documentation Updates
##### ROCm Data Center Tool Documentation Updates
- ROCm Data Center Tool User Guide
- ROCm Data Center Tool API Guide
#### ROCm System Management Interface Updates
##### ROCm System Management Interface Updates
- System Management Interface Guide
- System Management Interface API Guide
#### ROCm Command Line Interface Updates
##### ROCm Command Line Interface Updates
- Command Line Interface Guide
### Machine Learning/AI Documentation Updates
#### Machine Learning/AI Documentation Updates
- Deep Learning Guide
- MIGraphX API Guide
- MIOpen API Guide
- MIVisionX API Guide
### ROCm Libraries Documentation Updates
#### ROCm Libraries Documentation Updates
- hipSOLVER User Guide
- RCCL User Guide
@@ -188,87 +226,90 @@ For documentation prior to ROCm v4.5, you may continue to access https://rocmdoc
- rocSPARSE User Guide
- rocThrust User Guide
### Compilers and Tools
#### Compilers and Tools
#### ROCDebugger Documentation Updates
##### ROCDebugger Documentation Updates
- ROCDebugger User Guide
- ROCDebugger API Guide
#### ROCTracer
##### ROCTracer
- ROCTracer User Guide
- ROCTracer API Guide
#### Compilers
##### Compilers
- AMD Instinct High Performance Computing and Tuning Guide
- AMD Compiler Reference Guide
#### HIPify Documentation
##### HIPify Documentation
- HIPify User Guide
- HIP Supported CUDA API Reference Guide
#### ROCm Debug Agent
##### ROCm Debug Agent
- ROCm Debug Agent Guide
- System Level Debug Guide
- ROCm Validation Suite
### Programming Models Documentation
#### Programming Models Documentation
#### HIP Documentation
##### HIP Documentation
- HIP Programming Guide
- HIP API Guide
- HIP FAQ Guide
#### OpenMP Documentation
##### OpenMP Documentation
- OpenMP Support Guide
### ROCm Glossary
#### ROCm Glossary
- ROCm Glossary Terms and Definitions
## AMD ROCm Legacy Documentation Links ROCm v4.3 and Prior
### AMD ROCm Legacy Documentation Links ROCm v4.3 and Prior
- For AMD ROCm documentation, see
https://rocmdocs.amd.com/en/latest/
- For AMD ROCm documentation, see <https://rocmdocs.amd.com/en/latest/>
- For installation instructions on supported platforms, see
https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html
<https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html>
- For AMD ROCm binary structure, see
https://rocmdocs.amd.com/en/latest/Installation_Guide/Software-Stack-for-AMD-GPU.html
<https://rocmdocs.amd.com/en/latest/Installation_Guide/Software-Stack-for-AMD-GPU.html>
- For AMD ROCm release history, see
<https://rocmdocs.amd.com/en/latest/Current_Release_Notes/ROCm-Version-History.html>
https://rocmdocs.amd.com/en/latest/Current_Release_Notes/ROCm-Version-History.html
## What's New in This Release
# What's New in This Release
## HIP Enhancements
### HIP Enhancements
The ROCm v5.0 release consists of the following HIP enhancements.
### HIP Installation Guide Updates
#### HIP Installation Guide Updates
The HIP Installation Guide is updated to include building HIP from source on the NVIDIA platform.
The HIP Installation Guide is updated to include building HIP from source on the
NVIDIA platform.
Refer to the HIP Installation Guide v5.0 for more details.
### Managed Memory Allocation
#### Managed Memory Allocation
Managed memory, including the `__managed__` keyword, is now supported in the HIP combined host/device compilation. Through unified memory allocation, managed memory allows data to be shared and accessible to both the CPU and GPU using a single pointer. The allocation is managed by the AMD GPU driver using the Linux Heterogeneous Memory Management (HMM) mechanism. The user can call managed memory API hipMallocManaged to allocate a large chunk of HMM memory, execute kernels on a device, and fetch data between the host and device as needed.
Managed memory, including the `__managed__` keyword, is now supported in the HIP
combined host/device compilation. Through unified memory allocation, managed
memory allows data to be shared and accessible to both the CPU and GPU using a
single pointer. The allocation is managed by the AMD GPU driver using the Linux
Heterogeneous Memory Management (HMM) mechanism. The user can call managed
memory API `hipMallocManaged` to allocate a large chunk of HMM memory, execute
kernels on a device, and fetch data between the host and device as needed.
**Note:** In a HIP application, it is recommended to do a capability check before calling the managed memory APIs. For example,
**Note:** In a HIP application, it is recommended to do a capability check
before calling the managed memory APIs. For example,
```c
```cpp
int managed_memory = 0;
HIPCHECK(hipDeviceGetAttribute(&managed_memory, hipDeviceAttributeManagedMemory, p_gpuDevice));
@@ -281,89 +322,97 @@ if (!managed_memory) {
}
```
**Note:** The managed memory capability check may not be necessary; however, if HMM is not supported, managed malloc will fall back to using system memory. Other managed memory API calls will, then, have
**Note:** The managed memory capability check may not be necessary; however, if
HMM is not supported, managed `malloc` will fall back to using system memory.
Refer to the HIP API documentation for more details on managed memory APIs.
For the application, see [hipMallocManaged.cpp](https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-4.5.x/tests/src/runtimeApi/memory/hipMallocManaged.cpp)
For the application, see
[hipMallocManaged.cpp](https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-4.5.x/tests/src/runtimeApi/memory/hipMallocManaged.cpp)
## New Environment Variable
### New Environment Variable
The following new environment variable is added in this release:
| **Environment Variable** | **Value** | **Description** |
| --- | --- | --- |
| **HSA\_COOP\_CU\_COUNT** | 0 or 1 (default is 0) | Some processors support more compute units than can reliably be used in a cooperative dispatch. Setting the environment variable HSA\_COOP\_CU\_COUNT to 1 will cause ROCr to return the correct CU count for cooperative groups through the HSA\_AMD\_AGENT\_INFO\_COOPERATIVE\_COMPUTE\_UNIT\_COUNT attribute of hsa\_agent\_get\_info(). Setting HSA\_COOP\_CU\_COUNT to other values, or leaving it unset, will cause ROCr to return the same CU count for the attributes HSA\_AMD\_AGENT\_INFO\_COOPERATIVE\_COMPUTE\_UNIT\_COUNT and HSA\_AMD\_AGENT\_INFO\_COMPUTE\_UNIT\_COUNT. Future ROCm releases will make HSA\_COOP\_CU\_COUNT=1 the default.
|
| **Environment Variable** | **Value** | **Description** |
|:------------------------:|:---------------------:|:--------------------------------------------------------|
| `HSA_COOP_CU_COUNT` | 0 or 1 (default is 0) | Some processors support more compute units than can reliably be used in a cooperative dispatch. Setting the environment variable `HSA_COOP_CU_COUNT` to 1 will cause ROCr to return the correct CU count for cooperative groups through the `HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT` attribute of `hsa_agent_get_info()`. Setting `HSA_COOP_CU_COUNT` to other values, or leaving it unset, will cause ROCr to return the same CU count for the attributes `HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT` and `HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT`. Future ROCm releases will make `HSA_COOP_CU_COUNT = 1` the default. |
## ROCm Math and Communication Libraries
### ROCm Math and Communication Libraries
| **Library** | **Changes** |
| --- | --- |
| **rocBLAS** | **Added** <ul><li>Added rocblas\_get\_version\_string\_size convenience function</li><li>Added rocblas\_xtrmm\_outofplace, an out-of-place version of rocblas\_xtrmm</li><li>Added hpl and trig initialization for gemm\_ex to rocblas-bench</li><li>Added source code gemm. It can be used as an alternative to Tensile for debugging and development</li><li>Added option `ROCM_MATHLIBS_API_USE_HIP_COMPLEX` to opt-in to use hipFloatComplex and hipDoubleComplex</li></ul> **Optimizations** <ul><li>Improved performance of non-batched and batched single-precision GER for size m > 1024. Performance enhanced by 5-10% measured on a MI100 (gfx908) GPU.</li><li>Improved performance of non-batched and batched HER for all sizes and data types. Performance enhanced by 2-17% measured on a MI100 (gfx908) GPU.</li></ul> **Changed** <ul><li>Instantiate templated rocBLAS functions to reduce size of librocblas.so</li><li>Removed static library dependency on msgpack</li><li>Removed boost dependencies for clients</li></ul> **Fixed** <ul><li>Option to install script to build only rocBLAS clients with a pre-built rocBLAS library</li><li>Correctly set output of nrm2\_batched\_ex and nrm2\_strided\_batched\_ex when given bad input</li><li>Fix for dgmm with side == rocblas\_side\_left and a negative incx</li><li>Fixed out-of-bounds read for small trsm</li><li>Fixed numerical checking for tbmv\_strided\_batched</li></ul> |
| | |
| **hipBLAS** | **Added** <ul><li>Added rocSOLVER functions to hipblas-bench</li><li>Added option `ROCM_MATHLIBS_API_USE_HIP_COMPLEX` to opt-in to use hipFloatComplex and hipDoubleComplex</li><li>Added compilation warning for future trmm changes</li><li>Added documentation to hipblas.h</li><li>Added option to forgo pivoting for getrf and getri when ipiv is nullptr</li><li>Added code coverage option</li></ul> **Fixed** <ul><li>Fixed use of incorrect `HIP_PATH` when building from source.</li><li>Fixed windows packaging</li><li>Allowing negative increments in hipblas-bench</li><li>Removed boost dependency</li></ul> |
| | |
| **rocFFT** | **Changed** <ul><li>Enabled runtime compilation of single FFT kernels > length 1024.</li><li>Re-aligned split device library into 4 roughly equal libraries.</li><li>Implemented the FuseShim framework to replace the original OptimizePlan</li><li>Implemented the generic buffer-assignment framework. The buffer assignment is no longer performed by each node. A generic algorithm is designed to test and pick the best assignment path. With the help of FuseShim, more kernel-fusions are achieved.</li><li>Do not read the imaginary part of the DC and Nyquist modes for even-length complex-to-real transforms.</li></ul> **Optimizations** <ul><li>Optimized twiddle-conjugation; complex-to-complex inverse transforms have similar performance to foward transforms now.</li><li>Improved performance of single-kernel small 2D transforms.</li></ul> |
| | |
| **hipFFT** | **Fixed** <ul><li>Fixed incorrect reporting of rocFFT version.</li></ul> **Changed** <ul><li>Unconditionally enabled callback functionality. On the CUDA backend, callbacks only run correctly when hipFFT is built as a static library, and is linked against the static cuFFT library.</li></ul> |
| | |
| **rocSPARSE** | **Added** <ul><li>csrmv, coomv, ellmv, hybmv for (conjugate) transposed matricescsrmv for symmetric matrices</li></ul> **Changed** <ul><li>spmm\_ex is now deprecated and will be removed in the next major release</li></ul> **Improved** <ul><li>Optimization for gtsv</li></ul> |
| | |
| **hipSPARSE** | **Added** <ul><li>Added (conjugate) transpose support for csrmv, hybmv and spmv routines</li></ul> |
| | |
| **Library** | **Changes** |
|:--------------:|:----------------------------------------------------------------------------------------|
| **rocBLAS** | **Added** <ul><li>Added `rocblas_get_version_string_size` convenience function</li><li>Added `rocblas_xtrmm_outofplace`, an out-of-place version of `rocblas_xtrmm`</li><li>Added hpl and trig initialization for `gemm_ex` to `rocblas-bench`</li><li>Added source code gemm. It can be used as an alternative to Tensile for debugging and development</li><li>Added option `ROCM_MATHLIBS_API_USE_HIP_COMPLEX` to opt-in to use `hipFloatComplex` and `hipDoubleComplex`</li></ul> **Optimizations** <ul><li>Improved performance of non-batched and batched single-precision GER for size m > 1024. Performance enhanced by 5-10% measured on a MI100 (gfx908) GPU.</li><li>Improved performance of non-batched and batched HER for all sizes and data types. Performance enhanced by 2-17% measured on a MI100 (gfx908) GPU.</li></ul> **Changed** <ul><li>Instantiate templated rocBLAS functions to reduce size of librocblas.so</li><li>Removed static library dependency on msgpack</li><li>Removed boost dependencies for clients</li></ul> **Fixed** <ul><li>Option to install script to build only rocBLAS clients with a pre-built rocBLAS library</li><li>Correctly set output of `nrm2_batched_ex` and `nrm2_strided_batched_ex` when given bad input</li><li>Fix for dgmm with side == `rocblas_side_left` and a negative incx</li><li>Fixed out-of-bounds read for small trsm</li><li>Fixed numerical checking for `tbmv_strided_batched`</li></ul> |
| | |
| **hipBLAS** | **Added** <ul><li>Added rocSOLVER functions to hipblas-bench</li><li>Added option `ROCM_MATHLIBS_API_USE_HIP_COMPLEX` to opt-in to use `hipFloatComplex` and `hipDoubleComplex`</li><li>Added compilation warning for future trmm changes</li><li>Added documentation to `hipblas.h`</li><li>Added option to forgo pivoting for getrf and getri when ipiv is `nullptr`</li><li>Added code coverage option</li></ul> **Fixed** <ul><li>Fixed use of incorrect `HIP_PATH` when building from source.</li><li>Fixed windows packaging</li><li>Allowing negative increments in hipblas-bench</li><li>Removed boost dependency</li></ul> |
| | |
| **rocFFT** | **Changed** <ul><li>Enabled runtime compilation of single FFT kernels > length 1024.</li><li>Re-aligned split device library into 4 roughly equal libraries.</li><li>Implemented the FuseShim framework to replace the original OptimizePlan</li><li>Implemented the generic buffer-assignment framework. The buffer assignment is no longer performed by each node. A generic algorithm is designed to test and pick the best assignment path. With the help of FuseShim, more kernel-fusions are achieved.</li><li>Do not read the imaginary part of the DC and Nyquist modes for even-length complex-to-real transforms.</li></ul> **Optimizations** <ul><li>Optimized twiddle-conjugation; complex-to-complex inverse transforms have similar performance to foward transforms now.</li><li>Improved performance of single-kernel small 2D transforms.</li></ul> |
| | |
| **hipFFT** | **Fixed** <ul><li>Fixed incorrect reporting of rocFFT version.</li></ul> **Changed** <ul><li>Unconditionally enabled callback functionality. On the CUDA backend, callbacks only run correctly when hipFFT is built as a static library, and is linked against the static cuFFT library.</li></ul> |
| | |
| **rocSPARSE** | **Added** <ul><li>csrmv, coomv, ellmv, hybmv for (conjugate) transposed matricescsrmv for symmetric matrices</li></ul> **Changed** <ul><li>`spmm_ex` is now deprecated and will be removed in the next major release</li></ul> **Improved** <ul><li>Optimization for gtsv</li></ul> |
| | |
| **hipSPARSE** | **Added** <ul><li>Added (conjugate) transpose support for csrmv, hybmv and spmv routines</li></ul> |
| | |
| **rocALUTION** | **Changed** <ul><li>Removed deprecated GlobalPairwiseAMG class, please use PairwiseAMG instead.</li></ul> **Improved** <ul><li>Improved documentation</li></ul> |
| | |
| **rocTHRUST** | **Updates** <ul><li>Updated to match upstream Thrust 1.13.0</li><li>Updated to match upstream Thrust 1.14.0</li><li>Added async scan</li></ul> **Changed** <ul><li>Scan algorithms: inclusive\_scan now uses the input-type as accumulator-type, exclusive\_scan uses initial-value-type. This particularly changes behaviour of small-size input types with large-size output types (e.g. short input, int output). And low-res input with high-res output (e.g. float input, double output)</li></ul> |
| | |
| **rocSOLVER** | **Added** <ul><li>Symmetric matrix factorizations: <ul><li>LASYF</li><li>SYTF2, SYTRF (with batched and strided\_batched versions)</li></ul><li>Added rocsolver\_get\_version\_string\_size to help with version string queries</li><li>Added rocblas\_layer\_mode\_ex and the ability to print kernel calls in the trace and profile logs</li><li>Expanded batched and strided\_batched sample programs.</li></ul> **Optimizations** <ul><li>Improved general performance of LU factorization</li><li>Increased parallelism of specialized kernels when compiling from source, reducing build times on multi-core systems.</li></ul> **Changed** <ul><li>The rocsolver-test client now prints the rocSOLVER version used to run the tests, rather than the version used to build them</li><li>The rocsolver-bench client now prints the rocSOLVER version used in the benchmark</li></ul> **Fixed** <ul><li>Added missing stdint.h include to rocsolver.h</li></ul> |
| | |
| **hipSOLVER** | **Added** <ul><li>Added SYTRF functions: hipsolverSsytrf\_bufferSize, hipsolverDsytrf\_bufferSize, hipsolverCsytrf\_bufferSize, hipsolverZsytrf\_bufferSize, hipsolverSsytrf, hipsolverDsytrf, hipsolverCsytrf, hipsolverZsytrf</li></ul> **Fixed** <ul><li>Fixed use of incorrect `HIP_PATH` when building from source</li></ul> |
| | |
| **RCCL** | **Added** <ul><li>Compatibility with NCCL 2.10.3</li></ul> **Known issues** <ul><li>Managed memory is not currently supported for clique-based kernels</li></ul> |
| | |
| **hipCUB** | **Fixed** <ul><li>Added missing includes to hipcub.hpp</li></ul> **Added** <ul><li>Bfloat16 support to test cases (device\_reduce & device\_radix\_sort)</li><li>Device merge sort</li><li>Block merge sort</li><li>API update to CUB 1.14.0</li></ul> **Changed** <ul><li>The SetupNVCC.cmake automatic target selector select all of the capabalities of all available card for NVIDIA backend.</li></ul> |
| | |
| **rocPRIM** | **Fixed** <ul><li>Enable bfloat16 tests and reduce threshold for bfloat16</li><li>Fix device scan limit\_size feature</li><li>Non-optimized builds no longer trigger local memory limit errors</li></ul> **Added** <ul><li>Scan size limit feature</li><li>Reduce size limit feature</li><li>Transform size limit feature</li><li>Add block\_load\_striped and block\_store\_striped</li><li>Add gather\_to\_blocked to gather values from other threads into a blocked arrangement</li><li>The block sizes for device merge sorts initial block sort and its merge steps are now separate in its kernel config (the block sort step supports multiple items per thread)</li></ul> **Changed** <ul><li>size\_limit for scan, reduce and transform can now be set in the config struct instead of a parameter</li><li>device\_scan and device\_segmented\_scan: inclusive\_scan now uses the input-type as accumulator-type, exclusive\_scan uses initial-value-type. This particularly changes behaviour of small-size input types with large-size output types (e.g. short input, int output) and low-res input with high-res output (e.g. float input, double output)</li><li>Revert old Fiji workaround, because the issue was solved at compiler side</li><li>Update README cmake minimum version number</li><li>Block sort support multiple items per thread. Currently only powers of two block sizes, and items per threads are supported and only for full blocks</li><li>Bumped the minimum required version of CMake to 3.16</li></ul> **Known issues** <ul><li>Unit tests may soft hang on MI200 when running in hipMallocManaged mode.</li><li>device\_segmented\_radix\_sort, device\_scan unit tests failing for HIP on WindowsReduceEmptyInput cause random failure with bfloat16</li><li>Managed memory is not currently supported for clique-based kernels</li></ul> |
| | |
| **rocTHRUST** | **Updates** <ul><li>Updated to match upstream Thrust 1.13.0</li><li>Updated to match upstream Thrust 1.14.0</li><li>Added async scan</li></ul> **Changed** <ul><li>Scan algorithms: `inclusive_scan` now uses the input-type as accumulator-type, `exclusive_scan` uses initial-value-type. This particularly changes behaviour of small-size input types with large-size output types (e.g. `short` input, `int` output). And low-res input with high-res output (e.g. float input, double output)</li></ul> |
| | |
| **rocSOLVER** | **Added** <ul><li>Symmetric matrix factorizations: <ul><li>LASYF</li><li>SYTF2, SYTRF (with `batched` and `strided_batched` versions)</li></ul><li>Added `rocsolver_get_version_string_size` to help with version string queries</li><li>Added `rocblas_layer_mode_ex` and the ability to print kernel calls in the trace and profile logs</li><li>Expanded batched and `strided_batched` sample programs.</li></ul> **Optimizations** <ul><li>Improved general performance of LU factorization</li><li>Increased parallelism of specialized kernels when compiling from source, reducing build times on multi-core systems.</li></ul> **Changed** <ul><li>The rocsolver-test client now prints the rocSOLVER version used to run the tests, rather than the version used to build them</li><li>The rocsolver-bench client now prints the rocSOLVER version used in the benchmark</li></ul> **Fixed** <ul><li>Added missing `stdint.h` include to `rocsolver.h`</li></ul> |
| | |
| **hipSOLVER** | **Added** <ul><li>Added SYTRF functions: `hipsolverSsytrf_bufferSize`, `hipsolverDsytrf_bufferSize`, `hipsolverCsytrf_bufferSize`, `hipsolverZsytrf_bufferSize`, `hipsolverSsytrf`, `hipsolverDsytrf`, `hipsolverCsytrf`, `hipsolverZsytrf`</li></ul> **Fixed** <ul><li>Fixed use of incorrect `HIP_PATH` when building from source</li></ul> |
| | |
| **RCCL** | **Added** <ul><li>Compatibility with NCCL 2.10.3</li></ul> **Known issues** <ul><li>Managed memory is not currently supported for clique-based kernels</li></ul> |
| | |
| **hipCUB** | **Fixed** <ul><li>Added missing includes to `hipcub.hpp`</li></ul> **Added** <ul><li>Bfloat16 support to test cases (`device_reduce` & `device_radix_sort`)</li><li>Device merge sort</li><li>Block merge sort</li><li>API update to CUB 1.14.0</li></ul> **Changed** <ul><li>The `SetupNVCC.cmake` automatic target selector select all of the capabalities of all available card for NVIDIA backend.</li></ul> |
| | |
| **rocPRIM** | **Fixed** <ul><li>Enable `bfloat16` tests and reduce threshold for `bfloat16`</li><li>Fix device scan `limit_size` feature</li><li>Non-optimized builds no longer trigger local memory limit errors</li></ul> **Added** <ul><li>Scan size limit feature</li><li>Reduce size limit feature</li><li>Transform size limit feature</li><li>Add `block_load_striped` and `block_store_striped`</li><li>Add `gather_to_blocked` to gather values from other threads into a blocked arrangement</li><li>The block sizes for device merge sorts initial block sort and its merge steps are now separate in its kernel config (the block sort step supports multiple items per thread)</li></ul> **Changed** <ul><li>`size_limit` for scan, reduce and transform can now be set in the config struct instead of a parameter</li><li>`device_scan` and `device_segmented_scan`: `inclusive_scan` now uses the input-type as accumulator-type, `exclusive_scan` uses initial-value-type. This particularly changes behaviour of small-size input types with large-size output types (e.g. `short` input, `int` output) and low-res input with high-res output (e.g. `float` input, `double` output)</li><li>Revert old Fiji workaround, because the issue was solved at compiler side</li><li>Update `README` cmake minimum version number</li><li>Block sort support multiple items per thread. Currently only powers of two block sizes, and items per threads are supported and only for full blocks</li><li>Bumped the minimum required version of CMake to 3.16</li></ul> **Known issues** <ul><li>Unit tests may soft hang on MI200 when running in `hipMallocManaged` mode.</li><li>`device_segmented_radix_sort`, `device_scan` unit tests failing for HIP on `WindowsReduceEmptyInput` cause random failure with `bfloat16`</li><li>Managed memory is not currently supported for clique-based kernels</li></ul> |
## System Management Interface
### System Management Interface
### Clock Throttling for GPU Events
#### Clock Throttling for GPU Events
This feature lists GPU events as they occur in real-time and can be used with _kfdtest_ to produce _vm\_fault_ events for testing.
This feature lists GPU events as they occur in real-time and can be used with
`kfdtest` to produce `vm_fault` events for testing.
The command can be called with either &quot; `-e` or `--showevents` like this:
-e [EVENT [EVENT ...]], --showevents [EVENT [EVENT ...]] Show event list
Where `EVENT` is any list combination of `VM_FAULT`, `THERMAL_THROTTLE`, or `GPU_RESET` and is NOT case sensitive.
**Note:** If no event arguments are passed, all events will be watched by default.
#### CLI Commands
The command can be called with either `-e` or `--showevents` like this:
```bash
-e [EVENT [EVENT ...]], --showevents [EVENT [EVENT ...]] Show event list
```
Where `EVENT` is any list combination of `VM_FAULT`, `THERMAL_THROTTLE`, or
`GPU_RESET` and is **NOT** case sensitive.
**Note:** If no event arguments are passed, all events will be watched by
default.
##### CLI Commands
```bash
$ rocm-smi --showevents vm_fault thermal_throttle gpu_reset
======================= ROCm System Management Interface =======================
================================= Show Events ==================================
press 'q' or 'ctrl + c' to quit
DEVICE TIME TYPE DESCRIPTION
DEVICE TIME TYPE DESCRIPTION
============================= End of ROCm SMI Log ==============================
```
(Run kfdtest in another window to test for vm\_fault events.)
(Run `kfdtest` in another window to test for `vm_fault` events.)
**Note:** Unlike other rocm-smi CLI commands, this command does not quit unless specified by the user. Users may press either `q` or `ctrl + c` to quit.
**Note:** Unlike other rocm-smi CLI commands, this command does not quit unless
specified by the user. Users may press either `q` or `ctrl + c` to quit.
### Display XGMI Bandwidth Between Nodes
#### Display XGMI Bandwidth Between Nodes
The _rsmi\_minmax\_bandwidth\_get_ API reads the HW Topology file and displays bandwidth (min-max) between any two NUMA nodes in a matrix format.
The `rsmi_minmax_bandwidth_get` API reads the HW Topology file and displays
bandwidth (min-max) between any two NUMA nodes in a matrix format.
The Command Line Interface (CLI) command can be called as follows:
```
```bash
$ rocm-smi --shownodesbw
======================= ROCm System Management Interface =======================
@@ -381,21 +430,26 @@ Format: min-max; Units: mps
============================= End of ROCm SMI Log ==============================
```
The sample output above shows the maximum theoretical xgmi bandwidth between 2 numa nodes,
The sample output above shows the maximum theoretical xgmi bandwidth between 2
numa nodes,
**Note:** "0-0" min-max bandwidth indicates devices are not connected directly.
### P2P Connection Status
#### P2P Connection Status
The _rsmi\_is\_p2p\_accessible_ API returns "True" if P2P can be implemented between two nodes, and returns "False" if P2P cannot be implemented between the two nodes.
The `rsmi_is_p2p_accessible` API returns `True` if P2P can be implemented
between two nodes, and returns `False` if P2P cannot be implemented between the
two nodes.
The Command Line Interface command can be called as follows:
rocm-smi --showtopoaccess
```bash
rocm-smi --showtopoaccess
```
Sample Output:
```
```bash
$ rocm-smi --showtopoaccess
======================= ROCm System Management Interface =======================
===================== Link accessibility between two GPUs ======================
@@ -405,13 +459,17 @@ GPU1 True True
============================= End of ROCm SMI Log ==============================
```
# Breaking Changes
## Breaking Changes
## Runtime Breaking Change
### Runtime Breaking Change
Re-ordering of the enumerated type in hip\_runtime\_api.h to better match NV. See below for the difference in enumerated types.
Re-ordering of the enumerated type in `hip_runtime_api.h` to better match CUDA.
See below for the difference in enumerated types.
ROCm software will be affected if any of the defined enums listed below are used in the code. Applications built with ROCm v5.0 enumerated types will work with a ROCm 4.5.2 driver. However, an undefined behavior error will occur with a ROCm v4.5.2 application that uses these enumerated types with a ROCm 5.0 runtime.
ROCm software will be affected if any of the defined enums listed below are used
in the code. Applications built with ROCm v5.0 enumerated types will work with a
ROCm 4.5.2 driver. However, an undefined behavior error will occur with a ROCm
v4.5.2 application that uses these enumerated types with a ROCm 5.0 runtime.
```c
typedef enum hipDeviceAttribute_t {
@@ -605,74 +663,93 @@ typedef enum hipDeviceAttribute_t {
} hipDeviceAttribute_t;
```
# Known Issues in This Release
## Known Issues in This Release
## Incorrect dGPU Behavior When Using AMDVBFlash Tool
### Incorrect dGPU Behavior When Using AMDVBFlash Tool
The AMDVBFlash tool, used for flashing the VBIOS image to dGPU, does not communicate with the ROM Controller specifically when the driver is present. This is because the driver, as part of its runtime power management feature, puts the dGPU to a sleep state.
The AMDVBFlash tool, used for flashing the VBIOS image to dGPU, does not
communicate with the ROM Controller specifically when the driver is present.
This is because the driver, as part of its runtime power management feature,
puts the dGPU to a sleep state.
As a workaround, users can run `amdgpu.runpm=0`, which temporarily disables the runtime power management feature from the driver and dynamically changes some power control-related sysfs files.
As a workaround, users can run `amdgpu.runpm=0`, which temporarily disables the
runtime power management feature from the driver and dynamically changes some
power control-related sysfs files.
## Issue with START Timestamp in ROCProfiler
### Issue with START Timestamp in ROCProfiler
Users may encounter an issue with the enabled timestamp functionality for monitoring one or multiple counters. ROCProfiler outputs the following four timestamps for each kernel:
Users may encounter an issue with the enabled timestamp functionality for
monitoring one or multiple counters. ROCProfiler outputs the following four
timestamps for each kernel:
- Dispatch
- Start
- End
- Complete
**Issue**
#### Issue
This defect is related to the Start timestamp functionality, which incorrectly shows an earlier time than the Dispatch timestamp.
This defect is related to the Start timestamp functionality, which incorrectly
shows an earlier time than the Dispatch timestamp.
To reproduce the issue,
1. Enable timing using the `--timestamp on` flag.
2. Use the `-i` option with the input filename that contains the name of the counter(s) to monitor.
2. Use the `-i` option with the input filename that contains the name of the
counter(s) to monitor.
3. Run the program.
4. Check the output result file.
**Current behavior**
##### Current behavior
BeginNS is lower than DispatchNS, which is incorrect.
`BeginNS` is lower than `DispatchNS`, which is incorrect.
**Expected behavior**
##### Expected behavior
The correct order is:
_Dispatch < Start < End < Complete_
`Dispatch < Start < End < Complete`
Users cannot use ROCProfiler to measure the time spent on each kernel because of the incorrect timestamp with counter collection enabled.
Users cannot use ROCProfiler to measure the time spent on each kernel because of
the incorrect timestamp with counter collection enabled.
**Recommended Workaround**
##### Recommended Workaround
Users are recommended to collect kernel execution timestamps without monitoring counters, as follows:
Users are recommended to collect kernel execution timestamps without monitoring
counters, as follows:
1. Enable timing using the `--timestamp on` flag, and run the application.
2. Rerun the application using the `-i` option with the input filename that contains the name of the counter(s) to monitor, and save this to a different output file using the `-o` flag.
2. Rerun the application using the `-i` option with the input filename that
contains the name of the counter(s) to monitor, and save this to a different
output file using the `-o` flag.
3. Check the output result file from step 1.
4. The order of timestamps correctly displays as:
_DispathNS < BeginNS < EndNS < CompleteNS_
`DispathNS < BeginNS < EndNS < CompleteNS`
1. Users can find the values of the collected counters in the output file generated in step 2.
## Radeon Pro V620 and W6800 Workstation GPUs
1. Users can find the values of the collected counters in the output file
generated in step 2.
### No Support for SMI and ROCDebugger on SRIOV
System Management Interface (SMI) and ROCDebugger are not supported in the SRIOV environment on any GPU. For more information, refer to the Systems Management Interface documentation.
System Management Interface (SMI) and ROCDebugger are not supported in the SRIOV
environment on any GPU, including the
**Radeon Pro V620 and W6800 Workstation GPUs**. For more information, refer to
the Systems Management Interface documentation.
# Deprecations and Warnings in This Release
## Deprecations and Warnings in This Release
## ROCm Libraries Changes Deprecations and Deprecation Removal
### ROCm Libraries Changes Deprecations and Deprecation Removal
- The hipfft.h header is now provided only by the hipfft package. Up to ROCm 5.0, users would get hipfft.h in the rocfft package too.
- The GlobalPairwiseAMG class is now entirely removed, users should use the PairwiseAMG class instead.
- The rocsparse\_spmm signature in 5.0 was changed to match that of rocsparse\_spmm\_ex. In 5.0, rocsparse\_spmm\_ex is still present, but deprecated. Signature diff for rocsparse\_spmm
- The `hipfft.h` header is now provided only by the `hipfft` package. Up to ROCm
5.0, users would get `hipfft.h` in the rocfft package too.
- The GlobalPairwiseAMG class is now entirely removed, users should use the
PairwiseAMG class instead.
- The `rocsparse_spmm` signature in 5.0 was changed to match that of
`rocsparse_spmm_ex`. In 5.0, `rocsparse_spmm_ex` is still present, but
deprecated. Signature diff for `rocsparse_spmm`
### _rocsparse\_spmm_ in 5.0
#### `rocsparse_spmm` in 5.0
```c
rocsparse_status rocsparse_spmm(rocsparse_handle handle,
@@ -690,7 +767,7 @@ rocsparse_status rocsparse_spmm(rocsparse_handle handle,
void* temp_buffer);
```
### _rocsparse\_spmm_ in 4.0
### `rocsparse_spmm` in 4.0
```c
rocsparse_status rocsparse_spmm(rocsparse_handle handle,
@@ -707,55 +784,99 @@ rocsparse_status rocsparse_spmm(rocsparse_handle handle,
void* temp_buffer);
```
## HIP API Deprecations and Warnings
### HIP API Deprecations and Warnings
### Warning - Arithmetic Operators of HIP Complex and Vector Types
#### Warning - Arithmetic Operators of HIP Complex and Vector Types
In this release, arithmetic operators of HIP complex and vector types are deprecated.
In this release, arithmetic operators of HIP complex and vector types are
deprecated.
- As alternatives to arithmetic operators of HIP complex types, users can use arithmetic operators of std::complex types.
- As alternatives to arithmetic operators of HIP vector types, users can use the operators of the native clang vector type associated with the data member of HIP vector types.
- As alternatives to arithmetic operators of HIP complex types, users can use
arithmetic operators of `std::complex` types.
- As alternatives to arithmetic operators of HIP vector types, users can use the
operators of the native clang vector type associated with the data member of
HIP vector types.
During the deprecation, two macros `__HIP_ENABLE_COMPLEX_OPERATORS` and `__HIP_ENABLE_VECTOR_OPERATORS` are provided to allow users to conditionally enable arithmetic operators of HIP complex or vector types.
During the deprecation, two macros `__HIP_ENABLE_COMPLEX_OPERATORS` and
`__HIP_ENABLE_VECTOR_OPERATORS` are provided to allow users to conditionally
enable arithmetic operators of HIP complex or vector types.
Note, the two macros are mutually exclusive and, by default, set to off.
The arithmetic operators of HIP complex and vector types will be removed in a future release.
The arithmetic operators of HIP complex and vector types will be removed in a
future release.
Refer to the HIP API Guide for more information.
### Refactor of HIPCC/HIPCONFIG
#### HIPCC/HIPCONFIG Refactoring
In prior ROCm releases, by default, the hipcc/hipconfig Perl scripts were used to identify and set target compiler options, target platform, compiler, and runtime appropriately.
In prior ROCm releases, by default, the `hipcc`/`hipconfig` Perl scripts were
used to identify and set target compiler options, target platform, compiler, and
runtime appropriately.
In ROCm v5.0, hipcc.bin and hipconfig.bin have been added as the compiled binary implementations of the hipcc and hipconfig. These new binaries are currently a work-in-progress, considered, and marked as experimental. ROCm plans to fully transition to hipcc.bin and hipconfig.bin in the a future ROCm release. The existing hipcc and hipconfig Perl scripts are renamed to hipcc.pl and hipconfig.pl respectively. New top-level hipcc and hipconfig Perl scripts are created, which can switch between the Perl script or the compiled binary based on the environment variable `HIPCC_USE_PERL_SCRIPT`.
In ROCm v5.0, `hipcc.bin` and `hipconfig.bin` have been added as the compiled
binary implementations of the `hipcc` and `hipconfig`. These new binaries are
currently a work-in-progress, considered, and marked as experimental. ROCm plans
to fully transition to `hipcc.bin` and `hipconfig.bin` in the a future ROCm
release. The existing `hipcc` and `hipconfig` Perl scripts are renamed to
`hipcc.pl` and `hipconfig.pl` respectively. New top-level `hipcc` and
`hipconfig` Perl scripts are created, which can switch between the Perl script
or the compiled binary based on the environment variable
`HIPCC_USE_PERL_SCRIPT`.
In ROCm 5.0, by default, this environment variable is set to use hipcc and hipconfig through the Perl scripts.
In ROCm 5.0, by default, this environment variable is set to use `hipcc` and
`hipconfig` through the Perl scripts.
Subsequently, Perl scripts will no longer be available in ROCm in a future release.
Subsequently, Perl scripts will no longer be available in ROCm in a future
release.
## Warning - Compiler-Generated Code Object Version 4 Deprecation
### Warning - Compiler-Generated Code Object Version 4 Deprecation
Support for loading compiler-generated code object version 4 will be deprecated in a future release with no release announcement and replaced with code object 5 as the default version.
Support for loading compiler-generated code object version 4 will be deprecated
in a future release with no release announcement and replaced with code object 5
as the default version.
The current default is code object version 4.
## Warning - MIOpenTensile Deprecation
### Warning - MIOpenTensile Deprecation
MIOpenTensile will be deprecated in a future release.
## Archived Documentation
Older rocm documentation is archived at <https://rocmdocs.amd.com>.
## Disclaimer
The information presented in this document is for informational purposes only
and may contain technical inaccuracies, omissions, and typographical errors.
The information contained herein is subject to change and may be rendered
inaccurate for many reasons, including but not limited to product and roadmap
changes, component and motherboard versionchanges, new model and/or product
releases, product differences between differing manufacturers, software changes,
BIOS flashes, firmware upgrades, or the like. Any computer system has risks of
security vulnerabilities that cannot be completely prevented or mitigated.
AMD assumes no obligation to update or otherwise correct or revise this
information. However, AMD reserves the right to revise this information and to
make changes from time to time to the content hereof without obligation of AMD
to notify any person of such revisions or changes. THIS INFORMATION IS PROVIDED
"AS IS." AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS
HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS
THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED
WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR
PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT,
INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY
INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.AMD, the AMD Arrow logo, and combinations thereof
are trademarks of Advanced Micro Devices, Inc. Other product names used in this
publication are for identification purposes only and may be trademarks of their
respective companies. ©[2021]Advanced Micro Devices, Inc.All rights reserved.
Archived Documentation
----------------------
Older rocm documentation is archived at https://rocmdocs.amd.com.
### Third-party Disclaimer
# Disclaimer
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard versionchanges, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated.AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.THIS INFORMATION IS PROVIDED AS IS.AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc.Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
©[2021]Advanced Micro Devices, Inc.All rights reserved.
## Third-party Disclaimer
Third-party content is licensed to you directly by the third party that owns the content and is not licensed to you by AMD. ALL LINKED THIRD-PARTY CONTENT IS PROVIDED “AS IS” WITHOUT A WARRANTY OF ANY KIND. USE OF SUCH THIRD-PARTY CONTENT IS DONE AT YOUR SOLE DISCRETION AND UNDER NO CIRCUMSTANCES WILL AMD BE LIABLE TO YOU FOR ANY THIRD-PARTY CONTENT. YOU ASSUME ALL RISK AND ARE SOLELY RESPONSIBLE FOR ANY DAMAGES THAT MAY ARISE FROM YOUR USE OF THIRD-PARTY CONTENT.
Third-party content is licensed to you directly by the third party that owns the
content and is not licensed to you by AMD. ALL LINKED THIRD-PARTY CONTENT IS
PROVIDED AS IS” WITHOUT A WARRANTY OF ANY KIND. USE OF SUCH THIRD-PARTY CONTENT
IS DONE AT YOUR SOLE DISCRETION AND UNDER NO CIRCUMSTANCES WILL AMD BE LIABLE TO
YOU FOR ANY THIRD-PARTY CONTENT. YOU ASSUME ALL RISK AND ARE SOLELY RESPONSIBLE
FOR ANY DAMAGES THAT MAY ARISE FROM YOUR USE OF THIRD-PARTY CONTENT.

View File

@@ -1,6 +1,6 @@
MIT License
Copyright (c) 2022 Advanced Micro Devices, Inc. All rights reserved.
Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

127
README.md
View File

@@ -1,52 +1,74 @@
# ROCm™ Repository Updates
This repository contains the manifest file for ROCm™ releases, changelogs, and release information. The file default.xml contains information for all repositories and the associated commit used to build the current ROCm release.
This repository contains the manifest file for ROCm™ releases, changelogs, and
release information. The file default.xml contains information for all
repositories and the associated commit used to build the current ROCm release.
The default.xml file uses the repo Manifest format.
# ROCm v5.4.3 Release Notes
ROCm v5.4.3 is now released. For ROCm v5.4.3 documentation, refer to https://docs.amd.com.
## ROCm v5.4.3 Release Notes
# ROCm v5.4.2 Release Notes
ROCm v5.4.2 is now released. For ROCm v5.4.2 documentation, refer to https://docs.amd.com.
ROCm v5.4.3 is now released. For ROCm v5.4.3 documentation, refer to
<https://docs.amd.com>.
# ROCm v5.4.1 Release Notes
ROCm v5.4.1 is now released. For ROCm v5.4.1 documentation, refer to https://docs.amd.com.
## ROCm v5.4.2 Release Notes
# ROCm v5.4 Release Notes
ROCm v5.4 is now released. For ROCm v5.4 documentation, refer to https://docs.amd.com.
ROCm v5.4.2 is now released. For ROCm v5.4.2 documentation, refer to
<https://docs.amd.com>.
# ROCm v5.3.3 Release Notes
ROCm v5.3.3 is now released. For ROCm v5.3.3 documentation, refer to https://docs.amd.com.
## ROCm v5.4.1 Release Notes
# ROCm v5.3.2 Release Notes
ROCm v5.3.2 is now released. For ROCm v5.3.2 documentation, refer to https://docs.amd.com.
ROCm v5.4.1 is now released. For ROCm v5.4.1 documentation, refer to
<https://docs.amd.com>.
# ROCm v5.3 Release Notes
ROCm v5.3 is now released. For ROCm v5.3 documentation, refer to https://docs.amd.com.
## ROCm v5.4 Release Notes
# ROCm v5.2.3 Release Notes
The ROCm v5.2.3 patch release is now available. The details are listed below. Highlights of this release include enhancements in RCCL version compatibility and minor bug fixes in the HIP Runtime.
ROCm v5.4 is now released. For ROCm v5.4 documentation, refer to
<https://docs.amd.com>.
Additionally, ROCm releases will return to use of the [ROCm](https://github.com/RadeonOpenCompute/ROCm) repository for version-controlled release notes henceforth.
## ROCm v5.3.3 Release Notes
ROCm v5.3.3 is now released. For ROCm v5.3.3 documentation, refer to
<https://docs.amd.com>.
## ROCm v5.3.2 Release Notes
ROCm v5.3.2 is now released. For ROCm v5.3.2 documentation, refer to
<https://docs.amd.com>.
## ROCm v5.3 Release Notes
ROCm v5.3 is now released. For ROCm v5.3 documentation, refer to
<https://docs.amd.com>.
## ROCm v5.2.3 Release Notes
The ROCm v5.2.3 patch release is now available. The details are listed below.
Highlights of this release include enhancements in RCCL version compatibility
and minor bug fixes in the HIP Runtime.
Additionally, ROCm releases will return to use of the
[ROCm](https://github.com/RadeonOpenCompute/ROCm) repository for
version-controlled release notes henceforth.
**NOTE**: This release of ROCm is validated with the AMDGPU release v22.20.1.
All users of the ROCm v5.2.1 release and below are encouraged to upgrade. Refer to https://docs.amd.com for documentation associated with this release.
All users of the ROCm v5.2.1 release and below are encouraged to upgrade. Refer
to <https://docs.amd.com> for documentation associated with this release.
## Introducing Preview Support for Ubuntu 20.04.5 HWE
Refer to the following article for information on the preview support for Ubuntu 20.04.5 HWE.
https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-22-20
Refer to the following article for information on the preview support for
Ubuntu 20.04.5 HWE.
<https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-22-20>
## Changes in This Release
### Ubuntu 18.04 End of Life
Support for Ubuntu 18.04 ends in this release. Future releases of ROCm will not provide prebuilt packages for Ubuntu 18.04.
Support for Ubuntu 18.04 ends in this release. Future releases of ROCm will not
provide prebuilt packages for Ubuntu 18.04.
### HIP and Other Runtimes
@@ -54,46 +76,63 @@ Support for Ubuntu 18.04 ends in this release. Future releases of ROCm will not
##### Fixes
- A bug was discovered in the HIP graph capture implementation in the ROCm v5.2.0 release. If the same kernel is called twice (with different argument values) in a graph capture, the implementation only kept the argument values for the second kernel call.
- A bug was introduced in the hiprtc implementation in the ROCm v5.2.0 release. This bug caused the *hiprtcGetLoweredName* call to fail for named expressions with whitespace in it.
**Example:** The named expression ```my_sqrt<complex<double>>``` passed but ```my_sqrt<complex<double>>``` failed.
- A bug was discovered in the HIP graph capture implementation in the ROCm
v5.2.0 release. If the same kernel is called twice (with different argument
values) in a graph capture, the implementation only kept the argument values
for the second kernel call.
- A bug was introduced in the `hiprtc` implementation in the ROCm v5.2.0
release. This bug caused the `hiprtcGetLoweredName` call to fail for named
expressions with whitespace in it.
**Example:** The named expression `my_sqrt<complex<double>>` passed but
`my_sqrt<complex<double>>` failed.
### ROCm Libraries
#### RCCL
##### Added
- Compatibility with NCCL 2.12.10
- Packages for test and benchmark executables on all supported OSes using CPack
- Adding custom signal handler - opt-in with RCCL_ENABLE_SIGNALHANDLER=1
- Additional details provided if Binary File Descriptor library (BFD) is pre-installed.
- Adding custom signal handler - opt-in with `RCCL_ENABLE_SIGNALHANDLER=1`
- Additional details provided if Binary File Descriptor library (BFD) is
pre-installed.
- Adding experimental support for using multiple ranks per device
- Requires using a new interface to create communicator (ncclCommInitRankMulti),
refer to the interface documentation for details.
- To avoid potential deadlocks, user might have to set an environment variables increasing
the number of hardware queues. For example,
- Requires using a new interface to create communicator
(`ncclCommInitRankMulti`), refer to the interface documentation for
details.
- To avoid potential deadlocks, user might have to set an environment
variables increasing the number of hardware queues. For example,
```cpp
export GPU_MAX_HW_QUEUES=16
```
export GPU_MAX_HW_QUEUES=16
```
- Adding support for reusing ports in NET/IB channels
- Opt-in with NCCL_IB_SOCK_CLIENT_PORT_REUSE=1 and NCCL_IB_SOCK_SERVER_PORT_REUSE=1
- When "Call to bind failed: Address already in use" error happens in large-scale AlltoAll
(for example, >=64 MI200 nodes), users are suggested to opt-in either one or both of the options to resolve the massive port usage issue
- Avoid using NCCL_IB_SOCK_SERVER_PORT_REUSE when NCCL_NCHANNELS_PER_NET_PEER is tuned >1
- Opt-in with `NCCL_IB_SOCK_CLIENT_PORT_REUSE=1` and
`NCCL_IB_SOCK_SERVER_PORT_REUSE=1`
- When "`Call to bind failed: Address already in use`" error happens in
large-scale AlltoAll (for example, >=64 MI200 nodes), users are suggested
to opt-in either one or both of the options to resolve the massive port
usage issue
- Avoid using `NCCL_IB_SOCK_SERVER_PORT_REUSE` when
`NCCL_NCHANNELS_PER_NET_PEER` is tuned >1
##### Removed
- Removed experimental clique-based kernels
### Development Tools
No notable changes in this release for development tools, including the compiler, profiler, and debugger.
No notable changes in this release for development tools, including the
compiler, profiler, and debugger.
### Deployment and Management Tools
No notable changes in this release for deployment and management tools.
## Older ROCm™ Releases
For release information for older ROCm™ releases, refer to [CHANGELOG](CHANGELOG.md).
For release information for older ROCm™ releases, refer to
[CHANGELOG](CHANGELOG.md).

1
RELEASE.md Normal file
View File

@@ -0,0 +1 @@
# Release Notes

882
docs/sphinx/CHANGELOG.md Normal file
View File

@@ -0,0 +1,882 @@
# Changelog
--------------------------------------------------------------------------------
## AMD ROCm™ Releases
### AMD ROCm™ V5.2 Release
AMD ROCm v5.2 is now released. The release documentation is available at
<https://docs.amd.com>.
### AMD ROCm™ V5.1.3 Release
AMD ROCm v5.1.3 is now released. The release documentation is available at
<https://docs.amd.com>.
### AMD ROCm™ V5.1.1 Release
AMD ROCm v5.1.1 is now released. The release documentation is available at
<https://docs.amd.com>.
### AMD ROCm™ V5.1 Release
AMD ROCm v5.1 is now released. The release documentation is available at
<https://docs.amd.com>.
### AMD ROCm™ v5.0.2 Release Notes
#### Fixed Defects in This Release
The following defects are fixed in the ROCm v5.0.2 release.
##### Issue with hostcall Facility in HIP Runtime
In ROCm v5.0, when using the `assert()` call in a HIP kernel, the compiler may
sometimes fail to emit kernel metadata related to the hostcall facility, which
results in incomplete initialization of the hostcall facility in the HIP
runtime. This can cause the HIP kernel to crash when it attempts to execute the
`assert()` call. The root cause was an incorrect check in the compiler to
determine whether the hostcall facility is required by the kernel. This is fixed
in the ROCm v5.0.2 release. The resolution includes a compiler change, which
emits the required metadata by default, unless the compiler can prove that the
hostcall facility is not required by the kernel. This ensures that the
`assert()` call never fails.
**Note**: This fix may lead to breakage in some OpenMP offload use cases, which
use print inside a target region and result in an abort in device code.
The issue will be fixed in a future release.
##### Compatibility Matrix Updates to ROCm Deep Learning Guide
The compatibility matrix in the AMD Deep Learning Guide is updated for ROCm
v5.0.2.
For more information and documentation updates, refer to <https://docs.amd.com>.
### AMD ROCm™ v5.0.1 Release Notes
#### Deprecations and Warnings
##### Refactor of HIPCC/HIPCONFIG
In prior ROCm releases, by default, the `hipcc`/`hipconfig` Perl scripts were
used to identify and set target compiler options, target platform, compiler, and
runtime appropriately.
In ROCm v5.0.1, `hipcc.bin` and `hipconfig.bin` have been added as the compiled
binary implementations of the `hipcc` and `hipconfig`. These new binaries are
currently a work-in-progress, considered, and marked as experimental. ROCm plans
to fully transition to `hipcc.bin` and `hipconfig.bin` in the a future ROCm
release. The existing `hipcc` and `hipconfig` Perl scripts are renamed to
`hipcc.pl` and `hipconfig.pl` respectively. New top-level `hipcc` and
`hipconfig` Perl scripts are created, which can switch between the Perl script
or the compiled binary based on the environment variable
`HIPCC_USE_PERL_SCRIPT`.
In ROCm 5.0.1, by default, this environment variable is set to use `hipcc` and
`hipconfig` through the Perl scripts. Subsequently, Perl scripts will no longer
be available in ROCm in a future release.
#### ROCM DOCUMENTATION UPDATES FOR ROCM 5.0.1
- ROCm Downloads Guide
- ROCm Installation Guide
- ROCm Release Notes
For more information, see <https://docs.amd.com>.
### AMD ROCm™ v5.0 Release Notes
## ROCm Installation Updates
This document describes the features, fixed issues, and information about
downloading and installing the AMD ROCm™ software.
It also covers known issues and deprecations in this release.
## Notice for Open-source and Closed-source ROCm Repositories in Future Releases
To make a distinction between open-source and closed-source components, all ROCm
repositories will consist of sub-folders in future releases.
- All open-source components will be placed in the `base-url/<rocm-ver>/main`
sub-folder
- All closed-source components will reside in the
`base-url/<rocm-ver>/proprietary` sub-folder
### List of Supported Operating Systems
The AMD ROCm platform supports the following operating systems:
| **OS-Version (64-bit)** | **Kernel Versions** |
|:-------------------------------:|:-----------------------------:|
| CentOS 8.3 | `4.18.0-193.el8` |
| CentOS 7.9 | `3.10.0-1127` |
| RHEL 8.5 | `4.18.0-348.7.1.el8_5.x86_64` |
| RHEL 8.4 | `4.18.0-305.el8.x86_64` |
| RHEL 7.9 | `3.10.0-1160.6.1.el7` |
| SLES 15 SP3 | `5.3.18-59.16-default` |
| Ubuntu 20.04.3 | `5.8.0 LTS / 5.11 HWE` |
| Ubuntu 18.04.5 [5.4 HWE kernel] | `5.4.0-71-generic` |
#### Support for RHEL v8.5
This release extends support for RHEL v8.5.
#### Supported GPUs
##### Radeon Pro V620 and W6800 Workstation GPUs
This release extends ROCm support for Radeon Pro V620 and W6800 Workstation
GPUs.
- SRIOV virtualization support for Radeon Pro V620
- KVM Hypervisor (1VF support only) on Ubuntu Host OS with Ubuntu, CentOs, and
RHEL Guest
- Support for ROCm-SMI in an SRIOV environment. For more details, refer to the
ROCm SMI API documentation.
**Note:** Radeon Pro v620 is not supported on SLES.
### ROCm Installation Updates for ROCm v5.0
This release has the following ROCm installation enhancements.
#### Support for Kernel Mode Driver
In this release, users can install the kernel-mode driver using the Installer
method. Some of the ROCm-specific use cases that the installer currently
supports are:
- OpenCL (ROCr/KFD based) runtime
- HIP runtimes
- ROCm libraries and applications
- ROCm Compiler and device libraries
- ROCr runtime and thunk
- Kernel-mode driver
#### Support for Multi-version ROCm Installation and Uninstallation
Users now can install multiple ROCm releases simultaneously on a system using
the newly introduced installer script and package manager install mechanism.
Users can also uninstall multi-version ROCm releases using the
`amdgpu-uninstall` script and package manager.
#### Support for Updating Information on Local Repositories
In this release, the `amdgpu-install` script automates the process of updating
local repository information before proceeding to ROCm installation.
#### Support for Release Upgrades
Users can now upgrade the existing ROCm installation to specific or latest ROCm
releases.
For more details, refer to the AMD ROCm Installation Guide v5.0.
## AMD ROCm V5.0 Documentation Updates
### New AMD ROCm Information Portal ROCm v4.5 and Above
Beginning ROCm release v5.0, AMD ROCm documentation has a new portal at
<https://docs.amd.com>. This portal consists of ROCm documentation v4.5 and
above.
For documentation prior to ROCm v4.5, you may continue to access
<https://rocmdocs.amd.com>.
### Documentation Updates for ROCm 5.0
#### Deployment Tools
##### ROCm Data Center Tool Documentation Updates
- ROCm Data Center Tool User Guide
- ROCm Data Center Tool API Guide
##### ROCm System Management Interface Updates
- System Management Interface Guide
- System Management Interface API Guide
##### ROCm Command Line Interface Updates
- Command Line Interface Guide
#### Machine Learning/AI Documentation Updates
- Deep Learning Guide
- MIGraphX API Guide
- MIOpen API Guide
- MIVisionX API Guide
#### ROCm Libraries Documentation Updates
- hipSOLVER User Guide
- RCCL User Guide
- rocALUTION User Guide
- rocBLAS User Guide
- rocFFT User Guide
- rocRAND User Guide
- rocSOLVER User Guide
- rocSPARSE User Guide
- rocThrust User Guide
#### Compilers and Tools
##### ROCDebugger Documentation Updates
- ROCDebugger User Guide
- ROCDebugger API Guide
##### ROCTracer
- ROCTracer User Guide
- ROCTracer API Guide
##### Compilers
- AMD Instinct High Performance Computing and Tuning Guide
- AMD Compiler Reference Guide
##### HIPify Documentation
- HIPify User Guide
- HIP Supported CUDA API Reference Guide
##### ROCm Debug Agent
- ROCm Debug Agent Guide
- System Level Debug Guide
- ROCm Validation Suite
#### Programming Models Documentation
##### HIP Documentation
- HIP Programming Guide
- HIP API Guide
- HIP FAQ Guide
##### OpenMP Documentation
- OpenMP Support Guide
#### ROCm Glossary
- ROCm Glossary Terms and Definitions
### AMD ROCm Legacy Documentation Links ROCm v4.3 and Prior
- For AMD ROCm documentation, see <https://rocmdocs.amd.com/en/latest/>
- For installation instructions on supported platforms, see
<https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html>
- For AMD ROCm binary structure, see
<https://rocmdocs.amd.com/en/latest/Installation_Guide/Software-Stack-for-AMD-GPU.html>
- For AMD ROCm release history, see
<https://rocmdocs.amd.com/en/latest/Current_Release_Notes/ROCm-Version-History.html>
## What's New in This Release
### HIP Enhancements
The ROCm v5.0 release consists of the following HIP enhancements.
#### HIP Installation Guide Updates
The HIP Installation Guide is updated to include building HIP from source on the
NVIDIA platform.
Refer to the HIP Installation Guide v5.0 for more details.
#### Managed Memory Allocation
Managed memory, including the `__managed__` keyword, is now supported in the HIP
combined host/device compilation. Through unified memory allocation, managed
memory allows data to be shared and accessible to both the CPU and GPU using a
single pointer. The allocation is managed by the AMD GPU driver using the Linux
Heterogeneous Memory Management (HMM) mechanism. The user can call managed
memory API `hipMallocManaged` to allocate a large chunk of HMM memory, execute
kernels on a device, and fetch data between the host and device as needed.
**Note:** In a HIP application, it is recommended to do a capability check
before calling the managed memory APIs. For example,
```cpp
int managed_memory = 0;
HIPCHECK(hipDeviceGetAttribute(&managed_memory, hipDeviceAttributeManagedMemory, p_gpuDevice));
if (!managed_memory) {
printf ("info: managed memory access not supported on the device %d\n Skipped\n", p_gpuDevice);
} else {
HIPCHECK(hipSetDevice(p_gpuDevice));
HIPCHECK(hipMallocManaged(&Hmm, N * sizeof(T)));
. . .
}
```
**Note:** The managed memory capability check may not be necessary; however, if
HMM is not supported, managed `malloc` will fall back to using system memory.
Refer to the HIP API documentation for more details on managed memory APIs.
For the application, see
[hipMallocManaged.cpp](https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-4.5.x/tests/src/runtimeApi/memory/hipMallocManaged.cpp)
### New Environment Variable
The following new environment variable is added in this release:
| **Environment Variable** | **Value** | **Description** |
|:------------------------:|:---------------------:|:--------------------------------------------------------|
| `HSA_COOP_CU_COUNT` | 0 or 1 (default is 0) | Some processors support more compute units than can reliably be used in a cooperative dispatch. Setting the environment variable `HSA_COOP_CU_COUNT` to 1 will cause ROCr to return the correct CU count for cooperative groups through the `HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT` attribute of `hsa_agent_get_info()`. Setting `HSA_COOP_CU_COUNT` to other values, or leaving it unset, will cause ROCr to return the same CU count for the attributes `HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT` and `HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT`. Future ROCm releases will make `HSA_COOP_CU_COUNT = 1` the default. |
### ROCm Math and Communication Libraries
| **Library** | **Changes** |
|:--------------:|:----------------------------------------------------------------------------------------|
| **rocBLAS** | **Added** <ul><li>Added `rocblas_get_version_string_size` convenience function</li><li>Added `rocblas_xtrmm_outofplace`, an out-of-place version of `rocblas_xtrmm`</li><li>Added hpl and trig initialization for `gemm_ex` to `rocblas-bench`</li><li>Added source code gemm. It can be used as an alternative to Tensile for debugging and development</li><li>Added option `ROCM_MATHLIBS_API_USE_HIP_COMPLEX` to opt-in to use `hipFloatComplex` and `hipDoubleComplex`</li></ul> **Optimizations** <ul><li>Improved performance of non-batched and batched single-precision GER for size m > 1024. Performance enhanced by 5-10% measured on a MI100 (gfx908) GPU.</li><li>Improved performance of non-batched and batched HER for all sizes and data types. Performance enhanced by 2-17% measured on a MI100 (gfx908) GPU.</li></ul> **Changed** <ul><li>Instantiate templated rocBLAS functions to reduce size of librocblas.so</li><li>Removed static library dependency on msgpack</li><li>Removed boost dependencies for clients</li></ul> **Fixed** <ul><li>Option to install script to build only rocBLAS clients with a pre-built rocBLAS library</li><li>Correctly set output of `nrm2_batched_ex` and `nrm2_strided_batched_ex` when given bad input</li><li>Fix for dgmm with side == `rocblas_side_left` and a negative incx</li><li>Fixed out-of-bounds read for small trsm</li><li>Fixed numerical checking for `tbmv_strided_batched`</li></ul> |
| | |
| **hipBLAS** | **Added** <ul><li>Added rocSOLVER functions to hipblas-bench</li><li>Added option `ROCM_MATHLIBS_API_USE_HIP_COMPLEX` to opt-in to use `hipFloatComplex` and `hipDoubleComplex`</li><li>Added compilation warning for future trmm changes</li><li>Added documentation to `hipblas.h`</li><li>Added option to forgo pivoting for getrf and getri when ipiv is `nullptr`</li><li>Added code coverage option</li></ul> **Fixed** <ul><li>Fixed use of incorrect `HIP_PATH` when building from source.</li><li>Fixed windows packaging</li><li>Allowing negative increments in hipblas-bench</li><li>Removed boost dependency</li></ul> |
| | |
| **rocFFT** | **Changed** <ul><li>Enabled runtime compilation of single FFT kernels > length 1024.</li><li>Re-aligned split device library into 4 roughly equal libraries.</li><li>Implemented the FuseShim framework to replace the original OptimizePlan</li><li>Implemented the generic buffer-assignment framework. The buffer assignment is no longer performed by each node. A generic algorithm is designed to test and pick the best assignment path. With the help of FuseShim, more kernel-fusions are achieved.</li><li>Do not read the imaginary part of the DC and Nyquist modes for even-length complex-to-real transforms.</li></ul> **Optimizations** <ul><li>Optimized twiddle-conjugation; complex-to-complex inverse transforms have similar performance to foward transforms now.</li><li>Improved performance of single-kernel small 2D transforms.</li></ul> |
| | |
| **hipFFT** | **Fixed** <ul><li>Fixed incorrect reporting of rocFFT version.</li></ul> **Changed** <ul><li>Unconditionally enabled callback functionality. On the CUDA backend, callbacks only run correctly when hipFFT is built as a static library, and is linked against the static cuFFT library.</li></ul> |
| | |
| **rocSPARSE** | **Added** <ul><li>csrmv, coomv, ellmv, hybmv for (conjugate) transposed matricescsrmv for symmetric matrices</li></ul> **Changed** <ul><li>`spmm_ex` is now deprecated and will be removed in the next major release</li></ul> **Improved** <ul><li>Optimization for gtsv</li></ul> |
| | |
| **hipSPARSE** | **Added** <ul><li>Added (conjugate) transpose support for csrmv, hybmv and spmv routines</li></ul> |
| | |
| **rocALUTION** | **Changed** <ul><li>Removed deprecated GlobalPairwiseAMG class, please use PairwiseAMG instead.</li></ul> **Improved** <ul><li>Improved documentation</li></ul> |
| | |
| **rocTHRUST** | **Updates** <ul><li>Updated to match upstream Thrust 1.13.0</li><li>Updated to match upstream Thrust 1.14.0</li><li>Added async scan</li></ul> **Changed** <ul><li>Scan algorithms: `inclusive_scan` now uses the input-type as accumulator-type, `exclusive_scan` uses initial-value-type. This particularly changes behaviour of small-size input types with large-size output types (e.g. `short` input, `int` output). And low-res input with high-res output (e.g. float input, double output)</li></ul> |
| | |
| **rocSOLVER** | **Added** <ul><li>Symmetric matrix factorizations: <ul><li>LASYF</li><li>SYTF2, SYTRF (with `batched` and `strided_batched` versions)</li></ul><li>Added `rocsolver_get_version_string_size` to help with version string queries</li><li>Added `rocblas_layer_mode_ex` and the ability to print kernel calls in the trace and profile logs</li><li>Expanded batched and `strided_batched` sample programs.</li></ul> **Optimizations** <ul><li>Improved general performance of LU factorization</li><li>Increased parallelism of specialized kernels when compiling from source, reducing build times on multi-core systems.</li></ul> **Changed** <ul><li>The rocsolver-test client now prints the rocSOLVER version used to run the tests, rather than the version used to build them</li><li>The rocsolver-bench client now prints the rocSOLVER version used in the benchmark</li></ul> **Fixed** <ul><li>Added missing `stdint.h` include to `rocsolver.h`</li></ul> |
| | |
| **hipSOLVER** | **Added** <ul><li>Added SYTRF functions: `hipsolverSsytrf_bufferSize`, `hipsolverDsytrf_bufferSize`, `hipsolverCsytrf_bufferSize`, `hipsolverZsytrf_bufferSize`, `hipsolverSsytrf`, `hipsolverDsytrf`, `hipsolverCsytrf`, `hipsolverZsytrf`</li></ul> **Fixed** <ul><li>Fixed use of incorrect `HIP_PATH` when building from source</li></ul> |
| | |
| **RCCL** | **Added** <ul><li>Compatibility with NCCL 2.10.3</li></ul> **Known issues** <ul><li>Managed memory is not currently supported for clique-based kernels</li></ul> |
| | |
| **hipCUB** | **Fixed** <ul><li>Added missing includes to `hipcub.hpp`</li></ul> **Added** <ul><li>Bfloat16 support to test cases (`device_reduce` & `device_radix_sort`)</li><li>Device merge sort</li><li>Block merge sort</li><li>API update to CUB 1.14.0</li></ul> **Changed** <ul><li>The `SetupNVCC.cmake` automatic target selector select all of the capabalities of all available card for NVIDIA backend.</li></ul> |
| | |
| **rocPRIM** | **Fixed** <ul><li>Enable `bfloat16` tests and reduce threshold for `bfloat16`</li><li>Fix device scan `limit_size` feature</li><li>Non-optimized builds no longer trigger local memory limit errors</li></ul> **Added** <ul><li>Scan size limit feature</li><li>Reduce size limit feature</li><li>Transform size limit feature</li><li>Add `block_load_striped` and `block_store_striped`</li><li>Add `gather_to_blocked` to gather values from other threads into a blocked arrangement</li><li>The block sizes for device merge sorts initial block sort and its merge steps are now separate in its kernel config (the block sort step supports multiple items per thread)</li></ul> **Changed** <ul><li>`size_limit` for scan, reduce and transform can now be set in the config struct instead of a parameter</li><li>`device_scan` and `device_segmented_scan`: `inclusive_scan` now uses the input-type as accumulator-type, `exclusive_scan` uses initial-value-type. This particularly changes behaviour of small-size input types with large-size output types (e.g. `short` input, `int` output) and low-res input with high-res output (e.g. `float` input, `double` output)</li><li>Revert old Fiji workaround, because the issue was solved at compiler side</li><li>Update `README` cmake minimum version number</li><li>Block sort support multiple items per thread. Currently only powers of two block sizes, and items per threads are supported and only for full blocks</li><li>Bumped the minimum required version of CMake to 3.16</li></ul> **Known issues** <ul><li>Unit tests may soft hang on MI200 when running in `hipMallocManaged` mode.</li><li>`device_segmented_radix_sort`, `device_scan` unit tests failing for HIP on `WindowsReduceEmptyInput` cause random failure with `bfloat16`</li><li>Managed memory is not currently supported for clique-based kernels</li></ul> |
### System Management Interface
#### Clock Throttling for GPU Events
This feature lists GPU events as they occur in real-time and can be used with
`kfdtest` to produce `vm_fault` events for testing.
The command can be called with either `-e` or `--showevents` like this:
```bash
-e [EVENT [EVENT ...]], --showevents [EVENT [EVENT ...]] Show event list
```
Where `EVENT` is any list combination of `VM_FAULT`, `THERMAL_THROTTLE`, or
`GPU_RESET` and is **NOT** case sensitive.
**Note:** If no event arguments are passed, all events will be watched by
default.
##### CLI Commands
```bash
$ rocm-smi --showevents vm_fault thermal_throttle gpu_reset
======================= ROCm System Management Interface =======================
================================= Show Events ==================================
press 'q' or 'ctrl + c' to quit
DEVICE TIME TYPE DESCRIPTION
============================= End of ROCm SMI Log ==============================
```
(Run `kfdtest` in another window to test for `vm_fault` events.)
**Note:** Unlike other rocm-smi CLI commands, this command does not quit unless
specified by the user. Users may press either `q` or `ctrl + c` to quit.
#### Display XGMI Bandwidth Between Nodes
The `rsmi_minmax_bandwidth_get` API reads the HW Topology file and displays
bandwidth (min-max) between any two NUMA nodes in a matrix format.
The Command Line Interface (CLI) command can be called as follows:
```bash
$ rocm-smi --shownodesbw
======================= ROCm System Management Interface =======================
================================== Bandwidth ===================================
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7
GPU0 N/A 50000-200000 50000-50000 0-0 0-0 0-0 50000-100000 0-0
GPU1 50000-200000 N/A 0-0 50000-50000 0-0 50000-50000 0-0 0-0
GPU2 50000-50000 0-0 N/A 50000-200000 50000-100000 0-0 0-0 0-0
GPU3 0-0 50000-50000 50000-200000 N/A 0-0 0-0 0-0 50000-50000
GPU4 0-0 0-0 50000-100000 0-0 N/A 50000-200000 50000-50000 0-0
GPU5 0-0 50000-50000 0-0 0-0 50000-200000 N/A 0-0 50000-50000
GPU6 50000-100000 0-0 0-0 0-0 50000-50000 0-0 N/A 50000-200000
GPU7 0-0 0-0 0-0 50000-50000 0-0 50000-50000 50000-200000 N/A
Format: min-max; Units: mps
============================= End of ROCm SMI Log ==============================
```
The sample output above shows the maximum theoretical xgmi bandwidth between 2
numa nodes,
**Note:** "0-0" min-max bandwidth indicates devices are not connected directly.
#### P2P Connection Status
The `rsmi_is_p2p_accessible` API returns `True` if P2P can be implemented
between two nodes, and returns `False` if P2P cannot be implemented between the
two nodes.
The Command Line Interface command can be called as follows:
```bash
rocm-smi --showtopoaccess
```
Sample Output:
```bash
$ rocm-smi --showtopoaccess
======================= ROCm System Management Interface =======================
===================== Link accessibility between two GPUs ======================
GPU0 GPU1
GPU0 True True
GPU1 True True
============================= End of ROCm SMI Log ==============================
```
## Breaking Changes
### Runtime Breaking Change
Re-ordering of the enumerated type in `hip_runtime_api.h` to better match CUDA.
See below for the difference in enumerated types.
ROCm software will be affected if any of the defined enums listed below are used
in the code. Applications built with ROCm v5.0 enumerated types will work with a
ROCm 4.5.2 driver. However, an undefined behavior error will occur with a ROCm
v4.5.2 application that uses these enumerated types with a ROCm 5.0 runtime.
```c
typedef enum hipDeviceAttribute_t {
hipDeviceAttributeMaxThreadsPerBlock, // Maximum number of threads per block.
hipDeviceAttributeMaxBlockDimX, // Maximum x-dimension of a block.
hipDeviceAttributeMaxBlockDimY, // Maximum y-dimension of a block.
hipDeviceAttributeMaxBlockDimZ, // Maximum z-dimension of a block.
hipDeviceAttributeMaxGridDimX, // Maximum x-dimension of a grid.
hipDeviceAttributeMaxGridDimY, // Maximum y-dimension of a grid.
hipDeviceAttributeMaxGridDimZ, // Maximum z-dimension of a grid.
hipDeviceAttributeMaxSharedMemoryPerBlock, // Maximum shared memory available per block in bytes.
hipDeviceAttributeTotalConstantMemory, // Constant memory size in bytes.
hipDeviceAttributeWarpSize, // Warp size in threads.
hipDeviceAttributeMaxRegistersPerBlock, // Maximum number of 32-bit registers available to a
// thread block. This number is shared by all thread
// blocks simultaneously resident on a
// multiprocessor.
hipDeviceAttributeClockRate, // Peak clock frequency in kilohertz.
hipDeviceAttributeMemoryClockRate, // Peak memory clock frequency in kilohertz.
hipDeviceAttributeMemoryBusWidth, // Global memory bus width in bits.
hipDeviceAttributeMultiprocessorCount, // Number of multiprocessors on the device.
hipDeviceAttributeComputeMode, // Compute mode that device is currently in.
hipDeviceAttributeL2CacheSize, // Size of L2 cache in bytes. 0 if the device doesn't have L2
// cache.
hipDeviceAttributeMaxThreadsPerMultiProcessor, // Maximum resident threads per
// multiprocessor.
hipDeviceAttributeComputeCapabilityMajor, // Major compute capability version number.
hipDeviceAttributeComputeCapabilityMinor, // Minor compute capability version number.
hipDeviceAttributeConcurrentKernels, // Device can possibly execute multiple kernels
// concurrently.
hipDeviceAttributePciBusId, // PCI Bus ID.
hipDeviceAttributePciDeviceId, // PCI Device ID.
hipDeviceAttributeMaxSharedMemoryPerMultiprocessor, // Maximum Shared Memory Per
// Multiprocessor.
hipDeviceAttributeIsMultiGpuBoard, // Multiple GPU devices.
hipDeviceAttributeIntegrated, // iGPU
hipDeviceAttributeCooperativeLaunch, // Support cooperative launch
hipDeviceAttributeCooperativeMultiDeviceLaunch, // Support cooperative launch on multiple devices
hipDeviceAttributeMaxTexture1DWidth, // Maximum number of elements in 1D images
hipDeviceAttributeMaxTexture2DWidth, // Maximum dimension width of 2D images in image elements
hipDeviceAttributeMaxTexture2DHeight, // Maximum dimension height of 2D images in image elements
hipDeviceAttributeMaxTexture3DWidth, // Maximum dimension width of 3D images in image elements
hipDeviceAttributeMaxTexture3DHeight, // Maximum dimensions height of 3D images in image elements
hipDeviceAttributeMaxTexture3DDepth, // Maximum dimensions depth of 3D images in image elements
hipDeviceAttributeCudaCompatibleBegin = 0,
hipDeviceAttributeHdpMemFlushCntl, // Address of the HDP\_MEM\_COHERENCY\_FLUSH\_CNTL register
hipDeviceAttributeHdpRegFlushCntl, // Address of the HDP\_REG\_COHERENCY\_FLUSH\_CNTL register
hipDeviceAttributeEccEnabled = hipDeviceAttributeCudaCompatibleBegin, // Whether ECC support is enabled.
hipDeviceAttributeAccessPolicyMaxWindowSize, // Cuda only. The maximum size of the window policy in bytes.
hipDeviceAttributeAsyncEngineCount, // Cuda only. Asynchronous engines number.
hipDeviceAttributeCanMapHostMemory, // Whether host memory can be mapped into device address space
hipDeviceAttributeCanUseHostPointerForRegisteredMem, // Cuda only. Device can access host registered memory
// at the same virtual address as the CPU
hipDeviceAttributeClockRate, // Peak clock frequency in kilohertz.
hipDeviceAttributeComputeMode, // Compute mode that device is currently in.
hipDeviceAttributeComputePreemptionSupported, // Cuda only. Device supports Compute Preemption.
hipDeviceAttributeConcurrentKernels, // Device can possibly execute multiple kernels concurrently.
hipDeviceAttributeConcurrentManagedAccess, // Device can coherently access managed memory concurrently with the CPU
hipDeviceAttributeCooperativeLaunch, // Support cooperative launch
hipDeviceAttributeCooperativeMultiDeviceLaunch, // Support cooperative launch on multiple devices
hipDeviceAttributeDeviceOverlap, // Cuda only. Device can concurrently copy memory and execute a kernel.
// Deprecated. Use instead asyncEngineCount.
hipDeviceAttributeDirectManagedMemAccessFromHost, // Host can directly access managed memory on
// the device without migration
hipDeviceAttributeGlobalL1CacheSupported, // Cuda only. Device supports caching globals in L1
hipDeviceAttributeHostNativeAtomicSupported, // Cuda only. Link between the device and the host supports native atomic operations
hipDeviceAttributeIntegrated, // Device is integrated GPU
hipDeviceAttributeIsMultiGpuBoard, // Multiple GPU devices.
hipDeviceAttributeKernelExecTimeout, // Run time limit for kernels executed on the device
hipDeviceAttributeL2CacheSize, // Size of L2 cache in bytes. 0 if the device doesn&#39;t have L2 cache.
hipDeviceAttributeLocalL1CacheSupported, // caching locals in L1 is supported
hipDeviceAttributeLuid, // Cuda only. 8-byte locally unique identifier in 8 bytes. Undefined on TCC and non-Windows platforms
hipDeviceAttributeLuidDeviceNodeMask, // Cuda only. Luid device node mask. Undefined on TCC and non-Windows platforms
hipDeviceAttributeComputeCapabilityMajor, // Major compute capability version number.
hipDeviceAttributeManagedMemory, // Device supports allocating managed memory on this system
hipDeviceAttributeMaxBlocksPerMultiProcessor, // Cuda only. Max block size per multiprocessor
hipDeviceAttributeMaxBlockDimX, // Max block size in width.
hipDeviceAttributeMaxBlockDimY, // Max block size in height.
hipDeviceAttributeMaxBlockDimZ, // Max block size in depth.
hipDeviceAttributeMaxGridDimX, // Max grid size in width.
hipDeviceAttributeMaxGridDimY, // Max grid size in height.
hipDeviceAttributeMaxGridDimZ, // Max grid size in depth.
hipDeviceAttributeMaxSurface1D, // Maximum size of 1D surface.
hipDeviceAttributeMaxSurface1DLayered, // Cuda only. Maximum dimensions of 1D layered surface.
hipDeviceAttributeMaxSurface2D, // Maximum dimension (width, height) of 2D surface.
hipDeviceAttributeMaxSurface2DLayered, // Cuda only. Maximum dimensions of 2D layered surface.
hipDeviceAttributeMaxSurface3D, // Maximum dimension (width, height, depth) of 3D surface.
hipDeviceAttributeMaxSurfaceCubemap, // Cuda only. Maximum dimensions of Cubemap surface.
hipDeviceAttributeMaxSurfaceCubemapLayered, // Cuda only. Maximum dimension of Cubemap layered surface.
hipDeviceAttributeMaxTexture1DWidth, // Maximum size of 1D texture.
hipDeviceAttributeMaxTexture1DLayered, // Cuda only. Maximum dimensions of 1D layered texture.
hipDeviceAttributeMaxTexture1DLinear, // Maximum number of elements allocatable in a 1D linear texture.
// Use cudaDeviceGetTexture1DLinearMaxWidth() instead on Cuda.
hipDeviceAttributeMaxTexture1DMipmap, // Cuda only. Maximum size of 1D mipmapped texture.
hipDeviceAttributeMaxTexture2DWidth, // Maximum dimension width of 2D texture.
hipDeviceAttributeMaxTexture2DHeight, // Maximum dimension hight of 2D texture.
hipDeviceAttributeMaxTexture2DGather, // Cuda only. Maximum dimensions of 2D texture if gather operations performed.
hipDeviceAttributeMaxTexture2DLayered, // Cuda only. Maximum dimensions of 2D layered texture.
hipDeviceAttributeMaxTexture2DLinear, // Cuda only. Maximum dimensions (width, height, pitch) of 2D textures bound to pitched memory.
hipDeviceAttributeMaxTexture2DMipmap, // Cuda only. Maximum dimensions of 2D mipmapped texture.
hipDeviceAttributeMaxTexture3DWidth, // Maximum dimension width of 3D texture.
hipDeviceAttributeMaxTexture3DHeight, // Maximum dimension height of 3D texture.
hipDeviceAttributeMaxTexture3DDepth, // Maximum dimension depth of 3D texture.
hipDeviceAttributeMaxTexture3DAlt, // Cuda only. Maximum dimensions of alternate 3D texture.
hipDeviceAttributeMaxTextureCubemap, // Cuda only. Maximum dimensions of Cubemap texture
hipDeviceAttributeMaxTextureCubemapLayered, // Cuda only. Maximum dimensions of Cubemap layered texture.
hipDeviceAttributeMaxThreadsDim, // Maximum dimension of a block
hipDeviceAttributeMaxThreadsPerBlock, // Maximum number of threads per block.
hipDeviceAttributeMaxThreadsPerMultiProcessor, // Maximum resident threads per multiprocessor.
hipDeviceAttributeMaxPitch, // Maximum pitch in bytes allowed by memory copies
hipDeviceAttributeMemoryBusWidth, // Global memory bus width in bits.
hipDeviceAttributeMemoryClockRate, // Peak memory clock frequency in kilohertz.
hipDeviceAttributeComputeCapabilityMinor, // Minor compute capability version number.
hipDeviceAttributeMultiGpuBoardGroupID, // Cuda only. Unique ID of device group on the same multi-GPU board
hipDeviceAttributeMultiprocessorCount, // Number of multiprocessors on the device.
hipDeviceAttributeName, // Device name.
hipDeviceAttributePageableMemoryAccess, // Device supports coherently accessing pageable memory
// without calling hipHostRegister on it
hipDeviceAttributePageableMemoryAccessUsesHostPageTables, // Device accesses pageable memory via the host&#39;s page tables
hipDeviceAttributePciBusId, // PCI Bus ID.
hipDeviceAttributePciDeviceId, // PCI Device ID.
hipDeviceAttributePciDomainID, // PCI Domain ID.
hipDeviceAttributePersistingL2CacheMaxSize, // Cuda11 only. Maximum l2 persisting lines capacity in bytes
hipDeviceAttributeMaxRegistersPerBlock, // 32-bit registers available to a thread block. This number is shared
// by all thread blocks simultaneously resident on a multiprocessor.
hipDeviceAttributeMaxRegistersPerMultiprocessor, // 32-bit registers available per block.
hipDeviceAttributeReservedSharedMemPerBlock, // Cuda11 only. Shared memory reserved by CUDA driver per block.
hipDeviceAttributeMaxSharedMemoryPerBlock, // Maximum shared memory available per block in bytes.
hipDeviceAttributeSharedMemPerBlockOptin, // Cuda only. Maximum shared memory per block usable by special opt in.
hipDeviceAttributeSharedMemPerMultiprocessor, // Cuda only. Shared memory available per multiprocessor.
hipDeviceAttributeSingleToDoublePrecisionPerfRatio, // Cuda only. Performance ratio of single precision to double precision.
hipDeviceAttributeStreamPrioritiesSupported, // Cuda only. Whether to support stream priorities.
hipDeviceAttributeSurfaceAlignment, // Cuda only. Alignment requirement for surfaces
hipDeviceAttributeTccDriver, // Cuda only. Whether device is a Tesla device using TCC driver
hipDeviceAttributeTextureAlignment, // Alignment requirement for textures
hipDeviceAttributeTexturePitchAlignment, // Pitch alignment requirement for 2D texture references bound to pitched memory;
hipDeviceAttributeTotalConstantMemory, // Constant memory size in bytes.
hipDeviceAttributeTotalGlobalMem, // Global memory available on devicice.
hipDeviceAttributeUnifiedAddressing, // Cuda only. An unified address space shared with the host.
hipDeviceAttributeUuid, // Cuda only. Unique ID in 16 byte.
hipDeviceAttributeWarpSize, // Warp size in threads.
hipDeviceAttributeMaxPitch, // Maximum pitch in bytes allowed by memory copies
hipDeviceAttributeTextureAlignment, //Alignment requirement for textures
hipDeviceAttributeTexturePitchAlignment, //Pitch alignment requirement for 2D texture references bound to pitched memory;
hipDeviceAttributeKernelExecTimeout, //Run time limit for kernels executed on the device
hipDeviceAttributeCanMapHostMemory, //Device can map host memory into device address space
hipDeviceAttributeEccEnabled, //Device has ECC support enabled
hipDeviceAttributeCudaCompatibleEnd = 9999,
hipDeviceAttributeAmdSpecificBegin = 10000,
hipDeviceAttributeCooperativeMultiDeviceUnmatchedFunc, // Supports cooperative launch on multiple
// devices with unmatched functions
hipDeviceAttributeCooperativeMultiDeviceUnmatchedGridDim, // Supports cooperative launch on multiple
// devices with unmatched grid dimensions
hipDeviceAttributeCooperativeMultiDeviceUnmatchedBlockDim, // Supports cooperative launch on multiple
// devices with unmatched block dimensions
hipDeviceAttributeCooperativeMultiDeviceUnmatchedSharedMem, // Supports cooperative launch on multiple
// devices with unmatched shared memories
hipDeviceAttributeAsicRevision, // Revision of the GPU in this device
hipDeviceAttributeManagedMemory, // Device supports allocating managed memory on this system
hipDeviceAttributeDirectManagedMemAccessFromHost, // Host can directly access managed memory on
// the device without migration
hipDeviceAttributeConcurrentManagedAccess, // Device can coherently access managed memory
// concurrently with the CPU
hipDeviceAttributePageableMemoryAccess, // Device supports coherently accessing pageable memory
// without calling hipHostRegister on it
hipDeviceAttributePageableMemoryAccessUsesHostPageTables, // Device accesses pageable memory via
// the host's page tables
hipDeviceAttributeCanUseStreamWaitValue // '1' if Device supports hipStreamWaitValue32() and
// hipStreamWaitValue64(), '0' otherwise.
hipDeviceAttributeClockInstructionRate = hipDeviceAttributeAmdSpecificBegin, // Frequency in khz of the timer used by the device-side "clock"
hipDeviceAttributeArch, // Device architecture
hipDeviceAttributeMaxSharedMemoryPerMultiprocessor, // Maximum Shared Memory PerMultiprocessor.
hipDeviceAttributeGcnArch, // Device gcn architecture
hipDeviceAttributeGcnArchName, // Device gcnArch name in 256 bytes
hipDeviceAttributeHdpMemFlushCntl, // Address of the HDP_MEM_COHERENCY_FLUSH_CNTL register
hipDeviceAttributeHdpRegFlushCntl, // Address of the HDP_REG_COHERENCY_FLUSH_CNTL register
hipDeviceAttributeCooperativeMultiDeviceUnmatchedFunc, // Supports cooperative launch on multiple
// devices with unmatched functions
hipDeviceAttributeCooperativeMultiDeviceUnmatchedGridDim, // Supports cooperative launch on multiple
// devices with unmatched grid dimensions
hipDeviceAttributeCooperativeMultiDeviceUnmatchedBlockDim, // Supports cooperative launch on multiple
// devices with unmatched block dimensions
hipDeviceAttributeCooperativeMultiDeviceUnmatchedSharedMem, // Supports cooperative launch on multiple
// devices with unmatched shared memories
hipDeviceAttributeIsLargeBar, // Whether it is LargeBar
hipDeviceAttributeAsicRevision, // Revision of the GPU in this device
hipDeviceAttributeCanUseStreamWaitValue, // '1' if Device supports hipStreamWaitValue32() and
// hipStreamWaitValue64() , '0' otherwise.
hipDeviceAttributeAmdSpecificEnd = 19999,
hipDeviceAttributeVendorSpecificBegin = 20000, // Extended attributes for vendors
} hipDeviceAttribute_t;
```
## Known Issues in This Release
### Incorrect dGPU Behavior When Using AMDVBFlash Tool
The AMDVBFlash tool, used for flashing the VBIOS image to dGPU, does not
communicate with the ROM Controller specifically when the driver is present.
This is because the driver, as part of its runtime power management feature,
puts the dGPU to a sleep state.
As a workaround, users can run `amdgpu.runpm=0`, which temporarily disables the
runtime power management feature from the driver and dynamically changes some
power control-related sysfs files.
### Issue with START Timestamp in ROCProfiler
Users may encounter an issue with the enabled timestamp functionality for
monitoring one or multiple counters. ROCProfiler outputs the following four
timestamps for each kernel:
- Dispatch
- Start
- End
- Complete
#### Issue
This defect is related to the Start timestamp functionality, which incorrectly
shows an earlier time than the Dispatch timestamp.
To reproduce the issue,
1. Enable timing using the `--timestamp on` flag.
2. Use the `-i` option with the input filename that contains the name of the
counter(s) to monitor.
3. Run the program.
4. Check the output result file.
##### Current behavior
`BeginNS` is lower than `DispatchNS`, which is incorrect.
##### Expected behavior
The correct order is:
`Dispatch < Start < End < Complete`
Users cannot use ROCProfiler to measure the time spent on each kernel because of
the incorrect timestamp with counter collection enabled.
##### Recommended Workaround
Users are recommended to collect kernel execution timestamps without monitoring
counters, as follows:
1. Enable timing using the `--timestamp on` flag, and run the application.
2. Rerun the application using the `-i` option with the input filename that
contains the name of the counter(s) to monitor, and save this to a different
output file using the `-o` flag.
3. Check the output result file from step 1.
4. The order of timestamps correctly displays as:
`DispathNS < BeginNS < EndNS < CompleteNS`
1. Users can find the values of the collected counters in the output file
generated in step 2.
### No Support for SMI and ROCDebugger on SRIOV
System Management Interface (SMI) and ROCDebugger are not supported in the SRIOV
environment on any GPU, including the
**Radeon Pro V620 and W6800 Workstation GPUs**. For more information, refer to
the Systems Management Interface documentation.
## Deprecations and Warnings in This Release
### ROCm Libraries Changes Deprecations and Deprecation Removal
- The `hipfft.h` header is now provided only by the `hipfft` package. Up to ROCm
5.0, users would get `hipfft.h` in the rocfft package too.
- The GlobalPairwiseAMG class is now entirely removed, users should use the
PairwiseAMG class instead.
- The `rocsparse_spmm` signature in 5.0 was changed to match that of
`rocsparse_spmm_ex`. In 5.0, `rocsparse_spmm_ex` is still present, but
deprecated. Signature diff for `rocsparse_spmm`
#### `rocsparse_spmm` in 5.0
```c
rocsparse_status rocsparse_spmm(rocsparse_handle handle,
rocsparse_operation trans_A,
rocsparse_operation trans_B,
const void* alpha,
const rocsparse_spmat_descr mat_A,
const rocsparse_dnmat_descr mat_B,
const void* beta,
const rocsparse_dnmat_descr mat_C,
rocsparse_datatype compute_type,
rocsparse_spmm_alg alg,
rocsparse_spmm_stage stage,
size_t* buffer_size,
void* temp_buffer);
```
### `rocsparse_spmm` in 4.0
```c
rocsparse_status rocsparse_spmm(rocsparse_handle handle,
rocsparse_operation trans_A,
rocsparse_operation trans_B,
const void* alpha,
const rocsparse_spmat_descr mat_A,
const rocsparse_dnmat_descr mat_B,
const void* beta,
const rocsparse_dnmat_descr mat_C,
rocsparse_datatype compute_type,
rocsparse_spmm_alg alg,
size_t* buffer_size,
void* temp_buffer);
```
### HIP API Deprecations and Warnings
#### Warning - Arithmetic Operators of HIP Complex and Vector Types
In this release, arithmetic operators of HIP complex and vector types are
deprecated.
- As alternatives to arithmetic operators of HIP complex types, users can use
arithmetic operators of `std::complex` types.
- As alternatives to arithmetic operators of HIP vector types, users can use the
operators of the native clang vector type associated with the data member of
HIP vector types.
During the deprecation, two macros `__HIP_ENABLE_COMPLEX_OPERATORS` and
`__HIP_ENABLE_VECTOR_OPERATORS` are provided to allow users to conditionally
enable arithmetic operators of HIP complex or vector types.
Note, the two macros are mutually exclusive and, by default, set to off.
The arithmetic operators of HIP complex and vector types will be removed in a
future release.
Refer to the HIP API Guide for more information.
#### HIPCC/HIPCONFIG Refactoring
In prior ROCm releases, by default, the `hipcc`/`hipconfig` Perl scripts were
used to identify and set target compiler options, target platform, compiler, and
runtime appropriately.
In ROCm v5.0, `hipcc.bin` and `hipconfig.bin` have been added as the compiled
binary implementations of the `hipcc` and `hipconfig`. These new binaries are
currently a work-in-progress, considered, and marked as experimental. ROCm plans
to fully transition to `hipcc.bin` and `hipconfig.bin` in the a future ROCm
release. The existing `hipcc` and `hipconfig` Perl scripts are renamed to
`hipcc.pl` and `hipconfig.pl` respectively. New top-level `hipcc` and
`hipconfig` Perl scripts are created, which can switch between the Perl script
or the compiled binary based on the environment variable
`HIPCC_USE_PERL_SCRIPT`.
In ROCm 5.0, by default, this environment variable is set to use `hipcc` and
`hipconfig` through the Perl scripts.
Subsequently, Perl scripts will no longer be available in ROCm in a future
release.
### Warning - Compiler-Generated Code Object Version 4 Deprecation
Support for loading compiler-generated code object version 4 will be deprecated
in a future release with no release announcement and replaced with code object 5
as the default version.
The current default is code object version 4.
### Warning - MIOpenTensile Deprecation
MIOpenTensile will be deprecated in a future release.
## Archived Documentation
Older rocm documentation is archived at <https://rocmdocs.amd.com>.
## Disclaimer
The information presented in this document is for informational purposes only
and may contain technical inaccuracies, omissions, and typographical errors.
The information contained herein is subject to change and may be rendered
inaccurate for many reasons, including but not limited to product and roadmap
changes, component and motherboard versionchanges, new model and/or product
releases, product differences between differing manufacturers, software changes,
BIOS flashes, firmware upgrades, or the like. Any computer system has risks of
security vulnerabilities that cannot be completely prevented or mitigated.
AMD assumes no obligation to update or otherwise correct or revise this
information. However, AMD reserves the right to revise this information and to
make changes from time to time to the content hereof without obligation of AMD
to notify any person of such revisions or changes. THIS INFORMATION IS PROVIDED
"AS IS." AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS
HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS
THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED
WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR
PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT,
INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY
INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.AMD, the AMD Arrow logo, and combinations thereof
are trademarks of Advanced Micro Devices, Inc. Other product names used in this
publication are for identification purposes only and may be trademarks of their
respective companies. ©[2021]Advanced Micro Devices, Inc.All rights reserved.
### Third-party Disclaimer
Third-party content is licensed to you directly by the third party that owns the
content and is not licensed to you by AMD. ALL LINKED THIRD-PARTY CONTENT IS
PROVIDED “AS IS” WITHOUT A WARRANTY OF ANY KIND. USE OF SUCH THIRD-PARTY CONTENT
IS DONE AT YOUR SOLE DISCRETION AND UNDER NO CIRCUMSTANCES WILL AMD BE LIABLE TO
YOU FOR ANY THIRD-PARTY CONTENT. YOU ASSUME ALL RISK AND ARE SOLELY RESPONSIBLE
FOR ANY DAMAGES THAT MAY ARISE FROM YOUR USE OF THIRD-PARTY CONTENT.

8
docs/sphinx/README.md Normal file
View File

@@ -0,0 +1,8 @@
# How to build documentation via Sphinx
```bash
pip3 install -r requirements.txt
python -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
```

1
docs/sphinx/RELEASE.md Normal file
View File

@@ -0,0 +1 @@
# Release Notes

View File

@@ -0,0 +1 @@
<svg id="Layer_1" data-name="Layer 1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 139.72 33.32"><defs><style>.cls-1{fill:#fff;}</style></defs><title>AMD-logo-white-v2</title><path class="cls-1" d="M33,31.14H25.21l-2.37-5.72H9.92L7.76,31.14H.14L11.78,2.26h8.34Zm-16.89-22L11.83,20.39h8.89Z" transform="translate(-0.14 -0.03)"/><path class="cls-1" d="M61.1,2.26h6.27V31.14h-7.2v-18l-7.79,9.06h-1.1L43.49,13.1v18h-7.2V2.26h6.27L51.83,13Z" transform="translate(-0.14 -0.03)"/><path class="cls-1" d="M85.61,2.26c10.54,0,16,6.56,16,14.48,0,8.3-5.25,14.4-16.77,14.4H72.86V2.26ZM80.06,25.85h4.7c7.24,0,9.4-4.91,9.4-9.15,0-5-2.67-9.15-9.48-9.15H80.06Z" transform="translate(-0.14 -0.03)"/><polygon class="cls-1" points="130.64 9.08 115.75 9.08 106.68 0 139.72 0 139.72 33.05 130.64 23.97 130.64 9.08"/><polygon class="cls-1" points="115.74 23.98 115.74 10.9 106.4 20.24 106.4 33.33 119.48 33.33 128.82 23.98 115.74 23.98"/></svg>

After

Width:  |  Height:  |  Size: 924 B

View File

@@ -0,0 +1,9 @@
<svg xmlns="http://www.w3.org/2000/svg" version="1.0" preserveAspectRatio="xMidYMid meet" viewBox="1.16 -0.07 462.14 198.07">
<g transform="translate(0.000000,480.000000) scale(0.100000,-0.100000)" fill="#000000" stroke="none">
<path d="M15 4788 c-3 -7 -4 -452 -3 -988 l3 -975 379 -3 c505 -3 622 11 710 88 77 68 105 188 106 456 0 273 -19 354 -106 433 -36 33 -74 56 -119 72 l-65 23 60 22 c161 58 198 133 188 387 -9 216 -55 324 -176 408 -99 69 -251 89 -683 89 -223 0 -291 -3 -294 -12z m645 -363 c24 -24 25 -31 28 -159 5 -206 -9 -236 -114 -236 l-44 0 0 216 0 217 52 -6 c37 -5 59 -14 78 -32z m-12 -726 c12 -5 27 -20 32 -34 13 -34 13 -406 0 -439 -12 -33 -45 -53 -102 -61 l-48 -7 0 281 0 282 48 -7 c26 -4 57 -11 70 -15z"/>
<path d="M1395 4788 c-3 -7 -4 -452 -3 -988 l3 -975 445 0 445 0 3 198 2 197 -190 0 -190 0 0 220 0 220 160 0 160 0 0 179 c0 154 -2 180 -16 185 -9 3 -81 6 -160 6 l-144 0 0 185 0 185 170 0 170 0 0 200 0 200 -425 0 c-331 0 -427 -3 -430 -12z"/>
<path d="M2317 4793 c-4 -3 -7 -93 -7 -199 l0 -193 148 -3 147 -3 5 -785 5 -785 258 -3 257 -2 0 790 0 789 153 3 152 3 3 175 c1 96 0 185 -3 198 l-5 22 -554 0 c-304 0 -556 -3 -559 -7z"/>
<path d="M3595 4788 c-2 -7 -9 -51 -15 -98 -6 -47 -17 -134 -25 -195 -8 -60 -21 -162 -29 -225 -14 -108 -23 -172 -61 -460 -9 -63 -22 -164 -30 -225 -8 -60 -30 -227 -49 -370 -19 -143 -37 -290 -41 -328 l-7 -67 263 2 264 3 13 145 c7 80 15 160 18 178 l5 32 84 0 c56 0 86 -4 92 -12 4 -7 12 -87 18 -178 l10 -165 264 -3 264 -2 -6 42 c-4 24 -20 135 -37 248 -29 192 -78 522 -100 670 -5 36 -23 157 -40 270 -17 113 -42 279 -55 370 -48 322 -54 361 -60 370 -10 16 -734 13 -740 -2z m411 -630 c7 -90 20 -235 29 -323 9 -88 18 -193 22 -233 l6 -73 -84 3 -84 3 3 60 c2 56 33 337 52 485 27 210 33 250 37 246 3 -3 11 -78 19 -168z"/>
</g>
</svg>

After

Width:  |  Height:  |  Size: 1.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

View File

@@ -0,0 +1,22 @@
$(document).ready(() => {
const copy = async(event) => {
return await navigator.clipboard.writeText($(event.target).attr('copydata'));
}
$('.table td code').each( function () {
var text = $(this).text()
$(this).addClass('hovertext')
$(this).attr('copydata', text)
$(this).attr('data-hover', "Click to copy.")
var new_text = text.replaceAll(/_([^\u200B])/g, '_\u200B$1').replaceAll(/([a-z])([A-Z])/g, '$1\u200B$2')
$(this).text(new_text)
$(this).click((event) => {
copy(event)
$(event.target).attr('data-hover', "Copied!")
$(event.target).on("mouseleave", () => {
$(event.target).attr('data-hover', "Click to copy.")
$(event.target).off("mouseleave")
})
})
})
})

View File

@@ -0,0 +1,72 @@
@import url("theme.css");
:root {
--pst-font-size-base: 11px;
}
div#site-navigation {
height: fit-content;
min-height: calc(100vh - 190px);
}
div.content-container {
overflow-y: clip;
}
.hovertext {
position: relative;
/* border-bottom: 1px dotted black; */
}
.hovertext:before {
content: attr(data-hover);
visibility: hidden;
opacity: 0;
width: 140px;
background-color: black;
color: #fff;
text-align: center;
border-radius: 5px;
padding: 5px 0;
transition: opacity 0.5s ease-in-out;
position: absolute;
z-index: 1;
left: 0;
top: 110%;
}
.hovertext:hover:before {
opacity: 1;
visibility: visible;
}
div#rdc-watermark-container {
pointer-events: none;
position: fixed;
height: 100vh;
width: 100vw;
top: 0;
left: 0;
z-index: 10000;
}
img#rdc-watermark {
pointer-events: none;
position: absolute;
top: 50%;
left: 50%;
transform-origin: center;
transform: translate(-50%, -50%) rotate(-45deg);
opacity: 10%;
z-index: 10000;
max-width: 100%;
max-height: calc(100% - 200px);
object-fit: contain;
width: 45%;
}
ul.bd-breadcrumbs {
margin-bottom: 0;
margin-top: 1px;
}

View File

@@ -0,0 +1,58 @@
.rocm-footer {
background-color: black;
color: white;
display: flex;
flex-wrap: wrap;
border-top: 1px solid hsla(216,3%,63%,.5);
align-items: center;
justify-content: center;
text-align: center;
width: 100%;
padding-top: 5px;
line-height: 20px;
height: 120px;
}
.rocm-footer a, .rocm-footer p {
color: white;
}
.rocm-footer>ul {
border-bottom: 1px solid hsla(216,3%,63%,.5);
justify-content: flex-end;
margin-top:15px;
}
.rocm-footer ul {
display: flex;
flex-direction: row;
flex-wrap: wrap;
font-size: 12px;
padding: 0;
padding-bottom: 12px;
width:98vw;
list-style: none inside none;
margin: 0;
}
.rocm-footer div {
width: 98vw;
}
.rocm-footer div {
text-align: start;
}
.rocm-footer a:hover {
color: #e9ecef;
text-decoration: none;
}
.rocm-footer ul li {
margin-right: 5px;
}
.rocm-footer ul li+li {
margin-left: 10px;
padding-left: 8px;
}

View File

@@ -0,0 +1,108 @@
.rocm-header {
background-color: black;
position: -webkit-sticky; /* Safari */
position: sticky;
top: 0;
width: 100%;
min-height: 50px;
overflow: hidden;
font-family: 'Noto Sans', sans-serif;
font-size: 16px;
text-align: left;
height:70px;
}
.rocm-header a {
color: white;
text-decoration: none;
}
.rocm-header-link p {
margin-top: 1em;
margin-bottom: 1em;
}
.rocm-header img#amd-logo{
margin: 1.5em;
width: 8.25rem;
}
.rocm-header img#rocm-logo{
margin: 0;
max-height: 100%;
}
.rocm-header-buttons {
display: inline-block;
height: fit-content;
max-width: 100%;
width: fit-content;
vertical-align: middle;
}
.rocm-header-link:first-child {
margin-left: 4em;
}
.rocm-header-link {
position: relative;
display: inline-block;
height: fit-content;
text-align: center;
vertical-align: middle;
}
.rocm-header-link.rocm-header-last {
position: absolute;
right: 4em;
top: 50%;
transform: translate(0, -50%);
height: 100%;
}
.rocm-header-link .rocm-link-box, .rocm-header-link p {
vertical-align: middle;
color: white;
}
.rocm-header-link .rocm-link box {
font-size: x-large;
}
.rocm-header-link p {
font-size: 16px;
}
.rocm-header-link img, .rocm-header-link .rocm-link-box {
max-height: 50px;
margin-left: 2em;
margin-right: 2em;
}
.rocm-header-link .glow-wrap{
overflow: hidden;
position: absolute;
width: 100%;
height: 100%;
top: 0;
}
.rocm-header-link .glow{
display: block;
position:absolute;
width: 20%;
height: 100%;
background: rgba(255,255,255,.2);
top: 0;
left: 0;
transform-origin: right top;
transform: translate(-100%, 0) skew(-45deg);
filter: blur(2px);
transition: all .5s cubic-bezier(0.645, 0.045, 0.355, 1);
}
.rocm-header-link:hover .glow{
transform-origin: left bottom;
transform: translate(1000%, 0) skew(-45deg);
transition: all .5s cubic-bezier(0.645, 0.045, 0.355, 1);
}

View File

@@ -0,0 +1,11 @@
{% if show_copyright and copyright %}
<div class="copyright">
{% if hasdoc('copyright') %}
{% trans path=pathto('copyright'), copyright=copyright|e %}© <a href="{{ path }}">Copyright</a> {{ copyright }}.{% endtrans %}
<br/>
{% else %}
{% trans copyright=copyright|e %}© Copyright {{ copyright }}.{% endtrans %}
<br/>
{% endif %}
</div>
{% endif %}

View File

@@ -0,0 +1,43 @@
<!-- Copied from pydata-sphinx-theme -->
{%- macro icon_link_nav_item(url, icon, name, type, attributes='') -%}
{%- if url | length > 2 %}
<li class="nav-item">
{%- set attributesDefault = { "href": url, "title": name, "class": "nav-link", "rel": "noopener", "target": "_blank", "data-bs-toggle": "tooltip", "data-bs-placement": "bottom"} %}
{%- if attributes %}{% for key, val in attributes.items() %}
{% set _ = attributesDefault.update(attributes) %}
{% endfor %}{% endif -%}
{% set attributeString = [] %}
{% for key, val in attributesDefault.items() %}
{%- set _ = attributeString.append('%s="%s"' % (key, val)) %}
{% endfor %}
{% set attributeString = attributeString | join(" ") -%}
<a {{ attributeString }}>
{%- if type == "fontawesome" -%}
<span><i class="{{ icon }}"></i></span>
<label class="sr-only">{{ _(name) }}</label>
{%- elif type == "local" -%}
<img src="{{ pathto(icon, 1) }}" class="icon-link-image" alt="{{ _(name) }}"/>
{%- elif type == "url" -%}
<img src="{{ icon }}" class="icon-link-image" alt="{{ _(name) }}"/>
{%- else %}
<span>Incorrectly configured icon link. Type must be `fontawesome`, `url` or `local`.</span>
{%- endif -%}
</a>
</li>
{%- endif -%}
{%- endmacro -%}
<ul id="navbar-icon-links"
class="navbar-nav"
aria-label="{{ _(theme_icon_links_label) }}">
{%- block icon_link_shortcuts -%}
{{ icon_link_nav_item("http://www.github.com/AMD", "fab fa-github", "GitHub", "fontawesome") -}}
{{ icon_link_nav_item("http://www.facebook.com/amd", "fab fa-facebook-f", "Facebook", "fontawesome") -}}
{{ icon_link_nav_item("http://www.twitter.com/amd", "fab fa-twitter", "Twitter", "fontawesome") -}}
{{ icon_link_nav_item("http://www.instagram.com/amd", "fab fa-instagram", "Instagram", "fontawesome") -}}
{{ icon_link_nav_item("http://www.linkedin.com/company/amd", "fab fa-linkedin", "LinkedIn", "fontawesome") -}}
{{ icon_link_nav_item("https://www.amd.com/en/corporate/subscriptions", "fa fa-envelope", "Mail", "fontawesome") -}}
{{ icon_link_nav_item("https://www.youtube.com/user/amd?sub_confirmation=1", "fab fa-youtube", "Youtube", "fontawesome") -}}
{{ icon_link_nav_item("https://www.twitch.tv/amd", "fab fa-twitch", "Twitch", "fontawesome") -}}
{% endblock icon_link_shortcuts -%}
</ul>

View File

@@ -0,0 +1,5 @@
{% extends "!layout.html" %}
{%- block footer %}
{%- include "sections/footer.html" %}
{%- endblock %}

View File

@@ -0,0 +1,10 @@
<p>
{%- if last_updated %}
{% trans prefix=translate('Last updated on'), last_updated=last_updated|e %}{{ prefix }} {{ last_updated }}.{% endtrans %}<br/>
{%- endif %}
{%- if theme_extra_footer %}
<div class="extra_footer">
{{ theme_extra_footer }}
</div>
{%- endif %}
</p>

View File

@@ -0,0 +1,20 @@
<div class="rocm-footer">
{%- include "components/social-links.html" with context -%}
{% include 'components/copyright.html' %}
<div class="rocm-footer-links">
<ul>
<li><a href="https://www.amd.com/en/corporate/copyright">Terms and Conditions (AMD)</a></li>
<li><a href="#">Terms and Conditions (ROCm)</a></li>
<li><a href="https://www.amd.com/en/corporate/privacy">Privacy</a></li>
<li><a href="https://www.amd.com/en/corporate/cookies">Cookie Policy</a></li>
<li><a href="https://www.amd.com/en/corporate/trademarks">Trademarks</a></li>
<li><a href="https://www.amd.com/system/files/documents/statement-human-trafficking-forced-labor.pdf">Statement on Forced Labor</a></li>
<li><a href="https://www.amd.com/en/corporate/competition">Fair and Open Competition</a></li>
<li><a href="https://www.amd.com/system/files/documents/amd-uk-tax-strategy.pdf">UK Tax Strategy</a></li>
</ul>
</div>
</div>
<div id="rdc-watermark-container">
<img id="rdc-watermark" src="{{ pathto('rdc-watermark.svg',1) }}" alt="DRAFT watermark"/>
</div>

View File

@@ -0,0 +1,48 @@
<div class="rocm-header">
<div class="rocm-header-buttons">
<a href="https://www.amd.com" class="rocm-header-link">
<img id="amd-logo" alt="Advanced Micro Devices, Inc." src="{{ pathto('amd-header-logo.svg',1) }}"></img>
<div class="glow-wrap">
<i class="glow"></i>
</div>
</a>
<a href="{{ theme_repository_url }}" class="rocm-header-link">
<div class="rocm-link-box">
<p>GitHub</p>
</div>
<div class="glow-wrap">
<i class="glow"></i>
</div>
</a>
<a href="https://github.com/RadeonOpenCompute/ROCm/discussions" class="rocm-header-link">
<div class="rocm-link-box">
<p>Community</p>
</div>
<div class="glow-wrap">
<i class="glow"></i>
</div>
</a>
<a href="https://github.com/RadeonOpenCompute/ROCm/issues/new" class="rocm-header-link">
<div class="rocm-link-box">
<p>Support</p>
</div>
<div class="glow-wrap">
<i class="glow"></i>
</div>
</a>
<a href="https://www.amd.com/en/technologies/infinity-hub" class="rocm-header-link">
<div class="rocm-link-box">
<p>Infinity Hub</p>
</div>
<div class="glow-wrap">
<i class="glow"></i>
</div>
</a>
<a href="https://rocm.amd.com" class="rocm-header-link rocm-header-last" id="rocm-link">
<img id="rocm-logo" alt="ROCm logo" src="{{ pathto('rocm-on.png',1) }}"></img>
<div class="glow-wrap">
<i class="glow"></i>
</div>
</a>
</div>
</div>

165
docs/sphinx/_toc.yml Normal file
View File

@@ -0,0 +1,165 @@
defaults:
numbered: False
maxdepth: 6
root: index
subtrees:
- entries:
- file: release
subtrees:
- entries:
- file: release/gpu_os_support
- file: release/licensing
- url: https://github.com/RadeonOpenCompute/ROCm/labels/Verified%20Issue
title: Known Issues
- file: release/compatibility
subtrees:
- entries:
- file: reference/framework_compatiblity/framework_compatiblity
- file: reference/kernel_userspace_compatibility/kernel_userspace_comp
- entries:
- file: deploy
subtrees:
- entries:
- file: quick_start
- file: hip_sdk_install_win/hip_sdk_install_win
- file: deploy/docker
- file: deploy/install
- file: deploy/multi
- file: deploy/spack
- file: deploy/build_source
- caption: APIs and Reference
entries:
- file: reference/hip
subtrees:
- entries:
- title: HIP Runtime API
url: https://advanced-micro-devices-hip-saad.readthedocs-hosted.com/en/wip-sphinx/
- title: HIPify - Port Your Code
url: https://advanced-micro-devices-demo--737.com.readthedocs.build/projects/HIPIFY/en/737/
- file: reference/gpu_libraries/math
title: Math Libraries
subtrees:
- entries:
- file: reference/gpu_libraries/blas
subtrees:
- entries:
- title: rocBLAS
url: https://rocmdocs.amd.com/projects/rocBLAS/en/master/
- title: hipBLAS
url: https://rocmdocs.amd.com/projects/hipBLAS/en/master/
- title: rocWMMA
url: https://rocmdocs.amd.com/projects/rocWMMA/en/master/
- file: reference/gpu_libraries/fft
subtrees:
- entries:
- title: rocFFT
url: https://rocmdocs.amd.com/projects/rocFFT/en/master/
- title: hipFFT
url: https://rocmdocs.amd.com/projects/hipFFT/en/master/
- file: reference/gpu_libraries/rand
subtrees:
- entries:
- title: rocRAND
url: https://rocmdocs.amd.com/projects/rocRAND/en/master/
- title: hipRAND
url: https://rocmdocs.amd.com/projects/hipRAND/en/master/
- file: reference/gpu_libraries/solver
subtrees:
- entries:
- title: rocSOLVER
url: https://rocmdocs.amd.com/projects/rocSOLVER/en/master/
- title: hipSOLVER
url: https://rocmdocs.amd.com/projects/hipSOLVER/en/master/
- file: reference/gpu_libraries/sparse
subtrees:
- entries:
- title: rocSPARSE
url: https://rocmdocs.amd.com/projects/rocSPARSE/en/master/
- title: hipSPARSE
url: https://rocmdocs.amd.com/projects/hipSPARSE/en/master/
- file: reference/gpu_libraries/c++_primitives
title: C++ Primitives
subtrees:
- entries:
- url: https://rocmdocs.amd.com/projects/rocPRIM/en/master/
title: rocPRIM
- entries:
- url: https://rocmdocs.amd.com/projects/hipCUB/en/master/
title: hipCUB
- entries:
- url: https://rocmdocs.amd.com/projects/rocThrust/en/master/
title: rocThrust
- file: reference/gpu_libraries/communication
title: Communication Libraries
subtrees:
- entries:
- url: https://rocmdocs.amd.com/projects/RCCL/en/master/
title: RCCL
- url: https://rocmsoftwareplatform.github.io/MIOpen/doc/html/releasenotes.html
title: MIOpen - Machine Intelligence
- url: https://rocmsoftwareplatform.github.io/AMDMIGraphX/doc/html/
title: MIGraphX- Graph Optimization
- file: reference/computer_vision
subtrees:
- entries:
- url: https://rocmdocs.amd.com/projects/MIVisionX/en/master/
title: MIVisionX
- entries:
- url: https://rocmdocs.amd.com/projects/rocAL/en/master/
title: rocAL
- file: reference/openmp/openmp
title: OpenMP
- file: reference/compilers
title: Compilers and Tools
subtrees:
- entries:
- file: reference/rocmcc/rocmcc
title: ROCmCC
- url: http://profiler
title: ROCGDB
- url: http://profiler
title: rocprof
- url: http://profiler
title: roctracer
- url: http://profiler
title: ROCdbgapi
- file: reference/management_tools
title: Management Tools
subtrees:
- entries:
- url: http://smi
title: rocmsmi
- file: reference/gpu_arch
- caption: Understand ROCm
entries:
- title: Compiler Disambiguation
file: understand/compiler_disabiguation
- file: isv_deployment_win
- file: understand/deep_learning/deep_learning
- file: understand/cmake_packages
- caption: How to Guides
entries:
- file: how_to/docker_gpu_isolation
- file: how_to/magma_install/magma_install
- file: how_to/pytorch_install/pytorch_install
- file: how_to/tensorflow_install/tensorflow_install
- file: how_to/system_debugging
- caption: Examples
entries:
- title: rocm-examples
url: https://github.com/
- file: examples/ai_ml_inferencing
title: AI/ML/Inferencing
subtrees:
- entries:
- file: examples/inception_casestudy/inception_casestudy
- file: examples/inception_casestudy_migraphx/inception_casestudy_migraphx
- caption: About
entries:
- file: about

170
docs/sphinx/_toc.yml.in Normal file
View File

@@ -0,0 +1,170 @@
# Anywhere {branch} is used, the branch name will be substituted.
# These comments will also be removed.
defaults:
numbered: False
maxdepth: 6
root: index
subtrees:
- entries:
- file: release
subtrees:
- entries:
- file: release/gpu_os_support
- file: release/licensing
- url: https://github.com/RadeonOpenCompute/ROCm/labels/Verified%20Issue
title: Known Issues
- file: release/compatibility
subtrees:
- entries:
- file: reference/framework_compatiblity/framework_compatiblity
- file: reference/kernel_userspace_compatibility/kernel_userspace_comp
- entries:
- file: deploy
subtrees:
- entries:
- file: quick_start
- file: hip_sdk_install_win/hip_sdk_install_win
- file: deploy/docker
- file: deploy/install
- file: deploy/multi
- file: deploy/spack
- file: deploy/build_source
- caption: APIs and Reference
entries:
- file: reference/hip
subtrees:
- entries:
- title: HIP Runtime API
url: https://advanced-micro-devices-hip-saad.readthedocs-hosted.com/en/wip-sphinx/
- title: HIPify - Port Your Code
url: https://advanced-micro-devices-demo--737.com.readthedocs.build/projects/HIPIFY/en/737/
- file: reference/gpu_libraries/math
title: Math Libraries
subtrees:
- entries:
- file: reference/gpu_libraries/blas
subtrees:
- entries:
- title: rocBLAS
url: https://rocmdocs.amd.com/projects/rocBLAS/en/{branch}/
- title: hipBLAS
url: https://rocmdocs.amd.com/projects/hipBLAS/en/{branch}/
- title: rocWMMA
url: https://rocmdocs.amd.com/projects/rocWMMA/en/{branch}/
- file: reference/gpu_libraries/fft
subtrees:
- entries:
- title: rocFFT
url: https://rocmdocs.amd.com/projects/rocFFT/en/{branch}/
- title: hipFFT
url: https://rocmdocs.amd.com/projects/hipFFT/en/{branch}/
- file: reference/gpu_libraries/rand
subtrees:
- entries:
- title: rocRAND
url: https://rocmdocs.amd.com/projects/rocRAND/en/{branch}/
- title: hipRAND
url: https://rocmdocs.amd.com/projects/hipRAND/en/{branch}/
- file: reference/gpu_libraries/solver
subtrees:
- entries:
- title: rocSOLVER
url: https://rocmdocs.amd.com/projects/rocSOLVER/en/{branch}/
- title: hipSOLVER
url: https://rocmdocs.amd.com/projects/hipSOLVER/en/{branch}/
- file: reference/gpu_libraries/sparse
subtrees:
- entries:
- title: rocSPARSE
url: https://rocmdocs.amd.com/projects/rocSPARSE/en/{branch}/
- title: hipSPARSE
url: https://rocmdocs.amd.com/projects/hipSPARSE/en/{branch}/
- file: reference/gpu_libraries/c++_primitives
title: C++ Primitives
subtrees:
- entries:
- url: https://rocmdocs.amd.com/projects/rocPRIM/en/{branch}/
title: rocPRIM
- entries:
- url: https://rocmdocs.amd.com/projects/hipCUB/en/{branch}/
title: hipCUB
- entries:
- url: https://rocmdocs.amd.com/projects/rocThrust/en/{branch}/
title: rocThrust
- file: reference/gpu_libraries/communication
title: Communication Libraries
subtrees:
- entries:
- url: https://rocmdocs.amd.com/projects/RCCL/en/{branch}/
title: RCCL
- url: https://rocmsoftwareplatform.github.io/MIOpen/doc/html/releasenotes.html
title: MIOpen - Machine Intelligence
- url: https://rocmsoftwareplatform.github.io/AMDMIGraphX/doc/html/
title: MIGraphX- Graph Optimization
- file: reference/computer_vision
subtrees:
- entries:
- url: https://rocmdocs.amd.com/projects/MIVisionX/en/{branch}/
title: MIVisionX
- entries:
- url: https://rocmdocs.amd.com/projects/rocAL/en/{branch}/
title: rocAL
- file: reference/openmp/openmp
title: OpenMP
- file: reference/compilers
title: Compilers and Tools
subtrees:
- entries:
- file: reference/rocmcc/rocmcc
title: ROCmCC
- url: http://profiler
title: ROCGDB
- url: http://profiler
title: rocprof
- url: http://profiler
title: roctracer
- url: http://profiler
title: ROCdbgapi
- file: reference/management_tools
title: Management Tools
subtrees:
- entries:
- url: http://smi
title: rocmsmi
- file: reference/gpu_arch
- caption: Understand ROCm
entries:
- title: Compiler Disambiguation
file: understand/compiler_disabiguation
- file: isv_deployment_win
- file: understand/deep_learning/deep_learning
- file: understand/cmake_packages
- caption: How to Guides
entries:
- file: how_to/docker_gpu_isolation
- file: how_to/deep_learning_rocm
subtrees:
- entries:
- file: how_to/magma_install/magma_install
- file: how_to/pytorch_install/pytorch_install
- file: how_to/tensorflow_install/tensorflow_install
- file: how_to/system_debugging
- caption: Examples
entries:
- title: rocm-examples
url: https://github.com/
- file: examples/ai_ml_inferencing
title: AI/ML/Inferencing
subtrees:
- entries:
- file: examples/inception_casestudy/inception_casestudy
- file: examples/inception_casestudy_migraphx/inception_casestudy_migraphx
- caption: About
entries:
- file: about

71
docs/sphinx/about.md Normal file
View File

@@ -0,0 +1,71 @@
# About ROCm Documentation
ROCm documentation is made available under open source [licenses](licensing.md).
Documentation is built using open source toolchains. Contributions to our
documentation is encouraged and welcome. As a contributor, please familiarize
yourself with our documentation toolchain.
## ReadTheDocs
[ReadTheDocs](https://docs.readthedocs.io/en/stable/) is our frontend for the
our documentation. By frontend, this is the tool that serves our HTML based
documentation to our end users. We are using a paid ReadTheDocs plan. Many
projects were using the free readthedocs plan. All projects should transition to
the paid readthedocs site as this is add free. The paid site has additional
functionality including longer build times, better user monitoring and the
[rocmdoc.amd.com](https://rocmdoc.amd.com) URL. Please contact the documentation
team or devops for readthedocs access.
## Doxygen
[Doxygen](https://www.doxygen.nl/) is the most common inline code documentation
standard. ROCm projects are use Doxygen for public API documentation (unless the
upstream project is using a different tool).
## Sphinx
[Sphinx](https://www.sphinx-doc.org/en/master/) is a documentation generator
originally used for python. It is now widely used in the Open Source community.
Originally, sphinx supported rst based documentation. Markdown support is now
available. ROCm documentation plans to default to markdown for new projects.
Existing projects using rst are under no obligation to convert to markdown. New projects
that believe markdown is not suitable should contact the documentation team
prior to selecting rst.
### Sphinx Theme
ROCm is using the
[Sphinx Book Theme](https://sphinx-book-theme.readthedocs.io/en/latest/). This
theme is used by Jupyter books. ROCm documentation applies some customization
include a header and footer on top of the Sphinx Book Theme. A future custom
ROCm theme will be part of our documentation goals.
### Sphinx Design
Sphinx Design is an extension for sphinx based websites that add design
functionality. Please see the documentation
[here](https://sphinx-design.readthedocs.io/en/latest/index.html). ROCm
documentation uses sphinx design for grids, cards, and synchronized tabs.
Other features may be used in the future.
### Sphinx External TOC
ROCm uses the
[sphinx-external-toc](https://sphinx-external-toc.readthedocs.io/en/latest/intro.html)
for our navigation. This tool allows a yml file based left navigation menu. This
tool was selected due to its flexibility that allows scripts to operate on the
yml file. Please transition to this file for the project's navigation. You can
see the `_toc.yml.in` file in this repository in the docs/sphinx folder for an
example.
### Breathe
Sphinx uses [Breathe](https://www.breathe-doc.org/) to integrate doxygen
content.
## rocm-docs-core pip package
[rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) is an AMD
maintained project that applies customization for our documentation. This
project is the tool most ROCm repositories will use as part of the documentation
build.

View File

17
docs/sphinx/conf.py Normal file
View File

@@ -0,0 +1,17 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
import shutil
shutil.copy2('../../CHANGELOG.md','./')
shutil.copy2('../../RELEASE.md','./')
from rocm_docs import ROCmDocs
docs_core = ROCmDocs("ROCm Documentation")
docs_core.setup()
for sphinx_var in ROCmDocs.SPHINX_VARS:
globals()[sphinx_var] = getattr(docs_core, sphinx_var)

44
docs/sphinx/deploy.md Normal file
View File

@@ -0,0 +1,44 @@
# Deploy
Please follow the guides below to begin your ROCm journey. ROCm can be consumed
via many mechanisms.
:::::{grid} 1 1 3 3
:gutter: 1
::::{grid-item-card}
:padding: 2
Quick Start
^^^
- [Linux](quick_start)
- [Windows](hip_sdk_install_win/hip_sdk_install_win)
::::
::::{grid-item-card}
:padding: 2
Docker
^^^
- [Guide](deploy/docker)
- [Dockerhub](https://hub.docker.com/u/rocm/#!)
::::
::::{grid-item-card}
:padding: 2
[Advanced](deploy/advanced)
^^^
- [Uninstall](deploy/advanced/uninstall)
- [Multi-ROCm Installations](deploy/advanced/multi)
- [spack](deploy/advanced/spack)
- [Build from Source](deploy/advanced/build_source)
::::
:::::
## Related Information
[Release Information](release)

View File

@@ -0,0 +1 @@
# Build from Source

View File

@@ -0,0 +1 @@
# Docker

View File

@@ -0,0 +1,3 @@
# Basic Installation Guide
This guide explains the basic installation of ROCm. This is the recommended starting point for all users of ROCm.

View File

@@ -0,0 +1 @@
# Multi-ROCm Installation

View File

@@ -0,0 +1 @@
# spack

View File

@@ -0,0 +1 @@
# AI/ML/Inferencing

View File

@@ -0,0 +1,5 @@
# Inception V3 with PyTorch
Pull content from
<https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/Deep_Learning_Training.html>.
Ignore training description.

42
docs/sphinx/format_toc.py Normal file
View File

@@ -0,0 +1,42 @@
import os
from typing import Union
from git import Repo, Remote, RemoteReference
from pathlib import Path
def format_toc(repo_path: Union[str, os.PathLike, None] = None):
pwd = Path(__file__).resolve().parent
if repo_path is None:
repo_path = pwd.parent
at_start = True
repo = Repo(repo_path, search_parent_directories=True)
assert not repo.bare
try:
branch = repo.active_branch.name
except TypeError as exc: # HEAD is detached commit
checked_heads = []
for head in repo.heads:
checked_heads.append(head.name)
if head.commit == repo.head.commit:
branch = head.name
break
else: # loop fell through
for remote in repo.remotes:
remote: Remote
for ref in remote.refs:
ref: RemoteReference
if ref.commit == repo.head.commit:
branch = ref.name.split('/')[-1]
break
else: # loop fell through
raise TypeError("A branch name could not be determined.\n(Checked heads: %s)" % ' '.join(checked_heads)) from exc
with open(pwd / '_toc.yml.in', 'r', encoding='utf-8') as input:
with open(pwd / '_toc.yml', 'w', encoding='utf-8') as output:
for line in input.readlines():
if line[0] == '#' and at_start:
continue
at_start = False
output.write(line.format(branch=branch))
if __name__ == '__main__':
format_toc()

View File

@@ -0,0 +1 @@
# GPU Libraries

View File

@@ -0,0 +1,179 @@
# Quick Start (Windows)
The steps to install the HIP SDK for Windows are described in this document.
## System Requirements
The HIP SDK is supported on Windows 10 and 11. The HIP SDK may be installed on a
system without AMD GPUs to use the build toolchains. To run HIP applications, a
compatible GPU is required. Please see the supported GPU guide for more details.
TODO: provide link to supported GPU guide.
## SDK Installation
Installation options are listed in Table 1.
| **Table 1. Components for Installation** | |
|:------------------------:|:----------------:|:------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| **HIP Components** | **Install Type** | **Display Driver** | **Install Options** |
| **HIP SDK Core** | **Full** | Adrenalin 22.40 | **Full:** Provides all AMD Software features and controls for gaming, recording, streaming, and tweaking your performance on your graphics hardware. |
| **HIP Libraries** | **Full** | | **Minimal:** Provides only the basic controls for AMD Software features and does not include advanced features such as performance tweaking or recording and capturing content.|
| **HIP Runtime Compiler** | **Full** | | **Driver Only:** Provides no user interface for AMD Software features. |
| **Ray Tracing** | **Full** | | **Do Not Install** |
| **BitCode Profiler** | **Full** | | |
TODO: describe each installation option.
## HIP SDK Installer
The AMD HIP SDK Installer manages the installation and uninstallation process of
HIP SDK for Windows. This includes system configuration checks, installing
components, and installing the display driver.
To launch the AMD HIP SDK Installer, click the **Setup** icon shown in Figure 1.
The installer will begin to load and detect your system's configuration and
compatibility, as shown in Figure 2. A completely loaded AMD HIP SDK Installer
window will appear, as shown in Figure 3.
| ![Setup](image/Setup-Icon.png) |
|:------------------------------:|
| **Figure 1. Setup Icon** |
| ![Loading Window](image/Loading-Window.png) |
|:-------------------------------------------:|
| **Figure 2. AMD HIP SDK Loading Window** |
| ![Installer Window](image/Installer-Window.png) |
|:-----------------------------------------------:|
| **Figure 3. AMD HIP SDK Installer Window** |
### Installation Selections
By default, all components are selected for installation. Refer to Figure 3 for
an instance when the Select All option is turned on.
**Note** The Select All option only applies to the installation of HIP
components. To install the AMD Display Driver, manually select the install type.
**Note** To customize the install location on your system, click
**Additional Options** under HIP SDK Core and AMD Radeon Vega 10 Graphics. Refer
to the sections [HIP Components](#hip-components) and
[AMD Display Driver](#amd-display-driver) for more information on each
installation.
To make installation selections and install, follow these steps:
1. Scroll the window to AMD Display Driver and select the desired install type.
Refer to the section [AMD Display Driver](#amd-display-driver) for more
information on installation types.
2. Once selected, click **Install** located in the lower right corner, and skip
to [Installing Components](#installing-components).
#### Deselect All
To select individual component installs onto your system click **Deselect All**
in the upper right corner of the installer window, as seen in Figure 3. Figure 4
demonstrates the installer window once the installation components are all
deselected.
| ![DeSelect All](image/DeSelectAll.png) |
|:--------------------------------------:|
| **Figure 4. Deselect All Selection** |
#### HIP Components
By default, each HIP component will be checked off for full installation,
Figures 4 through 8 demonstrate the options available to you when you click
**Additional Options** under each component.
| **Table 2. Custom Selections for Installation** | |
|:------------------------------------------------------------------|:---------------------------------------------------- |
| **If:** | **Then:** |
| You intend to make custom selections for this installation | Skip to the section _Deselect All_. |
| You do not intend to make custom selections for this installation | Continue to the section _AMD Display Driver_. |
**Note** You can manually select installation locations for the HIP SDK Core, as
shown in Figure 5.
| ![HIP SDK Core](image/HIP-SDK-Core.png) |
|:---------------------------------------:|
| **Figure 5. HIP SDK Core Option** |
| ![HIP Libraries](image/HIP-Libraries.png) |
|:-----------------------------------------:|
| **Figure 6. HIP Libraries Option** |
| ![HIP Runtime Compiler](image/HIP-Runtime-Compiler.png) |
|:-------------------------------------------------------:|
| **Figure 7. HIP Runtime Compiler Option** |
| ![HIP Ray Tracing](image/HIP-Ray-Tracing.png) |
|:---------------------------------------------:|
| **Figure 8. HIP Ray Tracing** |
| ![BitCode Profiler](image/BitCode-Profiler.png) |
|:-----------------------------------------------:|
| **Figure 9. BitCode Profiler** |
#### AMD Display Driver
The AMD Display Driver offers three install types:
- Full Install
- Minimal Install
- Driver only
Table 3 describes the difference in each option shown in Figure 10.
**Note** You must perform a system restart for a complete installation of the
Display Driver.
**Note** Unless you intend to factory reset your machine, leave the
**Factory Reset (Optional)** box unchecked. A Factory Reset will remove all
prior versions of AMD HIP SDK and drivers. You will not be able to roll back to
previously installed drivers.
| **Table 3. Display Driver Install Options** | |
|:-------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
| **Install Option** | **Description** |
| **Full Install** | Provides all AMD Software features and controls for gaming, recording, streaming, and tweaking the performance on your graphics hardware. |
| **Minimal Install** | Provides only the basic controls for AMD Software features and does not include advanced features such as performance tweaking or recording and capturing content. |
| **Driver Only** | Provides no user interface for AMD Software features. |
| ![Display Driver](image/AMD-Display-Driver.png) |
|:-----------------------------------------------:|
| **Figure 10. AMD Display Driver Options** |
## Installing Components
Please wait for the installation to complete during as shown in Figure 11.
| ![Installing](image/Installation.png) |
|:-------------------------------------:|
| **Figure 11. Active Installation** |
### Installation Complete
Once the installation is complete, the installer window may prompt you for a
system restart. Click **Restart** at the lower right corner, shown in Figure 12.
| ![Installation Complete](image/Installation-Complete.png) |
|:---------------------------------------------------------:|
| **Figure 12. Installation Complete** |
## Uninstallation
All components, except visual studio plug-in should be uninstalled through
control panel -> Add/Remove Program. For visual studio extension uninstallation,
please refer to
<https://github.com/ROCm-Developer-Tools/HIP-VS/blob/master/README.md>. For the
uninstallation of the HIP SDK Core and drivers repeat the steps in the sections
[HIP SDK Installer](#hip-sdk-installer) and
[Installing Components](#installing-components).
**Note** Selecting **Install** once ROCm has already installed results in its
uninstallation.
| ![Uninstall](image/Uninstallation.png) |
|:----------------------------------------:|
| **Figure 13. HIP SDK Uninstalling** |

Binary file not shown.

After

Width:  |  Height:  |  Size: 163 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 183 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 407 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 465 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 207 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 461 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 461 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 412 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@
# Deep Learning Guide

View File

@@ -0,0 +1,5 @@
# Docker GPU Isolation Techniques
## GPU Passthrough
## Environment Variable

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

View File

@@ -0,0 +1,4 @@
[ZoneTransfer]
ZoneId=3
ReferrerUrl=https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/Frameworks_Installation.html
HostUrl=https://docs-be.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/image.006.png?_LANG=enus

View File

@@ -0,0 +1,471 @@
# Magma Installation for ROCm
Pull content from
<https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/Frameworks_Installation.html>
The following sections cover the different framework installations for ROCm and
Deep Learning applications. Figure 5 provides the sequential flow for the use of
each framework. Refer to the ROCm Compatible Frameworks Release Notes for each
framework's most current release notes at
[/bundle/ROCm-Compatible-Frameworks-Release-Notes/page/Framework_Release_Notes.html](/bundle/ROCm-Compatible-Frameworks-Release-Notes/page/Framework_Release_Notes.html).
| ![Figure 5](figures/image.005.png)|
|:--:|
| <b>Figure 5. ROCm Compatible Frameworks Flowchart</b>|
## PyTorch
PyTorch is an open source Machine Learning Python library, primarily differentiated by Tensor computing with GPU acceleration and a type-based automatic differentiation. Other advanced features include:
- Support for distributed training
- Native ONNX support
- C++ frontend
- The ability to deploy at scale using TorchServe
- A production-ready deployment mechanism through TorchScript
### Installing PyTorch
To install ROCm on bare metal, refer to the section [ROCm Installation](https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4-/page/Prerequisites.html#d2999e60). The recommended option to get a PyTorch environment is through Docker. However, installing the PyTorch wheels package on bare metal is also supported.
#### Option 1 (Recommended): Use Docker Image with PyTorch Pre-installed
Using Docker gives you portability and access to a prebuilt Docker container that has been rigorously tested within AMD. This might also save on the compilation time and should perform as it did when tested without facing potential installation issues.
Follow these steps:
1. Pull the latest public PyTorch Docker image.
```
docker pull rocm/pytorch:latest
```
Optionally, you may download a specific and supported configuration with different user-space ROCm versions, PyTorch versions, and supported operating systems. To download the PyTorch Docker image, refer to [https://hub.docker.com/r/rocm/pytorch](https://hub.docker.com/r/rocm/pytorch).
2. Start a Docker container using the downloaded image.
```
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
```
:::{note}
This will automatically download the image if it does not exist on the host. You can also pass the -v argument to mount any data directories from the host onto the container.
:::
#### Option 2: Install PyTorch Using Wheels Package
PyTorch supports the ROCm platform by providing tested wheels packages. To access this feature, refer to [https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/) and choose the "ROCm" compute platform. Figure 6 is a matrix from pytroch.org that illustrates the installation compatibility between ROCm and the PyTorch build.
| ![Figure 6](figures/image.006.png)|
|:--:|
| <b>Figure 6. Installation Matrix from Pytorch.org</b>|
To install PyTorch using the wheels package, follow these installation steps:
1. Choose one of the following options:
a. Obtain a base Docker image with the correct user-space ROCm version installed from [https://hub.docker.com/repository/docker/rocm/dev-ubuntu-20.04](https://hub.docker.com/repository/docker/rocm/dev-ubuntu-20.04).
or
b. Download a base OS Docker image and install ROCm following the installation directions in the section [Installation](https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4-/page/Prerequisites.html#d2999e60). ROCm 5.2 is installed in this example, as supported by the installation matrix from pytorch.org.
or
c. Install on bare metal. Skip to Step 3.
```
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/dev-ubuntu-20.04:latest
```
3. Install any dependencies needed for installing the wheels package.
```
sudo apt update
sudo apt install libjpeg-dev python3-dev
pip3 install wheel setuptools
```
4. Install torch, torchvision, and torchaudio as specified by the installation matrix.
:::{note}
ROCm 5.2 PyTorch wheel in the command below is shown for reference.
:::
```
pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/rocm5.2/
```
#### Option 3: Install PyTorch Using PyTorch ROCm Base Docker Image
A prebuilt base Docker image is used to build PyTorch in this option. The base Docker has all dependencies installed, including:
- ROCm
- Torchvision
- Conda packages
- Compiler toolchain
Additionally, a particular environment flag (BUILD_ENVIRONMENT) is set, and the build scripts utilize that to determine the build environment configuration.
Follow these steps:
1. Obtain the Docker image.
```
docker pull rocm/pytorch:latest-base
```
The above will download the base container, which does not contain PyTorch.
2. Start a Docker container using the image.
```
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest-base
```
You can also pass the -v argument to mount any data directories from the host onto the container.
3. Clone the PyTorch repository.
```
cd ~
git clone https://github.com/pytorch/pytorch.git
cd pytorch
git submodule update --init recursive
```
4. Build PyTorch for ROCm.
:::{note}
By default in the rocm/pytorch:latest-base, PyTorch builds for these architectures simultaneously:
- gfx900
- gfx906
- gfx908
- gfx90a
- gfx1030
:::
5. To determine your AMD uarch, run:
```
rocminfo | grep gfx
```
6. In the event you want to compile only for your uarch, use:
```
export PYTORCH_ROCM_ARCH=<uarch>
```
\<uarch\> is the architecture reported by the rocminfo command. is the architecture reported by the rocminfo command.
7. Build PyTorch using the following command:
```
./.jenkins/pytorch/build.sh
```
This will first convert PyTorch sources for HIP compatibility and build the PyTorch framework.
8. Alternatively, build PyTorch by issuing the following commands:
```
python3 tools/amd_build/build_amd.py
USE_ROCM=1 MAX_JOBS=4 python3 setup.py install user
```
#### Option 4: Install Using PyTorch Upstream Docker File
Instead of using a prebuilt base Docker image, you can build a custom base Docker image using scripts from the PyTorch repository. This will utilize a standard Docker image from operating system maintainers and install all the dependencies required to build PyTorch, including
- ROCm
- Torchvision
- Conda packages
- Compiler toolchain
Follow these steps:
1. Clone the PyTorch repository on the host.
```
cd ~
git clone https://github.com/pytorch/pytorch.git
cd pytorch
git submodule update --init recursive
```
2. Build the PyTorch Docker image.
```
cd.circleci/docker
./build.sh pytorch-linux-bionic-rocm<version>-py3.7
# eg. ./build.sh pytorch-linux-bionic-rocm3.10-py3.7
```
This should be complete with a message "Successfully build &lt;image_id&gt;."
3. Start a Docker container using the image:
```
docker run -it --cap-add=SYS_PTRACE --security-opt
seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add
video --ipc=host --shm-size 8G <image_id>
```
You can also pass -v argument to mount any data directories from the host onto the container.
4. Clone the PyTorch repository.
```
cd ~
git clone https://github.com/pytorch/pytorch.git
cd pytorch
git submodule update --init --recursive
```
5. Build PyTorch for ROCm.
:::{note}
By default in the rocm/pytorch:latest-base, PyTorch builds for these architectures simultaneously:
- gfx900
- gfx906
- gfx908
- gfx90a
- gfx1030
:::
6. To determine your AMD uarch, run:
```
rocminfo | grep gfx
```
7. If you want to compile only for your uarch:
```
export PYTORCH_ROCM_ARCH=<uarch>
```
\<uarch\> is the architecture reported by the rocminfo command.
8. Build PyTorch using:
```
./.jenkins/pytorch/build.sh
```
This will first convert PyTorch sources to be HIP compatible and then build the PyTorch framework.
Alternatively, build PyTorch by issuing the following commands:
```
python3 tools/amd\_build/build\_amd.py
USE\_ROCM=1 MAX\_JOBS=4 python3 setup.py install --user
```
### Test the PyTorch Installation
You can use PyTorch unit tests to validate a PyTorch installation. If using a prebuilt PyTorch Docker image from AMD ROCm DockerHub or installing an official wheels package, these tests are already run on those configurations. Alternatively, you can manually run the unit tests to validate the PyTorch installation fully.
Follow these steps:
1. Test if PyTorch is installed and accessible by importing the torch package in Python.
:::{note}
Do not run in the PyTorch git folder.
:::
```
python3 -c 'import torch' 2> /dev/null && echo 'Success' || echo 'Failure'
```
2. Test if the GPU is accessible from PyTorch. In the PyTorch framework, torch.cuda is a generic mechanism to access the GPU; it will access an AMD GPU only if available.
```
python3 -c 'import torch; print(torch.cuda.is_available())'
```
3. Run the unit tests to validate the PyTorch installation fully. Run the following command from the PyTorch home directory:
```
BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT:-rocm} ./.jenkins/pytorch/test.sh
```
This ensures that even for wheel installs in a non-controlled environment, the required environment variable will be set to skip certain unit tests for ROCm.
:::{note}
Make sure the PyTorch source code is corresponding to the PyTorch wheel or installation in the Docker image. Incompatible PyTorch source code might give errors when running the unit tests.
:::
This will first install some dependencies, such as a supported torchvision version for PyTorch. Torchvision is used in some PyTorch tests for loading models. Next, this will run all the unit tests.
:::{note}
Some tests may be skipped, as appropriate, based on your system configuration. All features of PyTorch are not supported on ROCm, and the tests that evaluate these features are skipped. In addition, depending on the host memory, or the number of available GPUs, other tests may be skipped. No test should fail if the compilation and installation are correct.
:::
4. Run individual unit tests with the following command:
```
PYTORCH\_TEST\_WITH\_ROCM=1 python3 test/test\_nn.py --verbose
```
test_nn.py can be replaced with any other test set.
### Run a Basic PyTorch Example
The PyTorch examples repository provides basic examples that exercise the functionality of the framework. MNIST (Modified National Institute of Standards and Technology) database is a collection of handwritten digits that may be used to train a Convolutional Neural Network for handwriting recognition. Alternatively, ImageNet is a database of images used to train a network for visual object recognition.
Follow these steps:
1. Clone the PyTorch examples repository.
```
git clone https://github.com/pytorch/examples.git
```
2. Run the MNIST example.
```
cd examples/mnist
```
3. Follow the instructions in the README file in this folder. In this case:
```
pip3 install -r requirements.txt
python3 main.py
```
4. Run the ImageNet example.
```
cd examples/imagenet
```
5. Follow the instructions in the README file in this folder. In this case:
```
pip3 install -r requirements.txt
python3 main.py
```
## MAGMA for ROCm
Matrix Algebra on GPU and Multicore Architectures, abbreviated as MAGMA, is a collection of next-generation dense linear algebra libraries that is designed for heterogeneous architectures, such as multiple GPUs and multi- or many-core CPUs.
MAGMA provides implementations for CUDA, HIP, Intel Xeon Phi, and OpenCL™. For more information, refer to [https://icl.utk.edu/magma/index.html](https://icl.utk.edu/magma/index.html).
### Using MAGMA for PyTorch
Tensor is fundamental to Deep Learning techniques because it provides extensive representational functionalities and math operations. This data structure is represented as a multidimensional matrix. MAGMA accelerates tensor operations with a variety of solutions including driver routines, computational routines, BLAS routines, auxiliary routines, and utility routines.
### Build MAGMA from Source
To build MAGMA from the source, follow these steps:
1. In the event you want to compile only for your uarch, use:
```
export PYTORCH_ROCM_ARCH=<uarch>
```
\<uarch\> is the architecture reported by the rocminfo command.
2. Use the following:
```
export PYTORCH_ROCM_ARCH=<uarch>
# "install" hipMAGMA into /opt/rocm/magma by copying after build
git clone https://bitbucket.org/icl/magma.git
pushd magma
# Fixes memory leaks of magma found while executing linalg UTs
git checkout 5959b8783e45f1809812ed96ae762f38ee701972
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc
echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc
export PATH="${PATH}:/opt/rocm/bin"
if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then
amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'`
else
amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`
fi
for arch in $amdgpu_targets; do
echo "DEVCCFLAGS += --amdgpu-target=$arch" >> make.inc
done
# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
make -f make.gen.hipMAGMA -j $(nproc)
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda
make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda
popd
mv magma /opt/rocm
```
## TensorFlow
TensorFlow is an open source library for solving Machine Learning, Deep Learning, and Artificial Intelligence problems. It can be used to solve many problems across different sectors and industries but primarily focuses on training and inference in neural networks. It is one of the most popular and in-demand frameworks and is very active in open source contribution and development.
### Installing TensorFlow
The following sections contain options for installing TensorFlow.
#### Option 1: Install TensorFlow Using Docker Image
To install ROCm on bare metal, follow the section [ROCm Installation](https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4-/page/Prerequisites.html#d2999e60). The recommended option to get a TensorFlow environment is through Docker.
Using Docker provides portability and access to a prebuilt Docker container that has been rigorously tested within AMD. This might also save compilation time and should perform as tested without facing potential installation issues.
Follow these steps:
1. Pull the latest public TensorFlow Docker image.
```
docker pull rocm/tensorflow:latest
```
2. Once you have pulled the image, run it by using the command below:
```
docker run -it --network=host --device=/dev/kfd --device=/dev/dri
--ipc=host --shm-size 16G --group-add video --cap-add=SYS\_PTRACE
--security-opt seccomp=unconfined rocm/tensorflow:latest
```
#### Option 2: Install TensorFlow Using Wheels Package
To install TensorFlow using the wheels package, follow these steps:
1. Check the Python version.
```
python3 version
```
| If: | Then: |
| ----------- | ----------- |
| The Python version is less than 3.7 | Upgrade Python. |
| The Python version is more than 3.7 | Skip this step and go to Step 3. |
:::{note}
The supported Python versions are:
- 3.7
- 3.8
- 3.9
- 3.10
:::
```
sudo apt-get install python3.7 # or python3.8 or python 3.9 or python 3.10
```
2. Set up multiple Python versions using update-alternatives.
```
update-alternatives --query python3
sudo update-alternatives --install
/usr/bin/python3 python3 /usr/bin/python[version] [priority]
```
:::{note}
Follow the instruction in Step 2 for incompatible Python versions.
:::
```
sudo update-alternatives --config python3
```
3. Follow the screen prompts, and select the Python version installed in Step 2.
4. Install or upgrade PIP.
```
sudo apt install python3-pip
```
To install PIP, use the following:
```
/usr/bin/python[version] -m pip install --upgrade pip
```
Upgrade PIP for Python version installed in step 2:
```
sudo pip3 install --upgrade pip
```
5. Install TensorFlow for the Python version as indicated in Step 2.
```
/usr/bin/python[version] -m pip install --user tensorflow-rocm==[wheel-version] upgrade
```
For a valid wheel version for a ROCm release, refer to the instruction below:
```
sudo apt install rocm-libs rccl
```
6. Update protobuf to 3.19 or lower.
```
/usr/bin/python3.7 -m pip install protobuf=3.19.0
sudo pip3 install tensorflow
```
7. Set the environment variable PYTHONPATH.
```
export PYTHONPATH="./.local/lib/python[version]/site-packages:$PYTHONPATH" #Use same python version as in step 2
```
8. Install libraries.
```
sudo apt install rocm-libs rccl
```
9. Test installation.
```
python3 -c 'import tensorflow' 2> /dev/null && echo 'Success' || echo 'Failure'
```
:::{note}
For details on tensorflow-rocm wheels and ROCm version compatibility, see: [https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-rocm-release.md](https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-rocm-release.md)
:::
### Test the TensorFlow Installation
To test the installation of TensorFlow, run the container image as specified in the previous section Installing TensorFlow. Ensure you have access to the Python shell in the Docker container.
```
python3 -c 'import tensorflow' 2> /dev/null && echo Success || echo Failure
```
### Run a Basic TensorFlow Example
The TensorFlow examples repository provides basic examples that exercise the framework's functionality. The MNIST database is a collection of handwritten digits that may be used to train a Convolutional Neural Network for handwriting recognition.
Follow these steps:
1. Clone the TensorFlow example repository.
```
cd ~
git clone https://github.com/tensorflow/models.git
```
2. Install the dependencies of the code, and run the code.
```
#pip3 install requirement.txt
#python mnist\_tf.py
```

View File

@@ -0,0 +1,6 @@
# PyTorch Installation for ROCm
Pull content from
<https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/Frameworks_Installation.html>
TEST

View File

@@ -0,0 +1,3 @@
# System Debugging Guide
Pull from https://docs.amd.com/bundle/ROCm-System-Level-Debug-Guide-v5.2/page/ROCm_System_Level_Debug_Information.html

View File

@@ -0,0 +1,4 @@
# TensorFlow Installation for ROCm
Pull content from
<https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/Frameworks_Installation.html>

View File

@@ -0,0 +1,4 @@
# Inference Optimization Using MIGraphX
Pull content from
<https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/Optimization.html>

74
docs/sphinx/index.md Normal file
View File

@@ -0,0 +1,74 @@
# AMD ROCm Documentation
Welcome to AMD ROCm's documentation!
:::::{grid} 1 1 2 2
:gutter: 1
::::{grid-item}
:::{dropdown} [Release Info](release)
- Release Notes
- [GPU and OS Support](gpu_os_support)
- [Known Issues](https://github.com/RadeonOpenCompute/ROCm/labels/Verified%20Issue)
- End Of Life and Support Policies
:::
::::
::::{grid-item}
:::{dropdown} [Deploy ROCm](deploy)
- [Quick Start (Linux)](quick_start)
- [Quick Start (Windows)](hip_sdk_install_win/hip_sdk_install_win)
- [Advanced (Linux)](deploy/advanced)
- [Docker](deploy/docker)
:::
::::
:::::
::::{grid} 1 2 2 2
:class-container: rocm-doc-grid
:::{grid-item-card}
:padding: 2
[APIs and Reference](https://example.com)
^^^
- [HIP](reference/hip)
- [OpenMP](reference/openmp/openmp)
- [Compilers and Tools](reference/compilers)
- [Management Tools](reference/tools)
- [GPU Architecture](reference/gpu_arch)
:::
:::{grid-item-card}
:padding: 2
Understand ROCm
^^^
- What compiler should I choose?
- All Articles
:::
:::{grid-item-card}
:padding: 2
How to Guides
^^^
- [How to Isolate GPUs in Docker?](how_to/docker_gpu_isolation)
- [Magma Installation for ROCm](how_to/magma_install/magma_install)
:::
:::{grid-item-card}
:padding: 2
Examples
^^^
- [rocm-examples](https://github.com/amd/rocm-examples)
:::
::::

View File

@@ -0,0 +1,23 @@
# ISV Deployment Guide (Windows)
## Abstract
ISVs deploying applications using the HIP SDK depend on the AMD GPU Drivers, HIP Runtime Library and HIP SDK Libraries. A compatibility matrix table provides details on AMDs support model. AMD GPU Drivers are distributed with a HIP Runtime included. Each HIP Runtime is associated with a HIP compiler version. Applications built with a particular HIP compiler should document its associated HIP Runtime version and AMD GPU Driver as minimum version requirements for its end users. Applications do not distribute the HIP Runtime. Instead, end users will use the HIP Runtime provided by an AMD GPU Driver. AMD provides backward compatibility for applications dynamically linked to the HIP Runtime based on our Driver and HIP support policy. ISV applications using the HIP SDK Libraries, for example hipBLAS, should distribute the HIP SDK Library as part of its installer package. It is recommended not to require end users to install the HIP SDK. AMD provides backward compatibility for AMD Driver and HIP Runtime for the HIP SDK Libraries based on our support policy. AMD support policy for Visual Studio and other third-party compilers are documented here.
## Introduction
This guide is intended for Independent Software Vendors (ISVs) and other software developers intending to build applications with the HIP SDK for Windows. The HIP SDK is intended for developer distribution in contrast to the AMD GPU driver which is intended for all end users. The guide discusses how to use and distribute components from the HIP SDK. The HIP SDK is the collection of the AMD GPU Driver, HIP Runtime and the HIP Libraries. These three parts are distributed in the HIP SDK installer. The compatibility and versioning relation between these three parts is documented here. AMDs support policies for the developer tools allows the ISVs the stability to plan the usage of a tool chain.
## Recommended Library Distribution Model
The HIP SDK is distributed via a Windows installer. This distribution system is only intended for software developers and testers. AMD recommends that end users of the program built against HIP SDK components do not have a requirement to install the HIP SDK. There are two types of ISV applications that use the HIP SDK as follows.
The first group of ISV applications have a dependency on the HIP Runtime and select HIP Header Only Libraries (rocPRIM, hipCUB and rocThrust). This group of ISV applications need to require their end users install an AMD GPU Driver. Each AMD GPU driver has a HIP runtime library bundled with it. The ISV application should ensure that the HIP runtime library has a minimum version associated with it. As the HIP runtime library does not have semantic versioning, the ISV application cannot check for compatibility. However, AMD is committed to not breaking API/ABI compatibility unless the major version number of the HIP runtime is incremented. ISV applications may run without user warning if the HIP major version available in the driver is the same as the HIP major version associated with the compiler it was built with. The ISV at its discretion may throw a warning if the HIP major version is higher than the associate HIP major version of the compiler it was built with.
The second group of ISV application has a dependency on the HIP Runtime and one or more Dynamically Linked HIP Libraries including the HIP RT library. ISV applications with this dependency need to ensure the end user installs an AMD GPU Driver and is recommended to distribute the dynamically linked HIP library in the installer package of its application. This allows end users to avoid installing the HIP SDK. One benefit of this model is smaller disk space required as only required binaries are distributed by the ISV application. It also avoids the end user to have to agree to licensing agreements for the entire HIP SDK. The version checks recommended for the ISV application including dynamically linked HIP Libraries follow the same requirements as the ISV applications that only have the HIP Runtime and header only library. In addition, each dynamically linked HIP library also has a minimum HIP runtime requirement. Checks for the minimum HIP version for each dynamically linked HIP library may be added at the ISVs discretion. Usually, the minimum HIP version check for the HIP runtime is sufficient if dynamically linked HIP libraries come from the same SDK package as the HIP compiler.
Please note AMD does not support static linking to any components distributed in the HIP SDK.
## Conclusion
This guide provides a limited set of guidance for ISVs application deployment. Please refer to the HIP API guides for the SDK and HIP Optimization guides for more information.

View File

@@ -0,0 +1 @@
# Kernel and Userspace Compatibility

View File

236
docs/sphinx/quick_start.md Normal file
View File

@@ -0,0 +1,236 @@
# Quick Start (Linux)
## Install Prerequisites
::::{tab-set}
:::{tab-item} Ubuntu 20.04
:sync: key5
Install kernel headers and modules for the active kernel.
```bash
sudo apt install linux-headers-`uname -r` linux-modules-extra-`uname -r`
```
:::
:::{tab-item} Ubuntu 22.04
:sync: key6
Content 3
:::
:::{tab-item} RHEL8
:sync: key1
Content 1
:::
:::{tab-item} RHEL9
:sync: key2
Content 2
:::
:::{tab-item} SLES15 SP3
:sync: key3
Content 3
:::
:::{tab-item} SLES15 SP4
:sync: key4
Content 3
:::
::::
## Add Repositories
::::{tab-set}
:::{tab-item} Ubuntu 20.04
:sync: key5
Add the ROCm GPG key and add the repositories
```bash
wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] https://repo.radeon.com/amdgpu/latest/ubuntu focal main' | sudo tee /etc/apt/sources.list.d/amdgpu.list
echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/debian/ focal main' | sudo tee /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt-get update
```
:::
:::{tab-item} Ubuntu 22.04
:sync: key6
Content 3
:::
:::{tab-item} RHEL8
:sync: key1
Content 1
:::
:::{tab-item} RHEL9
:sync: key2
Content 2
:::
:::{tab-item} SLES15 SP3
:sync: key3
Content 3
:::
:::{tab-item} SLES15 SP4
:sync: key4
Content 3
:::
::::
## Install Drivers
::::{tab-set}
:::{tab-item} Ubuntu 20.04
:sync: key5
Install the amdgpu kernel module, aka driver, on your system.
```bash
sudo apt install amdgpu-dkms
```
:::
:::{tab-item} Ubuntu 22.04
:sync: key6
Content 3
:::
:::{tab-item} RHEL8
:sync: key1
Content 1
:::
:::{tab-item} RHEL9
:sync: key2
Content 2
:::
:::{tab-item} SLES15 SP3
:sync: key3
Content 3
:::
:::{tab-item} SLES15 SP4
:sync: key4
Content 3
:::
::::
## Install ROCm Runtimes
::::{tab-set}
:::{tab-item} Ubuntu 20.04
:sync: key5
Installs the rocm-hip-runtime metapackage. This contains depedencies for most
common ROCm applications.
```bash
sudo apt install rocm-hip-libraries
```
:::
:::{tab-item} Ubuntu 22.04
:sync: key6
Content 3
:::
:::{tab-item} RHEL8
:sync: key1
Content 1
:::
:::{tab-item} RHEL9
:sync: key2
Content 2
:::
:::{tab-item} SLES15 SP3
:sync: key3
Content 3
:::
:::{tab-item} SLES15 SP4
:sync: key4
Content 3
:::
::::
## Reboot the system
::::{tab-set}
:::{tab-item} Ubuntu 20.04
:sync: key5
The driver requires a system reboot.
```bash
sudo reboot
```
:::
:::{tab-item} Ubuntu 22.04
:sync: key6
Content 3
:::
:::{tab-item} RHEL8
:sync: key1
Content 1
:::
:::{tab-item} RHEL9
:sync: key2
Content 2
:::
:::{tab-item} SLES15 SP3
:sync: key3
Content 3
:::
:::{tab-item} SLES15 SP4
:sync: key4
Content 3
:::
::::

View File

View File

@@ -0,0 +1 @@
# Computer Vision

View File

@@ -0,0 +1 @@
# Development Tools

View File

View File

@@ -0,0 +1,8 @@
# Framework Compatibility
Pull content from
<https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/Prerequisites.html>.
Only the frameworks content. Link to kernel/userspace guide.
Also pull content from
<https://docs.amd.com/bundle/ROCm-Compatible-Frameworks-Release-Notes/page/Framework_Release_Notes.html>

View File

@@ -0,0 +1,10 @@
# GPU Architectures
## ISA Documentation
- [AMD Instinct MI200 Instruction Set Architecture Reference Guide](https://developer.amd.com/wp-content/resources/CDNA2_Shader_ISA_18November2021.pdf)
## Whitepapers
- [AMD CDNA Architecture Whitepaper](https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf)
- [AMD CDNA™ 2 Architecture Whitepaper](https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf)

View File

@@ -0,0 +1,27 @@
# Matrix Multiplication
ROCm libraries for BLAS are as follows:
:::::{grid} 1 1 2 2
:gutter: 1
:::{grid-item-card} hipBLAS
hipBLAS is a compatiblity layer for GPU accelerated BLAS optimized for AMD GPUs
via rocBLAS and rocSOLVER. hipBLAS allows for a common interface for other GPU
BLAS libraries.
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/CHANGELOG.md)
- [API Reference Manual](https://rocmdocs.amd.com/projects/hipBLAS/en/rtd/)
:::
:::{grid-item-card} rocBLAS
rocBLAS is an AMD GPU optimized library for BLAS.
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/CHANGELOG.md)
- [API Reference Manual](https://rocmdocs.amd.com/projects/rocBLAS/en/rtd/)
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocBLAS)
:::
:::::

View File

@@ -0,0 +1 @@
# C++ Primitives

View File

@@ -0,0 +1 @@
# Communication Libraries

View File

@@ -0,0 +1,27 @@
# Fast Fourier Transforms
ROCm libraries for FFT are as follows:
:::::{grid} 1 1 2 2
:gutter: 1
:::{grid-item-card} hipFFT
hipFFT is a compatiblity layer for GPU accelerated FFT optimized for AMD GPUs
using rocFFT. hipFFT allows for a common interface for other non AMD GPU
FFT libraries.
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/CHANGELOG.md)
- [API Reference Manual](https://rocmdocs.amd.com/projects/hipFFT/en/rtd/)
:::
:::{grid-item-card} rocFFT
rocFFT is an AMD GPU optimized library for FFT.
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CHANGELOG.md)
- [API Reference Manual](https://rocmdocs.amd.com/projects/hipFFT/en/rtd/)
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocFFT)
:::
:::::

View File

@@ -0,0 +1 @@
# Math Libraries

View File

@@ -0,0 +1 @@
# Random Numbers

View File

@@ -0,0 +1 @@
# Linear Solvers

View File

@@ -0,0 +1 @@
# Sparse Matrix Solvers

View File

@@ -0,0 +1,2 @@
# HIP

View File

@@ -0,0 +1 @@
# Kernel Userspace Compatiblity Reference

View File

@@ -0,0 +1 @@
# Management Tools

View File

@@ -0,0 +1,4 @@
# OpenMP Support in ROCm
Pull from
<https://docs.amd.com/bundle/OpenMP-Support-Guide-v5.4/page/Introduction_to_OpenMP_Support_Guide.html>

View File

@@ -0,0 +1,23 @@
# Introduction to Compiler Reference Guide
ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance computing on AMD GPUs and CPUs and supports various heterogenous programming models such as HIP, OpenMP, and OpenCL.
ROCmCC is made available via two packages: rocm-llvm and rocm-llvm-alt. The differences are shown in this table:
| <b>Table 1. rocm-llvm vs. rocm-llvm-alt</b>|
| rocm-llvm | rocm-llvm-alt |
| ----------- | ----------- |
| Installed by default when ROCm™ itself is installed | An optional package |
| Provides an open-source compiler | Provides an additional closed-source compiler for users interested in additional CPU optimizations not available in rocm-llvm |
For more details, follow this table:
| <b>Table 2. Details Table</b>|
| For | See |
| ----------- | ----------- |
| The latest usage information for AMD GPU |[https://llvm.org/docs/AMDGPUUsage.html](https://llvm.org/docs/AMDGPUUsage.html) |
|Usage information for a specific ROCm release | [https://llvm.org/docs/AMDGPUUsage.html] (https://llvm.org/docs/AMDGPUUsage.html)|
| Source code for rocm-llvm | [https://github.com/RadeonOpenCompute/llvm-project](https://github.com/RadeonOpenCompute/llvm-project) |

1
docs/sphinx/release.md Normal file
View File

@@ -0,0 +1 @@
# Release Notes

View File

@@ -0,0 +1 @@
# Compatibility

View File

@@ -0,0 +1,95 @@
GPU and OS Support
## OS Support
ROCm supports the operating systems listed below.
| OS | Validated Kernel |
|:------------------:|:------------------:|
| RHEL 9.1 | `5.14` |
| RHEL 8.6 to 8.7 | `4.18` |
| SLES 15 SP4 | |
| Ubuntu 20.04.5 LTS | `5.15` |
| Ubuntu 22.04.1 LTS | `5.15`, OEM `5.17` |
## Virtualization Support
ROCm supports virtualization for select GPUs only as shown below.
| Hypervisor | Version | GPU | Validated Guest OS (validated kernel) |
|:--------------:|:--------:|:-----:|:--------------------------------------------------------------------------------:|
| VMWare |ESXi 8 | MI250 | `Ubuntu 20.04 (5.15.0-56-generic)` |
| VMWare |ESXi 8 | MI210 | `Ubuntu 20.04 (5.15.0-56-generic)`, `SLES 15 SP4 (5.14.21-150400.24.18-default)` |
| VMWare |ESXi 7 | MI210 | `Ubuntu 20.04 (5.15.0-56-generic)`, `SLES 15 SP4( 5.14.21-150400.24.18-default)` |
## GPU Support Table
::::{tab-set}
:::{tab-item} Instinct™
:sync: instinct
Use Driver Shipped with ROCm
|GPU |Architecture |Product|[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Linux | Windows |
|:----------------:|:--------------:|:----:|:--------------------------------------------------------------------:|:------------------------------------:|:-------:|
|Instinct™ MI250X | CDNA2 |All |gfx90a |Supported |Unsupported |
|Instinct™ MI250 | CDNA2 |All |gfx90a |Supported |Unsupported |
|Instinct™ MI210 | CDNA2 |All |gfx90a |Supported |Unsupported |
|Instinct™ MI100 | CDNA |All|gfx908 |Supported |Unsupported |
|Instinct™ MI50 | Vega |All|gfx906 |Supported |Unsupported |
:::
:::{tab-item} Radeon Pro™
:sync: radeonpro
[Use Radeon Pro Driver](https://www.amd.com/en/support/linux-drivers)
|GPU |Architecture |Product|[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Linux | Windows |
|:----------------:|:--------------:|:----:|:--------------------------------------------------------------------:|:------------------------------------:|:-------:|
|Radeon™ Pro W6800 | RDNA2 |All |gfx1030 |Supported |Supported|
|Radeon™ Pro V620 | RDNA2 |All|gfx1030 |Supported |Unsupported|
|Radeon™ RX 6900 XT| RDNA2 |HIP SDK|gfx1030 |Supported |Supported|
|Radeon™ RX 6600 | RDNA2 |HIP|gfx1031 |Supported|Supported|
|Radeon™ R9 Fury | Fiji |All|gfx803 |Community |Unsupported|
:::
:::{tab-item} Radeon™
:sync: radeon
[Use Radeon Pro Driver](https://www.amd.com/en/support/linux-drivers)
|GPU |Architecture |Product|[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Linux | Windows |
|:----------------:|:--------------:|:----:|:--------------------------------------------------------------------:|:------------------------------------:|:-------:|
|Radeon™ RX 6900 XT| RDNA2 |HIP SDK|gfx1030 |Supported |Supported|
|Radeon™ RX 6600 | RDNA2 |HIP|gfx1031 |Supported|Supported|
|Radeon™ R9 Fury | Fiji |All|gfx803 |Community |Unsupported|
:::
::::
### Products in ROCm
ROCm software support varies by GPU type and Operating System. ROCm ecosystem products are three software stack enablement levels that correspond as described below:
- All includes all software that is part of the ROCm ecosystem. Please see [article](link) for details of ROCm.
- HIP SDK includes the HIP Runtime and a selection of GPU libraries for compute. Please see [article](link) for details of HIP SDK.
- HIP enables the use of the HIP Runtime only.
### GPU Support Levels
GPU support levels in ROCm:
- Supported - AMD enables these GPUs in our software distributions for the corresponding ROCm product.
- Unsupported - This configuration is not enabled in our software distributions.
- Deprecated - Support will be removed in a future release.
- Community - AMD does not enable these GPUs in our software distributions but end users are free to enable these GPUs themselves.
## CPU Support
ROCm requires CPUs that support PCIe™ Atomics. Modern CPUs after the release of
1st generation AMD Zen CPU and Intel™ Haswell support PCIe Atomics.

View File

@@ -0,0 +1,93 @@
# Licensing Terms
ROCm™ is released by Advanced Micro Devices, Inc. under the open source licenses
via public GitHub repositories. The following table is a list of ROCm components
with the links to the license terms. The list is ordered to follow ROCm's
manifest file.
| Component | License |
|:------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------:|
| [ROCK-Kernel-Driver](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/master/COPYING) |
| [ROCT-Thunk-Interface](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/) | [MIT](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/master/LICENSE.md) |
| [ROCR-Runtime](https://github.com/RadeonOpenCompute/ROCR-Runtime/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/LICENSE.txt) |
| [rocm_smi_lib](https://github.com/RadeonOpenCompute/rocm_smi_lib/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/master/License.txt) |
| [rocm-cmake](https://github.com/RadeonOpenCompute/rocm-cmake/) | [MIT](https://github.com/RadeonOpenCompute/rocm-cmake/blob/develop/LICENSE) |
| [rocminfo](https://github.com/RadeonOpenCompute/rocminfo/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocminfo/blob/master/License.txt) |
| [rocprofiler](https://github.com/ROCm-Developer-Tools/rocprofiler/) | [MIT](https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/LICENSE) |
| [roctracer](https://github.com/ROCm-Developer-Tools/roctracer/) | [MIT](https://github.com/ROCm-Developer-Tools/roctracer/blob/amd-master/LICENSE) |
| [ROCm-OpenCL-Runtime](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/) | [MIT](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/blob/develop/LICENSE.txt) |
| [ROCm-OpenCL-Runtime/api/opencl/khronos/icd](https://github.com/KhronosGroup/OpenCL-ICD-Loader/) | [Apache 2.0](https://github.com/KhronosGroup/OpenCL-ICD-Loader/blob/main/LICENSE) |
| [clang-ocl](https://github.com/RadeonOpenCompute/clang-ocl/) | [MIT](https://github.com/RadeonOpenCompute/clang-ocl/blob/master/LICENSE) |
| [HIP](https://github.com/ROCm-Developer-Tools/HIP/) | [MIT](https://github.com/ROCm-Developer-Tools/HIP/blob/develop/LICENSE.txt) |
| [hipamd](https://github.com/ROCm-Developer-Tools/hipamd/) | [MIT](https://github.com/ROCm-Developer-Tools/hipamd/blob/develop/LICENSE.txt) |
| [ROCclr](https://github.com/ROCm-Developer-Tools/ROCclr/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCclr/blob/develop/LICENSE.txt) |
| [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY/) | [MIT](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/LICENSE.txt) |
| [HIPCC](https://github.com/ROCm-Developer-Tools/HIPCC/blob/) | [MIT](https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/LICENSE.txt) |
| [llvm-project](https://github.com/ROCm-Developer-Tools/llvm-project/) | [Apache](https://github.com/ROCm-Developer-Tools/llvm-project/blob/main/LICENSE.TXT) |
| [ROCm-Device-Libs](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/LICENSE.TXT) |
| [atmi](https://github.com/RadeonOpenCompute/atmi/) | [MIT](https://github.com/RadeonOpenCompute/atmi/blob/master/LICENSE.txt) |
| [ROCm-CompilerSupport](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/blob/amd-stg-open/LICENSE.txt) |
| [rocr_debug_agent](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/) | [The University of Illinois/NCSA](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/blob/master/LICENSE.txt) |
| [rocm_bandwidth_test](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/blob/master/LICENSE.txt) |
| [half](https://github.com/ROCmSoftwarePlatform/half/) | [MIT](https://github.com/ROCmSoftwarePlatform/half/blob/master/LICENSE.txt) |
| [RCP](https://github.com/GPUOpen-Tools/radeon_compute_profiler/) | [MIT](https://github.com/GPUOpen-Tools/radeon_compute_profiler/blob/master/LICENSE) |
| [ROCgdb](https://github.com/ROCm-Developer-Tools/ROCgdb/) | [GNU General Public License v2.0](https://github.com/ROCm-Developer-Tools/ROCgdb/blob/amd-master/COPYING) |
| [ROCdbgapi](https://github.com/ROCm-Developer-Tools/ROCdbgapi/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCdbgapi/blob/amd-master/LICENSE.txt) |
| [rdc](https://github.com/RadeonOpenCompute/rdc/) | [MIT](https://github.com/RadeonOpenCompute/rdc/blob/master/LICENSE) |
| [rocBLAS](https://github.com/ROCmSoftwarePlatform/rocBLAS/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/LICENSE.md) |
| [Tensile](https://github.com/ROCmSoftwarePlatform/Tensile/) | [MIT](https://github.com/ROCmSoftwarePlatform/Tensile/blob/develop/LICENSE.md) |
| [hipBLAS](https://github.com/ROCmSoftwarePlatform/hipBLAS/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/LICENSE.md) |
| [rocFFT](https://github.com/ROCmSoftwarePlatform/rocFFT/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/LICENSE.md) |
| [hipFFT](https://github.com/ROCmSoftwarePlatform/hipFFT/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/LICENSE.md) |
| [rocRAND](https://github.com/ROCmSoftwarePlatform/rocRAND/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocRAND/blob/develop/LICENSE.txt) |
| [rocSPARSE](https://github.com/ROCmSoftwarePlatform/rocSPARSE/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocSPARSE/blob/develop/LICENSE.md) |
| [rocSOLVER](https://github.com/ROCmSoftwarePlatform/rocSOLVER/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/LICENSE.md) |
| [hipSOLVER](https://github.com/ROCmSoftwarePlatform/hipSOLVER/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/LICENSE.md) |
| [hipSPARSE](https://github.com/ROCmSoftwarePlatform/hipSPARSE/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSPARSE/blob/develop/LICENSE.md) |
| [rocALUTION](https://github.com/ROCmSoftwarePlatform/rocALUTION/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocALUTION/blob/develop/LICENSE.md) |
| [MIOpenGEMM](https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/) | [MIT](https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/blob/master/LICENSE.txt) |
| [MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen/) | [MIT](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/LICENSE.txt) |
| [rccl](https://github.com/ROCmSoftwarePlatform/rccl/) | [Custom](https://github.com/ROCmSoftwarePlatform/rccl/blob/develop/LICENSE.txt) |
| [MIVisionX](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/) | [MIT](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/blob/master/LICENSE.txt) |
| [rocThrust](https://github.com/ROCmSoftwarePlatform/rocThrust/) | [Apache 2.0](https://github.com/ROCmSoftwarePlatform/rocThrust/blob/develop/LICENSE) |
| [hipCUB](https://github.com/ROCmSoftwarePlatform/hipCUB/) | [Custom](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/LICENSE.txt) |
| [rocPRIM](https://github.com/ROCmSoftwarePlatform/rocPRIM/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/LICENSE.txt) |
| [rocWMMA](https://github.com/ROCmSoftwarePlatform/rocWMMA/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/LICENSE.md) |
| [hipfort](https://github.com/ROCmSoftwarePlatform/hipfort/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipfort/blob/master/LICENSE) |
| [AMDMIGraphX](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/) | [MIT](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/LICENSE) |
| [ROCmValidationSuite](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/blob/master/LICENSE) |
| [aomp](https://github.com/ROCm-Developer-Tools/aomp/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/LICENSE) |
| [aomp-extras](https://github.com/ROCm-Developer-Tools/aomp-extras/) | [MIT](https://github.com/ROCm-Developer-Tools/aomp-extras/blob/aomp-dev/LICENSE) |
| [flang](https://github.com/ROCm-Developer-Tools/flang/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/flang/blob/master/LICENSE.txt) |
The additional terms and conditions below apply to your use of ROCm technical
documentation.
©2022 Advanced Micro Devices, Inc. All rights reserved.
The information presented in this document is for informational purposes only
and may contain technical inaccuracies, omissions, and typographical errors. The
information contained herein is subject to change and may be rendered inaccurate
for many reasons, including but not limited to product and roadmap changes,
component and motherboard version changes, new model and/or product releases,
product differences between differing manufacturers, software changes, BIOS
flashes, firmware upgrades, or the like. Any computer system has risks of
security vulnerabilities that cannot be completely prevented or mitigated. AMD
assumes no obligation to update or otherwise correct or revise this information.
However, AMD reserves the right to revise this information and to make changes
from time to time to the content hereof without obligation of AMD to notify any
person of such revisions or changes.
THIS INFORMATION IS PROVIDED “AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES
WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD
SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER
CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN,
EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
AMD, the AMD Arrow logo, ROCm, and combinations thereof are trademarks of
Advanced Micro Devices, Inc. Other product names used in this publication are
for identification purposes only and may be trademarks of their respective
companies.

View File

@@ -0,0 +1 @@
git+https://github.com/RadeonOpenCompute/rocm-docs-core.git

View File

@@ -0,0 +1,275 @@
#
# This file is autogenerated by pip-compile with Python 3.8
# by the following command:
#
# pip-compile requirements.in
#
accessible-pygments==0.0.3
# via pydata-sphinx-theme
alabaster==0.7.13
# via sphinx
asttokens==2.2.1
# via stack-data
attrs==22.2.0
# via
# jsonschema
# jupyter-cache
babel==2.11.0
# via
# pydata-sphinx-theme
# sphinx
backcall==0.2.0
# via ipython
beautifulsoup4==4.11.2
# via pydata-sphinx-theme
breathe==4.34.0
# via rocm-docs-core
certifi==2022.12.7
# via requests
cffi==1.15.1
# via pynacl
charset-normalizer==2.1.1
# via requests
click==8.1.3
# via
# jupyter-cache
# sphinx-external-toc
comm==0.1.2
# via ipykernel
debugpy==1.6.6
# via ipykernel
decorator==5.1.1
# via ipython
deprecated==1.2.13
# via pygithub
docutils==0.16
# via
# breathe
# myst-parser
# pydata-sphinx-theme
# rocm-docs-core
# sphinx
executing==1.2.0
# via stack-data
fastjsonschema==2.16.2
# via nbformat
gitdb==4.0.10
# via gitpython
gitpython==3.1.30
# via rocm-docs-core
greenlet==2.0.2
# via sqlalchemy
idna==3.4
# via requests
imagesize==1.4.1
# via sphinx
importlib-metadata==6.0.0
# via
# jupyter-cache
# jupyter-client
# myst-nb
importlib-resources==5.10.2
# via
# jsonschema
# rocm-docs-core
ipykernel==6.21.2
# via myst-nb
ipython==8.10.0
# via
# ipykernel
# myst-nb
jedi==0.18.2
# via ipython
jinja2==3.1.2
# via
# myst-parser
# sphinx
jsonschema==4.17.3
# via nbformat
jupyter-cache==0.5.0
# via myst-nb
jupyter-client==8.0.3
# via
# ipykernel
# nbclient
jupyter-core==5.2.0
# via
# ipykernel
# jupyter-client
# nbformat
linkify-it-py==1.0.3
# via myst-parser
markdown-it-py==2.2.0
# via
# mdit-py-plugins
# myst-parser
markupsafe==2.1.2
# via jinja2
matplotlib-inline==0.1.6
# via
# ipykernel
# ipython
mdit-py-plugins==0.3.4
# via myst-parser
mdurl==0.1.2
# via markdown-it-py
myst-nb==0.17.1
# via rocm-docs-core
myst-parser[linkify]==0.18.1
# via
# myst-nb
# rocm-docs-core
nbclient==0.5.13
# via
# jupyter-cache
# myst-nb
nbformat==5.7.3
# via
# jupyter-cache
# myst-nb
# nbclient
nest-asyncio==1.5.6
# via
# ipykernel
# nbclient
packaging==23.0
# via
# ipykernel
# pydata-sphinx-theme
# sphinx
parso==0.8.3
# via jedi
pexpect==4.8.0
# via ipython
pickleshare==0.7.5
# via ipython
pkgutil-resolve-name==1.3.10
# via jsonschema
platformdirs==3.0.0
# via jupyter-core
prompt-toolkit==3.0.37
# via ipython
psutil==5.9.4
# via ipykernel
ptyprocess==0.7.0
# via pexpect
pure-eval==0.2.2
# via stack-data
pycparser==2.21
# via cffi
pydata-sphinx-theme==0.13.0rc5
# via sphinx-book-theme
pygithub==1.57
# via rocm-docs-core
pygments==2.14.0
# via
# accessible-pygments
# ipython
# pydata-sphinx-theme
# sphinx
pyjwt==2.6.0
# via pygithub
pynacl==1.5.0
# via pygithub
pyrsistent==0.19.3
# via jsonschema
python-dateutil==2.8.2
# via jupyter-client
pytz==2022.7.1
# via babel
pyyaml==6.0
# via
# jupyter-cache
# myst-nb
# myst-parser
# sphinx-external-toc
pyzmq==25.0.0
# via
# ipykernel
# jupyter-client
requests==2.28.1
# via
# pygithub
# sphinx
rocm-docs-core @ git+https://github.com/RadeonOpenCompute/rocm-docs-core.git
# via -r requirements.in
six==1.16.0
# via
# asttokens
# python-dateutil
smmap==5.0.0
# via gitdb
snowballstemmer==2.2.0
# via sphinx
soupsieve==2.4
# via beautifulsoup4
sphinx==4.3.1
# via
# breathe
# myst-nb
# myst-parser
# pydata-sphinx-theme
# rocm-docs-core
# sphinx-book-theme
# sphinx-copybutton
# sphinx-design
# sphinx-external-toc
sphinx-book-theme==1.0.0rc2
# via rocm-docs-core
sphinx-copybutton==0.5.1
# via rocm-docs-core
sphinx-design==0.3.0
# via rocm-docs-core
sphinx-external-toc==0.3.1
# via rocm-docs-core
sphinxcontrib-applehelp==1.0.4
# via sphinx
sphinxcontrib-devhelp==1.0.2
# via sphinx
sphinxcontrib-htmlhelp==2.0.1
# via sphinx
sphinxcontrib-jsmath==1.0.1
# via sphinx
sphinxcontrib-qthelp==1.0.3
# via sphinx
sphinxcontrib-serializinghtml==1.1.5
# via sphinx
sqlalchemy==1.4.46
# via jupyter-cache
stack-data==0.6.2
# via ipython
tabulate==0.9.0
# via jupyter-cache
tornado==6.2
# via
# ipykernel
# jupyter-client
traitlets==5.9.0
# via
# comm
# ipykernel
# ipython
# jupyter-client
# jupyter-core
# matplotlib-inline
# nbclient
# nbformat
typing-extensions==4.5.0
# via
# myst-nb
# myst-parser
uc-micro-py==1.0.1
# via linkify-it-py
urllib3==1.26.13
# via requests
wcwidth==0.2.6
# via prompt-toolkit
wrapt==1.14.1
# via deprecated
zipp==3.11.0
# via
# importlib-metadata
# importlib-resources
# The following packages are considered to be unsafe in a requirements file:
# setuptools

32
docs/sphinx/rocm_stack.md Normal file
View File

@@ -0,0 +1,32 @@
# The ROCm Stack
ROCm is the GPU computing stack for AMD GPUs. ROCm is comprised of the
components described in this page. Kernel mo
## Kernel Module (Linux)
## HIP Runtime
## Compiler
### hipcc
### AMD Clang
## GPU Libraries
### Math Libraries
The Math libraries are grouped into libraries starting with a roc-prefix and
hip-prefix. Libraries starting with a hip-prefix provide a support for AMD GPUs
and NVIDIA GPUs. Libraries beginning the roc-prefix support AMD GPUs only.
## #Compute Primitives
## Communication Libraries
## AI and ML (Linux only)
## Management Tools (Linux)
## Deployment Tools (Linux)

View File

@@ -0,0 +1,165 @@
===========================
Using CMake
===========================
Most components in ROCm support CMake 3.5 or higher out-of-the-box and do not require any special Find modules. A Find module is often used by
downstream to find the files by guessing locations of files with platform-specific hints. Typically, the Find module is required when the
upstream is not built with CMake or the package configuration files are not available.
ROCm provides the respective *config-file* packages, and this enables ``find_package`` to be used directly. ROCm does not require any Find
module as the *config-file* packages are shipped with the upstream projects.
Finding Dependencies
--------------------
When dependencies are not found in standard locations such as */usr* or */usr/local*, then the ``CMAKE_PREFIX_PATH`` variable can be set to the
installation prefixes. This can be set to multiple locations with a semicolon separating the entries.
There are two ways to set this variable:
- Pass the flag when configuring with ``-DCMAKE_PREFIX_PATH=....`` This approach is preferred when users install the components in custom
locations.
- Append the variable in the CMakeLists.txt file. This is useful if the dependencies are found in a common location. For example, when
the binaries provided on `repo.radeon.com <http://repo.radeon.com>`_ are installed to */opt/rocm*, you can add the following line to a CMakeLists.txt file
::
list (APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm)
Using HIP in CMake
--------------------
There are two ways to use HIP in CMake:
- Use the HIP API without compiling the GPU device code. As there is no GPU code, any C or C++ compiler can be used.
The ``find_package(hip)`` provides the ``hip::host`` target to use HIP in this context
::
# Search for rocm in common locations
list(APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm)
# Find hip
find_package(hip)
# Create the library
add_library(myLib ...)
# Link with HIP
target_link_libraries(myLib hip::host)
.. note::
The ``hip::host`` target provides all the usage requirements needed to use HIP without compiling GPU device code.
- Use HIP API and compile GPU device code. This requires using a
device compiler. The compiler for CMake can be set using either the
``CMAKE_C_COMPILER`` and ``CMAKE_CXX_COMPILER`` variable or using the ``CC`` and
``CXX`` environment variables. This can be set when configuring CMake or
put into a CMake toolchain file. The device compiler must be set to a
compiler that supports AMD GPU targets, which is usually Clang.
The ``find_package(hip)`` provides the ``hip::device`` target to add all the
flags for device compilation
::
# Search for rocm in common locations
list(APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm)
# Find hip
find_package(hip)
# Create library
add_library(myLib ...)
# Link with HIP
target_link_libraries(myLib hip::device)
This project can then be configured with::
cmake -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ ..
Which uses the device compiler provided from the binary packages from
`repo.radeon.com <http://repo.radeon.com>`_.
.. note::
Compiling for the GPU device requires at least C++11. This can be
enabled by setting ``CMAKE_CXX_STANDARD`` or setting the correct compiler flags
in the CMake toolchain.
The GPU device code can be built for different GPU architectures by
setting the ``GPU_TARGETS`` variable. By default, this is set to all the
currently supported architectures for AMD ROCm. It can be set by passing
the flag during configuration with ``-DGPU_TARGETS=gfx900``. It can also be
set in the CMakeLists.txt as a cached variable before calling
``find_package(hip)``::
# Set the GPU to compile for
set(GPU_TARGETS "gfx900" CACHE STRING "GPU targets to compile for")
# Search for rocm in common locations
list(APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm)
# Find hip
find_package(hip)
Using ROCm Libraries
---------------------------
Libraries such as rocBLAS, MIOpen, and others support CMake users as
well.
As illustrated in the example below, to use MIOpen from CMake, you can
call ``find_package(miopen)``, which provides the ``MIOpen`` CMake target. This
can be linked with ``target_link_libraries``::
# Search for rocm in common locations
list(APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm)
# Find miopen
find_package(miopen)
# Create library
add_library(myLib ...)
# Link with miopen
target_link_libraries(myLib MIOpen)
.. note::
Most libraries are designed as host-only API, so using a GPU device
compiler is not necessary for downstream projects unless it uses the GPU
device code.
ROCm CMake Packages
--------------------
+-----------+----------+-------------------------------------------------------+
| Component | Package | Targets |
+===========+==========+=======================================================+
| HIP | hip | hip::host, hip::device |
+-----------+----------+-------------------------------------------------------+
| rocPRIM | rocprim | roc::rocprim |
+-----------+----------+-------------------------------------------------------+
| rocThrust | rocthrust| roc::rocthrust |
+-----------+----------+-------------------------------------------------------+
| hipCUB | hipcub | hip::hipcub |
+-----------+----------+-------------------------------------------------------+
| rocRAND | rocrand | roc::rocrand |
+-----------+----------+-------------------------------------------------------+
| rocBLAS | rocblas | roc::rocblas |
+-----------+----------+-------------------------------------------------------+
| rocSOLVER | rocsolver| roc::rocsolver |
+-----------+----------+-------------------------------------------------------+
| hipBLAS | hipblas | roc::hipblas |
+-----------+----------+-------------------------------------------------------+
| rocFFT | rocfft | roc::rocfft |
+-----------+----------+-------------------------------------------------------+
| hipFFT | hipfft | hip::hipfft |
+-----------+----------+-------------------------------------------------------+
| rocSPARSE | rocsparse| roc::rocsparse |
+-----------+----------+-------------------------------------------------------+
| hipSPARSE | hipsparse|roc::hipsparse |
+-----------+----------+-------------------------------------------------------+
| rocALUTION|rocalution| roc::rocalution |
+-----------+----------+-------------------------------------------------------+
| RCCL | rccl | rccl |
+-----------+----------+-------------------------------------------------------+
| MIOpen | miopen | MIOpen |
+-----------+----------+-------------------------------------------------------+
| MIGraphX | migraphx | migraphx::migraphx, migraphx::migraphx_c, |
| | | migraphx::migraphx_cpu, migraphx::migraphx_gpu, |
| | | migraphx::migraphx_onnx, migraphx::migraphx_tf |
+-----------+----------+-------------------------------------------------------+

View File

@@ -0,0 +1,4 @@
# ROCm Compilers Disambiguation
Pull from
<https://docs.amd.com/bundle/ROCm-Compiler-Reference-Guide-v5.4.1/page/Glossary_of_Compiler_Terms.html>

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

Some files were not shown because too many files have changed in this diff Show More