Documentation Redesign (#1883)
3
.gitignore
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
.vscode
|
||||
build
|
||||
_build
|
||||
14
.readthedocs.yaml
Normal file
@@ -0,0 +1,14 @@
|
||||
# Read the Docs configuration file
|
||||
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
|
||||
|
||||
version: 2
|
||||
|
||||
sphinx:
|
||||
configuration: docs/sphinx/conf.py
|
||||
|
||||
formats: all
|
||||
|
||||
python:
|
||||
version: "3.8"
|
||||
install:
|
||||
- requirements: docs/sphinx/requirements.txt
|
||||
569
CHANGELOG.md
@@ -1,124 +1,155 @@
|
||||
Changelog
|
||||
------------------
|
||||
# AMD ROCm™ Releases
|
||||
# Changelog
|
||||
|
||||
## AMD ROCm™ V5.2 Release
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
AMD ROCm v5.2 is now released. The release documentation is available at https://docs.amd.com.
|
||||
## AMD ROCm™ Releases
|
||||
|
||||
## AMD ROCm™ V5.1.3 Release
|
||||
### AMD ROCm™ V5.2 Release
|
||||
|
||||
AMD ROCm v5.1.3 is now released. The release documentation is available at https://docs.amd.com.
|
||||
AMD ROCm v5.2 is now released. The release documentation is available at
|
||||
<https://docs.amd.com>.
|
||||
|
||||
## AMD ROCm™ V5.1.1 Release
|
||||
### AMD ROCm™ V5.1.3 Release
|
||||
|
||||
AMD ROCm v5.1.1 is now released. The release documentation is available at https://docs.amd.com.
|
||||
AMD ROCm v5.1.3 is now released. The release documentation is available at
|
||||
<https://docs.amd.com>.
|
||||
|
||||
## AMD ROCm™ V5.1 Release
|
||||
### AMD ROCm™ V5.1.1 Release
|
||||
|
||||
AMD ROCm v5.1 is now released. The release documentation is available at https://docs.amd.com.
|
||||
AMD ROCm v5.1.1 is now released. The release documentation is available at
|
||||
<https://docs.amd.com>.
|
||||
|
||||
### AMD ROCm™ V5.1 Release
|
||||
|
||||
## AMD ROCm™ v5.0.2 Release Notes
|
||||
AMD ROCm v5.1 is now released. The release documentation is available at
|
||||
<https://docs.amd.com>.
|
||||
|
||||
### Fixed Defects in This Release
|
||||
### AMD ROCm™ v5.0.2 Release Notes
|
||||
|
||||
#### Fixed Defects in This Release
|
||||
|
||||
The following defects are fixed in the ROCm v5.0.2 release.
|
||||
|
||||
#### Issue with hostcall Facility in HIP Runtime
|
||||
##### Issue with hostcall Facility in HIP Runtime
|
||||
|
||||
In ROCm v5.0, when using the “assert()” call in a HIP kernel, the compiler may sometimes fail to emit kernel metadata related to the hostcall facility, which results in incomplete initialization of the hostcall facility in the HIP runtime. This can cause the HIP kernel to crash when it attempts to execute the “assert()” call.
|
||||
The root cause was an incorrect check in the compiler to determine whether the hostcall facility is required by the kernel. This is fixed in the ROCm v5.0.2 release.
|
||||
The resolution includes a compiler change, which emits the required metadata by default, unless the compiler can prove that the hostcall facility is not required by the kernel. This ensures that the “assert()” call never fails.
|
||||
In ROCm v5.0, when using the `assert()` call in a HIP kernel, the compiler may
|
||||
sometimes fail to emit kernel metadata related to the hostcall facility, which
|
||||
results in incomplete initialization of the hostcall facility in the HIP
|
||||
runtime. This can cause the HIP kernel to crash when it attempts to execute the
|
||||
`assert()` call. The root cause was an incorrect check in the compiler to
|
||||
determine whether the hostcall facility is required by the kernel. This is fixed
|
||||
in the ROCm v5.0.2 release. The resolution includes a compiler change, which
|
||||
emits the required metadata by default, unless the compiler can prove that the
|
||||
hostcall facility is not required by the kernel. This ensures that the
|
||||
`assert()` call never fails.
|
||||
|
||||
**Note**: This fix may lead to breakage in some OpenMP offload use cases, which use print inside a target region and result in an abort in device code. The issue will be fixed in a future release.
|
||||
**Note**: This fix may lead to breakage in some OpenMP offload use cases, which
|
||||
use print inside a target region and result in an abort in device code.
|
||||
The issue will be fixed in a future release.
|
||||
|
||||
#### Compatibility Matrix Updates to ROCm Deep Learning Guide
|
||||
##### Compatibility Matrix Updates to ROCm Deep Learning Guide
|
||||
|
||||
The compatibility matrix in the AMD Deep Learning Guide is updated for ROCm v5.0.2.
|
||||
The compatibility matrix in the AMD Deep Learning Guide is updated for ROCm
|
||||
v5.0.2.
|
||||
|
||||
For more information and documentation updates, refer to https://docs.amd.com.
|
||||
For more information and documentation updates, refer to <https://docs.amd.com>.
|
||||
|
||||
### AMD ROCm™ v5.0.1 Release Notes
|
||||
|
||||
#### Deprecations and Warnings
|
||||
|
||||
## AMD ROCm™ v5.0.1 Release Notes
|
||||
##### Refactor of HIPCC/HIPCONFIG
|
||||
|
||||
### Deprecations and Warnings
|
||||
In prior ROCm releases, by default, the `hipcc`/`hipconfig` Perl scripts were
|
||||
used to identify and set target compiler options, target platform, compiler, and
|
||||
runtime appropriately.
|
||||
|
||||
#### Refactor of HIPCC/HIPCONFIG
|
||||
In ROCm v5.0.1, `hipcc.bin` and `hipconfig.bin` have been added as the compiled
|
||||
binary implementations of the `hipcc` and `hipconfig`. These new binaries are
|
||||
currently a work-in-progress, considered, and marked as experimental. ROCm plans
|
||||
to fully transition to `hipcc.bin` and `hipconfig.bin` in the a future ROCm
|
||||
release. The existing `hipcc` and `hipconfig` Perl scripts are renamed to
|
||||
`hipcc.pl` and `hipconfig.pl` respectively. New top-level `hipcc` and
|
||||
`hipconfig` Perl scripts are created, which can switch between the Perl script
|
||||
or the compiled binary based on the environment variable
|
||||
`HIPCC_USE_PERL_SCRIPT`.
|
||||
|
||||
In prior ROCm releases, by default, the hipcc/hipconfig Perl scripts were used to identify and set target compiler options, target platform, compiler, and runtime appropriately.
|
||||
In ROCm v5.0.1, hipcc.bin and hipconfig.bin have been added as the compiled binary implementations of the hipcc and hipconfig. These new binaries are currently a work-in-progress, considered, and marked as experimental. ROCm plans to fully transition to hipcc.bin and hipconfig.bin in the a future ROCm release. The existing hipcc and hipconfig Perl scripts are renamed to hipcc.pl and hipconfig.pl respectively. New top-level hipcc and hipconfig Perl scripts are created, which can switch between the Perl script or the compiled binary based on the environment variable HIPCC_USE_PERL_SCRIPT.
|
||||
In ROCm 5.0.1, by default, this environment variable is set to use hipcc and hipconfig through the Perl scripts.
|
||||
Subsequently, Perl scripts will no longer be available in ROCm in a future release.
|
||||
In ROCm 5.0.1, by default, this environment variable is set to use `hipcc` and
|
||||
`hipconfig` through the Perl scripts. Subsequently, Perl scripts will no longer
|
||||
be available in ROCm in a future release.
|
||||
|
||||
#### ROCM DOCUMENTATION UPDATES FOR ROCM 5.0.1
|
||||
|
||||
### ROCM DOCUMENTATION UPDATES FOR ROCM 5.0.1
|
||||
- ROCm Downloads Guide
|
||||
|
||||
* ROCm Downloads Guide
|
||||
- ROCm Installation Guide
|
||||
|
||||
* ROCm Installation Guide
|
||||
- ROCm Release Notes
|
||||
|
||||
* ROCm Release Notes
|
||||
For more information, see <https://docs.amd.com>.
|
||||
|
||||
For more information, see https://docs.amd.com.
|
||||
### AMD ROCm™ v5.0 Release Notes
|
||||
|
||||
## ROCm Installation Updates
|
||||
|
||||
|
||||
## AMD ROCm™ v5.0 Release Notes
|
||||
|
||||
|
||||
# ROCm Installation Updates
|
||||
|
||||
This document describes the features, fixed issues, and information about downloading and installing the AMD ROCm™ software.
|
||||
This document describes the features, fixed issues, and information about
|
||||
downloading and installing the AMD ROCm™ software.
|
||||
|
||||
It also covers known issues and deprecations in this release.
|
||||
|
||||
## Notice for Open-source and Closed-source ROCm Repositories in Future Releases
|
||||
|
||||
To make a distinction between open-source and closed-source components, all ROCm repositories will consist of sub-folders in future releases.
|
||||
To make a distinction between open-source and closed-source components, all ROCm
|
||||
repositories will consist of sub-folders in future releases.
|
||||
|
||||
- All open-source components will be placed in the `base-url/<rocm-ver>/main` sub-folder
|
||||
- All closed-source components will reside in the `base-url/<rocm-ver>/proprietary` sub-folder
|
||||
- All open-source components will be placed in the `base-url/<rocm-ver>/main`
|
||||
sub-folder
|
||||
- All closed-source components will reside in the
|
||||
`base-url/<rocm-ver>/proprietary` sub-folder
|
||||
|
||||
## List of Supported Operating Systems
|
||||
### List of Supported Operating Systems
|
||||
|
||||
The AMD ROCm platform supports the following operating systems:
|
||||
|
||||
| **OS-Version (64-bit)** | **Kernel Versions** |
|
||||
| --- | --- |
|
||||
| CentOS 8.3 | 4.18.0-193.el8 |
|
||||
| CentOS 7.9 | 3.10.0-1127 |
|
||||
| RHEL 8.5 | 4.18.0-348.7.1.el8\_5.x86\_64 |
|
||||
| RHEL 8.4 | 4.18.0-305.el8.x86\_64 |
|
||||
| RHEL 7.9 | 3.10.0-1160.6.1.el7 |
|
||||
| SLES 15 SP3 | 5.3.18-59.16-default |
|
||||
| Ubuntu 20.04.3 | 5.8.0 LTS / 5.11 HWE |
|
||||
| Ubuntu 18.04.5 [5.4 HWE kernel] | 5.4.0-71-generic |
|
||||
| **OS-Version (64-bit)** | **Kernel Versions** |
|
||||
|:-------------------------------:|:-----------------------------:|
|
||||
| CentOS 8.3 | `4.18.0-193.el8` |
|
||||
| CentOS 7.9 | `3.10.0-1127` |
|
||||
| RHEL 8.5 | `4.18.0-348.7.1.el8_5.x86_64` |
|
||||
| RHEL 8.4 | `4.18.0-305.el8.x86_64` |
|
||||
| RHEL 7.9 | `3.10.0-1160.6.1.el7` |
|
||||
| SLES 15 SP3 | `5.3.18-59.16-default` |
|
||||
| Ubuntu 20.04.3 | `5.8.0 LTS / 5.11 HWE` |
|
||||
| Ubuntu 18.04.5 [5.4 HWE kernel] | `5.4.0-71-generic` |
|
||||
|
||||
### Support for RHEL v8.5
|
||||
#### Support for RHEL v8.5
|
||||
|
||||
This release extends support for RHEL v8.5.
|
||||
|
||||
### Supported GPUs
|
||||
#### Supported GPUs
|
||||
|
||||
#### Radeon Pro V620 and W6800 Workstation GPUs
|
||||
##### Radeon Pro V620 and W6800 Workstation GPUs
|
||||
|
||||
This release extends ROCm support for Radeon Pro V620 and W6800 Workstation GPUs.
|
||||
This release extends ROCm support for Radeon Pro V620 and W6800 Workstation
|
||||
GPUs.
|
||||
|
||||
- SRIOV virtualization support for Radeon Pro V620
|
||||
- KVM Hypervisor (1VF support only) on Ubuntu Host OS with Ubuntu, CentOs, and RHEL Guest
|
||||
- Support for ROCm-SMI in an SRIOV environment. For more details, refer to the ROCm SMI API documentation.
|
||||
- KVM Hypervisor (1VF support only) on Ubuntu Host OS with Ubuntu, CentOs, and
|
||||
RHEL Guest
|
||||
- Support for ROCm-SMI in an SRIOV environment. For more details, refer to the
|
||||
ROCm SMI API documentation.
|
||||
|
||||
**Note:** Radeon Pro v620 is not supported on SLES.
|
||||
|
||||
## ROCm Installation Updates for ROCm v5.0
|
||||
### ROCm Installation Updates for ROCm v5.0
|
||||
|
||||
This release has the following ROCm installation enhancements.
|
||||
|
||||
### Support for Kernel Mode Driver
|
||||
#### Support for Kernel Mode Driver
|
||||
|
||||
In this release, users can install the kernel-mode driver using the Installer method. Some of the ROCm-specific use cases that the installer currently supports are:
|
||||
In this release, users can install the kernel-mode driver using the Installer
|
||||
method. Some of the ROCm-specific use cases that the installer currently
|
||||
supports are:
|
||||
|
||||
- OpenCL (ROCr/KFD based) runtime
|
||||
- HIP runtimes
|
||||
@@ -127,56 +158,63 @@ In this release, users can install the kernel-mode driver using the Installer me
|
||||
- ROCr runtime and thunk
|
||||
- Kernel-mode driver
|
||||
|
||||
### Support for Multi-version ROCm Installation and Uninstallation
|
||||
#### Support for Multi-version ROCm Installation and Uninstallation
|
||||
|
||||
Users now can install multiple ROCm releases simultaneously on a system using the newly introduced installer script and package manager install mechanism.
|
||||
Users now can install multiple ROCm releases simultaneously on a system using
|
||||
the newly introduced installer script and package manager install mechanism.
|
||||
|
||||
Users can also uninstall multi-version ROCm releases using the `amdgpu-uninstall` script and package manager.
|
||||
Users can also uninstall multi-version ROCm releases using the
|
||||
`amdgpu-uninstall` script and package manager.
|
||||
|
||||
### Support for Updating Information on Local Repositories
|
||||
#### Support for Updating Information on Local Repositories
|
||||
|
||||
In this release, the `amdgpu-install` script automates the process of updating local repository information before proceeding to ROCm installation.
|
||||
In this release, the `amdgpu-install` script automates the process of updating
|
||||
local repository information before proceeding to ROCm installation.
|
||||
|
||||
### Support for Release Upgrades
|
||||
#### Support for Release Upgrades
|
||||
|
||||
Users can now upgrade the existing ROCm installation to specific or latest ROCm releases.
|
||||
Users can now upgrade the existing ROCm installation to specific or latest ROCm
|
||||
releases.
|
||||
|
||||
For more details, refer to the AMD ROCm Installation Guide v5.0.
|
||||
|
||||
# AMD ROCm V5.0 Documentation Updates
|
||||
## AMD ROCm V5.0 Documentation Updates
|
||||
|
||||
## New AMD ROCm Information Portal – ROCm v4.5 and Above
|
||||
### New AMD ROCm Information Portal – ROCm v4.5 and Above
|
||||
|
||||
Beginning ROCm release v5.0, AMD ROCm documentation has a new portal at https://docs.amd.com. This portal consists of ROCm documentation v4.5 and above.
|
||||
Beginning ROCm release v5.0, AMD ROCm documentation has a new portal at
|
||||
<https://docs.amd.com>. This portal consists of ROCm documentation v4.5 and
|
||||
above.
|
||||
|
||||
For documentation prior to ROCm v4.5, you may continue to access https://rocmdocs.amd.com.
|
||||
For documentation prior to ROCm v4.5, you may continue to access
|
||||
<https://rocmdocs.amd.com>.
|
||||
|
||||
## Documentation Updates for ROCm 5.0
|
||||
### Documentation Updates for ROCm 5.0
|
||||
|
||||
### Deployment Tools
|
||||
#### Deployment Tools
|
||||
|
||||
#### ROCm Data Center Tool Documentation Updates
|
||||
##### ROCm Data Center Tool Documentation Updates
|
||||
|
||||
- ROCm Data Center Tool User Guide
|
||||
- ROCm Data Center Tool API Guide
|
||||
|
||||
#### ROCm System Management Interface Updates
|
||||
##### ROCm System Management Interface Updates
|
||||
|
||||
- System Management Interface Guide
|
||||
- System Management Interface API Guide
|
||||
|
||||
#### ROCm Command Line Interface Updates
|
||||
##### ROCm Command Line Interface Updates
|
||||
|
||||
- Command Line Interface Guide
|
||||
|
||||
### Machine Learning/AI Documentation Updates
|
||||
#### Machine Learning/AI Documentation Updates
|
||||
|
||||
- Deep Learning Guide
|
||||
- MIGraphX API Guide
|
||||
- MIOpen API Guide
|
||||
- MIVisionX API Guide
|
||||
|
||||
### ROCm Libraries Documentation Updates
|
||||
#### ROCm Libraries Documentation Updates
|
||||
|
||||
- hipSOLVER User Guide
|
||||
- RCCL User Guide
|
||||
@@ -188,87 +226,90 @@ For documentation prior to ROCm v4.5, you may continue to access https://rocmdoc
|
||||
- rocSPARSE User Guide
|
||||
- rocThrust User Guide
|
||||
|
||||
### Compilers and Tools
|
||||
#### Compilers and Tools
|
||||
|
||||
#### ROCDebugger Documentation Updates
|
||||
##### ROCDebugger Documentation Updates
|
||||
|
||||
- ROCDebugger User Guide
|
||||
- ROCDebugger API Guide
|
||||
|
||||
#### ROCTracer
|
||||
##### ROCTracer
|
||||
|
||||
- ROCTracer User Guide
|
||||
- ROCTracer API Guide
|
||||
|
||||
#### Compilers
|
||||
##### Compilers
|
||||
|
||||
- AMD Instinct High Performance Computing and Tuning Guide
|
||||
- AMD Compiler Reference Guide
|
||||
|
||||
#### HIPify Documentation
|
||||
##### HIPify Documentation
|
||||
|
||||
- HIPify User Guide
|
||||
- HIP Supported CUDA API Reference Guide
|
||||
|
||||
#### ROCm Debug Agent
|
||||
##### ROCm Debug Agent
|
||||
|
||||
- ROCm Debug Agent Guide
|
||||
- System Level Debug Guide
|
||||
- ROCm Validation Suite
|
||||
|
||||
### Programming Models Documentation
|
||||
#### Programming Models Documentation
|
||||
|
||||
#### HIP Documentation
|
||||
##### HIP Documentation
|
||||
|
||||
- HIP Programming Guide
|
||||
- HIP API Guide
|
||||
- HIP FAQ Guide
|
||||
|
||||
#### OpenMP Documentation
|
||||
##### OpenMP Documentation
|
||||
|
||||
- OpenMP Support Guide
|
||||
|
||||
### ROCm Glossary
|
||||
#### ROCm Glossary
|
||||
|
||||
- ROCm Glossary – Terms and Definitions
|
||||
|
||||
## AMD ROCm Legacy Documentation Links – ROCm v4.3 and Prior
|
||||
### AMD ROCm Legacy Documentation Links – ROCm v4.3 and Prior
|
||||
|
||||
- For AMD ROCm documentation, see
|
||||
|
||||
https://rocmdocs.amd.com/en/latest/
|
||||
- For AMD ROCm documentation, see <https://rocmdocs.amd.com/en/latest/>
|
||||
|
||||
- For installation instructions on supported platforms, see
|
||||
|
||||
https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html
|
||||
<https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html>
|
||||
|
||||
- For AMD ROCm binary structure, see
|
||||
|
||||
https://rocmdocs.amd.com/en/latest/Installation_Guide/Software-Stack-for-AMD-GPU.html
|
||||
<https://rocmdocs.amd.com/en/latest/Installation_Guide/Software-Stack-for-AMD-GPU.html>
|
||||
|
||||
- For AMD ROCm release history, see
|
||||
<https://rocmdocs.amd.com/en/latest/Current_Release_Notes/ROCm-Version-History.html>
|
||||
|
||||
https://rocmdocs.amd.com/en/latest/Current_Release_Notes/ROCm-Version-History.html
|
||||
## What's New in This Release
|
||||
|
||||
# What's New in This Release
|
||||
|
||||
## HIP Enhancements
|
||||
### HIP Enhancements
|
||||
|
||||
The ROCm v5.0 release consists of the following HIP enhancements.
|
||||
|
||||
### HIP Installation Guide Updates
|
||||
#### HIP Installation Guide Updates
|
||||
|
||||
The HIP Installation Guide is updated to include building HIP from source on the NVIDIA platform.
|
||||
The HIP Installation Guide is updated to include building HIP from source on the
|
||||
NVIDIA platform.
|
||||
|
||||
Refer to the HIP Installation Guide v5.0 for more details.
|
||||
|
||||
### Managed Memory Allocation
|
||||
#### Managed Memory Allocation
|
||||
|
||||
Managed memory, including the `__managed__` keyword, is now supported in the HIP combined host/device compilation. Through unified memory allocation, managed memory allows data to be shared and accessible to both the CPU and GPU using a single pointer. The allocation is managed by the AMD GPU driver using the Linux Heterogeneous Memory Management (HMM) mechanism. The user can call managed memory API hipMallocManaged to allocate a large chunk of HMM memory, execute kernels on a device, and fetch data between the host and device as needed.
|
||||
Managed memory, including the `__managed__` keyword, is now supported in the HIP
|
||||
combined host/device compilation. Through unified memory allocation, managed
|
||||
memory allows data to be shared and accessible to both the CPU and GPU using a
|
||||
single pointer. The allocation is managed by the AMD GPU driver using the Linux
|
||||
Heterogeneous Memory Management (HMM) mechanism. The user can call managed
|
||||
memory API `hipMallocManaged` to allocate a large chunk of HMM memory, execute
|
||||
kernels on a device, and fetch data between the host and device as needed.
|
||||
|
||||
**Note:** In a HIP application, it is recommended to do a capability check before calling the managed memory APIs. For example,
|
||||
**Note:** In a HIP application, it is recommended to do a capability check
|
||||
before calling the managed memory APIs. For example,
|
||||
|
||||
```c
|
||||
```cpp
|
||||
int managed_memory = 0;
|
||||
HIPCHECK(hipDeviceGetAttribute(&managed_memory, hipDeviceAttributeManagedMemory, p_gpuDevice));
|
||||
|
||||
@@ -281,89 +322,97 @@ if (!managed_memory) {
|
||||
}
|
||||
```
|
||||
|
||||
**Note:** The managed memory capability check may not be necessary; however, if HMM is not supported, managed malloc will fall back to using system memory. Other managed memory API calls will, then, have
|
||||
**Note:** The managed memory capability check may not be necessary; however, if
|
||||
HMM is not supported, managed `malloc` will fall back to using system memory.
|
||||
|
||||
Refer to the HIP API documentation for more details on managed memory APIs.
|
||||
|
||||
For the application, see [hipMallocManaged.cpp](https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-4.5.x/tests/src/runtimeApi/memory/hipMallocManaged.cpp)
|
||||
For the application, see
|
||||
[hipMallocManaged.cpp](https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-4.5.x/tests/src/runtimeApi/memory/hipMallocManaged.cpp)
|
||||
|
||||
## New Environment Variable
|
||||
### New Environment Variable
|
||||
|
||||
The following new environment variable is added in this release:
|
||||
|
||||
| **Environment Variable** | **Value** | **Description** |
|
||||
| --- | --- | --- |
|
||||
| **HSA\_COOP\_CU\_COUNT** | 0 or 1 (default is 0) | Some processors support more compute units than can reliably be used in a cooperative dispatch. Setting the environment variable HSA\_COOP\_CU\_COUNT to 1 will cause ROCr to return the correct CU count for cooperative groups through the HSA\_AMD\_AGENT\_INFO\_COOPERATIVE\_COMPUTE\_UNIT\_COUNT attribute of hsa\_agent\_get\_info(). Setting HSA\_COOP\_CU\_COUNT to other values, or leaving it unset, will cause ROCr to return the same CU count for the attributes HSA\_AMD\_AGENT\_INFO\_COOPERATIVE\_COMPUTE\_UNIT\_COUNT and HSA\_AMD\_AGENT\_INFO\_COMPUTE\_UNIT\_COUNT. Future ROCm releases will make HSA\_COOP\_CU\_COUNT=1 the default.
|
||||
|
|
||||
| **Environment Variable** | **Value** | **Description** |
|
||||
|:------------------------:|:---------------------:|:--------------------------------------------------------|
|
||||
| `HSA_COOP_CU_COUNT` | 0 or 1 (default is 0) | Some processors support more compute units than can reliably be used in a cooperative dispatch. Setting the environment variable `HSA_COOP_CU_COUNT` to 1 will cause ROCr to return the correct CU count for cooperative groups through the `HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT` attribute of `hsa_agent_get_info()`. Setting `HSA_COOP_CU_COUNT` to other values, or leaving it unset, will cause ROCr to return the same CU count for the attributes `HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT` and `HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT`. Future ROCm releases will make `HSA_COOP_CU_COUNT = 1` the default. |
|
||||
|
||||
## ROCm Math and Communication Libraries
|
||||
### ROCm Math and Communication Libraries
|
||||
|
||||
| **Library** | **Changes** |
|
||||
| --- | --- |
|
||||
| **rocBLAS** | **Added** <ul><li>Added rocblas\_get\_version\_string\_size convenience function</li><li>Added rocblas\_xtrmm\_outofplace, an out-of-place version of rocblas\_xtrmm</li><li>Added hpl and trig initialization for gemm\_ex to rocblas-bench</li><li>Added source code gemm. It can be used as an alternative to Tensile for debugging and development</li><li>Added option `ROCM_MATHLIBS_API_USE_HIP_COMPLEX` to opt-in to use hipFloatComplex and hipDoubleComplex</li></ul> **Optimizations** <ul><li>Improved performance of non-batched and batched single-precision GER for size m > 1024. Performance enhanced by 5-10% measured on a MI100 (gfx908) GPU.</li><li>Improved performance of non-batched and batched HER for all sizes and data types. Performance enhanced by 2-17% measured on a MI100 (gfx908) GPU.</li></ul> **Changed** <ul><li>Instantiate templated rocBLAS functions to reduce size of librocblas.so</li><li>Removed static library dependency on msgpack</li><li>Removed boost dependencies for clients</li></ul> **Fixed** <ul><li>Option to install script to build only rocBLAS clients with a pre-built rocBLAS library</li><li>Correctly set output of nrm2\_batched\_ex and nrm2\_strided\_batched\_ex when given bad input</li><li>Fix for dgmm with side == rocblas\_side\_left and a negative incx</li><li>Fixed out-of-bounds read for small trsm</li><li>Fixed numerical checking for tbmv\_strided\_batched</li></ul> |
|
||||
| | |
|
||||
| **hipBLAS** | **Added** <ul><li>Added rocSOLVER functions to hipblas-bench</li><li>Added option `ROCM_MATHLIBS_API_USE_HIP_COMPLEX` to opt-in to use hipFloatComplex and hipDoubleComplex</li><li>Added compilation warning for future trmm changes</li><li>Added documentation to hipblas.h</li><li>Added option to forgo pivoting for getrf and getri when ipiv is nullptr</li><li>Added code coverage option</li></ul> **Fixed** <ul><li>Fixed use of incorrect `HIP_PATH` when building from source.</li><li>Fixed windows packaging</li><li>Allowing negative increments in hipblas-bench</li><li>Removed boost dependency</li></ul> |
|
||||
| | |
|
||||
| **rocFFT** | **Changed** <ul><li>Enabled runtime compilation of single FFT kernels > length 1024.</li><li>Re-aligned split device library into 4 roughly equal libraries.</li><li>Implemented the FuseShim framework to replace the original OptimizePlan</li><li>Implemented the generic buffer-assignment framework. The buffer assignment is no longer performed by each node. A generic algorithm is designed to test and pick the best assignment path. With the help of FuseShim, more kernel-fusions are achieved.</li><li>Do not read the imaginary part of the DC and Nyquist modes for even-length complex-to-real transforms.</li></ul> **Optimizations** <ul><li>Optimized twiddle-conjugation; complex-to-complex inverse transforms have similar performance to foward transforms now.</li><li>Improved performance of single-kernel small 2D transforms.</li></ul> |
|
||||
| | |
|
||||
| **hipFFT** | **Fixed** <ul><li>Fixed incorrect reporting of rocFFT version.</li></ul> **Changed** <ul><li>Unconditionally enabled callback functionality. On the CUDA backend, callbacks only run correctly when hipFFT is built as a static library, and is linked against the static cuFFT library.</li></ul> |
|
||||
| | |
|
||||
| **rocSPARSE** | **Added** <ul><li>csrmv, coomv, ellmv, hybmv for (conjugate) transposed matricescsrmv for symmetric matrices</li></ul> **Changed** <ul><li>spmm\_ex is now deprecated and will be removed in the next major release</li></ul> **Improved** <ul><li>Optimization for gtsv</li></ul> |
|
||||
| | |
|
||||
| **hipSPARSE** | **Added** <ul><li>Added (conjugate) transpose support for csrmv, hybmv and spmv routines</li></ul> |
|
||||
| | |
|
||||
| **Library** | **Changes** |
|
||||
|:--------------:|:----------------------------------------------------------------------------------------|
|
||||
| **rocBLAS** | **Added** <ul><li>Added `rocblas_get_version_string_size` convenience function</li><li>Added `rocblas_xtrmm_outofplace`, an out-of-place version of `rocblas_xtrmm`</li><li>Added hpl and trig initialization for `gemm_ex` to `rocblas-bench`</li><li>Added source code gemm. It can be used as an alternative to Tensile for debugging and development</li><li>Added option `ROCM_MATHLIBS_API_USE_HIP_COMPLEX` to opt-in to use `hipFloatComplex` and `hipDoubleComplex`</li></ul> **Optimizations** <ul><li>Improved performance of non-batched and batched single-precision GER for size m > 1024. Performance enhanced by 5-10% measured on a MI100 (gfx908) GPU.</li><li>Improved performance of non-batched and batched HER for all sizes and data types. Performance enhanced by 2-17% measured on a MI100 (gfx908) GPU.</li></ul> **Changed** <ul><li>Instantiate templated rocBLAS functions to reduce size of librocblas.so</li><li>Removed static library dependency on msgpack</li><li>Removed boost dependencies for clients</li></ul> **Fixed** <ul><li>Option to install script to build only rocBLAS clients with a pre-built rocBLAS library</li><li>Correctly set output of `nrm2_batched_ex` and `nrm2_strided_batched_ex` when given bad input</li><li>Fix for dgmm with side == `rocblas_side_left` and a negative incx</li><li>Fixed out-of-bounds read for small trsm</li><li>Fixed numerical checking for `tbmv_strided_batched`</li></ul> |
|
||||
| | |
|
||||
| **hipBLAS** | **Added** <ul><li>Added rocSOLVER functions to hipblas-bench</li><li>Added option `ROCM_MATHLIBS_API_USE_HIP_COMPLEX` to opt-in to use `hipFloatComplex` and `hipDoubleComplex`</li><li>Added compilation warning for future trmm changes</li><li>Added documentation to `hipblas.h`</li><li>Added option to forgo pivoting for getrf and getri when ipiv is `nullptr`</li><li>Added code coverage option</li></ul> **Fixed** <ul><li>Fixed use of incorrect `HIP_PATH` when building from source.</li><li>Fixed windows packaging</li><li>Allowing negative increments in hipblas-bench</li><li>Removed boost dependency</li></ul> |
|
||||
| | |
|
||||
| **rocFFT** | **Changed** <ul><li>Enabled runtime compilation of single FFT kernels > length 1024.</li><li>Re-aligned split device library into 4 roughly equal libraries.</li><li>Implemented the FuseShim framework to replace the original OptimizePlan</li><li>Implemented the generic buffer-assignment framework. The buffer assignment is no longer performed by each node. A generic algorithm is designed to test and pick the best assignment path. With the help of FuseShim, more kernel-fusions are achieved.</li><li>Do not read the imaginary part of the DC and Nyquist modes for even-length complex-to-real transforms.</li></ul> **Optimizations** <ul><li>Optimized twiddle-conjugation; complex-to-complex inverse transforms have similar performance to foward transforms now.</li><li>Improved performance of single-kernel small 2D transforms.</li></ul> |
|
||||
| | |
|
||||
| **hipFFT** | **Fixed** <ul><li>Fixed incorrect reporting of rocFFT version.</li></ul> **Changed** <ul><li>Unconditionally enabled callback functionality. On the CUDA backend, callbacks only run correctly when hipFFT is built as a static library, and is linked against the static cuFFT library.</li></ul> |
|
||||
| | |
|
||||
| **rocSPARSE** | **Added** <ul><li>csrmv, coomv, ellmv, hybmv for (conjugate) transposed matricescsrmv for symmetric matrices</li></ul> **Changed** <ul><li>`spmm_ex` is now deprecated and will be removed in the next major release</li></ul> **Improved** <ul><li>Optimization for gtsv</li></ul> |
|
||||
| | |
|
||||
| **hipSPARSE** | **Added** <ul><li>Added (conjugate) transpose support for csrmv, hybmv and spmv routines</li></ul> |
|
||||
| | |
|
||||
| **rocALUTION** | **Changed** <ul><li>Removed deprecated GlobalPairwiseAMG class, please use PairwiseAMG instead.</li></ul> **Improved** <ul><li>Improved documentation</li></ul> |
|
||||
| | |
|
||||
| **rocTHRUST** | **Updates** <ul><li>Updated to match upstream Thrust 1.13.0</li><li>Updated to match upstream Thrust 1.14.0</li><li>Added async scan</li></ul> **Changed** <ul><li>Scan algorithms: inclusive\_scan now uses the input-type as accumulator-type, exclusive\_scan uses initial-value-type. This particularly changes behaviour of small-size input types with large-size output types (e.g. short input, int output). And low-res input with high-res output (e.g. float input, double output)</li></ul> |
|
||||
| | |
|
||||
| **rocSOLVER** | **Added** <ul><li>Symmetric matrix factorizations: <ul><li>LASYF</li><li>SYTF2, SYTRF (with batched and strided\_batched versions)</li></ul><li>Added rocsolver\_get\_version\_string\_size to help with version string queries</li><li>Added rocblas\_layer\_mode\_ex and the ability to print kernel calls in the trace and profile logs</li><li>Expanded batched and strided\_batched sample programs.</li></ul> **Optimizations** <ul><li>Improved general performance of LU factorization</li><li>Increased parallelism of specialized kernels when compiling from source, reducing build times on multi-core systems.</li></ul> **Changed** <ul><li>The rocsolver-test client now prints the rocSOLVER version used to run the tests, rather than the version used to build them</li><li>The rocsolver-bench client now prints the rocSOLVER version used in the benchmark</li></ul> **Fixed** <ul><li>Added missing stdint.h include to rocsolver.h</li></ul> |
|
||||
| | |
|
||||
| **hipSOLVER** | **Added** <ul><li>Added SYTRF functions: hipsolverSsytrf\_bufferSize, hipsolverDsytrf\_bufferSize, hipsolverCsytrf\_bufferSize, hipsolverZsytrf\_bufferSize, hipsolverSsytrf, hipsolverDsytrf, hipsolverCsytrf, hipsolverZsytrf</li></ul> **Fixed** <ul><li>Fixed use of incorrect `HIP_PATH` when building from source</li></ul> |
|
||||
| | |
|
||||
| **RCCL** | **Added** <ul><li>Compatibility with NCCL 2.10.3</li></ul> **Known issues** <ul><li>Managed memory is not currently supported for clique-based kernels</li></ul> |
|
||||
| | |
|
||||
| **hipCUB** | **Fixed** <ul><li>Added missing includes to hipcub.hpp</li></ul> **Added** <ul><li>Bfloat16 support to test cases (device\_reduce & device\_radix\_sort)</li><li>Device merge sort</li><li>Block merge sort</li><li>API update to CUB 1.14.0</li></ul> **Changed** <ul><li>The SetupNVCC.cmake automatic target selector select all of the capabalities of all available card for NVIDIA backend.</li></ul> |
|
||||
| | |
|
||||
| **rocPRIM** | **Fixed** <ul><li>Enable bfloat16 tests and reduce threshold for bfloat16</li><li>Fix device scan limit\_size feature</li><li>Non-optimized builds no longer trigger local memory limit errors</li></ul> **Added** <ul><li>Scan size limit feature</li><li>Reduce size limit feature</li><li>Transform size limit feature</li><li>Add block\_load\_striped and block\_store\_striped</li><li>Add gather\_to\_blocked to gather values from other threads into a blocked arrangement</li><li>The block sizes for device merge sorts initial block sort and its merge steps are now separate in its kernel config (the block sort step supports multiple items per thread)</li></ul> **Changed** <ul><li>size\_limit for scan, reduce and transform can now be set in the config struct instead of a parameter</li><li>device\_scan and device\_segmented\_scan: inclusive\_scan now uses the input-type as accumulator-type, exclusive\_scan uses initial-value-type. This particularly changes behaviour of small-size input types with large-size output types (e.g. short input, int output) and low-res input with high-res output (e.g. float input, double output)</li><li>Revert old Fiji workaround, because the issue was solved at compiler side</li><li>Update README cmake minimum version number</li><li>Block sort support multiple items per thread. Currently only powers of two block sizes, and items per threads are supported and only for full blocks</li><li>Bumped the minimum required version of CMake to 3.16</li></ul> **Known issues** <ul><li>Unit tests may soft hang on MI200 when running in hipMallocManaged mode.</li><li>device\_segmented\_radix\_sort, device\_scan unit tests failing for HIP on WindowsReduceEmptyInput cause random failure with bfloat16</li><li>Managed memory is not currently supported for clique-based kernels</li></ul> |
|
||||
| | |
|
||||
| **rocTHRUST** | **Updates** <ul><li>Updated to match upstream Thrust 1.13.0</li><li>Updated to match upstream Thrust 1.14.0</li><li>Added async scan</li></ul> **Changed** <ul><li>Scan algorithms: `inclusive_scan` now uses the input-type as accumulator-type, `exclusive_scan` uses initial-value-type. This particularly changes behaviour of small-size input types with large-size output types (e.g. `short` input, `int` output). And low-res input with high-res output (e.g. float input, double output)</li></ul> |
|
||||
| | |
|
||||
| **rocSOLVER** | **Added** <ul><li>Symmetric matrix factorizations: <ul><li>LASYF</li><li>SYTF2, SYTRF (with `batched` and `strided_batched` versions)</li></ul><li>Added `rocsolver_get_version_string_size` to help with version string queries</li><li>Added `rocblas_layer_mode_ex` and the ability to print kernel calls in the trace and profile logs</li><li>Expanded batched and `strided_batched` sample programs.</li></ul> **Optimizations** <ul><li>Improved general performance of LU factorization</li><li>Increased parallelism of specialized kernels when compiling from source, reducing build times on multi-core systems.</li></ul> **Changed** <ul><li>The rocsolver-test client now prints the rocSOLVER version used to run the tests, rather than the version used to build them</li><li>The rocsolver-bench client now prints the rocSOLVER version used in the benchmark</li></ul> **Fixed** <ul><li>Added missing `stdint.h` include to `rocsolver.h`</li></ul> |
|
||||
| | |
|
||||
| **hipSOLVER** | **Added** <ul><li>Added SYTRF functions: `hipsolverSsytrf_bufferSize`, `hipsolverDsytrf_bufferSize`, `hipsolverCsytrf_bufferSize`, `hipsolverZsytrf_bufferSize`, `hipsolverSsytrf`, `hipsolverDsytrf`, `hipsolverCsytrf`, `hipsolverZsytrf`</li></ul> **Fixed** <ul><li>Fixed use of incorrect `HIP_PATH` when building from source</li></ul> |
|
||||
| | |
|
||||
| **RCCL** | **Added** <ul><li>Compatibility with NCCL 2.10.3</li></ul> **Known issues** <ul><li>Managed memory is not currently supported for clique-based kernels</li></ul> |
|
||||
| | |
|
||||
| **hipCUB** | **Fixed** <ul><li>Added missing includes to `hipcub.hpp`</li></ul> **Added** <ul><li>Bfloat16 support to test cases (`device_reduce` & `device_radix_sort`)</li><li>Device merge sort</li><li>Block merge sort</li><li>API update to CUB 1.14.0</li></ul> **Changed** <ul><li>The `SetupNVCC.cmake` automatic target selector select all of the capabalities of all available card for NVIDIA backend.</li></ul> |
|
||||
| | |
|
||||
| **rocPRIM** | **Fixed** <ul><li>Enable `bfloat16` tests and reduce threshold for `bfloat16`</li><li>Fix device scan `limit_size` feature</li><li>Non-optimized builds no longer trigger local memory limit errors</li></ul> **Added** <ul><li>Scan size limit feature</li><li>Reduce size limit feature</li><li>Transform size limit feature</li><li>Add `block_load_striped` and `block_store_striped`</li><li>Add `gather_to_blocked` to gather values from other threads into a blocked arrangement</li><li>The block sizes for device merge sorts initial block sort and its merge steps are now separate in its kernel config (the block sort step supports multiple items per thread)</li></ul> **Changed** <ul><li>`size_limit` for scan, reduce and transform can now be set in the config struct instead of a parameter</li><li>`device_scan` and `device_segmented_scan`: `inclusive_scan` now uses the input-type as accumulator-type, `exclusive_scan` uses initial-value-type. This particularly changes behaviour of small-size input types with large-size output types (e.g. `short` input, `int` output) and low-res input with high-res output (e.g. `float` input, `double` output)</li><li>Revert old Fiji workaround, because the issue was solved at compiler side</li><li>Update `README` cmake minimum version number</li><li>Block sort support multiple items per thread. Currently only powers of two block sizes, and items per threads are supported and only for full blocks</li><li>Bumped the minimum required version of CMake to 3.16</li></ul> **Known issues** <ul><li>Unit tests may soft hang on MI200 when running in `hipMallocManaged` mode.</li><li>`device_segmented_radix_sort`, `device_scan` unit tests failing for HIP on `WindowsReduceEmptyInput` cause random failure with `bfloat16`</li><li>Managed memory is not currently supported for clique-based kernels</li></ul> |
|
||||
|
||||
## System Management Interface
|
||||
### System Management Interface
|
||||
|
||||
### Clock Throttling for GPU Events
|
||||
#### Clock Throttling for GPU Events
|
||||
|
||||
This feature lists GPU events as they occur in real-time and can be used with _kfdtest_ to produce _vm\_fault_ events for testing.
|
||||
This feature lists GPU events as they occur in real-time and can be used with
|
||||
`kfdtest` to produce `vm_fault` events for testing.
|
||||
|
||||
The command can be called with either " `-e` or `--showevents` like this:
|
||||
|
||||
-e [EVENT [EVENT ...]], --showevents [EVENT [EVENT ...]] Show event list
|
||||
|
||||
Where `EVENT` is any list combination of `VM_FAULT`, `THERMAL_THROTTLE`, or `GPU_RESET` and is NOT case sensitive.
|
||||
|
||||
**Note:** If no event arguments are passed, all events will be watched by default.
|
||||
|
||||
#### CLI Commands
|
||||
The command can be called with either `-e` or `--showevents` like this:
|
||||
|
||||
```bash
|
||||
-e [EVENT [EVENT ...]], --showevents [EVENT [EVENT ...]] Show event list
|
||||
```
|
||||
|
||||
Where `EVENT` is any list combination of `VM_FAULT`, `THERMAL_THROTTLE`, or
|
||||
`GPU_RESET` and is **NOT** case sensitive.
|
||||
|
||||
**Note:** If no event arguments are passed, all events will be watched by
|
||||
default.
|
||||
|
||||
##### CLI Commands
|
||||
|
||||
```bash
|
||||
$ rocm-smi --showevents vm_fault thermal_throttle gpu_reset
|
||||
|
||||
======================= ROCm System Management Interface =======================
|
||||
================================= Show Events ==================================
|
||||
press 'q' or 'ctrl + c' to quit
|
||||
DEVICE TIME TYPE DESCRIPTION
|
||||
DEVICE TIME TYPE DESCRIPTION
|
||||
|
||||
============================= End of ROCm SMI Log ==============================
|
||||
```
|
||||
|
||||
(Run kfdtest in another window to test for vm\_fault events.)
|
||||
(Run `kfdtest` in another window to test for `vm_fault` events.)
|
||||
|
||||
**Note:** Unlike other rocm-smi CLI commands, this command does not quit unless specified by the user. Users may press either `q` or `ctrl + c` to quit.
|
||||
**Note:** Unlike other rocm-smi CLI commands, this command does not quit unless
|
||||
specified by the user. Users may press either `q` or `ctrl + c` to quit.
|
||||
|
||||
### Display XGMI Bandwidth Between Nodes
|
||||
#### Display XGMI Bandwidth Between Nodes
|
||||
|
||||
The _rsmi\_minmax\_bandwidth\_get_ API reads the HW Topology file and displays bandwidth (min-max) between any two NUMA nodes in a matrix format.
|
||||
The `rsmi_minmax_bandwidth_get` API reads the HW Topology file and displays
|
||||
bandwidth (min-max) between any two NUMA nodes in a matrix format.
|
||||
|
||||
The Command Line Interface (CLI) command can be called as follows:
|
||||
|
||||
```
|
||||
```bash
|
||||
$ rocm-smi --shownodesbw
|
||||
|
||||
======================= ROCm System Management Interface =======================
|
||||
@@ -381,21 +430,26 @@ Format: min-max; Units: mps
|
||||
============================= End of ROCm SMI Log ==============================
|
||||
```
|
||||
|
||||
The sample output above shows the maximum theoretical xgmi bandwidth between 2 numa nodes,
|
||||
The sample output above shows the maximum theoretical xgmi bandwidth between 2
|
||||
numa nodes,
|
||||
|
||||
**Note:** "0-0" min-max bandwidth indicates devices are not connected directly.
|
||||
|
||||
### P2P Connection Status
|
||||
#### P2P Connection Status
|
||||
|
||||
The _rsmi\_is\_p2p\_accessible_ API returns "True" if P2P can be implemented between two nodes, and returns "False" if P2P cannot be implemented between the two nodes.
|
||||
The `rsmi_is_p2p_accessible` API returns `True` if P2P can be implemented
|
||||
between two nodes, and returns `False` if P2P cannot be implemented between the
|
||||
two nodes.
|
||||
|
||||
The Command Line Interface command can be called as follows:
|
||||
|
||||
rocm-smi --showtopoaccess
|
||||
```bash
|
||||
rocm-smi --showtopoaccess
|
||||
```
|
||||
|
||||
Sample Output:
|
||||
|
||||
```
|
||||
```bash
|
||||
$ rocm-smi --showtopoaccess
|
||||
======================= ROCm System Management Interface =======================
|
||||
===================== Link accessibility between two GPUs ======================
|
||||
@@ -405,13 +459,17 @@ GPU1 True True
|
||||
============================= End of ROCm SMI Log ==============================
|
||||
```
|
||||
|
||||
# Breaking Changes
|
||||
## Breaking Changes
|
||||
|
||||
## Runtime Breaking Change
|
||||
### Runtime Breaking Change
|
||||
|
||||
Re-ordering of the enumerated type in hip\_runtime\_api.h to better match NV. See below for the difference in enumerated types.
|
||||
Re-ordering of the enumerated type in `hip_runtime_api.h` to better match CUDA.
|
||||
See below for the difference in enumerated types.
|
||||
|
||||
ROCm software will be affected if any of the defined enums listed below are used in the code. Applications built with ROCm v5.0 enumerated types will work with a ROCm 4.5.2 driver. However, an undefined behavior error will occur with a ROCm v4.5.2 application that uses these enumerated types with a ROCm 5.0 runtime.
|
||||
ROCm software will be affected if any of the defined enums listed below are used
|
||||
in the code. Applications built with ROCm v5.0 enumerated types will work with a
|
||||
ROCm 4.5.2 driver. However, an undefined behavior error will occur with a ROCm
|
||||
v4.5.2 application that uses these enumerated types with a ROCm 5.0 runtime.
|
||||
|
||||
```c
|
||||
typedef enum hipDeviceAttribute_t {
|
||||
@@ -605,74 +663,93 @@ typedef enum hipDeviceAttribute_t {
|
||||
} hipDeviceAttribute_t;
|
||||
```
|
||||
|
||||
# Known Issues in This Release
|
||||
## Known Issues in This Release
|
||||
|
||||
## Incorrect dGPU Behavior When Using AMDVBFlash Tool
|
||||
### Incorrect dGPU Behavior When Using AMDVBFlash Tool
|
||||
|
||||
The AMDVBFlash tool, used for flashing the VBIOS image to dGPU, does not communicate with the ROM Controller specifically when the driver is present. This is because the driver, as part of its runtime power management feature, puts the dGPU to a sleep state.
|
||||
The AMDVBFlash tool, used for flashing the VBIOS image to dGPU, does not
|
||||
communicate with the ROM Controller specifically when the driver is present.
|
||||
This is because the driver, as part of its runtime power management feature,
|
||||
puts the dGPU to a sleep state.
|
||||
|
||||
As a workaround, users can run `amdgpu.runpm=0`, which temporarily disables the runtime power management feature from the driver and dynamically changes some power control-related sysfs files.
|
||||
As a workaround, users can run `amdgpu.runpm=0`, which temporarily disables the
|
||||
runtime power management feature from the driver and dynamically changes some
|
||||
power control-related sysfs files.
|
||||
|
||||
## Issue with START Timestamp in ROCProfiler
|
||||
### Issue with START Timestamp in ROCProfiler
|
||||
|
||||
Users may encounter an issue with the enabled timestamp functionality for monitoring one or multiple counters. ROCProfiler outputs the following four timestamps for each kernel:
|
||||
Users may encounter an issue with the enabled timestamp functionality for
|
||||
monitoring one or multiple counters. ROCProfiler outputs the following four
|
||||
timestamps for each kernel:
|
||||
|
||||
- Dispatch
|
||||
- Start
|
||||
- End
|
||||
- Complete
|
||||
|
||||
**Issue**
|
||||
#### Issue
|
||||
|
||||
This defect is related to the Start timestamp functionality, which incorrectly shows an earlier time than the Dispatch timestamp.
|
||||
This defect is related to the Start timestamp functionality, which incorrectly
|
||||
shows an earlier time than the Dispatch timestamp.
|
||||
|
||||
To reproduce the issue,
|
||||
|
||||
1. Enable timing using the `--timestamp on` flag.
|
||||
2. Use the `-i` option with the input filename that contains the name of the counter(s) to monitor.
|
||||
2. Use the `-i` option with the input filename that contains the name of the
|
||||
counter(s) to monitor.
|
||||
3. Run the program.
|
||||
4. Check the output result file.
|
||||
|
||||
**Current behavior**
|
||||
##### Current behavior
|
||||
|
||||
BeginNS is lower than DispatchNS, which is incorrect.
|
||||
`BeginNS` is lower than `DispatchNS`, which is incorrect.
|
||||
|
||||
**Expected behavior**
|
||||
##### Expected behavior
|
||||
|
||||
The correct order is:
|
||||
|
||||
_Dispatch < Start < End < Complete_
|
||||
`Dispatch < Start < End < Complete`
|
||||
|
||||
Users cannot use ROCProfiler to measure the time spent on each kernel because of the incorrect timestamp with counter collection enabled.
|
||||
Users cannot use ROCProfiler to measure the time spent on each kernel because of
|
||||
the incorrect timestamp with counter collection enabled.
|
||||
|
||||
**Recommended Workaround**
|
||||
##### Recommended Workaround
|
||||
|
||||
Users are recommended to collect kernel execution timestamps without monitoring counters, as follows:
|
||||
Users are recommended to collect kernel execution timestamps without monitoring
|
||||
counters, as follows:
|
||||
|
||||
1. Enable timing using the `--timestamp on` flag, and run the application.
|
||||
2. Rerun the application using the `-i` option with the input filename that contains the name of the counter(s) to monitor, and save this to a different output file using the `-o` flag.
|
||||
2. Rerun the application using the `-i` option with the input filename that
|
||||
contains the name of the counter(s) to monitor, and save this to a different
|
||||
output file using the `-o` flag.
|
||||
3. Check the output result file from step 1.
|
||||
4. The order of timestamps correctly displays as:
|
||||
|
||||
_DispathNS < BeginNS < EndNS < CompleteNS_
|
||||
`DispathNS < BeginNS < EndNS < CompleteNS`
|
||||
|
||||
1. Users can find the values of the collected counters in the output file generated in step 2.
|
||||
|
||||
## Radeon Pro V620 and W6800 Workstation GPUs
|
||||
1. Users can find the values of the collected counters in the output file
|
||||
generated in step 2.
|
||||
|
||||
### No Support for SMI and ROCDebugger on SRIOV
|
||||
|
||||
System Management Interface (SMI) and ROCDebugger are not supported in the SRIOV environment on any GPU. For more information, refer to the Systems Management Interface documentation.
|
||||
System Management Interface (SMI) and ROCDebugger are not supported in the SRIOV
|
||||
environment on any GPU, including the
|
||||
**Radeon Pro V620 and W6800 Workstation GPUs**. For more information, refer to
|
||||
the Systems Management Interface documentation.
|
||||
|
||||
# Deprecations and Warnings in This Release
|
||||
## Deprecations and Warnings in This Release
|
||||
|
||||
## ROCm Libraries Changes – Deprecations and Deprecation Removal
|
||||
### ROCm Libraries Changes – Deprecations and Deprecation Removal
|
||||
|
||||
- The hipfft.h header is now provided only by the hipfft package. Up to ROCm 5.0, users would get hipfft.h in the rocfft package too.
|
||||
- The GlobalPairwiseAMG class is now entirely removed, users should use the PairwiseAMG class instead.
|
||||
- The rocsparse\_spmm signature in 5.0 was changed to match that of rocsparse\_spmm\_ex. In 5.0, rocsparse\_spmm\_ex is still present, but deprecated. Signature diff for rocsparse\_spmm
|
||||
- The `hipfft.h` header is now provided only by the `hipfft` package. Up to ROCm
|
||||
5.0, users would get `hipfft.h` in the rocfft package too.
|
||||
- The GlobalPairwiseAMG class is now entirely removed, users should use the
|
||||
PairwiseAMG class instead.
|
||||
- The `rocsparse_spmm` signature in 5.0 was changed to match that of
|
||||
`rocsparse_spmm_ex`. In 5.0, `rocsparse_spmm_ex` is still present, but
|
||||
deprecated. Signature diff for `rocsparse_spmm`
|
||||
|
||||
### _rocsparse\_spmm_ in 5.0
|
||||
#### `rocsparse_spmm` in 5.0
|
||||
|
||||
```c
|
||||
rocsparse_status rocsparse_spmm(rocsparse_handle handle,
|
||||
@@ -690,7 +767,7 @@ rocsparse_status rocsparse_spmm(rocsparse_handle handle,
|
||||
void* temp_buffer);
|
||||
```
|
||||
|
||||
### _rocsparse\_spmm_ in 4.0
|
||||
### `rocsparse_spmm` in 4.0
|
||||
|
||||
```c
|
||||
rocsparse_status rocsparse_spmm(rocsparse_handle handle,
|
||||
@@ -707,55 +784,99 @@ rocsparse_status rocsparse_spmm(rocsparse_handle handle,
|
||||
void* temp_buffer);
|
||||
```
|
||||
|
||||
## HIP API Deprecations and Warnings
|
||||
### HIP API Deprecations and Warnings
|
||||
|
||||
### Warning - Arithmetic Operators of HIP Complex and Vector Types
|
||||
#### Warning - Arithmetic Operators of HIP Complex and Vector Types
|
||||
|
||||
In this release, arithmetic operators of HIP complex and vector types are deprecated.
|
||||
In this release, arithmetic operators of HIP complex and vector types are
|
||||
deprecated.
|
||||
|
||||
- As alternatives to arithmetic operators of HIP complex types, users can use arithmetic operators of std::complex types.
|
||||
- As alternatives to arithmetic operators of HIP vector types, users can use the operators of the native clang vector type associated with the data member of HIP vector types.
|
||||
- As alternatives to arithmetic operators of HIP complex types, users can use
|
||||
arithmetic operators of `std::complex` types.
|
||||
- As alternatives to arithmetic operators of HIP vector types, users can use the
|
||||
operators of the native clang vector type associated with the data member of
|
||||
HIP vector types.
|
||||
|
||||
During the deprecation, two macros `__HIP_ENABLE_COMPLEX_OPERATORS` and `__HIP_ENABLE_VECTOR_OPERATORS` are provided to allow users to conditionally enable arithmetic operators of HIP complex or vector types.
|
||||
During the deprecation, two macros `__HIP_ENABLE_COMPLEX_OPERATORS` and
|
||||
`__HIP_ENABLE_VECTOR_OPERATORS` are provided to allow users to conditionally
|
||||
enable arithmetic operators of HIP complex or vector types.
|
||||
|
||||
Note, the two macros are mutually exclusive and, by default, set to off.
|
||||
|
||||
The arithmetic operators of HIP complex and vector types will be removed in a future release.
|
||||
The arithmetic operators of HIP complex and vector types will be removed in a
|
||||
future release.
|
||||
|
||||
Refer to the HIP API Guide for more information.
|
||||
|
||||
### Refactor of HIPCC/HIPCONFIG
|
||||
#### HIPCC/HIPCONFIG Refactoring
|
||||
|
||||
In prior ROCm releases, by default, the hipcc/hipconfig Perl scripts were used to identify and set target compiler options, target platform, compiler, and runtime appropriately.
|
||||
In prior ROCm releases, by default, the `hipcc`/`hipconfig` Perl scripts were
|
||||
used to identify and set target compiler options, target platform, compiler, and
|
||||
runtime appropriately.
|
||||
|
||||
In ROCm v5.0, hipcc.bin and hipconfig.bin have been added as the compiled binary implementations of the hipcc and hipconfig. These new binaries are currently a work-in-progress, considered, and marked as experimental. ROCm plans to fully transition to hipcc.bin and hipconfig.bin in the a future ROCm release. The existing hipcc and hipconfig Perl scripts are renamed to hipcc.pl and hipconfig.pl respectively. New top-level hipcc and hipconfig Perl scripts are created, which can switch between the Perl script or the compiled binary based on the environment variable `HIPCC_USE_PERL_SCRIPT`.
|
||||
In ROCm v5.0, `hipcc.bin` and `hipconfig.bin` have been added as the compiled
|
||||
binary implementations of the `hipcc` and `hipconfig`. These new binaries are
|
||||
currently a work-in-progress, considered, and marked as experimental. ROCm plans
|
||||
to fully transition to `hipcc.bin` and `hipconfig.bin` in the a future ROCm
|
||||
release. The existing `hipcc` and `hipconfig` Perl scripts are renamed to
|
||||
`hipcc.pl` and `hipconfig.pl` respectively. New top-level `hipcc` and
|
||||
`hipconfig` Perl scripts are created, which can switch between the Perl script
|
||||
or the compiled binary based on the environment variable
|
||||
`HIPCC_USE_PERL_SCRIPT`.
|
||||
|
||||
In ROCm 5.0, by default, this environment variable is set to use hipcc and hipconfig through the Perl scripts.
|
||||
In ROCm 5.0, by default, this environment variable is set to use `hipcc` and
|
||||
`hipconfig` through the Perl scripts.
|
||||
|
||||
Subsequently, Perl scripts will no longer be available in ROCm in a future release.
|
||||
Subsequently, Perl scripts will no longer be available in ROCm in a future
|
||||
release.
|
||||
|
||||
## Warning - Compiler-Generated Code Object Version 4 Deprecation
|
||||
### Warning - Compiler-Generated Code Object Version 4 Deprecation
|
||||
|
||||
Support for loading compiler-generated code object version 4 will be deprecated in a future release with no release announcement and replaced with code object 5 as the default version.
|
||||
Support for loading compiler-generated code object version 4 will be deprecated
|
||||
in a future release with no release announcement and replaced with code object 5
|
||||
as the default version.
|
||||
|
||||
The current default is code object version 4.
|
||||
|
||||
## Warning - MIOpenTensile Deprecation
|
||||
### Warning - MIOpenTensile Deprecation
|
||||
|
||||
MIOpenTensile will be deprecated in a future release.
|
||||
|
||||
## Archived Documentation
|
||||
|
||||
Older rocm documentation is archived at <https://rocmdocs.amd.com>.
|
||||
|
||||
## Disclaimer
|
||||
|
||||
The information presented in this document is for informational purposes only
|
||||
and may contain technical inaccuracies, omissions, and typographical errors.
|
||||
The information contained herein is subject to change and may be rendered
|
||||
inaccurate for many reasons, including but not limited to product and roadmap
|
||||
changes, component and motherboard versionchanges, new model and/or product
|
||||
releases, product differences between differing manufacturers, software changes,
|
||||
BIOS flashes, firmware upgrades, or the like. Any computer system has risks of
|
||||
security vulnerabilities that cannot be completely prevented or mitigated.
|
||||
AMD assumes no obligation to update or otherwise correct or revise this
|
||||
information. However, AMD reserves the right to revise this information and to
|
||||
make changes from time to time to the content hereof without obligation of AMD
|
||||
to notify any person of such revisions or changes. THIS INFORMATION IS PROVIDED
|
||||
"AS IS." AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS
|
||||
HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS
|
||||
THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED
|
||||
WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR
|
||||
PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT,
|
||||
INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY
|
||||
INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGES.AMD, the AMD Arrow logo, and combinations thereof
|
||||
are trademarks of Advanced Micro Devices, Inc. Other product names used in this
|
||||
publication are for identification purposes only and may be trademarks of their
|
||||
respective companies. ©[2021]Advanced Micro Devices, Inc.All rights reserved.
|
||||
|
||||
Archived Documentation
|
||||
----------------------
|
||||
Older rocm documentation is archived at https://rocmdocs.amd.com.
|
||||
### Third-party Disclaimer
|
||||
|
||||
# Disclaimer
|
||||
|
||||
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard versionchanges, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated.AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.THIS INFORMATION IS PROVIDED ‘AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc.Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
|
||||
©[2021]Advanced Micro Devices, Inc.All rights reserved.
|
||||
|
||||
## Third-party Disclaimer
|
||||
Third-party content is licensed to you directly by the third party that owns the content and is not licensed to you by AMD. ALL LINKED THIRD-PARTY CONTENT IS PROVIDED “AS IS” WITHOUT A WARRANTY OF ANY KIND. USE OF SUCH THIRD-PARTY CONTENT IS DONE AT YOUR SOLE DISCRETION AND UNDER NO CIRCUMSTANCES WILL AMD BE LIABLE TO YOU FOR ANY THIRD-PARTY CONTENT. YOU ASSUME ALL RISK AND ARE SOLELY RESPONSIBLE FOR ANY DAMAGES THAT MAY ARISE FROM YOUR USE OF THIRD-PARTY CONTENT.
|
||||
Third-party content is licensed to you directly by the third party that owns the
|
||||
content and is not licensed to you by AMD. ALL LINKED THIRD-PARTY CONTENT IS
|
||||
PROVIDED “AS IS” WITHOUT A WARRANTY OF ANY KIND. USE OF SUCH THIRD-PARTY CONTENT
|
||||
IS DONE AT YOUR SOLE DISCRETION AND UNDER NO CIRCUMSTANCES WILL AMD BE LIABLE TO
|
||||
YOU FOR ANY THIRD-PARTY CONTENT. YOU ASSUME ALL RISK AND ARE SOLELY RESPONSIBLE
|
||||
FOR ANY DAMAGES THAT MAY ARISE FROM YOUR USE OF THIRD-PARTY CONTENT.
|
||||
|
||||
2
LICENSE
@@ -1,6 +1,6 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2022 Advanced Micro Devices, Inc. All rights reserved.
|
||||
Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
|
||||
127
README.md
@@ -1,52 +1,74 @@
|
||||
# ROCm™ Repository Updates
|
||||
This repository contains the manifest file for ROCm™ releases, changelogs, and release information. The file default.xml contains information for all repositories and the associated commit used to build the current ROCm release.
|
||||
|
||||
This repository contains the manifest file for ROCm™ releases, changelogs, and
|
||||
release information. The file default.xml contains information for all
|
||||
repositories and the associated commit used to build the current ROCm release.
|
||||
|
||||
The default.xml file uses the repo Manifest format.
|
||||
|
||||
# ROCm v5.4.3 Release Notes
|
||||
ROCm v5.4.3 is now released. For ROCm v5.4.3 documentation, refer to https://docs.amd.com.
|
||||
## ROCm v5.4.3 Release Notes
|
||||
|
||||
# ROCm v5.4.2 Release Notes
|
||||
ROCm v5.4.2 is now released. For ROCm v5.4.2 documentation, refer to https://docs.amd.com.
|
||||
ROCm v5.4.3 is now released. For ROCm v5.4.3 documentation, refer to
|
||||
<https://docs.amd.com>.
|
||||
|
||||
# ROCm v5.4.1 Release Notes
|
||||
ROCm v5.4.1 is now released. For ROCm v5.4.1 documentation, refer to https://docs.amd.com.
|
||||
## ROCm v5.4.2 Release Notes
|
||||
|
||||
# ROCm v5.4 Release Notes
|
||||
ROCm v5.4 is now released. For ROCm v5.4 documentation, refer to https://docs.amd.com.
|
||||
ROCm v5.4.2 is now released. For ROCm v5.4.2 documentation, refer to
|
||||
<https://docs.amd.com>.
|
||||
|
||||
# ROCm v5.3.3 Release Notes
|
||||
ROCm v5.3.3 is now released. For ROCm v5.3.3 documentation, refer to https://docs.amd.com.
|
||||
## ROCm v5.4.1 Release Notes
|
||||
|
||||
# ROCm v5.3.2 Release Notes
|
||||
ROCm v5.3.2 is now released. For ROCm v5.3.2 documentation, refer to https://docs.amd.com.
|
||||
ROCm v5.4.1 is now released. For ROCm v5.4.1 documentation, refer to
|
||||
<https://docs.amd.com>.
|
||||
|
||||
# ROCm v5.3 Release Notes
|
||||
ROCm v5.3 is now released. For ROCm v5.3 documentation, refer to https://docs.amd.com.
|
||||
## ROCm v5.4 Release Notes
|
||||
|
||||
# ROCm v5.2.3 Release Notes
|
||||
The ROCm v5.2.3 patch release is now available. The details are listed below. Highlights of this release include enhancements in RCCL version compatibility and minor bug fixes in the HIP Runtime.
|
||||
ROCm v5.4 is now released. For ROCm v5.4 documentation, refer to
|
||||
<https://docs.amd.com>.
|
||||
|
||||
Additionally, ROCm releases will return to use of the [ROCm](https://github.com/RadeonOpenCompute/ROCm) repository for version-controlled release notes henceforth.
|
||||
## ROCm v5.3.3 Release Notes
|
||||
|
||||
ROCm v5.3.3 is now released. For ROCm v5.3.3 documentation, refer to
|
||||
<https://docs.amd.com>.
|
||||
|
||||
## ROCm v5.3.2 Release Notes
|
||||
|
||||
ROCm v5.3.2 is now released. For ROCm v5.3.2 documentation, refer to
|
||||
<https://docs.amd.com>.
|
||||
|
||||
## ROCm v5.3 Release Notes
|
||||
|
||||
ROCm v5.3 is now released. For ROCm v5.3 documentation, refer to
|
||||
<https://docs.amd.com>.
|
||||
|
||||
## ROCm v5.2.3 Release Notes
|
||||
|
||||
The ROCm v5.2.3 patch release is now available. The details are listed below.
|
||||
Highlights of this release include enhancements in RCCL version compatibility
|
||||
and minor bug fixes in the HIP Runtime.
|
||||
|
||||
Additionally, ROCm releases will return to use of the
|
||||
[ROCm](https://github.com/RadeonOpenCompute/ROCm) repository for
|
||||
version-controlled release notes henceforth.
|
||||
|
||||
**NOTE**: This release of ROCm is validated with the AMDGPU release v22.20.1.
|
||||
|
||||
All users of the ROCm v5.2.1 release and below are encouraged to upgrade. Refer to https://docs.amd.com for documentation associated with this release.
|
||||
|
||||
All users of the ROCm v5.2.1 release and below are encouraged to upgrade. Refer
|
||||
to <https://docs.amd.com> for documentation associated with this release.
|
||||
|
||||
## Introducing Preview Support for Ubuntu 20.04.5 HWE
|
||||
|
||||
Refer to the following article for information on the preview support for Ubuntu 20.04.5 HWE.
|
||||
|
||||
https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-22-20
|
||||
Refer to the following article for information on the preview support for
|
||||
Ubuntu 20.04.5 HWE.
|
||||
|
||||
<https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-22-20>
|
||||
|
||||
## Changes in This Release
|
||||
|
||||
### Ubuntu 18.04 End of Life
|
||||
|
||||
Support for Ubuntu 18.04 ends in this release. Future releases of ROCm will not provide prebuilt packages for Ubuntu 18.04.
|
||||
|
||||
Support for Ubuntu 18.04 ends in this release. Future releases of ROCm will not
|
||||
provide prebuilt packages for Ubuntu 18.04.
|
||||
|
||||
### HIP and Other Runtimes
|
||||
|
||||
@@ -54,46 +76,63 @@ Support for Ubuntu 18.04 ends in this release. Future releases of ROCm will not
|
||||
|
||||
##### Fixes
|
||||
|
||||
- A bug was discovered in the HIP graph capture implementation in the ROCm v5.2.0 release. If the same kernel is called twice (with different argument values) in a graph capture, the implementation only kept the argument values for the second kernel call.
|
||||
|
||||
- A bug was introduced in the hiprtc implementation in the ROCm v5.2.0 release. This bug caused the *hiprtcGetLoweredName* call to fail for named expressions with whitespace in it.
|
||||
|
||||
**Example:** The named expression ```my_sqrt<complex<double>>``` passed but ```my_sqrt<complex<double>>``` failed.
|
||||
- A bug was discovered in the HIP graph capture implementation in the ROCm
|
||||
v5.2.0 release. If the same kernel is called twice (with different argument
|
||||
values) in a graph capture, the implementation only kept the argument values
|
||||
for the second kernel call.
|
||||
- A bug was introduced in the `hiprtc` implementation in the ROCm v5.2.0
|
||||
release. This bug caused the `hiprtcGetLoweredName` call to fail for named
|
||||
expressions with whitespace in it.
|
||||
|
||||
**Example:** The named expression `my_sqrt<complex<double>>` passed but
|
||||
`my_sqrt<complex<double>>` failed.
|
||||
|
||||
### ROCm Libraries
|
||||
|
||||
#### RCCL
|
||||
|
||||
##### Added
|
||||
|
||||
- Compatibility with NCCL 2.12.10
|
||||
- Packages for test and benchmark executables on all supported OSes using CPack
|
||||
- Adding custom signal handler - opt-in with RCCL_ENABLE_SIGNALHANDLER=1
|
||||
- Additional details provided if Binary File Descriptor library (BFD) is pre-installed.
|
||||
- Adding custom signal handler - opt-in with `RCCL_ENABLE_SIGNALHANDLER=1`
|
||||
- Additional details provided if Binary File Descriptor library (BFD) is
|
||||
pre-installed.
|
||||
- Adding experimental support for using multiple ranks per device
|
||||
- Requires using a new interface to create communicator (ncclCommInitRankMulti),
|
||||
refer to the interface documentation for details.
|
||||
- To avoid potential deadlocks, user might have to set an environment variables increasing
|
||||
the number of hardware queues. For example,
|
||||
- Requires using a new interface to create communicator
|
||||
(`ncclCommInitRankMulti`), refer to the interface documentation for
|
||||
details.
|
||||
- To avoid potential deadlocks, user might have to set an environment
|
||||
variables increasing the number of hardware queues. For example,
|
||||
|
||||
```cpp
|
||||
export GPU_MAX_HW_QUEUES=16
|
||||
```
|
||||
export GPU_MAX_HW_QUEUES=16
|
||||
|
||||
```
|
||||
- Adding support for reusing ports in NET/IB channels
|
||||
- Opt-in with NCCL_IB_SOCK_CLIENT_PORT_REUSE=1 and NCCL_IB_SOCK_SERVER_PORT_REUSE=1
|
||||
- When "Call to bind failed: Address already in use" error happens in large-scale AlltoAll
|
||||
(for example, >=64 MI200 nodes), users are suggested to opt-in either one or both of the options to resolve the massive port usage issue
|
||||
- Avoid using NCCL_IB_SOCK_SERVER_PORT_REUSE when NCCL_NCHANNELS_PER_NET_PEER is tuned >1
|
||||
- Opt-in with `NCCL_IB_SOCK_CLIENT_PORT_REUSE=1` and
|
||||
`NCCL_IB_SOCK_SERVER_PORT_REUSE=1`
|
||||
- When "`Call to bind failed: Address already in use`" error happens in
|
||||
large-scale AlltoAll (for example, >=64 MI200 nodes), users are suggested
|
||||
to opt-in either one or both of the options to resolve the massive port
|
||||
usage issue
|
||||
- Avoid using `NCCL_IB_SOCK_SERVER_PORT_REUSE` when
|
||||
`NCCL_NCHANNELS_PER_NET_PEER` is tuned >1
|
||||
|
||||
##### Removed
|
||||
|
||||
- Removed experimental clique-based kernels
|
||||
|
||||
### Development Tools
|
||||
No notable changes in this release for development tools, including the compiler, profiler, and debugger.
|
||||
|
||||
No notable changes in this release for development tools, including the
|
||||
compiler, profiler, and debugger.
|
||||
|
||||
### Deployment and Management Tools
|
||||
|
||||
No notable changes in this release for deployment and management tools.
|
||||
|
||||
## Older ROCm™ Releases
|
||||
For release information for older ROCm™ releases, refer to [CHANGELOG](CHANGELOG.md).
|
||||
|
||||
For release information for older ROCm™ releases, refer to
|
||||
[CHANGELOG](CHANGELOG.md).
|
||||
|
||||
1
RELEASE.md
Normal file
@@ -0,0 +1 @@
|
||||
# Release Notes
|
||||
882
docs/sphinx/CHANGELOG.md
Normal file
@@ -0,0 +1,882 @@
|
||||
# Changelog
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
## AMD ROCm™ Releases
|
||||
|
||||
### AMD ROCm™ V5.2 Release
|
||||
|
||||
AMD ROCm v5.2 is now released. The release documentation is available at
|
||||
<https://docs.amd.com>.
|
||||
|
||||
### AMD ROCm™ V5.1.3 Release
|
||||
|
||||
AMD ROCm v5.1.3 is now released. The release documentation is available at
|
||||
<https://docs.amd.com>.
|
||||
|
||||
### AMD ROCm™ V5.1.1 Release
|
||||
|
||||
AMD ROCm v5.1.1 is now released. The release documentation is available at
|
||||
<https://docs.amd.com>.
|
||||
|
||||
### AMD ROCm™ V5.1 Release
|
||||
|
||||
AMD ROCm v5.1 is now released. The release documentation is available at
|
||||
<https://docs.amd.com>.
|
||||
|
||||
### AMD ROCm™ v5.0.2 Release Notes
|
||||
|
||||
#### Fixed Defects in This Release
|
||||
|
||||
The following defects are fixed in the ROCm v5.0.2 release.
|
||||
|
||||
##### Issue with hostcall Facility in HIP Runtime
|
||||
|
||||
In ROCm v5.0, when using the `assert()` call in a HIP kernel, the compiler may
|
||||
sometimes fail to emit kernel metadata related to the hostcall facility, which
|
||||
results in incomplete initialization of the hostcall facility in the HIP
|
||||
runtime. This can cause the HIP kernel to crash when it attempts to execute the
|
||||
`assert()` call. The root cause was an incorrect check in the compiler to
|
||||
determine whether the hostcall facility is required by the kernel. This is fixed
|
||||
in the ROCm v5.0.2 release. The resolution includes a compiler change, which
|
||||
emits the required metadata by default, unless the compiler can prove that the
|
||||
hostcall facility is not required by the kernel. This ensures that the
|
||||
`assert()` call never fails.
|
||||
|
||||
**Note**: This fix may lead to breakage in some OpenMP offload use cases, which
|
||||
use print inside a target region and result in an abort in device code.
|
||||
The issue will be fixed in a future release.
|
||||
|
||||
##### Compatibility Matrix Updates to ROCm Deep Learning Guide
|
||||
|
||||
The compatibility matrix in the AMD Deep Learning Guide is updated for ROCm
|
||||
v5.0.2.
|
||||
|
||||
For more information and documentation updates, refer to <https://docs.amd.com>.
|
||||
|
||||
### AMD ROCm™ v5.0.1 Release Notes
|
||||
|
||||
#### Deprecations and Warnings
|
||||
|
||||
##### Refactor of HIPCC/HIPCONFIG
|
||||
|
||||
In prior ROCm releases, by default, the `hipcc`/`hipconfig` Perl scripts were
|
||||
used to identify and set target compiler options, target platform, compiler, and
|
||||
runtime appropriately.
|
||||
|
||||
In ROCm v5.0.1, `hipcc.bin` and `hipconfig.bin` have been added as the compiled
|
||||
binary implementations of the `hipcc` and `hipconfig`. These new binaries are
|
||||
currently a work-in-progress, considered, and marked as experimental. ROCm plans
|
||||
to fully transition to `hipcc.bin` and `hipconfig.bin` in the a future ROCm
|
||||
release. The existing `hipcc` and `hipconfig` Perl scripts are renamed to
|
||||
`hipcc.pl` and `hipconfig.pl` respectively. New top-level `hipcc` and
|
||||
`hipconfig` Perl scripts are created, which can switch between the Perl script
|
||||
or the compiled binary based on the environment variable
|
||||
`HIPCC_USE_PERL_SCRIPT`.
|
||||
|
||||
In ROCm 5.0.1, by default, this environment variable is set to use `hipcc` and
|
||||
`hipconfig` through the Perl scripts. Subsequently, Perl scripts will no longer
|
||||
be available in ROCm in a future release.
|
||||
|
||||
#### ROCM DOCUMENTATION UPDATES FOR ROCM 5.0.1
|
||||
|
||||
- ROCm Downloads Guide
|
||||
|
||||
- ROCm Installation Guide
|
||||
|
||||
- ROCm Release Notes
|
||||
|
||||
For more information, see <https://docs.amd.com>.
|
||||
|
||||
### AMD ROCm™ v5.0 Release Notes
|
||||
|
||||
## ROCm Installation Updates
|
||||
|
||||
This document describes the features, fixed issues, and information about
|
||||
downloading and installing the AMD ROCm™ software.
|
||||
|
||||
It also covers known issues and deprecations in this release.
|
||||
|
||||
## Notice for Open-source and Closed-source ROCm Repositories in Future Releases
|
||||
|
||||
To make a distinction between open-source and closed-source components, all ROCm
|
||||
repositories will consist of sub-folders in future releases.
|
||||
|
||||
- All open-source components will be placed in the `base-url/<rocm-ver>/main`
|
||||
sub-folder
|
||||
- All closed-source components will reside in the
|
||||
`base-url/<rocm-ver>/proprietary` sub-folder
|
||||
|
||||
### List of Supported Operating Systems
|
||||
|
||||
The AMD ROCm platform supports the following operating systems:
|
||||
|
||||
| **OS-Version (64-bit)** | **Kernel Versions** |
|
||||
|:-------------------------------:|:-----------------------------:|
|
||||
| CentOS 8.3 | `4.18.0-193.el8` |
|
||||
| CentOS 7.9 | `3.10.0-1127` |
|
||||
| RHEL 8.5 | `4.18.0-348.7.1.el8_5.x86_64` |
|
||||
| RHEL 8.4 | `4.18.0-305.el8.x86_64` |
|
||||
| RHEL 7.9 | `3.10.0-1160.6.1.el7` |
|
||||
| SLES 15 SP3 | `5.3.18-59.16-default` |
|
||||
| Ubuntu 20.04.3 | `5.8.0 LTS / 5.11 HWE` |
|
||||
| Ubuntu 18.04.5 [5.4 HWE kernel] | `5.4.0-71-generic` |
|
||||
|
||||
#### Support for RHEL v8.5
|
||||
|
||||
This release extends support for RHEL v8.5.
|
||||
|
||||
#### Supported GPUs
|
||||
|
||||
##### Radeon Pro V620 and W6800 Workstation GPUs
|
||||
|
||||
This release extends ROCm support for Radeon Pro V620 and W6800 Workstation
|
||||
GPUs.
|
||||
|
||||
- SRIOV virtualization support for Radeon Pro V620
|
||||
- KVM Hypervisor (1VF support only) on Ubuntu Host OS with Ubuntu, CentOs, and
|
||||
RHEL Guest
|
||||
- Support for ROCm-SMI in an SRIOV environment. For more details, refer to the
|
||||
ROCm SMI API documentation.
|
||||
|
||||
**Note:** Radeon Pro v620 is not supported on SLES.
|
||||
|
||||
### ROCm Installation Updates for ROCm v5.0
|
||||
|
||||
This release has the following ROCm installation enhancements.
|
||||
|
||||
#### Support for Kernel Mode Driver
|
||||
|
||||
In this release, users can install the kernel-mode driver using the Installer
|
||||
method. Some of the ROCm-specific use cases that the installer currently
|
||||
supports are:
|
||||
|
||||
- OpenCL (ROCr/KFD based) runtime
|
||||
- HIP runtimes
|
||||
- ROCm libraries and applications
|
||||
- ROCm Compiler and device libraries
|
||||
- ROCr runtime and thunk
|
||||
- Kernel-mode driver
|
||||
|
||||
#### Support for Multi-version ROCm Installation and Uninstallation
|
||||
|
||||
Users now can install multiple ROCm releases simultaneously on a system using
|
||||
the newly introduced installer script and package manager install mechanism.
|
||||
|
||||
Users can also uninstall multi-version ROCm releases using the
|
||||
`amdgpu-uninstall` script and package manager.
|
||||
|
||||
#### Support for Updating Information on Local Repositories
|
||||
|
||||
In this release, the `amdgpu-install` script automates the process of updating
|
||||
local repository information before proceeding to ROCm installation.
|
||||
|
||||
#### Support for Release Upgrades
|
||||
|
||||
Users can now upgrade the existing ROCm installation to specific or latest ROCm
|
||||
releases.
|
||||
|
||||
For more details, refer to the AMD ROCm Installation Guide v5.0.
|
||||
|
||||
## AMD ROCm V5.0 Documentation Updates
|
||||
|
||||
### New AMD ROCm Information Portal – ROCm v4.5 and Above
|
||||
|
||||
Beginning ROCm release v5.0, AMD ROCm documentation has a new portal at
|
||||
<https://docs.amd.com>. This portal consists of ROCm documentation v4.5 and
|
||||
above.
|
||||
|
||||
For documentation prior to ROCm v4.5, you may continue to access
|
||||
<https://rocmdocs.amd.com>.
|
||||
|
||||
### Documentation Updates for ROCm 5.0
|
||||
|
||||
#### Deployment Tools
|
||||
|
||||
##### ROCm Data Center Tool Documentation Updates
|
||||
|
||||
- ROCm Data Center Tool User Guide
|
||||
- ROCm Data Center Tool API Guide
|
||||
|
||||
##### ROCm System Management Interface Updates
|
||||
|
||||
- System Management Interface Guide
|
||||
- System Management Interface API Guide
|
||||
|
||||
##### ROCm Command Line Interface Updates
|
||||
|
||||
- Command Line Interface Guide
|
||||
|
||||
#### Machine Learning/AI Documentation Updates
|
||||
|
||||
- Deep Learning Guide
|
||||
- MIGraphX API Guide
|
||||
- MIOpen API Guide
|
||||
- MIVisionX API Guide
|
||||
|
||||
#### ROCm Libraries Documentation Updates
|
||||
|
||||
- hipSOLVER User Guide
|
||||
- RCCL User Guide
|
||||
- rocALUTION User Guide
|
||||
- rocBLAS User Guide
|
||||
- rocFFT User Guide
|
||||
- rocRAND User Guide
|
||||
- rocSOLVER User Guide
|
||||
- rocSPARSE User Guide
|
||||
- rocThrust User Guide
|
||||
|
||||
#### Compilers and Tools
|
||||
|
||||
##### ROCDebugger Documentation Updates
|
||||
|
||||
- ROCDebugger User Guide
|
||||
- ROCDebugger API Guide
|
||||
|
||||
##### ROCTracer
|
||||
|
||||
- ROCTracer User Guide
|
||||
- ROCTracer API Guide
|
||||
|
||||
##### Compilers
|
||||
|
||||
- AMD Instinct High Performance Computing and Tuning Guide
|
||||
- AMD Compiler Reference Guide
|
||||
|
||||
##### HIPify Documentation
|
||||
|
||||
- HIPify User Guide
|
||||
- HIP Supported CUDA API Reference Guide
|
||||
|
||||
##### ROCm Debug Agent
|
||||
|
||||
- ROCm Debug Agent Guide
|
||||
- System Level Debug Guide
|
||||
- ROCm Validation Suite
|
||||
|
||||
#### Programming Models Documentation
|
||||
|
||||
##### HIP Documentation
|
||||
|
||||
- HIP Programming Guide
|
||||
- HIP API Guide
|
||||
- HIP FAQ Guide
|
||||
|
||||
##### OpenMP Documentation
|
||||
|
||||
- OpenMP Support Guide
|
||||
|
||||
#### ROCm Glossary
|
||||
|
||||
- ROCm Glossary – Terms and Definitions
|
||||
|
||||
### AMD ROCm Legacy Documentation Links – ROCm v4.3 and Prior
|
||||
|
||||
- For AMD ROCm documentation, see <https://rocmdocs.amd.com/en/latest/>
|
||||
|
||||
- For installation instructions on supported platforms, see
|
||||
<https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html>
|
||||
|
||||
- For AMD ROCm binary structure, see
|
||||
<https://rocmdocs.amd.com/en/latest/Installation_Guide/Software-Stack-for-AMD-GPU.html>
|
||||
|
||||
- For AMD ROCm release history, see
|
||||
<https://rocmdocs.amd.com/en/latest/Current_Release_Notes/ROCm-Version-History.html>
|
||||
|
||||
## What's New in This Release
|
||||
|
||||
### HIP Enhancements
|
||||
|
||||
The ROCm v5.0 release consists of the following HIP enhancements.
|
||||
|
||||
#### HIP Installation Guide Updates
|
||||
|
||||
The HIP Installation Guide is updated to include building HIP from source on the
|
||||
NVIDIA platform.
|
||||
|
||||
Refer to the HIP Installation Guide v5.0 for more details.
|
||||
|
||||
#### Managed Memory Allocation
|
||||
|
||||
Managed memory, including the `__managed__` keyword, is now supported in the HIP
|
||||
combined host/device compilation. Through unified memory allocation, managed
|
||||
memory allows data to be shared and accessible to both the CPU and GPU using a
|
||||
single pointer. The allocation is managed by the AMD GPU driver using the Linux
|
||||
Heterogeneous Memory Management (HMM) mechanism. The user can call managed
|
||||
memory API `hipMallocManaged` to allocate a large chunk of HMM memory, execute
|
||||
kernels on a device, and fetch data between the host and device as needed.
|
||||
|
||||
**Note:** In a HIP application, it is recommended to do a capability check
|
||||
before calling the managed memory APIs. For example,
|
||||
|
||||
```cpp
|
||||
int managed_memory = 0;
|
||||
HIPCHECK(hipDeviceGetAttribute(&managed_memory, hipDeviceAttributeManagedMemory, p_gpuDevice));
|
||||
|
||||
if (!managed_memory) {
|
||||
printf ("info: managed memory access not supported on the device %d\n Skipped\n", p_gpuDevice);
|
||||
} else {
|
||||
HIPCHECK(hipSetDevice(p_gpuDevice));
|
||||
HIPCHECK(hipMallocManaged(&Hmm, N * sizeof(T)));
|
||||
. . .
|
||||
}
|
||||
```
|
||||
|
||||
**Note:** The managed memory capability check may not be necessary; however, if
|
||||
HMM is not supported, managed `malloc` will fall back to using system memory.
|
||||
|
||||
Refer to the HIP API documentation for more details on managed memory APIs.
|
||||
|
||||
For the application, see
|
||||
[hipMallocManaged.cpp](https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-4.5.x/tests/src/runtimeApi/memory/hipMallocManaged.cpp)
|
||||
|
||||
### New Environment Variable
|
||||
|
||||
The following new environment variable is added in this release:
|
||||
|
||||
| **Environment Variable** | **Value** | **Description** |
|
||||
|:------------------------:|:---------------------:|:--------------------------------------------------------|
|
||||
| `HSA_COOP_CU_COUNT` | 0 or 1 (default is 0) | Some processors support more compute units than can reliably be used in a cooperative dispatch. Setting the environment variable `HSA_COOP_CU_COUNT` to 1 will cause ROCr to return the correct CU count for cooperative groups through the `HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT` attribute of `hsa_agent_get_info()`. Setting `HSA_COOP_CU_COUNT` to other values, or leaving it unset, will cause ROCr to return the same CU count for the attributes `HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT` and `HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT`. Future ROCm releases will make `HSA_COOP_CU_COUNT = 1` the default. |
|
||||
|
||||
### ROCm Math and Communication Libraries
|
||||
|
||||
| **Library** | **Changes** |
|
||||
|:--------------:|:----------------------------------------------------------------------------------------|
|
||||
| **rocBLAS** | **Added** <ul><li>Added `rocblas_get_version_string_size` convenience function</li><li>Added `rocblas_xtrmm_outofplace`, an out-of-place version of `rocblas_xtrmm`</li><li>Added hpl and trig initialization for `gemm_ex` to `rocblas-bench`</li><li>Added source code gemm. It can be used as an alternative to Tensile for debugging and development</li><li>Added option `ROCM_MATHLIBS_API_USE_HIP_COMPLEX` to opt-in to use `hipFloatComplex` and `hipDoubleComplex`</li></ul> **Optimizations** <ul><li>Improved performance of non-batched and batched single-precision GER for size m > 1024. Performance enhanced by 5-10% measured on a MI100 (gfx908) GPU.</li><li>Improved performance of non-batched and batched HER for all sizes and data types. Performance enhanced by 2-17% measured on a MI100 (gfx908) GPU.</li></ul> **Changed** <ul><li>Instantiate templated rocBLAS functions to reduce size of librocblas.so</li><li>Removed static library dependency on msgpack</li><li>Removed boost dependencies for clients</li></ul> **Fixed** <ul><li>Option to install script to build only rocBLAS clients with a pre-built rocBLAS library</li><li>Correctly set output of `nrm2_batched_ex` and `nrm2_strided_batched_ex` when given bad input</li><li>Fix for dgmm with side == `rocblas_side_left` and a negative incx</li><li>Fixed out-of-bounds read for small trsm</li><li>Fixed numerical checking for `tbmv_strided_batched`</li></ul> |
|
||||
| | |
|
||||
| **hipBLAS** | **Added** <ul><li>Added rocSOLVER functions to hipblas-bench</li><li>Added option `ROCM_MATHLIBS_API_USE_HIP_COMPLEX` to opt-in to use `hipFloatComplex` and `hipDoubleComplex`</li><li>Added compilation warning for future trmm changes</li><li>Added documentation to `hipblas.h`</li><li>Added option to forgo pivoting for getrf and getri when ipiv is `nullptr`</li><li>Added code coverage option</li></ul> **Fixed** <ul><li>Fixed use of incorrect `HIP_PATH` when building from source.</li><li>Fixed windows packaging</li><li>Allowing negative increments in hipblas-bench</li><li>Removed boost dependency</li></ul> |
|
||||
| | |
|
||||
| **rocFFT** | **Changed** <ul><li>Enabled runtime compilation of single FFT kernels > length 1024.</li><li>Re-aligned split device library into 4 roughly equal libraries.</li><li>Implemented the FuseShim framework to replace the original OptimizePlan</li><li>Implemented the generic buffer-assignment framework. The buffer assignment is no longer performed by each node. A generic algorithm is designed to test and pick the best assignment path. With the help of FuseShim, more kernel-fusions are achieved.</li><li>Do not read the imaginary part of the DC and Nyquist modes for even-length complex-to-real transforms.</li></ul> **Optimizations** <ul><li>Optimized twiddle-conjugation; complex-to-complex inverse transforms have similar performance to foward transforms now.</li><li>Improved performance of single-kernel small 2D transforms.</li></ul> |
|
||||
| | |
|
||||
| **hipFFT** | **Fixed** <ul><li>Fixed incorrect reporting of rocFFT version.</li></ul> **Changed** <ul><li>Unconditionally enabled callback functionality. On the CUDA backend, callbacks only run correctly when hipFFT is built as a static library, and is linked against the static cuFFT library.</li></ul> |
|
||||
| | |
|
||||
| **rocSPARSE** | **Added** <ul><li>csrmv, coomv, ellmv, hybmv for (conjugate) transposed matricescsrmv for symmetric matrices</li></ul> **Changed** <ul><li>`spmm_ex` is now deprecated and will be removed in the next major release</li></ul> **Improved** <ul><li>Optimization for gtsv</li></ul> |
|
||||
| | |
|
||||
| **hipSPARSE** | **Added** <ul><li>Added (conjugate) transpose support for csrmv, hybmv and spmv routines</li></ul> |
|
||||
| | |
|
||||
| **rocALUTION** | **Changed** <ul><li>Removed deprecated GlobalPairwiseAMG class, please use PairwiseAMG instead.</li></ul> **Improved** <ul><li>Improved documentation</li></ul> |
|
||||
| | |
|
||||
| **rocTHRUST** | **Updates** <ul><li>Updated to match upstream Thrust 1.13.0</li><li>Updated to match upstream Thrust 1.14.0</li><li>Added async scan</li></ul> **Changed** <ul><li>Scan algorithms: `inclusive_scan` now uses the input-type as accumulator-type, `exclusive_scan` uses initial-value-type. This particularly changes behaviour of small-size input types with large-size output types (e.g. `short` input, `int` output). And low-res input with high-res output (e.g. float input, double output)</li></ul> |
|
||||
| | |
|
||||
| **rocSOLVER** | **Added** <ul><li>Symmetric matrix factorizations: <ul><li>LASYF</li><li>SYTF2, SYTRF (with `batched` and `strided_batched` versions)</li></ul><li>Added `rocsolver_get_version_string_size` to help with version string queries</li><li>Added `rocblas_layer_mode_ex` and the ability to print kernel calls in the trace and profile logs</li><li>Expanded batched and `strided_batched` sample programs.</li></ul> **Optimizations** <ul><li>Improved general performance of LU factorization</li><li>Increased parallelism of specialized kernels when compiling from source, reducing build times on multi-core systems.</li></ul> **Changed** <ul><li>The rocsolver-test client now prints the rocSOLVER version used to run the tests, rather than the version used to build them</li><li>The rocsolver-bench client now prints the rocSOLVER version used in the benchmark</li></ul> **Fixed** <ul><li>Added missing `stdint.h` include to `rocsolver.h`</li></ul> |
|
||||
| | |
|
||||
| **hipSOLVER** | **Added** <ul><li>Added SYTRF functions: `hipsolverSsytrf_bufferSize`, `hipsolverDsytrf_bufferSize`, `hipsolverCsytrf_bufferSize`, `hipsolverZsytrf_bufferSize`, `hipsolverSsytrf`, `hipsolverDsytrf`, `hipsolverCsytrf`, `hipsolverZsytrf`</li></ul> **Fixed** <ul><li>Fixed use of incorrect `HIP_PATH` when building from source</li></ul> |
|
||||
| | |
|
||||
| **RCCL** | **Added** <ul><li>Compatibility with NCCL 2.10.3</li></ul> **Known issues** <ul><li>Managed memory is not currently supported for clique-based kernels</li></ul> |
|
||||
| | |
|
||||
| **hipCUB** | **Fixed** <ul><li>Added missing includes to `hipcub.hpp`</li></ul> **Added** <ul><li>Bfloat16 support to test cases (`device_reduce` & `device_radix_sort`)</li><li>Device merge sort</li><li>Block merge sort</li><li>API update to CUB 1.14.0</li></ul> **Changed** <ul><li>The `SetupNVCC.cmake` automatic target selector select all of the capabalities of all available card for NVIDIA backend.</li></ul> |
|
||||
| | |
|
||||
| **rocPRIM** | **Fixed** <ul><li>Enable `bfloat16` tests and reduce threshold for `bfloat16`</li><li>Fix device scan `limit_size` feature</li><li>Non-optimized builds no longer trigger local memory limit errors</li></ul> **Added** <ul><li>Scan size limit feature</li><li>Reduce size limit feature</li><li>Transform size limit feature</li><li>Add `block_load_striped` and `block_store_striped`</li><li>Add `gather_to_blocked` to gather values from other threads into a blocked arrangement</li><li>The block sizes for device merge sorts initial block sort and its merge steps are now separate in its kernel config (the block sort step supports multiple items per thread)</li></ul> **Changed** <ul><li>`size_limit` for scan, reduce and transform can now be set in the config struct instead of a parameter</li><li>`device_scan` and `device_segmented_scan`: `inclusive_scan` now uses the input-type as accumulator-type, `exclusive_scan` uses initial-value-type. This particularly changes behaviour of small-size input types with large-size output types (e.g. `short` input, `int` output) and low-res input with high-res output (e.g. `float` input, `double` output)</li><li>Revert old Fiji workaround, because the issue was solved at compiler side</li><li>Update `README` cmake minimum version number</li><li>Block sort support multiple items per thread. Currently only powers of two block sizes, and items per threads are supported and only for full blocks</li><li>Bumped the minimum required version of CMake to 3.16</li></ul> **Known issues** <ul><li>Unit tests may soft hang on MI200 when running in `hipMallocManaged` mode.</li><li>`device_segmented_radix_sort`, `device_scan` unit tests failing for HIP on `WindowsReduceEmptyInput` cause random failure with `bfloat16`</li><li>Managed memory is not currently supported for clique-based kernels</li></ul> |
|
||||
|
||||
### System Management Interface
|
||||
|
||||
#### Clock Throttling for GPU Events
|
||||
|
||||
This feature lists GPU events as they occur in real-time and can be used with
|
||||
`kfdtest` to produce `vm_fault` events for testing.
|
||||
|
||||
The command can be called with either `-e` or `--showevents` like this:
|
||||
|
||||
```bash
|
||||
-e [EVENT [EVENT ...]], --showevents [EVENT [EVENT ...]] Show event list
|
||||
```
|
||||
|
||||
Where `EVENT` is any list combination of `VM_FAULT`, `THERMAL_THROTTLE`, or
|
||||
`GPU_RESET` and is **NOT** case sensitive.
|
||||
|
||||
**Note:** If no event arguments are passed, all events will be watched by
|
||||
default.
|
||||
|
||||
##### CLI Commands
|
||||
|
||||
```bash
|
||||
$ rocm-smi --showevents vm_fault thermal_throttle gpu_reset
|
||||
|
||||
======================= ROCm System Management Interface =======================
|
||||
================================= Show Events ==================================
|
||||
press 'q' or 'ctrl + c' to quit
|
||||
DEVICE TIME TYPE DESCRIPTION
|
||||
|
||||
============================= End of ROCm SMI Log ==============================
|
||||
```
|
||||
|
||||
(Run `kfdtest` in another window to test for `vm_fault` events.)
|
||||
|
||||
**Note:** Unlike other rocm-smi CLI commands, this command does not quit unless
|
||||
specified by the user. Users may press either `q` or `ctrl + c` to quit.
|
||||
|
||||
#### Display XGMI Bandwidth Between Nodes
|
||||
|
||||
The `rsmi_minmax_bandwidth_get` API reads the HW Topology file and displays
|
||||
bandwidth (min-max) between any two NUMA nodes in a matrix format.
|
||||
|
||||
The Command Line Interface (CLI) command can be called as follows:
|
||||
|
||||
```bash
|
||||
$ rocm-smi --shownodesbw
|
||||
|
||||
======================= ROCm System Management Interface =======================
|
||||
================================== Bandwidth ===================================
|
||||
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7
|
||||
GPU0 N/A 50000-200000 50000-50000 0-0 0-0 0-0 50000-100000 0-0
|
||||
GPU1 50000-200000 N/A 0-0 50000-50000 0-0 50000-50000 0-0 0-0
|
||||
GPU2 50000-50000 0-0 N/A 50000-200000 50000-100000 0-0 0-0 0-0
|
||||
GPU3 0-0 50000-50000 50000-200000 N/A 0-0 0-0 0-0 50000-50000
|
||||
GPU4 0-0 0-0 50000-100000 0-0 N/A 50000-200000 50000-50000 0-0
|
||||
GPU5 0-0 50000-50000 0-0 0-0 50000-200000 N/A 0-0 50000-50000
|
||||
GPU6 50000-100000 0-0 0-0 0-0 50000-50000 0-0 N/A 50000-200000
|
||||
GPU7 0-0 0-0 0-0 50000-50000 0-0 50000-50000 50000-200000 N/A
|
||||
Format: min-max; Units: mps
|
||||
============================= End of ROCm SMI Log ==============================
|
||||
```
|
||||
|
||||
The sample output above shows the maximum theoretical xgmi bandwidth between 2
|
||||
numa nodes,
|
||||
|
||||
**Note:** "0-0" min-max bandwidth indicates devices are not connected directly.
|
||||
|
||||
#### P2P Connection Status
|
||||
|
||||
The `rsmi_is_p2p_accessible` API returns `True` if P2P can be implemented
|
||||
between two nodes, and returns `False` if P2P cannot be implemented between the
|
||||
two nodes.
|
||||
|
||||
The Command Line Interface command can be called as follows:
|
||||
|
||||
```bash
|
||||
rocm-smi --showtopoaccess
|
||||
```
|
||||
|
||||
Sample Output:
|
||||
|
||||
```bash
|
||||
$ rocm-smi --showtopoaccess
|
||||
======================= ROCm System Management Interface =======================
|
||||
===================== Link accessibility between two GPUs ======================
|
||||
GPU0 GPU1
|
||||
GPU0 True True
|
||||
GPU1 True True
|
||||
============================= End of ROCm SMI Log ==============================
|
||||
```
|
||||
|
||||
## Breaking Changes
|
||||
|
||||
### Runtime Breaking Change
|
||||
|
||||
Re-ordering of the enumerated type in `hip_runtime_api.h` to better match CUDA.
|
||||
See below for the difference in enumerated types.
|
||||
|
||||
ROCm software will be affected if any of the defined enums listed below are used
|
||||
in the code. Applications built with ROCm v5.0 enumerated types will work with a
|
||||
ROCm 4.5.2 driver. However, an undefined behavior error will occur with a ROCm
|
||||
v4.5.2 application that uses these enumerated types with a ROCm 5.0 runtime.
|
||||
|
||||
```c
|
||||
typedef enum hipDeviceAttribute_t {
|
||||
hipDeviceAttributeMaxThreadsPerBlock, // Maximum number of threads per block.
|
||||
hipDeviceAttributeMaxBlockDimX, // Maximum x-dimension of a block.
|
||||
hipDeviceAttributeMaxBlockDimY, // Maximum y-dimension of a block.
|
||||
hipDeviceAttributeMaxBlockDimZ, // Maximum z-dimension of a block.
|
||||
hipDeviceAttributeMaxGridDimX, // Maximum x-dimension of a grid.
|
||||
hipDeviceAttributeMaxGridDimY, // Maximum y-dimension of a grid.
|
||||
hipDeviceAttributeMaxGridDimZ, // Maximum z-dimension of a grid.
|
||||
hipDeviceAttributeMaxSharedMemoryPerBlock, // Maximum shared memory available per block in bytes.
|
||||
hipDeviceAttributeTotalConstantMemory, // Constant memory size in bytes.
|
||||
hipDeviceAttributeWarpSize, // Warp size in threads.
|
||||
hipDeviceAttributeMaxRegistersPerBlock, // Maximum number of 32-bit registers available to a
|
||||
// thread block. This number is shared by all thread
|
||||
// blocks simultaneously resident on a
|
||||
// multiprocessor.
|
||||
hipDeviceAttributeClockRate, // Peak clock frequency in kilohertz.
|
||||
hipDeviceAttributeMemoryClockRate, // Peak memory clock frequency in kilohertz.
|
||||
hipDeviceAttributeMemoryBusWidth, // Global memory bus width in bits.
|
||||
hipDeviceAttributeMultiprocessorCount, // Number of multiprocessors on the device.
|
||||
hipDeviceAttributeComputeMode, // Compute mode that device is currently in.
|
||||
hipDeviceAttributeL2CacheSize, // Size of L2 cache in bytes. 0 if the device doesn't have L2
|
||||
// cache.
|
||||
hipDeviceAttributeMaxThreadsPerMultiProcessor, // Maximum resident threads per
|
||||
// multiprocessor.
|
||||
hipDeviceAttributeComputeCapabilityMajor, // Major compute capability version number.
|
||||
hipDeviceAttributeComputeCapabilityMinor, // Minor compute capability version number.
|
||||
hipDeviceAttributeConcurrentKernels, // Device can possibly execute multiple kernels
|
||||
// concurrently.
|
||||
hipDeviceAttributePciBusId, // PCI Bus ID.
|
||||
hipDeviceAttributePciDeviceId, // PCI Device ID.
|
||||
hipDeviceAttributeMaxSharedMemoryPerMultiprocessor, // Maximum Shared Memory Per
|
||||
// Multiprocessor.
|
||||
hipDeviceAttributeIsMultiGpuBoard, // Multiple GPU devices.
|
||||
hipDeviceAttributeIntegrated, // iGPU
|
||||
hipDeviceAttributeCooperativeLaunch, // Support cooperative launch
|
||||
hipDeviceAttributeCooperativeMultiDeviceLaunch, // Support cooperative launch on multiple devices
|
||||
hipDeviceAttributeMaxTexture1DWidth, // Maximum number of elements in 1D images
|
||||
hipDeviceAttributeMaxTexture2DWidth, // Maximum dimension width of 2D images in image elements
|
||||
hipDeviceAttributeMaxTexture2DHeight, // Maximum dimension height of 2D images in image elements
|
||||
hipDeviceAttributeMaxTexture3DWidth, // Maximum dimension width of 3D images in image elements
|
||||
hipDeviceAttributeMaxTexture3DHeight, // Maximum dimensions height of 3D images in image elements
|
||||
hipDeviceAttributeMaxTexture3DDepth, // Maximum dimensions depth of 3D images in image elements
|
||||
hipDeviceAttributeCudaCompatibleBegin = 0,
|
||||
hipDeviceAttributeHdpMemFlushCntl, // Address of the HDP\_MEM\_COHERENCY\_FLUSH\_CNTL register
|
||||
hipDeviceAttributeHdpRegFlushCntl, // Address of the HDP\_REG\_COHERENCY\_FLUSH\_CNTL register
|
||||
hipDeviceAttributeEccEnabled = hipDeviceAttributeCudaCompatibleBegin, // Whether ECC support is enabled.
|
||||
hipDeviceAttributeAccessPolicyMaxWindowSize, // Cuda only. The maximum size of the window policy in bytes.
|
||||
hipDeviceAttributeAsyncEngineCount, // Cuda only. Asynchronous engines number.
|
||||
hipDeviceAttributeCanMapHostMemory, // Whether host memory can be mapped into device address space
|
||||
hipDeviceAttributeCanUseHostPointerForRegisteredMem, // Cuda only. Device can access host registered memory
|
||||
// at the same virtual address as the CPU
|
||||
hipDeviceAttributeClockRate, // Peak clock frequency in kilohertz.
|
||||
hipDeviceAttributeComputeMode, // Compute mode that device is currently in.
|
||||
hipDeviceAttributeComputePreemptionSupported, // Cuda only. Device supports Compute Preemption.
|
||||
hipDeviceAttributeConcurrentKernels, // Device can possibly execute multiple kernels concurrently.
|
||||
hipDeviceAttributeConcurrentManagedAccess, // Device can coherently access managed memory concurrently with the CPU
|
||||
hipDeviceAttributeCooperativeLaunch, // Support cooperative launch
|
||||
hipDeviceAttributeCooperativeMultiDeviceLaunch, // Support cooperative launch on multiple devices
|
||||
hipDeviceAttributeDeviceOverlap, // Cuda only. Device can concurrently copy memory and execute a kernel.
|
||||
// Deprecated. Use instead asyncEngineCount.
|
||||
hipDeviceAttributeDirectManagedMemAccessFromHost, // Host can directly access managed memory on
|
||||
// the device without migration
|
||||
hipDeviceAttributeGlobalL1CacheSupported, // Cuda only. Device supports caching globals in L1
|
||||
hipDeviceAttributeHostNativeAtomicSupported, // Cuda only. Link between the device and the host supports native atomic operations
|
||||
hipDeviceAttributeIntegrated, // Device is integrated GPU
|
||||
hipDeviceAttributeIsMultiGpuBoard, // Multiple GPU devices.
|
||||
hipDeviceAttributeKernelExecTimeout, // Run time limit for kernels executed on the device
|
||||
hipDeviceAttributeL2CacheSize, // Size of L2 cache in bytes. 0 if the device doesn't have L2 cache.
|
||||
hipDeviceAttributeLocalL1CacheSupported, // caching locals in L1 is supported
|
||||
hipDeviceAttributeLuid, // Cuda only. 8-byte locally unique identifier in 8 bytes. Undefined on TCC and non-Windows platforms
|
||||
hipDeviceAttributeLuidDeviceNodeMask, // Cuda only. Luid device node mask. Undefined on TCC and non-Windows platforms
|
||||
hipDeviceAttributeComputeCapabilityMajor, // Major compute capability version number.
|
||||
hipDeviceAttributeManagedMemory, // Device supports allocating managed memory on this system
|
||||
hipDeviceAttributeMaxBlocksPerMultiProcessor, // Cuda only. Max block size per multiprocessor
|
||||
hipDeviceAttributeMaxBlockDimX, // Max block size in width.
|
||||
hipDeviceAttributeMaxBlockDimY, // Max block size in height.
|
||||
hipDeviceAttributeMaxBlockDimZ, // Max block size in depth.
|
||||
hipDeviceAttributeMaxGridDimX, // Max grid size in width.
|
||||
hipDeviceAttributeMaxGridDimY, // Max grid size in height.
|
||||
hipDeviceAttributeMaxGridDimZ, // Max grid size in depth.
|
||||
hipDeviceAttributeMaxSurface1D, // Maximum size of 1D surface.
|
||||
hipDeviceAttributeMaxSurface1DLayered, // Cuda only. Maximum dimensions of 1D layered surface.
|
||||
hipDeviceAttributeMaxSurface2D, // Maximum dimension (width, height) of 2D surface.
|
||||
hipDeviceAttributeMaxSurface2DLayered, // Cuda only. Maximum dimensions of 2D layered surface.
|
||||
hipDeviceAttributeMaxSurface3D, // Maximum dimension (width, height, depth) of 3D surface.
|
||||
hipDeviceAttributeMaxSurfaceCubemap, // Cuda only. Maximum dimensions of Cubemap surface.
|
||||
hipDeviceAttributeMaxSurfaceCubemapLayered, // Cuda only. Maximum dimension of Cubemap layered surface.
|
||||
hipDeviceAttributeMaxTexture1DWidth, // Maximum size of 1D texture.
|
||||
hipDeviceAttributeMaxTexture1DLayered, // Cuda only. Maximum dimensions of 1D layered texture.
|
||||
hipDeviceAttributeMaxTexture1DLinear, // Maximum number of elements allocatable in a 1D linear texture.
|
||||
// Use cudaDeviceGetTexture1DLinearMaxWidth() instead on Cuda.
|
||||
hipDeviceAttributeMaxTexture1DMipmap, // Cuda only. Maximum size of 1D mipmapped texture.
|
||||
hipDeviceAttributeMaxTexture2DWidth, // Maximum dimension width of 2D texture.
|
||||
hipDeviceAttributeMaxTexture2DHeight, // Maximum dimension hight of 2D texture.
|
||||
hipDeviceAttributeMaxTexture2DGather, // Cuda only. Maximum dimensions of 2D texture if gather operations performed.
|
||||
hipDeviceAttributeMaxTexture2DLayered, // Cuda only. Maximum dimensions of 2D layered texture.
|
||||
hipDeviceAttributeMaxTexture2DLinear, // Cuda only. Maximum dimensions (width, height, pitch) of 2D textures bound to pitched memory.
|
||||
hipDeviceAttributeMaxTexture2DMipmap, // Cuda only. Maximum dimensions of 2D mipmapped texture.
|
||||
hipDeviceAttributeMaxTexture3DWidth, // Maximum dimension width of 3D texture.
|
||||
hipDeviceAttributeMaxTexture3DHeight, // Maximum dimension height of 3D texture.
|
||||
hipDeviceAttributeMaxTexture3DDepth, // Maximum dimension depth of 3D texture.
|
||||
hipDeviceAttributeMaxTexture3DAlt, // Cuda only. Maximum dimensions of alternate 3D texture.
|
||||
hipDeviceAttributeMaxTextureCubemap, // Cuda only. Maximum dimensions of Cubemap texture
|
||||
hipDeviceAttributeMaxTextureCubemapLayered, // Cuda only. Maximum dimensions of Cubemap layered texture.
|
||||
hipDeviceAttributeMaxThreadsDim, // Maximum dimension of a block
|
||||
hipDeviceAttributeMaxThreadsPerBlock, // Maximum number of threads per block.
|
||||
hipDeviceAttributeMaxThreadsPerMultiProcessor, // Maximum resident threads per multiprocessor.
|
||||
hipDeviceAttributeMaxPitch, // Maximum pitch in bytes allowed by memory copies
|
||||
hipDeviceAttributeMemoryBusWidth, // Global memory bus width in bits.
|
||||
hipDeviceAttributeMemoryClockRate, // Peak memory clock frequency in kilohertz.
|
||||
hipDeviceAttributeComputeCapabilityMinor, // Minor compute capability version number.
|
||||
hipDeviceAttributeMultiGpuBoardGroupID, // Cuda only. Unique ID of device group on the same multi-GPU board
|
||||
hipDeviceAttributeMultiprocessorCount, // Number of multiprocessors on the device.
|
||||
hipDeviceAttributeName, // Device name.
|
||||
hipDeviceAttributePageableMemoryAccess, // Device supports coherently accessing pageable memory
|
||||
// without calling hipHostRegister on it
|
||||
hipDeviceAttributePageableMemoryAccessUsesHostPageTables, // Device accesses pageable memory via the host's page tables
|
||||
hipDeviceAttributePciBusId, // PCI Bus ID.
|
||||
hipDeviceAttributePciDeviceId, // PCI Device ID.
|
||||
hipDeviceAttributePciDomainID, // PCI Domain ID.
|
||||
hipDeviceAttributePersistingL2CacheMaxSize, // Cuda11 only. Maximum l2 persisting lines capacity in bytes
|
||||
hipDeviceAttributeMaxRegistersPerBlock, // 32-bit registers available to a thread block. This number is shared
|
||||
// by all thread blocks simultaneously resident on a multiprocessor.
|
||||
hipDeviceAttributeMaxRegistersPerMultiprocessor, // 32-bit registers available per block.
|
||||
hipDeviceAttributeReservedSharedMemPerBlock, // Cuda11 only. Shared memory reserved by CUDA driver per block.
|
||||
hipDeviceAttributeMaxSharedMemoryPerBlock, // Maximum shared memory available per block in bytes.
|
||||
hipDeviceAttributeSharedMemPerBlockOptin, // Cuda only. Maximum shared memory per block usable by special opt in.
|
||||
hipDeviceAttributeSharedMemPerMultiprocessor, // Cuda only. Shared memory available per multiprocessor.
|
||||
hipDeviceAttributeSingleToDoublePrecisionPerfRatio, // Cuda only. Performance ratio of single precision to double precision.
|
||||
hipDeviceAttributeStreamPrioritiesSupported, // Cuda only. Whether to support stream priorities.
|
||||
hipDeviceAttributeSurfaceAlignment, // Cuda only. Alignment requirement for surfaces
|
||||
hipDeviceAttributeTccDriver, // Cuda only. Whether device is a Tesla device using TCC driver
|
||||
hipDeviceAttributeTextureAlignment, // Alignment requirement for textures
|
||||
hipDeviceAttributeTexturePitchAlignment, // Pitch alignment requirement for 2D texture references bound to pitched memory;
|
||||
hipDeviceAttributeTotalConstantMemory, // Constant memory size in bytes.
|
||||
hipDeviceAttributeTotalGlobalMem, // Global memory available on devicice.
|
||||
hipDeviceAttributeUnifiedAddressing, // Cuda only. An unified address space shared with the host.
|
||||
hipDeviceAttributeUuid, // Cuda only. Unique ID in 16 byte.
|
||||
hipDeviceAttributeWarpSize, // Warp size in threads.
|
||||
hipDeviceAttributeMaxPitch, // Maximum pitch in bytes allowed by memory copies
|
||||
hipDeviceAttributeTextureAlignment, //Alignment requirement for textures
|
||||
hipDeviceAttributeTexturePitchAlignment, //Pitch alignment requirement for 2D texture references bound to pitched memory;
|
||||
hipDeviceAttributeKernelExecTimeout, //Run time limit for kernels executed on the device
|
||||
hipDeviceAttributeCanMapHostMemory, //Device can map host memory into device address space
|
||||
hipDeviceAttributeEccEnabled, //Device has ECC support enabled
|
||||
hipDeviceAttributeCudaCompatibleEnd = 9999,
|
||||
hipDeviceAttributeAmdSpecificBegin = 10000,
|
||||
hipDeviceAttributeCooperativeMultiDeviceUnmatchedFunc, // Supports cooperative launch on multiple
|
||||
// devices with unmatched functions
|
||||
hipDeviceAttributeCooperativeMultiDeviceUnmatchedGridDim, // Supports cooperative launch on multiple
|
||||
// devices with unmatched grid dimensions
|
||||
hipDeviceAttributeCooperativeMultiDeviceUnmatchedBlockDim, // Supports cooperative launch on multiple
|
||||
// devices with unmatched block dimensions
|
||||
hipDeviceAttributeCooperativeMultiDeviceUnmatchedSharedMem, // Supports cooperative launch on multiple
|
||||
// devices with unmatched shared memories
|
||||
hipDeviceAttributeAsicRevision, // Revision of the GPU in this device
|
||||
hipDeviceAttributeManagedMemory, // Device supports allocating managed memory on this system
|
||||
hipDeviceAttributeDirectManagedMemAccessFromHost, // Host can directly access managed memory on
|
||||
// the device without migration
|
||||
hipDeviceAttributeConcurrentManagedAccess, // Device can coherently access managed memory
|
||||
// concurrently with the CPU
|
||||
hipDeviceAttributePageableMemoryAccess, // Device supports coherently accessing pageable memory
|
||||
// without calling hipHostRegister on it
|
||||
hipDeviceAttributePageableMemoryAccessUsesHostPageTables, // Device accesses pageable memory via
|
||||
// the host's page tables
|
||||
hipDeviceAttributeCanUseStreamWaitValue // '1' if Device supports hipStreamWaitValue32() and
|
||||
// hipStreamWaitValue64(), '0' otherwise.
|
||||
hipDeviceAttributeClockInstructionRate = hipDeviceAttributeAmdSpecificBegin, // Frequency in khz of the timer used by the device-side "clock"
|
||||
hipDeviceAttributeArch, // Device architecture
|
||||
hipDeviceAttributeMaxSharedMemoryPerMultiprocessor, // Maximum Shared Memory PerMultiprocessor.
|
||||
hipDeviceAttributeGcnArch, // Device gcn architecture
|
||||
hipDeviceAttributeGcnArchName, // Device gcnArch name in 256 bytes
|
||||
hipDeviceAttributeHdpMemFlushCntl, // Address of the HDP_MEM_COHERENCY_FLUSH_CNTL register
|
||||
hipDeviceAttributeHdpRegFlushCntl, // Address of the HDP_REG_COHERENCY_FLUSH_CNTL register
|
||||
hipDeviceAttributeCooperativeMultiDeviceUnmatchedFunc, // Supports cooperative launch on multiple
|
||||
// devices with unmatched functions
|
||||
hipDeviceAttributeCooperativeMultiDeviceUnmatchedGridDim, // Supports cooperative launch on multiple
|
||||
// devices with unmatched grid dimensions
|
||||
hipDeviceAttributeCooperativeMultiDeviceUnmatchedBlockDim, // Supports cooperative launch on multiple
|
||||
// devices with unmatched block dimensions
|
||||
hipDeviceAttributeCooperativeMultiDeviceUnmatchedSharedMem, // Supports cooperative launch on multiple
|
||||
// devices with unmatched shared memories
|
||||
hipDeviceAttributeIsLargeBar, // Whether it is LargeBar
|
||||
hipDeviceAttributeAsicRevision, // Revision of the GPU in this device
|
||||
hipDeviceAttributeCanUseStreamWaitValue, // '1' if Device supports hipStreamWaitValue32() and
|
||||
// hipStreamWaitValue64() , '0' otherwise.
|
||||
hipDeviceAttributeAmdSpecificEnd = 19999,
|
||||
hipDeviceAttributeVendorSpecificBegin = 20000, // Extended attributes for vendors
|
||||
} hipDeviceAttribute_t;
|
||||
```
|
||||
|
||||
## Known Issues in This Release
|
||||
|
||||
### Incorrect dGPU Behavior When Using AMDVBFlash Tool
|
||||
|
||||
The AMDVBFlash tool, used for flashing the VBIOS image to dGPU, does not
|
||||
communicate with the ROM Controller specifically when the driver is present.
|
||||
This is because the driver, as part of its runtime power management feature,
|
||||
puts the dGPU to a sleep state.
|
||||
|
||||
As a workaround, users can run `amdgpu.runpm=0`, which temporarily disables the
|
||||
runtime power management feature from the driver and dynamically changes some
|
||||
power control-related sysfs files.
|
||||
|
||||
### Issue with START Timestamp in ROCProfiler
|
||||
|
||||
Users may encounter an issue with the enabled timestamp functionality for
|
||||
monitoring one or multiple counters. ROCProfiler outputs the following four
|
||||
timestamps for each kernel:
|
||||
|
||||
- Dispatch
|
||||
- Start
|
||||
- End
|
||||
- Complete
|
||||
|
||||
#### Issue
|
||||
|
||||
This defect is related to the Start timestamp functionality, which incorrectly
|
||||
shows an earlier time than the Dispatch timestamp.
|
||||
|
||||
To reproduce the issue,
|
||||
|
||||
1. Enable timing using the `--timestamp on` flag.
|
||||
2. Use the `-i` option with the input filename that contains the name of the
|
||||
counter(s) to monitor.
|
||||
3. Run the program.
|
||||
4. Check the output result file.
|
||||
|
||||
##### Current behavior
|
||||
|
||||
`BeginNS` is lower than `DispatchNS`, which is incorrect.
|
||||
|
||||
##### Expected behavior
|
||||
|
||||
The correct order is:
|
||||
|
||||
`Dispatch < Start < End < Complete`
|
||||
|
||||
Users cannot use ROCProfiler to measure the time spent on each kernel because of
|
||||
the incorrect timestamp with counter collection enabled.
|
||||
|
||||
##### Recommended Workaround
|
||||
|
||||
Users are recommended to collect kernel execution timestamps without monitoring
|
||||
counters, as follows:
|
||||
|
||||
1. Enable timing using the `--timestamp on` flag, and run the application.
|
||||
2. Rerun the application using the `-i` option with the input filename that
|
||||
contains the name of the counter(s) to monitor, and save this to a different
|
||||
output file using the `-o` flag.
|
||||
3. Check the output result file from step 1.
|
||||
4. The order of timestamps correctly displays as:
|
||||
|
||||
`DispathNS < BeginNS < EndNS < CompleteNS`
|
||||
|
||||
1. Users can find the values of the collected counters in the output file
|
||||
generated in step 2.
|
||||
|
||||
### No Support for SMI and ROCDebugger on SRIOV
|
||||
|
||||
System Management Interface (SMI) and ROCDebugger are not supported in the SRIOV
|
||||
environment on any GPU, including the
|
||||
**Radeon Pro V620 and W6800 Workstation GPUs**. For more information, refer to
|
||||
the Systems Management Interface documentation.
|
||||
|
||||
## Deprecations and Warnings in This Release
|
||||
|
||||
### ROCm Libraries Changes – Deprecations and Deprecation Removal
|
||||
|
||||
- The `hipfft.h` header is now provided only by the `hipfft` package. Up to ROCm
|
||||
5.0, users would get `hipfft.h` in the rocfft package too.
|
||||
- The GlobalPairwiseAMG class is now entirely removed, users should use the
|
||||
PairwiseAMG class instead.
|
||||
- The `rocsparse_spmm` signature in 5.0 was changed to match that of
|
||||
`rocsparse_spmm_ex`. In 5.0, `rocsparse_spmm_ex` is still present, but
|
||||
deprecated. Signature diff for `rocsparse_spmm`
|
||||
|
||||
#### `rocsparse_spmm` in 5.0
|
||||
|
||||
```c
|
||||
rocsparse_status rocsparse_spmm(rocsparse_handle handle,
|
||||
rocsparse_operation trans_A,
|
||||
rocsparse_operation trans_B,
|
||||
const void* alpha,
|
||||
const rocsparse_spmat_descr mat_A,
|
||||
const rocsparse_dnmat_descr mat_B,
|
||||
const void* beta,
|
||||
const rocsparse_dnmat_descr mat_C,
|
||||
rocsparse_datatype compute_type,
|
||||
rocsparse_spmm_alg alg,
|
||||
rocsparse_spmm_stage stage,
|
||||
size_t* buffer_size,
|
||||
void* temp_buffer);
|
||||
```
|
||||
|
||||
### `rocsparse_spmm` in 4.0
|
||||
|
||||
```c
|
||||
rocsparse_status rocsparse_spmm(rocsparse_handle handle,
|
||||
rocsparse_operation trans_A,
|
||||
rocsparse_operation trans_B,
|
||||
const void* alpha,
|
||||
const rocsparse_spmat_descr mat_A,
|
||||
const rocsparse_dnmat_descr mat_B,
|
||||
const void* beta,
|
||||
const rocsparse_dnmat_descr mat_C,
|
||||
rocsparse_datatype compute_type,
|
||||
rocsparse_spmm_alg alg,
|
||||
size_t* buffer_size,
|
||||
void* temp_buffer);
|
||||
```
|
||||
|
||||
### HIP API Deprecations and Warnings
|
||||
|
||||
#### Warning - Arithmetic Operators of HIP Complex and Vector Types
|
||||
|
||||
In this release, arithmetic operators of HIP complex and vector types are
|
||||
deprecated.
|
||||
|
||||
- As alternatives to arithmetic operators of HIP complex types, users can use
|
||||
arithmetic operators of `std::complex` types.
|
||||
- As alternatives to arithmetic operators of HIP vector types, users can use the
|
||||
operators of the native clang vector type associated with the data member of
|
||||
HIP vector types.
|
||||
|
||||
During the deprecation, two macros `__HIP_ENABLE_COMPLEX_OPERATORS` and
|
||||
`__HIP_ENABLE_VECTOR_OPERATORS` are provided to allow users to conditionally
|
||||
enable arithmetic operators of HIP complex or vector types.
|
||||
|
||||
Note, the two macros are mutually exclusive and, by default, set to off.
|
||||
|
||||
The arithmetic operators of HIP complex and vector types will be removed in a
|
||||
future release.
|
||||
|
||||
Refer to the HIP API Guide for more information.
|
||||
|
||||
#### HIPCC/HIPCONFIG Refactoring
|
||||
|
||||
In prior ROCm releases, by default, the `hipcc`/`hipconfig` Perl scripts were
|
||||
used to identify and set target compiler options, target platform, compiler, and
|
||||
runtime appropriately.
|
||||
|
||||
In ROCm v5.0, `hipcc.bin` and `hipconfig.bin` have been added as the compiled
|
||||
binary implementations of the `hipcc` and `hipconfig`. These new binaries are
|
||||
currently a work-in-progress, considered, and marked as experimental. ROCm plans
|
||||
to fully transition to `hipcc.bin` and `hipconfig.bin` in the a future ROCm
|
||||
release. The existing `hipcc` and `hipconfig` Perl scripts are renamed to
|
||||
`hipcc.pl` and `hipconfig.pl` respectively. New top-level `hipcc` and
|
||||
`hipconfig` Perl scripts are created, which can switch between the Perl script
|
||||
or the compiled binary based on the environment variable
|
||||
`HIPCC_USE_PERL_SCRIPT`.
|
||||
|
||||
In ROCm 5.0, by default, this environment variable is set to use `hipcc` and
|
||||
`hipconfig` through the Perl scripts.
|
||||
|
||||
Subsequently, Perl scripts will no longer be available in ROCm in a future
|
||||
release.
|
||||
|
||||
### Warning - Compiler-Generated Code Object Version 4 Deprecation
|
||||
|
||||
Support for loading compiler-generated code object version 4 will be deprecated
|
||||
in a future release with no release announcement and replaced with code object 5
|
||||
as the default version.
|
||||
|
||||
The current default is code object version 4.
|
||||
|
||||
### Warning - MIOpenTensile Deprecation
|
||||
|
||||
MIOpenTensile will be deprecated in a future release.
|
||||
|
||||
## Archived Documentation
|
||||
|
||||
Older rocm documentation is archived at <https://rocmdocs.amd.com>.
|
||||
|
||||
## Disclaimer
|
||||
|
||||
The information presented in this document is for informational purposes only
|
||||
and may contain technical inaccuracies, omissions, and typographical errors.
|
||||
The information contained herein is subject to change and may be rendered
|
||||
inaccurate for many reasons, including but not limited to product and roadmap
|
||||
changes, component and motherboard versionchanges, new model and/or product
|
||||
releases, product differences between differing manufacturers, software changes,
|
||||
BIOS flashes, firmware upgrades, or the like. Any computer system has risks of
|
||||
security vulnerabilities that cannot be completely prevented or mitigated.
|
||||
AMD assumes no obligation to update or otherwise correct or revise this
|
||||
information. However, AMD reserves the right to revise this information and to
|
||||
make changes from time to time to the content hereof without obligation of AMD
|
||||
to notify any person of such revisions or changes. THIS INFORMATION IS PROVIDED
|
||||
"AS IS." AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS
|
||||
HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS
|
||||
THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED
|
||||
WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR
|
||||
PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT,
|
||||
INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY
|
||||
INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGES.AMD, the AMD Arrow logo, and combinations thereof
|
||||
are trademarks of Advanced Micro Devices, Inc. Other product names used in this
|
||||
publication are for identification purposes only and may be trademarks of their
|
||||
respective companies. ©[2021]Advanced Micro Devices, Inc.All rights reserved.
|
||||
|
||||
### Third-party Disclaimer
|
||||
|
||||
Third-party content is licensed to you directly by the third party that owns the
|
||||
content and is not licensed to you by AMD. ALL LINKED THIRD-PARTY CONTENT IS
|
||||
PROVIDED “AS IS” WITHOUT A WARRANTY OF ANY KIND. USE OF SUCH THIRD-PARTY CONTENT
|
||||
IS DONE AT YOUR SOLE DISCRETION AND UNDER NO CIRCUMSTANCES WILL AMD BE LIABLE TO
|
||||
YOU FOR ANY THIRD-PARTY CONTENT. YOU ASSUME ALL RISK AND ARE SOLELY RESPONSIBLE
|
||||
FOR ANY DAMAGES THAT MAY ARISE FROM YOUR USE OF THIRD-PARTY CONTENT.
|
||||
8
docs/sphinx/README.md
Normal file
@@ -0,0 +1,8 @@
|
||||
# How to build documentation via Sphinx
|
||||
|
||||
```bash
|
||||
pip3 install -r requirements.txt
|
||||
|
||||
python -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
|
||||
```
|
||||
|
||||
1
docs/sphinx/RELEASE.md
Normal file
@@ -0,0 +1 @@
|
||||
# Release Notes
|
||||
1
docs/sphinx/_images/amd-header-logo.svg
Normal file
@@ -0,0 +1 @@
|
||||
<svg id="Layer_1" data-name="Layer 1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 139.72 33.32"><defs><style>.cls-1{fill:#fff;}</style></defs><title>AMD-logo-white-v2</title><path class="cls-1" d="M33,31.14H25.21l-2.37-5.72H9.92L7.76,31.14H.14L11.78,2.26h8.34Zm-16.89-22L11.83,20.39h8.89Z" transform="translate(-0.14 -0.03)"/><path class="cls-1" d="M61.1,2.26h6.27V31.14h-7.2v-18l-7.79,9.06h-1.1L43.49,13.1v18h-7.2V2.26h6.27L51.83,13Z" transform="translate(-0.14 -0.03)"/><path class="cls-1" d="M85.61,2.26c10.54,0,16,6.56,16,14.48,0,8.3-5.25,14.4-16.77,14.4H72.86V2.26ZM80.06,25.85h4.7c7.24,0,9.4-4.91,9.4-9.15,0-5-2.67-9.15-9.48-9.15H80.06Z" transform="translate(-0.14 -0.03)"/><polygon class="cls-1" points="130.64 9.08 115.75 9.08 106.68 0 139.72 0 139.72 33.05 130.64 23.97 130.64 9.08"/><polygon class="cls-1" points="115.74 23.98 115.74 10.9 106.4 20.24 106.4 33.33 119.48 33.33 128.82 23.98 115.74 23.98"/></svg>
|
||||
|
After Width: | Height: | Size: 924 B |
9
docs/sphinx/_images/rdc-watermark.svg
Normal file
@@ -0,0 +1,9 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" version="1.0" preserveAspectRatio="xMidYMid meet" viewBox="1.16 -0.07 462.14 198.07">
|
||||
|
||||
<g transform="translate(0.000000,480.000000) scale(0.100000,-0.100000)" fill="#000000" stroke="none">
|
||||
<path d="M15 4788 c-3 -7 -4 -452 -3 -988 l3 -975 379 -3 c505 -3 622 11 710 88 77 68 105 188 106 456 0 273 -19 354 -106 433 -36 33 -74 56 -119 72 l-65 23 60 22 c161 58 198 133 188 387 -9 216 -55 324 -176 408 -99 69 -251 89 -683 89 -223 0 -291 -3 -294 -12z m645 -363 c24 -24 25 -31 28 -159 5 -206 -9 -236 -114 -236 l-44 0 0 216 0 217 52 -6 c37 -5 59 -14 78 -32z m-12 -726 c12 -5 27 -20 32 -34 13 -34 13 -406 0 -439 -12 -33 -45 -53 -102 -61 l-48 -7 0 281 0 282 48 -7 c26 -4 57 -11 70 -15z"/>
|
||||
<path d="M1395 4788 c-3 -7 -4 -452 -3 -988 l3 -975 445 0 445 0 3 198 2 197 -190 0 -190 0 0 220 0 220 160 0 160 0 0 179 c0 154 -2 180 -16 185 -9 3 -81 6 -160 6 l-144 0 0 185 0 185 170 0 170 0 0 200 0 200 -425 0 c-331 0 -427 -3 -430 -12z"/>
|
||||
<path d="M2317 4793 c-4 -3 -7 -93 -7 -199 l0 -193 148 -3 147 -3 5 -785 5 -785 258 -3 257 -2 0 790 0 789 153 3 152 3 3 175 c1 96 0 185 -3 198 l-5 22 -554 0 c-304 0 -556 -3 -559 -7z"/>
|
||||
<path d="M3595 4788 c-2 -7 -9 -51 -15 -98 -6 -47 -17 -134 -25 -195 -8 -60 -21 -162 -29 -225 -14 -108 -23 -172 -61 -460 -9 -63 -22 -164 -30 -225 -8 -60 -30 -227 -49 -370 -19 -143 -37 -290 -41 -328 l-7 -67 263 2 264 3 13 145 c7 80 15 160 18 178 l5 32 84 0 c56 0 86 -4 92 -12 4 -7 12 -87 18 -178 l10 -165 264 -3 264 -2 -6 42 c-4 24 -20 135 -37 248 -29 192 -78 522 -100 670 -5 36 -23 157 -40 270 -17 113 -42 279 -55 370 -48 322 -54 361 -60 370 -10 16 -734 13 -740 -2z m411 -630 c7 -90 20 -235 29 -323 9 -88 18 -193 22 -233 l6 -73 -84 3 -84 3 3 60 c2 56 33 337 52 485 27 210 33 250 37 246 3 -3 11 -78 19 -168z"/>
|
||||
</g>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 1.7 KiB |
BIN
docs/sphinx/_images/rocm-on.png
Normal file
|
After Width: | Height: | Size: 10 KiB |
22
docs/sphinx/_static/code_word_breaks.js
Normal file
@@ -0,0 +1,22 @@
|
||||
$(document).ready(() => {
|
||||
const copy = async(event) => {
|
||||
return await navigator.clipboard.writeText($(event.target).attr('copydata'));
|
||||
}
|
||||
|
||||
$('.table td code').each( function () {
|
||||
var text = $(this).text()
|
||||
$(this).addClass('hovertext')
|
||||
$(this).attr('copydata', text)
|
||||
$(this).attr('data-hover', "Click to copy.")
|
||||
var new_text = text.replaceAll(/_([^\u200B])/g, '_\u200B$1').replaceAll(/([a-z])([A-Z])/g, '$1\u200B$2')
|
||||
$(this).text(new_text)
|
||||
$(this).click((event) => {
|
||||
copy(event)
|
||||
$(event.target).attr('data-hover', "Copied!")
|
||||
$(event.target).on("mouseleave", () => {
|
||||
$(event.target).attr('data-hover', "Click to copy.")
|
||||
$(event.target).off("mouseleave")
|
||||
})
|
||||
})
|
||||
})
|
||||
})
|
||||
72
docs/sphinx/_static/custom.css
Normal file
@@ -0,0 +1,72 @@
|
||||
@import url("theme.css");
|
||||
|
||||
:root {
|
||||
--pst-font-size-base: 11px;
|
||||
}
|
||||
|
||||
div#site-navigation {
|
||||
height: fit-content;
|
||||
min-height: calc(100vh - 190px);
|
||||
}
|
||||
|
||||
div.content-container {
|
||||
overflow-y: clip;
|
||||
}
|
||||
|
||||
.hovertext {
|
||||
position: relative;
|
||||
/* border-bottom: 1px dotted black; */
|
||||
}
|
||||
|
||||
.hovertext:before {
|
||||
content: attr(data-hover);
|
||||
visibility: hidden;
|
||||
opacity: 0;
|
||||
width: 140px;
|
||||
background-color: black;
|
||||
color: #fff;
|
||||
text-align: center;
|
||||
border-radius: 5px;
|
||||
padding: 5px 0;
|
||||
transition: opacity 0.5s ease-in-out;
|
||||
|
||||
position: absolute;
|
||||
z-index: 1;
|
||||
left: 0;
|
||||
top: 110%;
|
||||
}
|
||||
|
||||
.hovertext:hover:before {
|
||||
opacity: 1;
|
||||
visibility: visible;
|
||||
}
|
||||
|
||||
div#rdc-watermark-container {
|
||||
pointer-events: none;
|
||||
position: fixed;
|
||||
height: 100vh;
|
||||
width: 100vw;
|
||||
top: 0;
|
||||
left: 0;
|
||||
z-index: 10000;
|
||||
}
|
||||
|
||||
img#rdc-watermark {
|
||||
pointer-events: none;
|
||||
position: absolute;
|
||||
top: 50%;
|
||||
left: 50%;
|
||||
transform-origin: center;
|
||||
transform: translate(-50%, -50%) rotate(-45deg);
|
||||
opacity: 10%;
|
||||
z-index: 10000;
|
||||
max-width: 100%;
|
||||
max-height: calc(100% - 200px);
|
||||
object-fit: contain;
|
||||
width: 45%;
|
||||
}
|
||||
|
||||
ul.bd-breadcrumbs {
|
||||
margin-bottom: 0;
|
||||
margin-top: 1px;
|
||||
}
|
||||
58
docs/sphinx/_static/rocm_footer.css
Normal file
@@ -0,0 +1,58 @@
|
||||
.rocm-footer {
|
||||
background-color: black;
|
||||
color: white;
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
border-top: 1px solid hsla(216,3%,63%,.5);
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
text-align: center;
|
||||
width: 100%;
|
||||
padding-top: 5px;
|
||||
line-height: 20px;
|
||||
height: 120px;
|
||||
}
|
||||
|
||||
.rocm-footer a, .rocm-footer p {
|
||||
color: white;
|
||||
}
|
||||
|
||||
.rocm-footer>ul {
|
||||
border-bottom: 1px solid hsla(216,3%,63%,.5);
|
||||
justify-content: flex-end;
|
||||
margin-top:15px;
|
||||
}
|
||||
|
||||
.rocm-footer ul {
|
||||
display: flex;
|
||||
flex-direction: row;
|
||||
flex-wrap: wrap;
|
||||
font-size: 12px;
|
||||
padding: 0;
|
||||
padding-bottom: 12px;
|
||||
width:98vw;
|
||||
list-style: none inside none;
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
.rocm-footer div {
|
||||
width: 98vw;
|
||||
}
|
||||
|
||||
.rocm-footer div {
|
||||
text-align: start;
|
||||
}
|
||||
|
||||
.rocm-footer a:hover {
|
||||
color: #e9ecef;
|
||||
text-decoration: none;
|
||||
}
|
||||
|
||||
.rocm-footer ul li {
|
||||
margin-right: 5px;
|
||||
}
|
||||
|
||||
.rocm-footer ul li+li {
|
||||
margin-left: 10px;
|
||||
padding-left: 8px;
|
||||
}
|
||||
108
docs/sphinx/_static/rocm_header.css
Normal file
@@ -0,0 +1,108 @@
|
||||
.rocm-header {
|
||||
background-color: black;
|
||||
position: -webkit-sticky; /* Safari */
|
||||
position: sticky;
|
||||
top: 0;
|
||||
width: 100%;
|
||||
min-height: 50px;
|
||||
overflow: hidden;
|
||||
font-family: 'Noto Sans', sans-serif;
|
||||
font-size: 16px;
|
||||
text-align: left;
|
||||
height:70px;
|
||||
}
|
||||
|
||||
.rocm-header a {
|
||||
color: white;
|
||||
text-decoration: none;
|
||||
}
|
||||
|
||||
.rocm-header-link p {
|
||||
margin-top: 1em;
|
||||
margin-bottom: 1em;
|
||||
}
|
||||
|
||||
.rocm-header img#amd-logo{
|
||||
margin: 1.5em;
|
||||
width: 8.25rem;
|
||||
}
|
||||
|
||||
.rocm-header img#rocm-logo{
|
||||
margin: 0;
|
||||
max-height: 100%;
|
||||
}
|
||||
|
||||
.rocm-header-buttons {
|
||||
display: inline-block;
|
||||
height: fit-content;
|
||||
max-width: 100%;
|
||||
width: fit-content;
|
||||
vertical-align: middle;
|
||||
}
|
||||
|
||||
.rocm-header-link:first-child {
|
||||
margin-left: 4em;
|
||||
}
|
||||
|
||||
.rocm-header-link {
|
||||
position: relative;
|
||||
display: inline-block;
|
||||
height: fit-content;
|
||||
text-align: center;
|
||||
vertical-align: middle;
|
||||
}
|
||||
|
||||
.rocm-header-link.rocm-header-last {
|
||||
position: absolute;
|
||||
right: 4em;
|
||||
top: 50%;
|
||||
transform: translate(0, -50%);
|
||||
height: 100%;
|
||||
}
|
||||
|
||||
.rocm-header-link .rocm-link-box, .rocm-header-link p {
|
||||
vertical-align: middle;
|
||||
color: white;
|
||||
}
|
||||
|
||||
.rocm-header-link .rocm-link box {
|
||||
font-size: x-large;
|
||||
}
|
||||
|
||||
.rocm-header-link p {
|
||||
font-size: 16px;
|
||||
}
|
||||
|
||||
.rocm-header-link img, .rocm-header-link .rocm-link-box {
|
||||
max-height: 50px;
|
||||
margin-left: 2em;
|
||||
margin-right: 2em;
|
||||
}
|
||||
|
||||
.rocm-header-link .glow-wrap{
|
||||
overflow: hidden;
|
||||
position: absolute;
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
top: 0;
|
||||
}
|
||||
|
||||
.rocm-header-link .glow{
|
||||
display: block;
|
||||
position:absolute;
|
||||
width: 20%;
|
||||
height: 100%;
|
||||
background: rgba(255,255,255,.2);
|
||||
top: 0;
|
||||
left: 0;
|
||||
transform-origin: right top;
|
||||
transform: translate(-100%, 0) skew(-45deg);
|
||||
filter: blur(2px);
|
||||
transition: all .5s cubic-bezier(0.645, 0.045, 0.355, 1);
|
||||
}
|
||||
|
||||
.rocm-header-link:hover .glow{
|
||||
transform-origin: left bottom;
|
||||
transform: translate(1000%, 0) skew(-45deg);
|
||||
transition: all .5s cubic-bezier(0.645, 0.045, 0.355, 1);
|
||||
}
|
||||
11
docs/sphinx/_templates/components/copyright.html
Normal file
@@ -0,0 +1,11 @@
|
||||
{% if show_copyright and copyright %}
|
||||
<div class="copyright">
|
||||
{% if hasdoc('copyright') %}
|
||||
{% trans path=pathto('copyright'), copyright=copyright|e %}© <a href="{{ path }}">Copyright</a> {{ copyright }}.{% endtrans %}
|
||||
<br/>
|
||||
{% else %}
|
||||
{% trans copyright=copyright|e %}© Copyright {{ copyright }}.{% endtrans %}
|
||||
<br/>
|
||||
{% endif %}
|
||||
</div>
|
||||
{% endif %}
|
||||
43
docs/sphinx/_templates/components/social-links.html
Normal file
@@ -0,0 +1,43 @@
|
||||
<!-- Copied from pydata-sphinx-theme -->
|
||||
|
||||
{%- macro icon_link_nav_item(url, icon, name, type, attributes='') -%}
|
||||
{%- if url | length > 2 %}
|
||||
<li class="nav-item">
|
||||
{%- set attributesDefault = { "href": url, "title": name, "class": "nav-link", "rel": "noopener", "target": "_blank", "data-bs-toggle": "tooltip", "data-bs-placement": "bottom"} %}
|
||||
{%- if attributes %}{% for key, val in attributes.items() %}
|
||||
{% set _ = attributesDefault.update(attributes) %}
|
||||
{% endfor %}{% endif -%}
|
||||
{% set attributeString = [] %}
|
||||
{% for key, val in attributesDefault.items() %}
|
||||
{%- set _ = attributeString.append('%s="%s"' % (key, val)) %}
|
||||
{% endfor %}
|
||||
{% set attributeString = attributeString | join(" ") -%}
|
||||
<a {{ attributeString }}>
|
||||
{%- if type == "fontawesome" -%}
|
||||
<span><i class="{{ icon }}"></i></span>
|
||||
<label class="sr-only">{{ _(name) }}</label>
|
||||
{%- elif type == "local" -%}
|
||||
<img src="{{ pathto(icon, 1) }}" class="icon-link-image" alt="{{ _(name) }}"/>
|
||||
{%- elif type == "url" -%}
|
||||
<img src="{{ icon }}" class="icon-link-image" alt="{{ _(name) }}"/>
|
||||
{%- else %}
|
||||
<span>Incorrectly configured icon link. Type must be `fontawesome`, `url` or `local`.</span>
|
||||
{%- endif -%}
|
||||
</a>
|
||||
</li>
|
||||
{%- endif -%}
|
||||
{%- endmacro -%}
|
||||
<ul id="navbar-icon-links"
|
||||
class="navbar-nav"
|
||||
aria-label="{{ _(theme_icon_links_label) }}">
|
||||
{%- block icon_link_shortcuts -%}
|
||||
{{ icon_link_nav_item("http://www.github.com/AMD", "fab fa-github", "GitHub", "fontawesome") -}}
|
||||
{{ icon_link_nav_item("http://www.facebook.com/amd", "fab fa-facebook-f", "Facebook", "fontawesome") -}}
|
||||
{{ icon_link_nav_item("http://www.twitter.com/amd", "fab fa-twitter", "Twitter", "fontawesome") -}}
|
||||
{{ icon_link_nav_item("http://www.instagram.com/amd", "fab fa-instagram", "Instagram", "fontawesome") -}}
|
||||
{{ icon_link_nav_item("http://www.linkedin.com/company/amd", "fab fa-linkedin", "LinkedIn", "fontawesome") -}}
|
||||
{{ icon_link_nav_item("https://www.amd.com/en/corporate/subscriptions", "fa fa-envelope", "Mail", "fontawesome") -}}
|
||||
{{ icon_link_nav_item("https://www.youtube.com/user/amd?sub_confirmation=1", "fab fa-youtube", "Youtube", "fontawesome") -}}
|
||||
{{ icon_link_nav_item("https://www.twitch.tv/amd", "fab fa-twitch", "Twitch", "fontawesome") -}}
|
||||
{% endblock icon_link_shortcuts -%}
|
||||
</ul>
|
||||
5
docs/sphinx/_templates/layout.html
Normal file
@@ -0,0 +1,5 @@
|
||||
{% extends "!layout.html" %}
|
||||
|
||||
{%- block footer %}
|
||||
{%- include "sections/footer.html" %}
|
||||
{%- endblock %}
|
||||
10
docs/sphinx/_templates/sections/footer-content.html
Normal file
@@ -0,0 +1,10 @@
|
||||
<p>
|
||||
{%- if last_updated %}
|
||||
{% trans prefix=translate('Last updated on'), last_updated=last_updated|e %}{{ prefix }} {{ last_updated }}.{% endtrans %}<br/>
|
||||
{%- endif %}
|
||||
{%- if theme_extra_footer %}
|
||||
<div class="extra_footer">
|
||||
{{ theme_extra_footer }}
|
||||
</div>
|
||||
{%- endif %}
|
||||
</p>
|
||||
20
docs/sphinx/_templates/sections/footer.html
Normal file
@@ -0,0 +1,20 @@
|
||||
<div class="rocm-footer">
|
||||
{%- include "components/social-links.html" with context -%}
|
||||
{% include 'components/copyright.html' %}
|
||||
<div class="rocm-footer-links">
|
||||
<ul>
|
||||
<li><a href="https://www.amd.com/en/corporate/copyright">Terms and Conditions (AMD)</a></li>
|
||||
<li><a href="#">Terms and Conditions (ROCm)</a></li>
|
||||
<li><a href="https://www.amd.com/en/corporate/privacy">Privacy</a></li>
|
||||
<li><a href="https://www.amd.com/en/corporate/cookies">Cookie Policy</a></li>
|
||||
<li><a href="https://www.amd.com/en/corporate/trademarks">Trademarks</a></li>
|
||||
<li><a href="https://www.amd.com/system/files/documents/statement-human-trafficking-forced-labor.pdf">Statement on Forced Labor</a></li>
|
||||
<li><a href="https://www.amd.com/en/corporate/competition">Fair and Open Competition</a></li>
|
||||
<li><a href="https://www.amd.com/system/files/documents/amd-uk-tax-strategy.pdf">UK Tax Strategy</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div id="rdc-watermark-container">
|
||||
<img id="rdc-watermark" src="{{ pathto('rdc-watermark.svg',1) }}" alt="DRAFT watermark"/>
|
||||
</div>
|
||||
48
docs/sphinx/_templates/sections/header.html
Normal file
@@ -0,0 +1,48 @@
|
||||
<div class="rocm-header">
|
||||
<div class="rocm-header-buttons">
|
||||
<a href="https://www.amd.com" class="rocm-header-link">
|
||||
<img id="amd-logo" alt="Advanced Micro Devices, Inc." src="{{ pathto('amd-header-logo.svg',1) }}"></img>
|
||||
<div class="glow-wrap">
|
||||
<i class="glow"></i>
|
||||
</div>
|
||||
</a>
|
||||
<a href="{{ theme_repository_url }}" class="rocm-header-link">
|
||||
<div class="rocm-link-box">
|
||||
<p>GitHub</p>
|
||||
</div>
|
||||
<div class="glow-wrap">
|
||||
<i class="glow"></i>
|
||||
</div>
|
||||
</a>
|
||||
<a href="https://github.com/RadeonOpenCompute/ROCm/discussions" class="rocm-header-link">
|
||||
<div class="rocm-link-box">
|
||||
<p>Community</p>
|
||||
</div>
|
||||
<div class="glow-wrap">
|
||||
<i class="glow"></i>
|
||||
</div>
|
||||
</a>
|
||||
<a href="https://github.com/RadeonOpenCompute/ROCm/issues/new" class="rocm-header-link">
|
||||
<div class="rocm-link-box">
|
||||
<p>Support</p>
|
||||
</div>
|
||||
<div class="glow-wrap">
|
||||
<i class="glow"></i>
|
||||
</div>
|
||||
</a>
|
||||
<a href="https://www.amd.com/en/technologies/infinity-hub" class="rocm-header-link">
|
||||
<div class="rocm-link-box">
|
||||
<p>Infinity Hub</p>
|
||||
</div>
|
||||
<div class="glow-wrap">
|
||||
<i class="glow"></i>
|
||||
</div>
|
||||
</a>
|
||||
<a href="https://rocm.amd.com" class="rocm-header-link rocm-header-last" id="rocm-link">
|
||||
<img id="rocm-logo" alt="ROCm logo" src="{{ pathto('rocm-on.png',1) }}"></img>
|
||||
<div class="glow-wrap">
|
||||
<i class="glow"></i>
|
||||
</div>
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
165
docs/sphinx/_toc.yml
Normal file
@@ -0,0 +1,165 @@
|
||||
defaults:
|
||||
numbered: False
|
||||
maxdepth: 6
|
||||
root: index
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: release
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: release/gpu_os_support
|
||||
- file: release/licensing
|
||||
- url: https://github.com/RadeonOpenCompute/ROCm/labels/Verified%20Issue
|
||||
title: Known Issues
|
||||
- file: release/compatibility
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: reference/framework_compatiblity/framework_compatiblity
|
||||
- file: reference/kernel_userspace_compatibility/kernel_userspace_comp
|
||||
|
||||
- entries:
|
||||
- file: deploy
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: quick_start
|
||||
- file: hip_sdk_install_win/hip_sdk_install_win
|
||||
- file: deploy/docker
|
||||
- file: deploy/install
|
||||
- file: deploy/multi
|
||||
- file: deploy/spack
|
||||
- file: deploy/build_source
|
||||
|
||||
|
||||
- caption: APIs and Reference
|
||||
entries:
|
||||
- file: reference/hip
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: HIP Runtime API
|
||||
url: https://advanced-micro-devices-hip-saad.readthedocs-hosted.com/en/wip-sphinx/
|
||||
- title: HIPify - Port Your Code
|
||||
url: https://advanced-micro-devices-demo--737.com.readthedocs.build/projects/HIPIFY/en/737/
|
||||
- file: reference/gpu_libraries/math
|
||||
title: Math Libraries
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: reference/gpu_libraries/blas
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: rocBLAS
|
||||
url: https://rocmdocs.amd.com/projects/rocBLAS/en/master/
|
||||
- title: hipBLAS
|
||||
url: https://rocmdocs.amd.com/projects/hipBLAS/en/master/
|
||||
- title: rocWMMA
|
||||
url: https://rocmdocs.amd.com/projects/rocWMMA/en/master/
|
||||
- file: reference/gpu_libraries/fft
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: rocFFT
|
||||
url: https://rocmdocs.amd.com/projects/rocFFT/en/master/
|
||||
- title: hipFFT
|
||||
url: https://rocmdocs.amd.com/projects/hipFFT/en/master/
|
||||
- file: reference/gpu_libraries/rand
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: rocRAND
|
||||
url: https://rocmdocs.amd.com/projects/rocRAND/en/master/
|
||||
- title: hipRAND
|
||||
url: https://rocmdocs.amd.com/projects/hipRAND/en/master/
|
||||
- file: reference/gpu_libraries/solver
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: rocSOLVER
|
||||
url: https://rocmdocs.amd.com/projects/rocSOLVER/en/master/
|
||||
- title: hipSOLVER
|
||||
url: https://rocmdocs.amd.com/projects/hipSOLVER/en/master/
|
||||
- file: reference/gpu_libraries/sparse
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: rocSPARSE
|
||||
url: https://rocmdocs.amd.com/projects/rocSPARSE/en/master/
|
||||
- title: hipSPARSE
|
||||
url: https://rocmdocs.amd.com/projects/hipSPARSE/en/master/
|
||||
- file: reference/gpu_libraries/c++_primitives
|
||||
title: C++ Primitives
|
||||
subtrees:
|
||||
- entries:
|
||||
- url: https://rocmdocs.amd.com/projects/rocPRIM/en/master/
|
||||
title: rocPRIM
|
||||
- entries:
|
||||
- url: https://rocmdocs.amd.com/projects/hipCUB/en/master/
|
||||
title: hipCUB
|
||||
- entries:
|
||||
- url: https://rocmdocs.amd.com/projects/rocThrust/en/master/
|
||||
title: rocThrust
|
||||
- file: reference/gpu_libraries/communication
|
||||
title: Communication Libraries
|
||||
subtrees:
|
||||
- entries:
|
||||
- url: https://rocmdocs.amd.com/projects/RCCL/en/master/
|
||||
title: RCCL
|
||||
- url: https://rocmsoftwareplatform.github.io/MIOpen/doc/html/releasenotes.html
|
||||
title: MIOpen - Machine Intelligence
|
||||
- url: https://rocmsoftwareplatform.github.io/AMDMIGraphX/doc/html/
|
||||
title: MIGraphX- Graph Optimization
|
||||
- file: reference/computer_vision
|
||||
subtrees:
|
||||
- entries:
|
||||
- url: https://rocmdocs.amd.com/projects/MIVisionX/en/master/
|
||||
title: MIVisionX
|
||||
- entries:
|
||||
- url: https://rocmdocs.amd.com/projects/rocAL/en/master/
|
||||
title: rocAL
|
||||
- file: reference/openmp/openmp
|
||||
title: OpenMP
|
||||
- file: reference/compilers
|
||||
title: Compilers and Tools
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: reference/rocmcc/rocmcc
|
||||
title: ROCmCC
|
||||
- url: http://profiler
|
||||
title: ROCGDB
|
||||
- url: http://profiler
|
||||
title: rocprof
|
||||
- url: http://profiler
|
||||
title: roctracer
|
||||
- url: http://profiler
|
||||
title: ROCdbgapi
|
||||
- file: reference/management_tools
|
||||
title: Management Tools
|
||||
subtrees:
|
||||
- entries:
|
||||
- url: http://smi
|
||||
title: rocmsmi
|
||||
- file: reference/gpu_arch
|
||||
- caption: Understand ROCm
|
||||
entries:
|
||||
- title: Compiler Disambiguation
|
||||
file: understand/compiler_disabiguation
|
||||
- file: isv_deployment_win
|
||||
- file: understand/deep_learning/deep_learning
|
||||
- file: understand/cmake_packages
|
||||
|
||||
- caption: How to Guides
|
||||
entries:
|
||||
- file: how_to/docker_gpu_isolation
|
||||
- file: how_to/magma_install/magma_install
|
||||
- file: how_to/pytorch_install/pytorch_install
|
||||
- file: how_to/tensorflow_install/tensorflow_install
|
||||
- file: how_to/system_debugging
|
||||
|
||||
- caption: Examples
|
||||
entries:
|
||||
- title: rocm-examples
|
||||
url: https://github.com/
|
||||
- file: examples/ai_ml_inferencing
|
||||
title: AI/ML/Inferencing
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: examples/inception_casestudy/inception_casestudy
|
||||
- file: examples/inception_casestudy_migraphx/inception_casestudy_migraphx
|
||||
|
||||
- caption: About
|
||||
entries:
|
||||
- file: about
|
||||
170
docs/sphinx/_toc.yml.in
Normal file
@@ -0,0 +1,170 @@
|
||||
# Anywhere {branch} is used, the branch name will be substituted.
|
||||
# These comments will also be removed.
|
||||
defaults:
|
||||
numbered: False
|
||||
maxdepth: 6
|
||||
root: index
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: release
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: release/gpu_os_support
|
||||
- file: release/licensing
|
||||
- url: https://github.com/RadeonOpenCompute/ROCm/labels/Verified%20Issue
|
||||
title: Known Issues
|
||||
- file: release/compatibility
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: reference/framework_compatiblity/framework_compatiblity
|
||||
- file: reference/kernel_userspace_compatibility/kernel_userspace_comp
|
||||
|
||||
- entries:
|
||||
- file: deploy
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: quick_start
|
||||
- file: hip_sdk_install_win/hip_sdk_install_win
|
||||
- file: deploy/docker
|
||||
- file: deploy/install
|
||||
- file: deploy/multi
|
||||
- file: deploy/spack
|
||||
- file: deploy/build_source
|
||||
|
||||
|
||||
- caption: APIs and Reference
|
||||
entries:
|
||||
- file: reference/hip
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: HIP Runtime API
|
||||
url: https://advanced-micro-devices-hip-saad.readthedocs-hosted.com/en/wip-sphinx/
|
||||
- title: HIPify - Port Your Code
|
||||
url: https://advanced-micro-devices-demo--737.com.readthedocs.build/projects/HIPIFY/en/737/
|
||||
- file: reference/gpu_libraries/math
|
||||
title: Math Libraries
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: reference/gpu_libraries/blas
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: rocBLAS
|
||||
url: https://rocmdocs.amd.com/projects/rocBLAS/en/{branch}/
|
||||
- title: hipBLAS
|
||||
url: https://rocmdocs.amd.com/projects/hipBLAS/en/{branch}/
|
||||
- title: rocWMMA
|
||||
url: https://rocmdocs.amd.com/projects/rocWMMA/en/{branch}/
|
||||
- file: reference/gpu_libraries/fft
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: rocFFT
|
||||
url: https://rocmdocs.amd.com/projects/rocFFT/en/{branch}/
|
||||
- title: hipFFT
|
||||
url: https://rocmdocs.amd.com/projects/hipFFT/en/{branch}/
|
||||
- file: reference/gpu_libraries/rand
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: rocRAND
|
||||
url: https://rocmdocs.amd.com/projects/rocRAND/en/{branch}/
|
||||
- title: hipRAND
|
||||
url: https://rocmdocs.amd.com/projects/hipRAND/en/{branch}/
|
||||
- file: reference/gpu_libraries/solver
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: rocSOLVER
|
||||
url: https://rocmdocs.amd.com/projects/rocSOLVER/en/{branch}/
|
||||
- title: hipSOLVER
|
||||
url: https://rocmdocs.amd.com/projects/hipSOLVER/en/{branch}/
|
||||
- file: reference/gpu_libraries/sparse
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: rocSPARSE
|
||||
url: https://rocmdocs.amd.com/projects/rocSPARSE/en/{branch}/
|
||||
- title: hipSPARSE
|
||||
url: https://rocmdocs.amd.com/projects/hipSPARSE/en/{branch}/
|
||||
- file: reference/gpu_libraries/c++_primitives
|
||||
title: C++ Primitives
|
||||
subtrees:
|
||||
- entries:
|
||||
- url: https://rocmdocs.amd.com/projects/rocPRIM/en/{branch}/
|
||||
title: rocPRIM
|
||||
- entries:
|
||||
- url: https://rocmdocs.amd.com/projects/hipCUB/en/{branch}/
|
||||
title: hipCUB
|
||||
- entries:
|
||||
- url: https://rocmdocs.amd.com/projects/rocThrust/en/{branch}/
|
||||
title: rocThrust
|
||||
- file: reference/gpu_libraries/communication
|
||||
title: Communication Libraries
|
||||
subtrees:
|
||||
- entries:
|
||||
- url: https://rocmdocs.amd.com/projects/RCCL/en/{branch}/
|
||||
title: RCCL
|
||||
- url: https://rocmsoftwareplatform.github.io/MIOpen/doc/html/releasenotes.html
|
||||
title: MIOpen - Machine Intelligence
|
||||
- url: https://rocmsoftwareplatform.github.io/AMDMIGraphX/doc/html/
|
||||
title: MIGraphX- Graph Optimization
|
||||
- file: reference/computer_vision
|
||||
subtrees:
|
||||
- entries:
|
||||
- url: https://rocmdocs.amd.com/projects/MIVisionX/en/{branch}/
|
||||
title: MIVisionX
|
||||
- entries:
|
||||
- url: https://rocmdocs.amd.com/projects/rocAL/en/{branch}/
|
||||
title: rocAL
|
||||
- file: reference/openmp/openmp
|
||||
title: OpenMP
|
||||
- file: reference/compilers
|
||||
title: Compilers and Tools
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: reference/rocmcc/rocmcc
|
||||
title: ROCmCC
|
||||
- url: http://profiler
|
||||
title: ROCGDB
|
||||
- url: http://profiler
|
||||
title: rocprof
|
||||
- url: http://profiler
|
||||
title: roctracer
|
||||
- url: http://profiler
|
||||
title: ROCdbgapi
|
||||
- file: reference/management_tools
|
||||
title: Management Tools
|
||||
subtrees:
|
||||
- entries:
|
||||
- url: http://smi
|
||||
title: rocmsmi
|
||||
- file: reference/gpu_arch
|
||||
- caption: Understand ROCm
|
||||
entries:
|
||||
- title: Compiler Disambiguation
|
||||
file: understand/compiler_disabiguation
|
||||
- file: isv_deployment_win
|
||||
- file: understand/deep_learning/deep_learning
|
||||
- file: understand/cmake_packages
|
||||
|
||||
- caption: How to Guides
|
||||
entries:
|
||||
- file: how_to/docker_gpu_isolation
|
||||
- file: how_to/deep_learning_rocm
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: how_to/magma_install/magma_install
|
||||
- file: how_to/pytorch_install/pytorch_install
|
||||
- file: how_to/tensorflow_install/tensorflow_install
|
||||
- file: how_to/system_debugging
|
||||
|
||||
- caption: Examples
|
||||
entries:
|
||||
- title: rocm-examples
|
||||
url: https://github.com/
|
||||
- file: examples/ai_ml_inferencing
|
||||
title: AI/ML/Inferencing
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: examples/inception_casestudy/inception_casestudy
|
||||
- file: examples/inception_casestudy_migraphx/inception_casestudy_migraphx
|
||||
|
||||
- caption: About
|
||||
entries:
|
||||
- file: about
|
||||
71
docs/sphinx/about.md
Normal file
@@ -0,0 +1,71 @@
|
||||
# About ROCm Documentation
|
||||
|
||||
ROCm documentation is made available under open source [licenses](licensing.md).
|
||||
Documentation is built using open source toolchains. Contributions to our
|
||||
documentation is encouraged and welcome. As a contributor, please familiarize
|
||||
yourself with our documentation toolchain.
|
||||
|
||||
## ReadTheDocs
|
||||
|
||||
[ReadTheDocs](https://docs.readthedocs.io/en/stable/) is our frontend for the
|
||||
our documentation. By frontend, this is the tool that serves our HTML based
|
||||
documentation to our end users. We are using a paid ReadTheDocs plan. Many
|
||||
projects were using the free readthedocs plan. All projects should transition to
|
||||
the paid readthedocs site as this is add free. The paid site has additional
|
||||
functionality including longer build times, better user monitoring and the
|
||||
[rocmdoc.amd.com](https://rocmdoc.amd.com) URL. Please contact the documentation
|
||||
team or devops for readthedocs access.
|
||||
|
||||
## Doxygen
|
||||
|
||||
[Doxygen](https://www.doxygen.nl/) is the most common inline code documentation
|
||||
standard. ROCm projects are use Doxygen for public API documentation (unless the
|
||||
upstream project is using a different tool).
|
||||
|
||||
## Sphinx
|
||||
|
||||
[Sphinx](https://www.sphinx-doc.org/en/master/) is a documentation generator
|
||||
originally used for python. It is now widely used in the Open Source community.
|
||||
Originally, sphinx supported rst based documentation. Markdown support is now
|
||||
available. ROCm documentation plans to default to markdown for new projects.
|
||||
Existing projects using rst are under no obligation to convert to markdown. New projects
|
||||
that believe markdown is not suitable should contact the documentation team
|
||||
prior to selecting rst.
|
||||
|
||||
### Sphinx Theme
|
||||
|
||||
ROCm is using the
|
||||
[Sphinx Book Theme](https://sphinx-book-theme.readthedocs.io/en/latest/). This
|
||||
theme is used by Jupyter books. ROCm documentation applies some customization
|
||||
include a header and footer on top of the Sphinx Book Theme. A future custom
|
||||
ROCm theme will be part of our documentation goals.
|
||||
|
||||
### Sphinx Design
|
||||
|
||||
Sphinx Design is an extension for sphinx based websites that add design
|
||||
functionality. Please see the documentation
|
||||
[here](https://sphinx-design.readthedocs.io/en/latest/index.html). ROCm
|
||||
documentation uses sphinx design for grids, cards, and synchronized tabs.
|
||||
Other features may be used in the future.
|
||||
|
||||
### Sphinx External TOC
|
||||
|
||||
ROCm uses the
|
||||
[sphinx-external-toc](https://sphinx-external-toc.readthedocs.io/en/latest/intro.html)
|
||||
for our navigation. This tool allows a yml file based left navigation menu. This
|
||||
tool was selected due to its flexibility that allows scripts to operate on the
|
||||
yml file. Please transition to this file for the project's navigation. You can
|
||||
see the `_toc.yml.in` file in this repository in the docs/sphinx folder for an
|
||||
example.
|
||||
|
||||
### Breathe
|
||||
|
||||
Sphinx uses [Breathe](https://www.breathe-doc.org/) to integrate doxygen
|
||||
content.
|
||||
|
||||
## rocm-docs-core pip package
|
||||
|
||||
[rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) is an AMD
|
||||
maintained project that applies customization for our documentation. This
|
||||
project is the tool most ROCm repositories will use as part of the documentation
|
||||
build.
|
||||
0
docs/sphinx/all_deploy_options.md
Normal file
17
docs/sphinx/conf.py
Normal file
@@ -0,0 +1,17 @@
|
||||
# Configuration file for the Sphinx documentation builder.
|
||||
#
|
||||
# This file only contains a selection of the most common options. For a full
|
||||
# list see the documentation:
|
||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html
|
||||
|
||||
import shutil
|
||||
shutil.copy2('../../CHANGELOG.md','./')
|
||||
shutil.copy2('../../RELEASE.md','./')
|
||||
|
||||
from rocm_docs import ROCmDocs
|
||||
|
||||
docs_core = ROCmDocs("ROCm Documentation")
|
||||
docs_core.setup()
|
||||
|
||||
for sphinx_var in ROCmDocs.SPHINX_VARS:
|
||||
globals()[sphinx_var] = getattr(docs_core, sphinx_var)
|
||||
44
docs/sphinx/deploy.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# Deploy
|
||||
|
||||
Please follow the guides below to begin your ROCm journey. ROCm can be consumed
|
||||
via many mechanisms.
|
||||
:::::{grid} 1 1 3 3
|
||||
:gutter: 1
|
||||
|
||||
::::{grid-item-card}
|
||||
:padding: 2
|
||||
Quick Start
|
||||
^^^
|
||||
|
||||
- [Linux](quick_start)
|
||||
- [Windows](hip_sdk_install_win/hip_sdk_install_win)
|
||||
|
||||
::::
|
||||
|
||||
::::{grid-item-card}
|
||||
:padding: 2
|
||||
Docker
|
||||
^^^
|
||||
|
||||
- [Guide](deploy/docker)
|
||||
- [Dockerhub](https://hub.docker.com/u/rocm/#!)
|
||||
|
||||
::::
|
||||
|
||||
::::{grid-item-card}
|
||||
:padding: 2
|
||||
[Advanced](deploy/advanced)
|
||||
^^^
|
||||
|
||||
- [Uninstall](deploy/advanced/uninstall)
|
||||
- [Multi-ROCm Installations](deploy/advanced/multi)
|
||||
- [spack](deploy/advanced/spack)
|
||||
- [Build from Source](deploy/advanced/build_source)
|
||||
|
||||
::::
|
||||
|
||||
:::::
|
||||
|
||||
## Related Information
|
||||
|
||||
[Release Information](release)
|
||||
1
docs/sphinx/deploy/build_source.md
Normal file
@@ -0,0 +1 @@
|
||||
# Build from Source
|
||||
1
docs/sphinx/deploy/docker.md
Normal file
@@ -0,0 +1 @@
|
||||
# Docker
|
||||
3
docs/sphinx/deploy/install.md
Normal file
@@ -0,0 +1,3 @@
|
||||
# Basic Installation Guide
|
||||
|
||||
This guide explains the basic installation of ROCm. This is the recommended starting point for all users of ROCm.
|
||||
1
docs/sphinx/deploy/multi.md
Normal file
@@ -0,0 +1 @@
|
||||
# Multi-ROCm Installation
|
||||
1
docs/sphinx/deploy/spack.md
Normal file
@@ -0,0 +1 @@
|
||||
# spack
|
||||
1
docs/sphinx/examples/ai_ml_inferencing.md
Normal file
@@ -0,0 +1 @@
|
||||
# AI/ML/Inferencing
|
||||
@@ -0,0 +1,5 @@
|
||||
# Inception V3 with PyTorch
|
||||
|
||||
Pull content from
|
||||
<https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/Deep_Learning_Training.html>.
|
||||
Ignore training description.
|
||||
42
docs/sphinx/format_toc.py
Normal file
@@ -0,0 +1,42 @@
|
||||
import os
|
||||
from typing import Union
|
||||
from git import Repo, Remote, RemoteReference
|
||||
from pathlib import Path
|
||||
|
||||
def format_toc(repo_path: Union[str, os.PathLike, None] = None):
|
||||
pwd = Path(__file__).resolve().parent
|
||||
if repo_path is None:
|
||||
repo_path = pwd.parent
|
||||
at_start = True
|
||||
repo = Repo(repo_path, search_parent_directories=True)
|
||||
assert not repo.bare
|
||||
try:
|
||||
branch = repo.active_branch.name
|
||||
except TypeError as exc: # HEAD is detached commit
|
||||
checked_heads = []
|
||||
for head in repo.heads:
|
||||
checked_heads.append(head.name)
|
||||
if head.commit == repo.head.commit:
|
||||
branch = head.name
|
||||
break
|
||||
else: # loop fell through
|
||||
for remote in repo.remotes:
|
||||
remote: Remote
|
||||
for ref in remote.refs:
|
||||
ref: RemoteReference
|
||||
if ref.commit == repo.head.commit:
|
||||
branch = ref.name.split('/')[-1]
|
||||
break
|
||||
else: # loop fell through
|
||||
raise TypeError("A branch name could not be determined.\n(Checked heads: %s)" % ' '.join(checked_heads)) from exc
|
||||
with open(pwd / '_toc.yml.in', 'r', encoding='utf-8') as input:
|
||||
with open(pwd / '_toc.yml', 'w', encoding='utf-8') as output:
|
||||
for line in input.readlines():
|
||||
if line[0] == '#' and at_start:
|
||||
continue
|
||||
at_start = False
|
||||
output.write(line.format(branch=branch))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
format_toc()
|
||||
1
docs/sphinx/gpu_libraries.md
Normal file
@@ -0,0 +1 @@
|
||||
# GPU Libraries
|
||||
179
docs/sphinx/hip_sdk_install_win/hip_sdk_install_win.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# Quick Start (Windows)
|
||||
|
||||
The steps to install the HIP SDK for Windows are described in this document.
|
||||
|
||||
## System Requirements
|
||||
|
||||
The HIP SDK is supported on Windows 10 and 11. The HIP SDK may be installed on a
|
||||
system without AMD GPUs to use the build toolchains. To run HIP applications, a
|
||||
compatible GPU is required. Please see the supported GPU guide for more details.
|
||||
TODO: provide link to supported GPU guide.
|
||||
|
||||
## SDK Installation
|
||||
|
||||
Installation options are listed in Table 1.
|
||||
|
||||
| **Table 1. Components for Installation** | |
|
||||
|:------------------------:|:----------------:|:------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
|
||||
| **HIP Components** | **Install Type** | **Display Driver** | **Install Options** |
|
||||
| **HIP SDK Core** | **Full** | Adrenalin 22.40 | **Full:** Provides all AMD Software features and controls for gaming, recording, streaming, and tweaking your performance on your graphics hardware. |
|
||||
| **HIP Libraries** | **Full** | | **Minimal:** Provides only the basic controls for AMD Software features and does not include advanced features such as performance tweaking or recording and capturing content.|
|
||||
| **HIP Runtime Compiler** | **Full** | | **Driver Only:** Provides no user interface for AMD Software features. |
|
||||
| **Ray Tracing** | **Full** | | **Do Not Install** |
|
||||
| **BitCode Profiler** | **Full** | | |
|
||||
|
||||
TODO: describe each installation option.
|
||||
|
||||
## HIP SDK Installer
|
||||
|
||||
The AMD HIP SDK Installer manages the installation and uninstallation process of
|
||||
HIP SDK for Windows. This includes system configuration checks, installing
|
||||
components, and installing the display driver.
|
||||
|
||||
To launch the AMD HIP SDK Installer, click the **Setup** icon shown in Figure 1.
|
||||
The installer will begin to load and detect your system's configuration and
|
||||
compatibility, as shown in Figure 2. A completely loaded AMD HIP SDK Installer
|
||||
window will appear, as shown in Figure 3.
|
||||
|
||||
|  |
|
||||
|:------------------------------:|
|
||||
| **Figure 1. Setup Icon** |
|
||||
|
||||
|  |
|
||||
|:-------------------------------------------:|
|
||||
| **Figure 2. AMD HIP SDK Loading Window** |
|
||||
|
||||
|  |
|
||||
|:-----------------------------------------------:|
|
||||
| **Figure 3. AMD HIP SDK Installer Window** |
|
||||
|
||||
### Installation Selections
|
||||
|
||||
By default, all components are selected for installation. Refer to Figure 3 for
|
||||
an instance when the Select All option is turned on.
|
||||
|
||||
**Note** The Select All option only applies to the installation of HIP
|
||||
components. To install the AMD Display Driver, manually select the install type.
|
||||
|
||||
**Note** To customize the install location on your system, click
|
||||
**Additional Options** under HIP SDK Core and AMD Radeon Vega 10 Graphics. Refer
|
||||
to the sections [HIP Components](#hip-components) and
|
||||
[AMD Display Driver](#amd-display-driver) for more information on each
|
||||
installation.
|
||||
|
||||
To make installation selections and install, follow these steps:
|
||||
|
||||
1. Scroll the window to AMD Display Driver and select the desired install type.
|
||||
Refer to the section [AMD Display Driver](#amd-display-driver) for more
|
||||
information on installation types.
|
||||
2. Once selected, click **Install** located in the lower right corner, and skip
|
||||
to [Installing Components](#installing-components).
|
||||
|
||||
#### Deselect All
|
||||
|
||||
To select individual component installs onto your system click **Deselect All**
|
||||
in the upper right corner of the installer window, as seen in Figure 3. Figure 4
|
||||
demonstrates the installer window once the installation components are all
|
||||
deselected.
|
||||
|
||||
|  |
|
||||
|:--------------------------------------:|
|
||||
| **Figure 4. Deselect All Selection** |
|
||||
|
||||
#### HIP Components
|
||||
|
||||
By default, each HIP component will be checked off for full installation,
|
||||
Figures 4 through 8 demonstrate the options available to you when you click
|
||||
**Additional Options** under each component.
|
||||
|
||||
| **Table 2. Custom Selections for Installation** | |
|
||||
|:------------------------------------------------------------------|:---------------------------------------------------- |
|
||||
| **If:** | **Then:** |
|
||||
| You intend to make custom selections for this installation | Skip to the section _Deselect All_. |
|
||||
| You do not intend to make custom selections for this installation | Continue to the section _AMD Display Driver_. |
|
||||
|
||||
**Note** You can manually select installation locations for the HIP SDK Core, as
|
||||
shown in Figure 5.
|
||||
|
||||
|  |
|
||||
|:---------------------------------------:|
|
||||
| **Figure 5. HIP SDK Core Option** |
|
||||
|
||||
|  |
|
||||
|:-----------------------------------------:|
|
||||
| **Figure 6. HIP Libraries Option** |
|
||||
|
||||
|  |
|
||||
|:-------------------------------------------------------:|
|
||||
| **Figure 7. HIP Runtime Compiler Option** |
|
||||
|
||||
|  |
|
||||
|:---------------------------------------------:|
|
||||
| **Figure 8. HIP Ray Tracing** |
|
||||
|
||||
|  |
|
||||
|:-----------------------------------------------:|
|
||||
| **Figure 9. BitCode Profiler** |
|
||||
|
||||
#### AMD Display Driver
|
||||
|
||||
The AMD Display Driver offers three install types:
|
||||
|
||||
- Full Install
|
||||
- Minimal Install
|
||||
- Driver only
|
||||
|
||||
Table 3 describes the difference in each option shown in Figure 10.
|
||||
|
||||
**Note** You must perform a system restart for a complete installation of the
|
||||
Display Driver.
|
||||
|
||||
**Note** Unless you intend to factory reset your machine, leave the
|
||||
**Factory Reset (Optional)** box unchecked. A Factory Reset will remove all
|
||||
prior versions of AMD HIP SDK and drivers. You will not be able to roll back to
|
||||
previously installed drivers.
|
||||
|
||||
| **Table 3. Display Driver Install Options** | |
|
||||
|:-------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------:|
|
||||
| **Install Option** | **Description** |
|
||||
| **Full Install** | Provides all AMD Software features and controls for gaming, recording, streaming, and tweaking the performance on your graphics hardware. |
|
||||
| **Minimal Install** | Provides only the basic controls for AMD Software features and does not include advanced features such as performance tweaking or recording and capturing content. |
|
||||
| **Driver Only** | Provides no user interface for AMD Software features. |
|
||||
|
||||
|  |
|
||||
|:-----------------------------------------------:|
|
||||
| **Figure 10. AMD Display Driver Options** |
|
||||
|
||||
## Installing Components
|
||||
|
||||
Please wait for the installation to complete during as shown in Figure 11.
|
||||
|
||||
|  |
|
||||
|:-------------------------------------:|
|
||||
| **Figure 11. Active Installation** |
|
||||
|
||||
### Installation Complete
|
||||
|
||||
Once the installation is complete, the installer window may prompt you for a
|
||||
system restart. Click **Restart** at the lower right corner, shown in Figure 12.
|
||||
|
||||
|  |
|
||||
|:---------------------------------------------------------:|
|
||||
| **Figure 12. Installation Complete** |
|
||||
|
||||
## Uninstallation
|
||||
|
||||
All components, except visual studio plug-in should be uninstalled through
|
||||
control panel -> Add/Remove Program. For visual studio extension uninstallation,
|
||||
please refer to
|
||||
<https://github.com/ROCm-Developer-Tools/HIP-VS/blob/master/README.md>. For the
|
||||
uninstallation of the HIP SDK Core and drivers repeat the steps in the sections
|
||||
[HIP SDK Installer](#hip-sdk-installer) and
|
||||
[Installing Components](#installing-components).
|
||||
|
||||
**Note** Selecting **Install** once ROCm has already installed results in its
|
||||
uninstallation.
|
||||
|
||||
|  |
|
||||
|:----------------------------------------:|
|
||||
| **Figure 13. HIP SDK Uninstalling** |
|
||||
BIN
docs/sphinx/hip_sdk_install_win/image/AMD-Display-Driver.png
Normal file
|
After Width: | Height: | Size: 163 KiB |
BIN
docs/sphinx/hip_sdk_install_win/image/AMD-Logo.png
Normal file
|
After Width: | Height: | Size: 2.1 KiB |
BIN
docs/sphinx/hip_sdk_install_win/image/BitCode-Profiler.png
Normal file
|
After Width: | Height: | Size: 34 KiB |
BIN
docs/sphinx/hip_sdk_install_win/image/DeSelectAll.png
Normal file
|
After Width: | Height: | Size: 183 KiB |
BIN
docs/sphinx/hip_sdk_install_win/image/HIP-Libraries.png
Normal file
|
After Width: | Height: | Size: 40 KiB |
BIN
docs/sphinx/hip_sdk_install_win/image/HIP-Ray-Tracing.png
Normal file
|
After Width: | Height: | Size: 40 KiB |
BIN
docs/sphinx/hip_sdk_install_win/image/HIP-Runtime-Compiler.png
Normal file
|
After Width: | Height: | Size: 36 KiB |
BIN
docs/sphinx/hip_sdk_install_win/image/HIP-SDK-Core.png
Normal file
|
After Width: | Height: | Size: 38 KiB |
BIN
docs/sphinx/hip_sdk_install_win/image/Installation-Complete.png
Normal file
|
After Width: | Height: | Size: 407 KiB |
BIN
docs/sphinx/hip_sdk_install_win/image/Installation.png
Normal file
|
After Width: | Height: | Size: 465 KiB |
BIN
docs/sphinx/hip_sdk_install_win/image/Installer-Window.png
Normal file
|
After Width: | Height: | Size: 207 KiB |
BIN
docs/sphinx/hip_sdk_install_win/image/Loading-Window.png
Normal file
|
After Width: | Height: | Size: 461 KiB |
BIN
docs/sphinx/hip_sdk_install_win/image/LoadingWindow.png
Normal file
|
After Width: | Height: | Size: 461 KiB |
BIN
docs/sphinx/hip_sdk_install_win/image/Setup-Icon.png
Normal file
|
After Width: | Height: | Size: 3.5 KiB |
BIN
docs/sphinx/hip_sdk_install_win/image/Uninstallation.png
Normal file
|
After Width: | Height: | Size: 412 KiB |
BIN
docs/sphinx/hip_sdk_install_win/image/Windows-Security.png
Normal file
|
After Width: | Height: | Size: 68 KiB |
1
docs/sphinx/hip_sdk_install_win/image/image planning
Normal file
@@ -0,0 +1 @@
|
||||
|
||||
1
docs/sphinx/how_to/deep_learning_rocm.md
Normal file
@@ -0,0 +1 @@
|
||||
# Deep Learning Guide
|
||||
5
docs/sphinx/how_to/docker_gpu_isolation.md
Normal file
@@ -0,0 +1,5 @@
|
||||
# Docker GPU Isolation Techniques
|
||||
|
||||
## GPU Passthrough
|
||||
|
||||
## Environment Variable
|
||||
BIN
docs/sphinx/how_to/magma_install/figures/image.005.png
Normal file
|
After Width: | Height: | Size: 88 KiB |
BIN
docs/sphinx/how_to/magma_install/figures/image.006.png
Normal file
|
After Width: | Height: | Size: 32 KiB |
@@ -0,0 +1,4 @@
|
||||
[ZoneTransfer]
|
||||
ZoneId=3
|
||||
ReferrerUrl=https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/Frameworks_Installation.html
|
||||
HostUrl=https://docs-be.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/image.006.png?_LANG=enus
|
||||
471
docs/sphinx/how_to/magma_install/magma_install.md
Normal file
@@ -0,0 +1,471 @@
|
||||
# Magma Installation for ROCm
|
||||
|
||||
Pull content from
|
||||
<https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/Frameworks_Installation.html>
|
||||
|
||||
The following sections cover the different framework installations for ROCm and
|
||||
Deep Learning applications. Figure 5 provides the sequential flow for the use of
|
||||
each framework. Refer to the ROCm Compatible Frameworks Release Notes for each
|
||||
framework's most current release notes at
|
||||
[/bundle/ROCm-Compatible-Frameworks-Release-Notes/page/Framework_Release_Notes.html](/bundle/ROCm-Compatible-Frameworks-Release-Notes/page/Framework_Release_Notes.html).
|
||||
|
||||
| |
|
||||
|:--:|
|
||||
| <b>Figure 5. ROCm Compatible Frameworks Flowchart</b>|
|
||||
|
||||
## PyTorch
|
||||
PyTorch is an open source Machine Learning Python library, primarily differentiated by Tensor computing with GPU acceleration and a type-based automatic differentiation. Other advanced features include:
|
||||
- Support for distributed training
|
||||
- Native ONNX support
|
||||
- C++ frontend
|
||||
- The ability to deploy at scale using TorchServe
|
||||
- A production-ready deployment mechanism through TorchScript
|
||||
### Installing PyTorch
|
||||
To install ROCm on bare metal, refer to the section [ROCm Installation](https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4-/page/Prerequisites.html#d2999e60). The recommended option to get a PyTorch environment is through Docker. However, installing the PyTorch wheels package on bare metal is also supported.
|
||||
#### Option 1 (Recommended): Use Docker Image with PyTorch Pre-installed
|
||||
Using Docker gives you portability and access to a prebuilt Docker container that has been rigorously tested within AMD. This might also save on the compilation time and should perform as it did when tested without facing potential installation issues.
|
||||
Follow these steps:
|
||||
1. Pull the latest public PyTorch Docker image.
|
||||
|
||||
```
|
||||
docker pull rocm/pytorch:latest
|
||||
```
|
||||
|
||||
Optionally, you may download a specific and supported configuration with different user-space ROCm versions, PyTorch versions, and supported operating systems. To download the PyTorch Docker image, refer to [https://hub.docker.com/r/rocm/pytorch](https://hub.docker.com/r/rocm/pytorch).
|
||||
|
||||
2. Start a Docker container using the downloaded image.
|
||||
|
||||
```
|
||||
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
|
||||
```
|
||||
|
||||
:::{note}
|
||||
This will automatically download the image if it does not exist on the host. You can also pass the -v argument to mount any data directories from the host onto the container.
|
||||
:::
|
||||
|
||||
#### Option 2: Install PyTorch Using Wheels Package
|
||||
PyTorch supports the ROCm platform by providing tested wheels packages. To access this feature, refer to [https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/) and choose the "ROCm" compute platform. Figure 6 is a matrix from pytroch.org that illustrates the installation compatibility between ROCm and the PyTorch build.
|
||||
|
||||
|
||||
| |
|
||||
|:--:|
|
||||
| <b>Figure 6. Installation Matrix from Pytorch.org</b>|
|
||||
|
||||
|
||||
To install PyTorch using the wheels package, follow these installation steps:
|
||||
|
||||
1. Choose one of the following options:
|
||||
|
||||
a. Obtain a base Docker image with the correct user-space ROCm version installed from [https://hub.docker.com/repository/docker/rocm/dev-ubuntu-20.04](https://hub.docker.com/repository/docker/rocm/dev-ubuntu-20.04).
|
||||
|
||||
or
|
||||
|
||||
b. Download a base OS Docker image and install ROCm following the installation directions in the section [Installation](https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4-/page/Prerequisites.html#d2999e60). ROCm 5.2 is installed in this example, as supported by the installation matrix from pytorch.org.
|
||||
|
||||
or
|
||||
|
||||
c. Install on bare metal. Skip to Step 3.
|
||||
|
||||
```
|
||||
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/dev-ubuntu-20.04:latest
|
||||
```
|
||||
3. Install any dependencies needed for installing the wheels package.
|
||||
|
||||
```
|
||||
sudo apt update
|
||||
sudo apt install libjpeg-dev python3-dev
|
||||
pip3 install wheel setuptools
|
||||
```
|
||||
|
||||
4. Install torch, torchvision, and torchaudio as specified by the installation matrix.
|
||||
|
||||
:::{note}
|
||||
ROCm 5.2 PyTorch wheel in the command below is shown for reference.
|
||||
:::
|
||||
|
||||
|
||||
```
|
||||
pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/rocm5.2/
|
||||
```
|
||||
|
||||
#### Option 3: Install PyTorch Using PyTorch ROCm Base Docker Image
|
||||
A prebuilt base Docker image is used to build PyTorch in this option. The base Docker has all dependencies installed, including:
|
||||
- ROCm
|
||||
- Torchvision
|
||||
- Conda packages
|
||||
- Compiler toolchain
|
||||
Additionally, a particular environment flag (BUILD_ENVIRONMENT) is set, and the build scripts utilize that to determine the build environment configuration.
|
||||
|
||||
Follow these steps:
|
||||
|
||||
1. Obtain the Docker image.
|
||||
```
|
||||
docker pull rocm/pytorch:latest-base
|
||||
```
|
||||
The above will download the base container, which does not contain PyTorch.
|
||||
|
||||
2. Start a Docker container using the image.
|
||||
```
|
||||
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest-base
|
||||
```
|
||||
You can also pass the -v argument to mount any data directories from the host onto the container.
|
||||
|
||||
3. Clone the PyTorch repository.
|
||||
```
|
||||
cd ~
|
||||
git clone https://github.com/pytorch/pytorch.git
|
||||
cd pytorch
|
||||
git submodule update --init –recursive
|
||||
```
|
||||
|
||||
4. Build PyTorch for ROCm.
|
||||
:::{note}
|
||||
By default in the rocm/pytorch:latest-base, PyTorch builds for these architectures simultaneously:
|
||||
- gfx900
|
||||
- gfx906
|
||||
- gfx908
|
||||
- gfx90a
|
||||
- gfx1030
|
||||
:::
|
||||
|
||||
5. To determine your AMD uarch, run:
|
||||
```
|
||||
rocminfo | grep gfx
|
||||
```
|
||||
|
||||
6. In the event you want to compile only for your uarch, use:
|
||||
```
|
||||
export PYTORCH_ROCM_ARCH=<uarch>
|
||||
```
|
||||
\<uarch\> is the architecture reported by the rocminfo command. is the architecture reported by the rocminfo command.
|
||||
|
||||
7. Build PyTorch using the following command:
|
||||
```
|
||||
./.jenkins/pytorch/build.sh
|
||||
```
|
||||
This will first convert PyTorch sources for HIP compatibility and build the PyTorch framework.
|
||||
|
||||
8. Alternatively, build PyTorch by issuing the following commands:
|
||||
```
|
||||
python3 tools/amd_build/build_amd.py
|
||||
USE_ROCM=1 MAX_JOBS=4 python3 setup.py install ––user
|
||||
```
|
||||
#### Option 4: Install Using PyTorch Upstream Docker File
|
||||
Instead of using a prebuilt base Docker image, you can build a custom base Docker image using scripts from the PyTorch repository. This will utilize a standard Docker image from operating system maintainers and install all the dependencies required to build PyTorch, including
|
||||
- ROCm
|
||||
- Torchvision
|
||||
- Conda packages
|
||||
- Compiler toolchain
|
||||
|
||||
Follow these steps:
|
||||
|
||||
1. Clone the PyTorch repository on the host.
|
||||
```
|
||||
cd ~
|
||||
git clone https://github.com/pytorch/pytorch.git
|
||||
cd pytorch
|
||||
git submodule update --init –recursive
|
||||
```
|
||||
|
||||
2. Build the PyTorch Docker image.
|
||||
```
|
||||
cd.circleci/docker
|
||||
./build.sh pytorch-linux-bionic-rocm<version>-py3.7
|
||||
# eg. ./build.sh pytorch-linux-bionic-rocm3.10-py3.7
|
||||
```
|
||||
This should be complete with a message "Successfully build <image_id>."
|
||||
|
||||
3. Start a Docker container using the image:
|
||||
```
|
||||
docker run -it --cap-add=SYS_PTRACE --security-opt
|
||||
seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add
|
||||
video --ipc=host --shm-size 8G <image_id>
|
||||
```
|
||||
You can also pass -v argument to mount any data directories from the host onto the container.
|
||||
|
||||
4. Clone the PyTorch repository.
|
||||
```
|
||||
cd ~
|
||||
git clone https://github.com/pytorch/pytorch.git
|
||||
cd pytorch
|
||||
git submodule update --init --recursive
|
||||
```
|
||||
|
||||
5. Build PyTorch for ROCm.
|
||||
|
||||
:::{note}
|
||||
By default in the rocm/pytorch:latest-base, PyTorch builds for these architectures simultaneously:
|
||||
- gfx900
|
||||
- gfx906
|
||||
- gfx908
|
||||
- gfx90a
|
||||
- gfx1030
|
||||
:::
|
||||
|
||||
6. To determine your AMD uarch, run:
|
||||
```
|
||||
rocminfo | grep gfx
|
||||
```
|
||||
|
||||
7. If you want to compile only for your uarch:
|
||||
```
|
||||
export PYTORCH_ROCM_ARCH=<uarch>
|
||||
```
|
||||
\<uarch\> is the architecture reported by the rocminfo command.
|
||||
|
||||
8. Build PyTorch using:
|
||||
```
|
||||
./.jenkins/pytorch/build.sh
|
||||
```
|
||||
This will first convert PyTorch sources to be HIP compatible and then build the PyTorch framework.
|
||||
|
||||
Alternatively, build PyTorch by issuing the following commands:
|
||||
```
|
||||
python3 tools/amd\_build/build\_amd.py
|
||||
USE\_ROCM=1 MAX\_JOBS=4 python3 setup.py install --user
|
||||
```
|
||||
|
||||
### Test the PyTorch Installation
|
||||
You can use PyTorch unit tests to validate a PyTorch installation. If using a prebuilt PyTorch Docker image from AMD ROCm DockerHub or installing an official wheels package, these tests are already run on those configurations. Alternatively, you can manually run the unit tests to validate the PyTorch installation fully.
|
||||
|
||||
Follow these steps:
|
||||
|
||||
1. Test if PyTorch is installed and accessible by importing the torch package in Python.
|
||||
:::{note}
|
||||
Do not run in the PyTorch git folder.
|
||||
:::
|
||||
```
|
||||
python3 -c 'import torch' 2> /dev/null && echo 'Success' || echo 'Failure'
|
||||
```
|
||||
|
||||
2. Test if the GPU is accessible from PyTorch. In the PyTorch framework, torch.cuda is a generic mechanism to access the GPU; it will access an AMD GPU only if available.
|
||||
```
|
||||
python3 -c 'import torch; print(torch.cuda.is_available())'
|
||||
```
|
||||
|
||||
3. Run the unit tests to validate the PyTorch installation fully. Run the following command from the PyTorch home directory:
|
||||
```
|
||||
BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT:-rocm} ./.jenkins/pytorch/test.sh
|
||||
```
|
||||
This ensures that even for wheel installs in a non-controlled environment, the required environment variable will be set to skip certain unit tests for ROCm.
|
||||
:::{note}
|
||||
Make sure the PyTorch source code is corresponding to the PyTorch wheel or installation in the Docker image. Incompatible PyTorch source code might give errors when running the unit tests.
|
||||
:::
|
||||
This will first install some dependencies, such as a supported torchvision version for PyTorch. Torchvision is used in some PyTorch tests for loading models. Next, this will run all the unit tests.
|
||||
:::{note}
|
||||
Some tests may be skipped, as appropriate, based on your system configuration. All features of PyTorch are not supported on ROCm, and the tests that evaluate these features are skipped. In addition, depending on the host memory, or the number of available GPUs, other tests may be skipped. No test should fail if the compilation and installation are correct.
|
||||
:::
|
||||
|
||||
4. Run individual unit tests with the following command:
|
||||
```
|
||||
PYTORCH\_TEST\_WITH\_ROCM=1 python3 test/test\_nn.py --verbose
|
||||
```
|
||||
test_nn.py can be replaced with any other test set.
|
||||
|
||||
### Run a Basic PyTorch Example
|
||||
The PyTorch examples repository provides basic examples that exercise the functionality of the framework. MNIST (Modified National Institute of Standards and Technology) database is a collection of handwritten digits that may be used to train a Convolutional Neural Network for handwriting recognition. Alternatively, ImageNet is a database of images used to train a network for visual object recognition.
|
||||
|
||||
Follow these steps:
|
||||
|
||||
1. Clone the PyTorch examples repository.
|
||||
```
|
||||
git clone https://github.com/pytorch/examples.git
|
||||
```
|
||||
|
||||
2. Run the MNIST example.
|
||||
```
|
||||
cd examples/mnist
|
||||
```
|
||||
|
||||
3. Follow the instructions in the README file in this folder. In this case:
|
||||
```
|
||||
pip3 install -r requirements.txt
|
||||
python3 main.py
|
||||
```
|
||||
|
||||
4. Run the ImageNet example.
|
||||
```
|
||||
cd examples/imagenet
|
||||
```
|
||||
|
||||
5. Follow the instructions in the README file in this folder. In this case:
|
||||
```
|
||||
pip3 install -r requirements.txt
|
||||
python3 main.py
|
||||
```
|
||||
|
||||
## MAGMA for ROCm
|
||||
Matrix Algebra on GPU and Multicore Architectures, abbreviated as MAGMA, is a collection of next-generation dense linear algebra libraries that is designed for heterogeneous architectures, such as multiple GPUs and multi- or many-core CPUs.
|
||||
|
||||
MAGMA provides implementations for CUDA, HIP, Intel Xeon Phi, and OpenCL™. For more information, refer to [https://icl.utk.edu/magma/index.html](https://icl.utk.edu/magma/index.html).
|
||||
|
||||
### Using MAGMA for PyTorch
|
||||
Tensor is fundamental to Deep Learning techniques because it provides extensive representational functionalities and math operations. This data structure is represented as a multidimensional matrix. MAGMA accelerates tensor operations with a variety of solutions including driver routines, computational routines, BLAS routines, auxiliary routines, and utility routines.
|
||||
|
||||
### Build MAGMA from Source
|
||||
To build MAGMA from the source, follow these steps:
|
||||
|
||||
1. In the event you want to compile only for your uarch, use:
|
||||
```
|
||||
export PYTORCH_ROCM_ARCH=<uarch>
|
||||
```
|
||||
\<uarch\> is the architecture reported by the rocminfo command.
|
||||
|
||||
2. Use the following:
|
||||
```
|
||||
export PYTORCH_ROCM_ARCH=<uarch>
|
||||
|
||||
# "install" hipMAGMA into /opt/rocm/magma by copying after build
|
||||
git clone https://bitbucket.org/icl/magma.git
|
||||
pushd magma
|
||||
# Fixes memory leaks of magma found while executing linalg UTs
|
||||
git checkout 5959b8783e45f1809812ed96ae762f38ee701972
|
||||
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
|
||||
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
|
||||
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc
|
||||
echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc
|
||||
export PATH="${PATH}:/opt/rocm/bin"
|
||||
if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then
|
||||
amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'`
|
||||
else
|
||||
amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`
|
||||
fi
|
||||
for arch in $amdgpu_targets; do
|
||||
echo "DEVCCFLAGS += --amdgpu-target=$arch" >> make.inc
|
||||
done
|
||||
# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
|
||||
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
|
||||
make -f make.gen.hipMAGMA -j $(nproc)
|
||||
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda
|
||||
make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda
|
||||
popd
|
||||
mv magma /opt/rocm
|
||||
```
|
||||
## TensorFlow
|
||||
TensorFlow is an open source library for solving Machine Learning, Deep Learning, and Artificial Intelligence problems. It can be used to solve many problems across different sectors and industries but primarily focuses on training and inference in neural networks. It is one of the most popular and in-demand frameworks and is very active in open source contribution and development.
|
||||
|
||||
### Installing TensorFlow
|
||||
The following sections contain options for installing TensorFlow.
|
||||
|
||||
#### Option 1: Install TensorFlow Using Docker Image
|
||||
To install ROCm on bare metal, follow the section [ROCm Installation](https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4-/page/Prerequisites.html#d2999e60). The recommended option to get a TensorFlow environment is through Docker.
|
||||
|
||||
Using Docker provides portability and access to a prebuilt Docker container that has been rigorously tested within AMD. This might also save compilation time and should perform as tested without facing potential installation issues.
|
||||
Follow these steps:
|
||||
|
||||
1. Pull the latest public TensorFlow Docker image.
|
||||
```
|
||||
docker pull rocm/tensorflow:latest
|
||||
```
|
||||
|
||||
2. Once you have pulled the image, run it by using the command below:
|
||||
```
|
||||
docker run -it --network=host --device=/dev/kfd --device=/dev/dri
|
||||
--ipc=host --shm-size 16G --group-add video --cap-add=SYS\_PTRACE
|
||||
--security-opt seccomp=unconfined rocm/tensorflow:latest
|
||||
```
|
||||
|
||||
#### Option 2: Install TensorFlow Using Wheels Package
|
||||
To install TensorFlow using the wheels package, follow these steps:
|
||||
|
||||
1. Check the Python version.
|
||||
```
|
||||
python3 –version
|
||||
```
|
||||
| If: | Then: |
|
||||
| ----------- | ----------- |
|
||||
| The Python version is less than 3.7 | Upgrade Python. |
|
||||
| The Python version is more than 3.7 | Skip this step and go to Step 3. |
|
||||
:::{note}
|
||||
The supported Python versions are:
|
||||
- 3.7
|
||||
- 3.8
|
||||
- 3.9
|
||||
- 3.10
|
||||
:::
|
||||
```
|
||||
sudo apt-get install python3.7 # or python3.8 or python 3.9 or python 3.10
|
||||
```
|
||||
|
||||
2. Set up multiple Python versions using update-alternatives.
|
||||
```
|
||||
update-alternatives --query python3
|
||||
sudo update-alternatives --install
|
||||
/usr/bin/python3 python3 /usr/bin/python[version] [priority]
|
||||
```
|
||||
:::{note}
|
||||
Follow the instruction in Step 2 for incompatible Python versions.
|
||||
:::
|
||||
```
|
||||
sudo update-alternatives --config python3
|
||||
```
|
||||
|
||||
3. Follow the screen prompts, and select the Python version installed in Step 2.
|
||||
|
||||
4. Install or upgrade PIP.
|
||||
```
|
||||
sudo apt install python3-pip
|
||||
```
|
||||
To install PIP, use the following:
|
||||
```
|
||||
/usr/bin/python[version] -m pip install --upgrade pip
|
||||
```
|
||||
Upgrade PIP for Python version installed in step 2:
|
||||
```
|
||||
sudo pip3 install --upgrade pip
|
||||
```
|
||||
|
||||
5. Install TensorFlow for the Python version as indicated in Step 2.
|
||||
```
|
||||
/usr/bin/python[version] -m pip install --user tensorflow-rocm==[wheel-version] –upgrade
|
||||
```
|
||||
For a valid wheel version for a ROCm release, refer to the instruction below:
|
||||
```
|
||||
sudo apt install rocm-libs rccl
|
||||
```
|
||||
|
||||
6. Update protobuf to 3.19 or lower.
|
||||
```
|
||||
/usr/bin/python3.7 -m pip install protobuf=3.19.0
|
||||
sudo pip3 install tensorflow
|
||||
```
|
||||
|
||||
7. Set the environment variable PYTHONPATH.
|
||||
```
|
||||
export PYTHONPATH="./.local/lib/python[version]/site-packages:$PYTHONPATH" #Use same python version as in step 2
|
||||
```
|
||||
|
||||
8. Install libraries.
|
||||
```
|
||||
sudo apt install rocm-libs rccl
|
||||
```
|
||||
|
||||
9. Test installation.
|
||||
```
|
||||
python3 -c 'import tensorflow' 2> /dev/null && echo 'Success' || echo 'Failure'
|
||||
```
|
||||
:::{note}
|
||||
For details on tensorflow-rocm wheels and ROCm version compatibility, see: [https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-rocm-release.md](https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-rocm-release.md)
|
||||
:::
|
||||
|
||||
### Test the TensorFlow Installation
|
||||
To test the installation of TensorFlow, run the container image as specified in the previous section Installing TensorFlow. Ensure you have access to the Python shell in the Docker container.
|
||||
```
|
||||
python3 -c 'import tensorflow' 2> /dev/null && echo ‘Success’ || echo ‘Failure’
|
||||
```
|
||||
|
||||
### Run a Basic TensorFlow Example
|
||||
The TensorFlow examples repository provides basic examples that exercise the framework's functionality. The MNIST database is a collection of handwritten digits that may be used to train a Convolutional Neural Network for handwriting recognition.
|
||||
|
||||
Follow these steps:
|
||||
1. Clone the TensorFlow example repository.
|
||||
```
|
||||
cd ~
|
||||
git clone https://github.com/tensorflow/models.git
|
||||
```
|
||||
|
||||
2. Install the dependencies of the code, and run the code.
|
||||
```
|
||||
#pip3 install requirement.txt
|
||||
#python mnist\_tf.py
|
||||
```
|
||||
6
docs/sphinx/how_to/pytorch_install/pytorch_install.md
Normal file
@@ -0,0 +1,6 @@
|
||||
# PyTorch Installation for ROCm
|
||||
|
||||
Pull content from
|
||||
<https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/Frameworks_Installation.html>
|
||||
|
||||
TEST
|
||||
3
docs/sphinx/how_to/system_debugging.md
Normal file
@@ -0,0 +1,3 @@
|
||||
# System Debugging Guide
|
||||
|
||||
Pull from https://docs.amd.com/bundle/ROCm-System-Level-Debug-Guide-v5.2/page/ROCm_System_Level_Debug_Information.html
|
||||
@@ -0,0 +1,4 @@
|
||||
# TensorFlow Installation for ROCm
|
||||
|
||||
Pull content from
|
||||
<https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/Frameworks_Installation.html>
|
||||
@@ -0,0 +1,4 @@
|
||||
# Inference Optimization Using MIGraphX
|
||||
|
||||
Pull content from
|
||||
<https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/Optimization.html>
|
||||
74
docs/sphinx/index.md
Normal file
@@ -0,0 +1,74 @@
|
||||
# AMD ROCm Documentation
|
||||
|
||||
Welcome to AMD ROCm's documentation!
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
::::{grid-item}
|
||||
:::{dropdown} [Release Info](release)
|
||||
|
||||
- Release Notes
|
||||
- [GPU and OS Support](gpu_os_support)
|
||||
- [Known Issues](https://github.com/RadeonOpenCompute/ROCm/labels/Verified%20Issue)
|
||||
- End Of Life and Support Policies
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
::::{grid-item}
|
||||
:::{dropdown} [Deploy ROCm](deploy)
|
||||
|
||||
- [Quick Start (Linux)](quick_start)
|
||||
- [Quick Start (Windows)](hip_sdk_install_win/hip_sdk_install_win)
|
||||
- [Advanced (Linux)](deploy/advanced)
|
||||
- [Docker](deploy/docker)
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
:::::
|
||||
|
||||
::::{grid} 1 2 2 2
|
||||
:class-container: rocm-doc-grid
|
||||
|
||||
:::{grid-item-card}
|
||||
:padding: 2
|
||||
[APIs and Reference](https://example.com)
|
||||
^^^
|
||||
|
||||
- [HIP](reference/hip)
|
||||
- [OpenMP](reference/openmp/openmp)
|
||||
- [Compilers and Tools](reference/compilers)
|
||||
- [Management Tools](reference/tools)
|
||||
- [GPU Architecture](reference/gpu_arch)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
:padding: 2
|
||||
Understand ROCm
|
||||
^^^
|
||||
|
||||
- What compiler should I choose?
|
||||
- All Articles
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
:padding: 2
|
||||
How to Guides
|
||||
^^^
|
||||
- [How to Isolate GPUs in Docker?](how_to/docker_gpu_isolation)
|
||||
- [Magma Installation for ROCm](how_to/magma_install/magma_install)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
:padding: 2
|
||||
Examples
|
||||
^^^
|
||||
|
||||
- [rocm-examples](https://github.com/amd/rocm-examples)
|
||||
|
||||
:::
|
||||
::::
|
||||
23
docs/sphinx/isv_deployment_win.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# ISV Deployment Guide (Windows)
|
||||
|
||||
## Abstract
|
||||
|
||||
ISVs deploying applications using the HIP SDK depend on the AMD GPU Drivers, HIP Runtime Library and HIP SDK Libraries. A compatibility matrix table provides details on AMD’s support model. AMD GPU Drivers are distributed with a HIP Runtime included. Each HIP Runtime is associated with a HIP compiler version. Applications built with a particular HIP compiler should document its associated HIP Runtime version and AMD GPU Driver as minimum version requirements for its end users. Applications do not distribute the HIP Runtime. Instead, end users will use the HIP Runtime provided by an AMD GPU Driver. AMD provides backward compatibility for applications dynamically linked to the HIP Runtime based on our Driver and HIP support policy. ISV applications using the HIP SDK Libraries, for example hipBLAS, should distribute the HIP SDK Library as part of its installer package. It is recommended not to require end users to install the HIP SDK. AMD provides backward compatibility for AMD Driver and HIP Runtime for the HIP SDK Libraries based on our support policy. AMD support policy for Visual Studio and other third-party compilers are documented here.
|
||||
|
||||
## Introduction
|
||||
|
||||
This guide is intended for Independent Software Vendors (ISVs) and other software developers intending to build applications with the HIP SDK for Windows. The HIP SDK is intended for developer distribution in contrast to the AMD GPU driver which is intended for all end users. The guide discusses how to use and distribute components from the HIP SDK. The HIP SDK is the collection of the AMD GPU Driver, HIP Runtime and the HIP Libraries. These three parts are distributed in the HIP SDK installer. The compatibility and versioning relation between these three parts is documented here. AMD’s support policies for the developer tools allows the ISVs the stability to plan the usage of a tool chain.
|
||||
|
||||
## Recommended Library Distribution Model
|
||||
|
||||
The HIP SDK is distributed via a Windows installer. This distribution system is only intended for software developers and testers. AMD recommends that end users of the program built against HIP SDK components do not have a requirement to install the HIP SDK. There are two types of ISV applications that use the HIP SDK as follows.
|
||||
|
||||
The first group of ISV applications have a dependency on the HIP Runtime and select HIP Header Only Libraries (rocPRIM, hipCUB and rocThrust). This group of ISV applications need to require their end users install an AMD GPU Driver. Each AMD GPU driver has a HIP runtime library bundled with it. The ISV application should ensure that the HIP runtime library has a minimum version associated with it. As the HIP runtime library does not have semantic versioning, the ISV application cannot check for compatibility. However, AMD is committed to not breaking API/ABI compatibility unless the major version number of the HIP runtime is incremented. ISV applications may run without user warning if the HIP major version available in the driver is the same as the HIP major version associated with the compiler it was built with. The ISV at its discretion may throw a warning if the HIP major version is higher than the associate HIP major version of the compiler it was built with.
|
||||
|
||||
The second group of ISV application has a dependency on the HIP Runtime and one or more Dynamically Linked HIP Libraries including the HIP RT library. ISV applications with this dependency need to ensure the end user installs an AMD GPU Driver and is recommended to distribute the dynamically linked HIP library in the installer package of its application. This allows end users to avoid installing the HIP SDK. One benefit of this model is smaller disk space required as only required binaries are distributed by the ISV application. It also avoids the end user to have to agree to licensing agreements for the entire HIP SDK. The version checks recommended for the ISV application including dynamically linked HIP Libraries follow the same requirements as the ISV applications that only have the HIP Runtime and header only library. In addition, each dynamically linked HIP library also has a minimum HIP runtime requirement. Checks for the minimum HIP version for each dynamically linked HIP library may be added at the ISVs discretion. Usually, the minimum HIP version check for the HIP runtime is sufficient if dynamically linked HIP libraries come from the same SDK package as the HIP compiler.
|
||||
|
||||
Please note AMD does not support static linking to any components distributed in the HIP SDK.
|
||||
|
||||
## Conclusion
|
||||
|
||||
This guide provides a limited set of guidance for ISVs application deployment. Please refer to the HIP API guides for the SDK and HIP Optimization guides for more information.
|
||||
1
docs/sphinx/kernel_userspace.md
Normal file
@@ -0,0 +1 @@
|
||||
# Kernel and Userspace Compatibility
|
||||
0
docs/sphinx/packaging_guidelines.md
Normal file
236
docs/sphinx/quick_start.md
Normal file
@@ -0,0 +1,236 @@
|
||||
# Quick Start (Linux)
|
||||
|
||||
## Install Prerequisites
|
||||
|
||||
::::{tab-set}
|
||||
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: key5
|
||||
Install kernel headers and modules for the active kernel.
|
||||
|
||||
```bash
|
||||
sudo apt install linux-headers-`uname -r` linux-modules-extra-`uname -r`
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: key6
|
||||
|
||||
Content 3
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL8
|
||||
:sync: key1
|
||||
|
||||
Content 1
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL9
|
||||
:sync: key2
|
||||
|
||||
Content 2
|
||||
:::
|
||||
|
||||
:::{tab-item} SLES15 SP3
|
||||
:sync: key3
|
||||
|
||||
Content 3
|
||||
:::
|
||||
|
||||
:::{tab-item} SLES15 SP4
|
||||
:sync: key4
|
||||
|
||||
Content 3
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
## Add Repositories
|
||||
|
||||
::::{tab-set}
|
||||
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: key5
|
||||
Add the ROCm GPG key and add the repositories
|
||||
|
||||
```bash
|
||||
wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
|
||||
echo 'deb [arch=amd64] https://repo.radeon.com/amdgpu/latest/ubuntu focal main' | sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/debian/ focal main' | sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt-get update
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: key6
|
||||
|
||||
Content 3
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL8
|
||||
:sync: key1
|
||||
|
||||
Content 1
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL9
|
||||
:sync: key2
|
||||
|
||||
Content 2
|
||||
:::
|
||||
|
||||
:::{tab-item} SLES15 SP3
|
||||
:sync: key3
|
||||
|
||||
Content 3
|
||||
:::
|
||||
|
||||
:::{tab-item} SLES15 SP4
|
||||
:sync: key4
|
||||
|
||||
Content 3
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
## Install Drivers
|
||||
|
||||
::::{tab-set}
|
||||
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: key5
|
||||
Install the amdgpu kernel module, aka driver, on your system.
|
||||
|
||||
```bash
|
||||
sudo apt install amdgpu-dkms
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: key6
|
||||
|
||||
Content 3
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL8
|
||||
:sync: key1
|
||||
|
||||
Content 1
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL9
|
||||
:sync: key2
|
||||
|
||||
Content 2
|
||||
:::
|
||||
|
||||
:::{tab-item} SLES15 SP3
|
||||
:sync: key3
|
||||
|
||||
Content 3
|
||||
:::
|
||||
|
||||
:::{tab-item} SLES15 SP4
|
||||
:sync: key4
|
||||
|
||||
Content 3
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
## Install ROCm Runtimes
|
||||
|
||||
::::{tab-set}
|
||||
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: key5
|
||||
Installs the rocm-hip-runtime metapackage. This contains depedencies for most
|
||||
common ROCm applications.
|
||||
|
||||
```bash
|
||||
sudo apt install rocm-hip-libraries
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: key6
|
||||
|
||||
Content 3
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL8
|
||||
:sync: key1
|
||||
|
||||
Content 1
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL9
|
||||
:sync: key2
|
||||
|
||||
Content 2
|
||||
:::
|
||||
|
||||
:::{tab-item} SLES15 SP3
|
||||
:sync: key3
|
||||
|
||||
Content 3
|
||||
:::
|
||||
|
||||
:::{tab-item} SLES15 SP4
|
||||
:sync: key4
|
||||
|
||||
Content 3
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
## Reboot the system
|
||||
|
||||
::::{tab-set}
|
||||
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: key5
|
||||
The driver requires a system reboot.
|
||||
|
||||
```bash
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: key6
|
||||
|
||||
Content 3
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL8
|
||||
:sync: key1
|
||||
|
||||
Content 1
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL9
|
||||
:sync: key2
|
||||
|
||||
Content 2
|
||||
:::
|
||||
|
||||
:::{tab-item} SLES15 SP3
|
||||
:sync: key3
|
||||
|
||||
Content 3
|
||||
:::
|
||||
|
||||
:::{tab-item} SLES15 SP4
|
||||
:sync: key4
|
||||
|
||||
Content 3
|
||||
:::
|
||||
|
||||
::::
|
||||
0
docs/sphinx/reference/compilers.md
Normal file
1
docs/sphinx/reference/computer_vision.md
Normal file
@@ -0,0 +1 @@
|
||||
# Computer Vision
|
||||
1
docs/sphinx/reference/dev_tools.md
Normal file
@@ -0,0 +1 @@
|
||||
# Development Tools
|
||||
0
docs/sphinx/reference/docker.md
Normal file
@@ -0,0 +1,8 @@
|
||||
# Framework Compatibility
|
||||
|
||||
Pull content from
|
||||
<https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/Prerequisites.html>.
|
||||
Only the frameworks content. Link to kernel/userspace guide.
|
||||
|
||||
Also pull content from
|
||||
<https://docs.amd.com/bundle/ROCm-Compatible-Frameworks-Release-Notes/page/Framework_Release_Notes.html>
|
||||
10
docs/sphinx/reference/gpu_arch.md
Normal file
@@ -0,0 +1,10 @@
|
||||
# GPU Architectures
|
||||
|
||||
## ISA Documentation
|
||||
|
||||
- [AMD Instinct MI200 Instruction Set Architecture Reference Guide](https://developer.amd.com/wp-content/resources/CDNA2_Shader_ISA_18November2021.pdf)
|
||||
|
||||
## Whitepapers
|
||||
|
||||
- [AMD CDNA Architecture Whitepaper](https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf)
|
||||
- [AMD CDNA™ 2 Architecture Whitepaper](https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf)
|
||||
27
docs/sphinx/reference/gpu_libraries/blas.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# Matrix Multiplication
|
||||
|
||||
ROCm libraries for BLAS are as follows:
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} hipBLAS
|
||||
hipBLAS is a compatiblity layer for GPU accelerated BLAS optimized for AMD GPUs
|
||||
via rocBLAS and rocSOLVER. hipBLAS allows for a common interface for other GPU
|
||||
BLAS libraries.
|
||||
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/CHANGELOG.md)
|
||||
- [API Reference Manual](https://rocmdocs.amd.com/projects/hipBLAS/en/rtd/)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} rocBLAS
|
||||
rocBLAS is an AMD GPU optimized library for BLAS.
|
||||
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/CHANGELOG.md)
|
||||
- [API Reference Manual](https://rocmdocs.amd.com/projects/rocBLAS/en/rtd/)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocBLAS)
|
||||
|
||||
:::
|
||||
|
||||
:::::
|
||||
1
docs/sphinx/reference/gpu_libraries/c++_primitives.md
Normal file
@@ -0,0 +1 @@
|
||||
# C++ Primitives
|
||||
1
docs/sphinx/reference/gpu_libraries/communication.md
Normal file
@@ -0,0 +1 @@
|
||||
# Communication Libraries
|
||||
27
docs/sphinx/reference/gpu_libraries/fft.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# Fast Fourier Transforms
|
||||
|
||||
ROCm libraries for FFT are as follows:
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} hipFFT
|
||||
hipFFT is a compatiblity layer for GPU accelerated FFT optimized for AMD GPUs
|
||||
using rocFFT. hipFFT allows for a common interface for other non AMD GPU
|
||||
FFT libraries.
|
||||
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/CHANGELOG.md)
|
||||
- [API Reference Manual](https://rocmdocs.amd.com/projects/hipFFT/en/rtd/)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} rocFFT
|
||||
rocFFT is an AMD GPU optimized library for FFT.
|
||||
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CHANGELOG.md)
|
||||
- [API Reference Manual](https://rocmdocs.amd.com/projects/hipFFT/en/rtd/)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocFFT)
|
||||
|
||||
:::
|
||||
|
||||
:::::
|
||||
1
docs/sphinx/reference/gpu_libraries/math.md
Normal file
@@ -0,0 +1 @@
|
||||
# Math Libraries
|
||||
1
docs/sphinx/reference/gpu_libraries/rand.md
Normal file
@@ -0,0 +1 @@
|
||||
# Random Numbers
|
||||
1
docs/sphinx/reference/gpu_libraries/solver.md
Normal file
@@ -0,0 +1 @@
|
||||
# Linear Solvers
|
||||
1
docs/sphinx/reference/gpu_libraries/sparse.md
Normal file
@@ -0,0 +1 @@
|
||||
# Sparse Matrix Solvers
|
||||
2
docs/sphinx/reference/hip.md
Normal file
@@ -0,0 +1,2 @@
|
||||
# HIP
|
||||
|
||||
@@ -0,0 +1 @@
|
||||
# Kernel Userspace Compatiblity Reference
|
||||
1
docs/sphinx/reference/management_tools.md
Normal file
@@ -0,0 +1 @@
|
||||
# Management Tools
|
||||
4
docs/sphinx/reference/openmp/openmp.md
Normal file
@@ -0,0 +1,4 @@
|
||||
# OpenMP Support in ROCm
|
||||
|
||||
Pull from
|
||||
<https://docs.amd.com/bundle/OpenMP-Support-Guide-v5.4/page/Introduction_to_OpenMP_Support_Guide.html>
|
||||
23
docs/sphinx/reference/rocmcc/rocmcc.md
Normal file
@@ -0,0 +1,23 @@
|
||||
# Introduction to Compiler Reference Guide
|
||||
|
||||
ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance computing on AMD GPUs and CPUs and supports various heterogenous programming models such as HIP, OpenMP, and OpenCL.
|
||||
|
||||
ROCmCC is made available via two packages: rocm-llvm and rocm-llvm-alt. The differences are shown in this table:
|
||||
|
||||
| <b>Table 1. rocm-llvm vs. rocm-llvm-alt</b>|
|
||||
| rocm-llvm | rocm-llvm-alt |
|
||||
| ----------- | ----------- |
|
||||
| Installed by default when ROCm™ itself is installed | An optional package |
|
||||
| Provides an open-source compiler | Provides an additional closed-source compiler for users interested in additional CPU optimizations not available in rocm-llvm |
|
||||
|
||||
For more details, follow this table:
|
||||
|
||||
| <b>Table 2. Details Table</b>|
|
||||
| For | See |
|
||||
| ----------- | ----------- |
|
||||
| The latest usage information for AMD GPU |[https://llvm.org/docs/AMDGPUUsage.html](https://llvm.org/docs/AMDGPUUsage.html) |
|
||||
|Usage information for a specific ROCm release | [https://llvm.org/docs/AMDGPUUsage.html] (https://llvm.org/docs/AMDGPUUsage.html)|
|
||||
| Source code for rocm-llvm | [https://github.com/RadeonOpenCompute/llvm-project](https://github.com/RadeonOpenCompute/llvm-project) |
|
||||
|
||||
|
||||
|
||||
1
docs/sphinx/release.md
Normal file
@@ -0,0 +1 @@
|
||||
# Release Notes
|
||||
1
docs/sphinx/release/compatibility.md
Normal file
@@ -0,0 +1 @@
|
||||
# Compatibility
|
||||
95
docs/sphinx/release/gpu_os_support.md
Normal file
@@ -0,0 +1,95 @@
|
||||
GPU and OS Support
|
||||
|
||||
## OS Support
|
||||
|
||||
ROCm supports the operating systems listed below.
|
||||
| OS | Validated Kernel |
|
||||
|:------------------:|:------------------:|
|
||||
| RHEL 9.1 | `5.14` |
|
||||
| RHEL 8.6 to 8.7 | `4.18` |
|
||||
| SLES 15 SP4 | |
|
||||
| Ubuntu 20.04.5 LTS | `5.15` |
|
||||
| Ubuntu 22.04.1 LTS | `5.15`, OEM `5.17` |
|
||||
|
||||
## Virtualization Support
|
||||
|
||||
ROCm supports virtualization for select GPUs only as shown below.
|
||||
|
||||
| Hypervisor | Version | GPU | Validated Guest OS (validated kernel) |
|
||||
|:--------------:|:--------:|:-----:|:--------------------------------------------------------------------------------:|
|
||||
| VMWare |ESXi 8 | MI250 | `Ubuntu 20.04 (5.15.0-56-generic)` |
|
||||
| VMWare |ESXi 8 | MI210 | `Ubuntu 20.04 (5.15.0-56-generic)`, `SLES 15 SP4 (5.14.21-150400.24.18-default)` |
|
||||
| VMWare |ESXi 7 | MI210 | `Ubuntu 20.04 (5.15.0-56-generic)`, `SLES 15 SP4( 5.14.21-150400.24.18-default)` |
|
||||
|
||||
## GPU Support Table
|
||||
|
||||
::::{tab-set}
|
||||
|
||||
:::{tab-item} Instinct™
|
||||
:sync: instinct
|
||||
Use Driver Shipped with ROCm
|
||||
|GPU |Architecture |Product|[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Linux | Windows |
|
||||
|:----------------:|:--------------:|:----:|:--------------------------------------------------------------------:|:------------------------------------:|:-------:|
|
||||
|Instinct™ MI250X | CDNA2 |All |gfx90a |Supported |Unsupported |
|
||||
|Instinct™ MI250 | CDNA2 |All |gfx90a |Supported |Unsupported |
|
||||
|Instinct™ MI210 | CDNA2 |All |gfx90a |Supported |Unsupported |
|
||||
|Instinct™ MI100 | CDNA |All|gfx908 |Supported |Unsupported |
|
||||
|Instinct™ MI50 | Vega |All|gfx906 |Supported |Unsupported |
|
||||
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Radeon Pro™
|
||||
:sync: radeonpro
|
||||
|
||||
[Use Radeon Pro Driver](https://www.amd.com/en/support/linux-drivers)
|
||||
|GPU |Architecture |Product|[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Linux | Windows |
|
||||
|:----------------:|:--------------:|:----:|:--------------------------------------------------------------------:|:------------------------------------:|:-------:|
|
||||
|Radeon™ Pro W6800 | RDNA2 |All |gfx1030 |Supported |Supported|
|
||||
|Radeon™ Pro V620 | RDNA2 |All|gfx1030 |Supported |Unsupported|
|
||||
|Radeon™ RX 6900 XT| RDNA2 |HIP SDK|gfx1030 |Supported |Supported|
|
||||
|Radeon™ RX 6600 | RDNA2 |HIP|gfx1031 |Supported|Supported|
|
||||
|Radeon™ R9 Fury | Fiji |All|gfx803 |Community |Unsupported|
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Radeon™
|
||||
:sync: radeon
|
||||
|
||||
[Use Radeon Pro Driver](https://www.amd.com/en/support/linux-drivers)
|
||||
|GPU |Architecture |Product|[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Linux | Windows |
|
||||
|:----------------:|:--------------:|:----:|:--------------------------------------------------------------------:|:------------------------------------:|:-------:|
|
||||
|Radeon™ RX 6900 XT| RDNA2 |HIP SDK|gfx1030 |Supported |Supported|
|
||||
|Radeon™ RX 6600 | RDNA2 |HIP|gfx1031 |Supported|Supported|
|
||||
|Radeon™ R9 Fury | Fiji |All|gfx803 |Community |Unsupported|
|
||||
|
||||
:::
|
||||
|
||||
|
||||
|
||||
::::
|
||||
|
||||
|
||||
|
||||
### Products in ROCm
|
||||
ROCm software support varies by GPU type and Operating System. ROCm ecosystem products are three software stack enablement levels that correspond as described below:
|
||||
|
||||
- All includes all software that is part of the ROCm ecosystem. Please see [article](link) for details of ROCm.
|
||||
- HIP SDK includes the HIP Runtime and a selection of GPU libraries for compute. Please see [article](link) for details of HIP SDK.
|
||||
- HIP enables the use of the HIP Runtime only.
|
||||
|
||||
|
||||
### GPU Support Levels
|
||||
|
||||
GPU support levels in ROCm:
|
||||
|
||||
- Supported - AMD enables these GPUs in our software distributions for the corresponding ROCm product.
|
||||
- Unsupported - This configuration is not enabled in our software distributions.
|
||||
- Deprecated - Support will be removed in a future release.
|
||||
- Community - AMD does not enable these GPUs in our software distributions but end users are free to enable these GPUs themselves.
|
||||
|
||||
|
||||
## CPU Support
|
||||
|
||||
ROCm requires CPUs that support PCIe™ Atomics. Modern CPUs after the release of
|
||||
1st generation AMD Zen CPU and Intel™ Haswell support PCIe Atomics.
|
||||
93
docs/sphinx/release/licensing.md
Normal file
@@ -0,0 +1,93 @@
|
||||
# Licensing Terms
|
||||
|
||||
ROCm™ is released by Advanced Micro Devices, Inc. under the open source licenses
|
||||
via public GitHub repositories. The following table is a list of ROCm components
|
||||
with the links to the license terms. The list is ordered to follow ROCm's
|
||||
manifest file.
|
||||
|
||||
| Component | License |
|
||||
|:------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------------------:|
|
||||
| [ROCK-Kernel-Driver](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/master/COPYING) |
|
||||
| [ROCT-Thunk-Interface](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/) | [MIT](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/master/LICENSE.md) |
|
||||
| [ROCR-Runtime](https://github.com/RadeonOpenCompute/ROCR-Runtime/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/LICENSE.txt) |
|
||||
| [rocm_smi_lib](https://github.com/RadeonOpenCompute/rocm_smi_lib/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/master/License.txt) |
|
||||
| [rocm-cmake](https://github.com/RadeonOpenCompute/rocm-cmake/) | [MIT](https://github.com/RadeonOpenCompute/rocm-cmake/blob/develop/LICENSE) |
|
||||
| [rocminfo](https://github.com/RadeonOpenCompute/rocminfo/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocminfo/blob/master/License.txt) |
|
||||
| [rocprofiler](https://github.com/ROCm-Developer-Tools/rocprofiler/) | [MIT](https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/LICENSE) |
|
||||
| [roctracer](https://github.com/ROCm-Developer-Tools/roctracer/) | [MIT](https://github.com/ROCm-Developer-Tools/roctracer/blob/amd-master/LICENSE) |
|
||||
| [ROCm-OpenCL-Runtime](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/) | [MIT](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/blob/develop/LICENSE.txt) |
|
||||
| [ROCm-OpenCL-Runtime/api/opencl/khronos/icd](https://github.com/KhronosGroup/OpenCL-ICD-Loader/) | [Apache 2.0](https://github.com/KhronosGroup/OpenCL-ICD-Loader/blob/main/LICENSE) |
|
||||
| [clang-ocl](https://github.com/RadeonOpenCompute/clang-ocl/) | [MIT](https://github.com/RadeonOpenCompute/clang-ocl/blob/master/LICENSE) |
|
||||
| [HIP](https://github.com/ROCm-Developer-Tools/HIP/) | [MIT](https://github.com/ROCm-Developer-Tools/HIP/blob/develop/LICENSE.txt) |
|
||||
| [hipamd](https://github.com/ROCm-Developer-Tools/hipamd/) | [MIT](https://github.com/ROCm-Developer-Tools/hipamd/blob/develop/LICENSE.txt) |
|
||||
| [ROCclr](https://github.com/ROCm-Developer-Tools/ROCclr/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCclr/blob/develop/LICENSE.txt) |
|
||||
| [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY/) | [MIT](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/LICENSE.txt) |
|
||||
| [HIPCC](https://github.com/ROCm-Developer-Tools/HIPCC/blob/) | [MIT](https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/LICENSE.txt) |
|
||||
| [llvm-project](https://github.com/ROCm-Developer-Tools/llvm-project/) | [Apache](https://github.com/ROCm-Developer-Tools/llvm-project/blob/main/LICENSE.TXT) |
|
||||
| [ROCm-Device-Libs](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/LICENSE.TXT) |
|
||||
| [atmi](https://github.com/RadeonOpenCompute/atmi/) | [MIT](https://github.com/RadeonOpenCompute/atmi/blob/master/LICENSE.txt) |
|
||||
| [ROCm-CompilerSupport](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/blob/amd-stg-open/LICENSE.txt) |
|
||||
| [rocr_debug_agent](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/) | [The University of Illinois/NCSA](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/blob/master/LICENSE.txt) |
|
||||
| [rocm_bandwidth_test](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/blob/master/LICENSE.txt) |
|
||||
| [half](https://github.com/ROCmSoftwarePlatform/half/) | [MIT](https://github.com/ROCmSoftwarePlatform/half/blob/master/LICENSE.txt) |
|
||||
| [RCP](https://github.com/GPUOpen-Tools/radeon_compute_profiler/) | [MIT](https://github.com/GPUOpen-Tools/radeon_compute_profiler/blob/master/LICENSE) |
|
||||
| [ROCgdb](https://github.com/ROCm-Developer-Tools/ROCgdb/) | [GNU General Public License v2.0](https://github.com/ROCm-Developer-Tools/ROCgdb/blob/amd-master/COPYING) |
|
||||
| [ROCdbgapi](https://github.com/ROCm-Developer-Tools/ROCdbgapi/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCdbgapi/blob/amd-master/LICENSE.txt) |
|
||||
| [rdc](https://github.com/RadeonOpenCompute/rdc/) | [MIT](https://github.com/RadeonOpenCompute/rdc/blob/master/LICENSE) |
|
||||
| [rocBLAS](https://github.com/ROCmSoftwarePlatform/rocBLAS/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/LICENSE.md) |
|
||||
| [Tensile](https://github.com/ROCmSoftwarePlatform/Tensile/) | [MIT](https://github.com/ROCmSoftwarePlatform/Tensile/blob/develop/LICENSE.md) |
|
||||
| [hipBLAS](https://github.com/ROCmSoftwarePlatform/hipBLAS/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/LICENSE.md) |
|
||||
| [rocFFT](https://github.com/ROCmSoftwarePlatform/rocFFT/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/LICENSE.md) |
|
||||
| [hipFFT](https://github.com/ROCmSoftwarePlatform/hipFFT/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/LICENSE.md) |
|
||||
| [rocRAND](https://github.com/ROCmSoftwarePlatform/rocRAND/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocRAND/blob/develop/LICENSE.txt) |
|
||||
| [rocSPARSE](https://github.com/ROCmSoftwarePlatform/rocSPARSE/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocSPARSE/blob/develop/LICENSE.md) |
|
||||
| [rocSOLVER](https://github.com/ROCmSoftwarePlatform/rocSOLVER/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/LICENSE.md) |
|
||||
| [hipSOLVER](https://github.com/ROCmSoftwarePlatform/hipSOLVER/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/LICENSE.md) |
|
||||
| [hipSPARSE](https://github.com/ROCmSoftwarePlatform/hipSPARSE/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSPARSE/blob/develop/LICENSE.md) |
|
||||
| [rocALUTION](https://github.com/ROCmSoftwarePlatform/rocALUTION/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocALUTION/blob/develop/LICENSE.md) |
|
||||
| [MIOpenGEMM](https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/) | [MIT](https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/blob/master/LICENSE.txt) |
|
||||
| [MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen/) | [MIT](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/LICENSE.txt) |
|
||||
| [rccl](https://github.com/ROCmSoftwarePlatform/rccl/) | [Custom](https://github.com/ROCmSoftwarePlatform/rccl/blob/develop/LICENSE.txt) |
|
||||
| [MIVisionX](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/) | [MIT](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/blob/master/LICENSE.txt) |
|
||||
| [rocThrust](https://github.com/ROCmSoftwarePlatform/rocThrust/) | [Apache 2.0](https://github.com/ROCmSoftwarePlatform/rocThrust/blob/develop/LICENSE) |
|
||||
| [hipCUB](https://github.com/ROCmSoftwarePlatform/hipCUB/) | [Custom](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/LICENSE.txt) |
|
||||
| [rocPRIM](https://github.com/ROCmSoftwarePlatform/rocPRIM/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/LICENSE.txt) |
|
||||
| [rocWMMA](https://github.com/ROCmSoftwarePlatform/rocWMMA/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/LICENSE.md) |
|
||||
| [hipfort](https://github.com/ROCmSoftwarePlatform/hipfort/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipfort/blob/master/LICENSE) |
|
||||
| [AMDMIGraphX](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/) | [MIT](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/LICENSE) |
|
||||
| [ROCmValidationSuite](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/blob/master/LICENSE) |
|
||||
| [aomp](https://github.com/ROCm-Developer-Tools/aomp/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/LICENSE) |
|
||||
| [aomp-extras](https://github.com/ROCm-Developer-Tools/aomp-extras/) | [MIT](https://github.com/ROCm-Developer-Tools/aomp-extras/blob/aomp-dev/LICENSE) |
|
||||
| [flang](https://github.com/ROCm-Developer-Tools/flang/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/flang/blob/master/LICENSE.txt) |
|
||||
|
||||
The additional terms and conditions below apply to your use of ROCm technical
|
||||
documentation.
|
||||
|
||||
©2022 Advanced Micro Devices, Inc. All rights reserved.
|
||||
|
||||
The information presented in this document is for informational purposes only
|
||||
and may contain technical inaccuracies, omissions, and typographical errors. The
|
||||
information contained herein is subject to change and may be rendered inaccurate
|
||||
for many reasons, including but not limited to product and roadmap changes,
|
||||
component and motherboard version changes, new model and/or product releases,
|
||||
product differences between differing manufacturers, software changes, BIOS
|
||||
flashes, firmware upgrades, or the like. Any computer system has risks of
|
||||
security vulnerabilities that cannot be completely prevented or mitigated. AMD
|
||||
assumes no obligation to update or otherwise correct or revise this information.
|
||||
However, AMD reserves the right to revise this information and to make changes
|
||||
from time to time to the content hereof without obligation of AMD to notify any
|
||||
person of such revisions or changes.
|
||||
|
||||
THIS INFORMATION IS PROVIDED “AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES
|
||||
WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
|
||||
INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD
|
||||
SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT,
|
||||
MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
|
||||
LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER
|
||||
CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN,
|
||||
EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
|
||||
|
||||
AMD, the AMD Arrow logo, ROCm, and combinations thereof are trademarks of
|
||||
Advanced Micro Devices, Inc. Other product names used in this publication are
|
||||
for identification purposes only and may be trademarks of their respective
|
||||
companies.
|
||||
1
docs/sphinx/requirements.in
Normal file
@@ -0,0 +1 @@
|
||||
git+https://github.com/RadeonOpenCompute/rocm-docs-core.git
|
||||
275
docs/sphinx/requirements.txt
Normal file
@@ -0,0 +1,275 @@
|
||||
#
|
||||
# This file is autogenerated by pip-compile with Python 3.8
|
||||
# by the following command:
|
||||
#
|
||||
# pip-compile requirements.in
|
||||
#
|
||||
accessible-pygments==0.0.3
|
||||
# via pydata-sphinx-theme
|
||||
alabaster==0.7.13
|
||||
# via sphinx
|
||||
asttokens==2.2.1
|
||||
# via stack-data
|
||||
attrs==22.2.0
|
||||
# via
|
||||
# jsonschema
|
||||
# jupyter-cache
|
||||
babel==2.11.0
|
||||
# via
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
backcall==0.2.0
|
||||
# via ipython
|
||||
beautifulsoup4==4.11.2
|
||||
# via pydata-sphinx-theme
|
||||
breathe==4.34.0
|
||||
# via rocm-docs-core
|
||||
certifi==2022.12.7
|
||||
# via requests
|
||||
cffi==1.15.1
|
||||
# via pynacl
|
||||
charset-normalizer==2.1.1
|
||||
# via requests
|
||||
click==8.1.3
|
||||
# via
|
||||
# jupyter-cache
|
||||
# sphinx-external-toc
|
||||
comm==0.1.2
|
||||
# via ipykernel
|
||||
debugpy==1.6.6
|
||||
# via ipykernel
|
||||
decorator==5.1.1
|
||||
# via ipython
|
||||
deprecated==1.2.13
|
||||
# via pygithub
|
||||
docutils==0.16
|
||||
# via
|
||||
# breathe
|
||||
# myst-parser
|
||||
# pydata-sphinx-theme
|
||||
# rocm-docs-core
|
||||
# sphinx
|
||||
executing==1.2.0
|
||||
# via stack-data
|
||||
fastjsonschema==2.16.2
|
||||
# via nbformat
|
||||
gitdb==4.0.10
|
||||
# via gitpython
|
||||
gitpython==3.1.30
|
||||
# via rocm-docs-core
|
||||
greenlet==2.0.2
|
||||
# via sqlalchemy
|
||||
idna==3.4
|
||||
# via requests
|
||||
imagesize==1.4.1
|
||||
# via sphinx
|
||||
importlib-metadata==6.0.0
|
||||
# via
|
||||
# jupyter-cache
|
||||
# jupyter-client
|
||||
# myst-nb
|
||||
importlib-resources==5.10.2
|
||||
# via
|
||||
# jsonschema
|
||||
# rocm-docs-core
|
||||
ipykernel==6.21.2
|
||||
# via myst-nb
|
||||
ipython==8.10.0
|
||||
# via
|
||||
# ipykernel
|
||||
# myst-nb
|
||||
jedi==0.18.2
|
||||
# via ipython
|
||||
jinja2==3.1.2
|
||||
# via
|
||||
# myst-parser
|
||||
# sphinx
|
||||
jsonschema==4.17.3
|
||||
# via nbformat
|
||||
jupyter-cache==0.5.0
|
||||
# via myst-nb
|
||||
jupyter-client==8.0.3
|
||||
# via
|
||||
# ipykernel
|
||||
# nbclient
|
||||
jupyter-core==5.2.0
|
||||
# via
|
||||
# ipykernel
|
||||
# jupyter-client
|
||||
# nbformat
|
||||
linkify-it-py==1.0.3
|
||||
# via myst-parser
|
||||
markdown-it-py==2.2.0
|
||||
# via
|
||||
# mdit-py-plugins
|
||||
# myst-parser
|
||||
markupsafe==2.1.2
|
||||
# via jinja2
|
||||
matplotlib-inline==0.1.6
|
||||
# via
|
||||
# ipykernel
|
||||
# ipython
|
||||
mdit-py-plugins==0.3.4
|
||||
# via myst-parser
|
||||
mdurl==0.1.2
|
||||
# via markdown-it-py
|
||||
myst-nb==0.17.1
|
||||
# via rocm-docs-core
|
||||
myst-parser[linkify]==0.18.1
|
||||
# via
|
||||
# myst-nb
|
||||
# rocm-docs-core
|
||||
nbclient==0.5.13
|
||||
# via
|
||||
# jupyter-cache
|
||||
# myst-nb
|
||||
nbformat==5.7.3
|
||||
# via
|
||||
# jupyter-cache
|
||||
# myst-nb
|
||||
# nbclient
|
||||
nest-asyncio==1.5.6
|
||||
# via
|
||||
# ipykernel
|
||||
# nbclient
|
||||
packaging==23.0
|
||||
# via
|
||||
# ipykernel
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
parso==0.8.3
|
||||
# via jedi
|
||||
pexpect==4.8.0
|
||||
# via ipython
|
||||
pickleshare==0.7.5
|
||||
# via ipython
|
||||
pkgutil-resolve-name==1.3.10
|
||||
# via jsonschema
|
||||
platformdirs==3.0.0
|
||||
# via jupyter-core
|
||||
prompt-toolkit==3.0.37
|
||||
# via ipython
|
||||
psutil==5.9.4
|
||||
# via ipykernel
|
||||
ptyprocess==0.7.0
|
||||
# via pexpect
|
||||
pure-eval==0.2.2
|
||||
# via stack-data
|
||||
pycparser==2.21
|
||||
# via cffi
|
||||
pydata-sphinx-theme==0.13.0rc5
|
||||
# via sphinx-book-theme
|
||||
pygithub==1.57
|
||||
# via rocm-docs-core
|
||||
pygments==2.14.0
|
||||
# via
|
||||
# accessible-pygments
|
||||
# ipython
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
pyjwt==2.6.0
|
||||
# via pygithub
|
||||
pynacl==1.5.0
|
||||
# via pygithub
|
||||
pyrsistent==0.19.3
|
||||
# via jsonschema
|
||||
python-dateutil==2.8.2
|
||||
# via jupyter-client
|
||||
pytz==2022.7.1
|
||||
# via babel
|
||||
pyyaml==6.0
|
||||
# via
|
||||
# jupyter-cache
|
||||
# myst-nb
|
||||
# myst-parser
|
||||
# sphinx-external-toc
|
||||
pyzmq==25.0.0
|
||||
# via
|
||||
# ipykernel
|
||||
# jupyter-client
|
||||
requests==2.28.1
|
||||
# via
|
||||
# pygithub
|
||||
# sphinx
|
||||
rocm-docs-core @ git+https://github.com/RadeonOpenCompute/rocm-docs-core.git
|
||||
# via -r requirements.in
|
||||
six==1.16.0
|
||||
# via
|
||||
# asttokens
|
||||
# python-dateutil
|
||||
smmap==5.0.0
|
||||
# via gitdb
|
||||
snowballstemmer==2.2.0
|
||||
# via sphinx
|
||||
soupsieve==2.4
|
||||
# via beautifulsoup4
|
||||
sphinx==4.3.1
|
||||
# via
|
||||
# breathe
|
||||
# myst-nb
|
||||
# myst-parser
|
||||
# pydata-sphinx-theme
|
||||
# rocm-docs-core
|
||||
# sphinx-book-theme
|
||||
# sphinx-copybutton
|
||||
# sphinx-design
|
||||
# sphinx-external-toc
|
||||
sphinx-book-theme==1.0.0rc2
|
||||
# via rocm-docs-core
|
||||
sphinx-copybutton==0.5.1
|
||||
# via rocm-docs-core
|
||||
sphinx-design==0.3.0
|
||||
# via rocm-docs-core
|
||||
sphinx-external-toc==0.3.1
|
||||
# via rocm-docs-core
|
||||
sphinxcontrib-applehelp==1.0.4
|
||||
# via sphinx
|
||||
sphinxcontrib-devhelp==1.0.2
|
||||
# via sphinx
|
||||
sphinxcontrib-htmlhelp==2.0.1
|
||||
# via sphinx
|
||||
sphinxcontrib-jsmath==1.0.1
|
||||
# via sphinx
|
||||
sphinxcontrib-qthelp==1.0.3
|
||||
# via sphinx
|
||||
sphinxcontrib-serializinghtml==1.1.5
|
||||
# via sphinx
|
||||
sqlalchemy==1.4.46
|
||||
# via jupyter-cache
|
||||
stack-data==0.6.2
|
||||
# via ipython
|
||||
tabulate==0.9.0
|
||||
# via jupyter-cache
|
||||
tornado==6.2
|
||||
# via
|
||||
# ipykernel
|
||||
# jupyter-client
|
||||
traitlets==5.9.0
|
||||
# via
|
||||
# comm
|
||||
# ipykernel
|
||||
# ipython
|
||||
# jupyter-client
|
||||
# jupyter-core
|
||||
# matplotlib-inline
|
||||
# nbclient
|
||||
# nbformat
|
||||
typing-extensions==4.5.0
|
||||
# via
|
||||
# myst-nb
|
||||
# myst-parser
|
||||
uc-micro-py==1.0.1
|
||||
# via linkify-it-py
|
||||
urllib3==1.26.13
|
||||
# via requests
|
||||
wcwidth==0.2.6
|
||||
# via prompt-toolkit
|
||||
wrapt==1.14.1
|
||||
# via deprecated
|
||||
zipp==3.11.0
|
||||
# via
|
||||
# importlib-metadata
|
||||
# importlib-resources
|
||||
|
||||
# The following packages are considered to be unsafe in a requirements file:
|
||||
# setuptools
|
||||
32
docs/sphinx/rocm_stack.md
Normal file
@@ -0,0 +1,32 @@
|
||||
# The ROCm Stack
|
||||
|
||||
ROCm is the GPU computing stack for AMD GPUs. ROCm is comprised of the
|
||||
components described in this page. Kernel mo
|
||||
|
||||
## Kernel Module (Linux)
|
||||
|
||||
## HIP Runtime
|
||||
|
||||
## Compiler
|
||||
|
||||
### hipcc
|
||||
|
||||
### AMD Clang
|
||||
|
||||
## GPU Libraries
|
||||
|
||||
### Math Libraries
|
||||
|
||||
The Math libraries are grouped into libraries starting with a roc-prefix and
|
||||
hip-prefix. Libraries starting with a hip-prefix provide a support for AMD GPUs
|
||||
and NVIDIA GPUs. Libraries beginning the roc-prefix support AMD GPUs only.
|
||||
|
||||
## #Compute Primitives
|
||||
|
||||
## Communication Libraries
|
||||
|
||||
## AI and ML (Linux only)
|
||||
|
||||
## Management Tools (Linux)
|
||||
|
||||
## Deployment Tools (Linux)
|
||||
165
docs/sphinx/understand/cmake_packages.rst
Normal file
@@ -0,0 +1,165 @@
|
||||
===========================
|
||||
Using CMake
|
||||
===========================
|
||||
|
||||
Most components in ROCm support CMake 3.5 or higher out-of-the-box and do not require any special Find modules. A Find module is often used by
|
||||
downstream to find the files by guessing locations of files with platform-specific hints. Typically, the Find module is required when the
|
||||
upstream is not built with CMake or the package configuration files are not available.
|
||||
|
||||
ROCm provides the respective *config-file* packages, and this enables ``find_package`` to be used directly. ROCm does not require any Find
|
||||
module as the *config-file* packages are shipped with the upstream projects.
|
||||
|
||||
Finding Dependencies
|
||||
--------------------
|
||||
|
||||
When dependencies are not found in standard locations such as */usr* or */usr/local*, then the ``CMAKE_PREFIX_PATH`` variable can be set to the
|
||||
installation prefixes. This can be set to multiple locations with a semicolon separating the entries.
|
||||
|
||||
There are two ways to set this variable:
|
||||
|
||||
- Pass the flag when configuring with ``-DCMAKE_PREFIX_PATH=....`` This approach is preferred when users install the components in custom
|
||||
locations.
|
||||
|
||||
- Append the variable in the CMakeLists.txt file. This is useful if the dependencies are found in a common location. For example, when
|
||||
the binaries provided on `repo.radeon.com <http://repo.radeon.com>`_ are installed to */opt/rocm*, you can add the following line to a CMakeLists.txt file
|
||||
|
||||
::
|
||||
|
||||
list (APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm)
|
||||
|
||||
|
||||
|
||||
Using HIP in CMake
|
||||
--------------------
|
||||
|
||||
There are two ways to use HIP in CMake:
|
||||
|
||||
- Use the HIP API without compiling the GPU device code. As there is no GPU code, any C or C++ compiler can be used.
|
||||
The ``find_package(hip)`` provides the ``hip::host`` target to use HIP in this context
|
||||
|
||||
::
|
||||
|
||||
# Search for rocm in common locations
|
||||
list(APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm)
|
||||
# Find hip
|
||||
find_package(hip)
|
||||
# Create the library
|
||||
add_library(myLib ...)
|
||||
# Link with HIP
|
||||
target_link_libraries(myLib hip::host)
|
||||
|
||||
.. note::
|
||||
The ``hip::host`` target provides all the usage requirements needed to use HIP without compiling GPU device code.
|
||||
|
||||
- Use HIP API and compile GPU device code. This requires using a
|
||||
device compiler. The compiler for CMake can be set using either the
|
||||
``CMAKE_C_COMPILER`` and ``CMAKE_CXX_COMPILER`` variable or using the ``CC`` and
|
||||
``CXX`` environment variables. This can be set when configuring CMake or
|
||||
put into a CMake toolchain file. The device compiler must be set to a
|
||||
compiler that supports AMD GPU targets, which is usually Clang.
|
||||
|
||||
The ``find_package(hip)`` provides the ``hip::device`` target to add all the
|
||||
flags for device compilation
|
||||
|
||||
::
|
||||
|
||||
# Search for rocm in common locations
|
||||
list(APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm)
|
||||
# Find hip
|
||||
find_package(hip)
|
||||
# Create library
|
||||
add_library(myLib ...)
|
||||
# Link with HIP
|
||||
target_link_libraries(myLib hip::device)
|
||||
|
||||
This project can then be configured with::
|
||||
|
||||
cmake -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ ..
|
||||
|
||||
Which uses the device compiler provided from the binary packages from
|
||||
`repo.radeon.com <http://repo.radeon.com>`_.
|
||||
|
||||
.. note::
|
||||
Compiling for the GPU device requires at least C++11. This can be
|
||||
enabled by setting ``CMAKE_CXX_STANDARD`` or setting the correct compiler flags
|
||||
in the CMake toolchain.
|
||||
|
||||
The GPU device code can be built for different GPU architectures by
|
||||
setting the ``GPU_TARGETS`` variable. By default, this is set to all the
|
||||
currently supported architectures for AMD ROCm. It can be set by passing
|
||||
the flag during configuration with ``-DGPU_TARGETS=gfx900``. It can also be
|
||||
set in the CMakeLists.txt as a cached variable before calling
|
||||
``find_package(hip)``::
|
||||
|
||||
# Set the GPU to compile for
|
||||
set(GPU_TARGETS "gfx900" CACHE STRING "GPU targets to compile for")
|
||||
# Search for rocm in common locations
|
||||
list(APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm)
|
||||
# Find hip
|
||||
find_package(hip)
|
||||
|
||||
Using ROCm Libraries
|
||||
---------------------------
|
||||
|
||||
Libraries such as rocBLAS, MIOpen, and others support CMake users as
|
||||
well.
|
||||
|
||||
As illustrated in the example below, to use MIOpen from CMake, you can
|
||||
call ``find_package(miopen)``, which provides the ``MIOpen`` CMake target. This
|
||||
can be linked with ``target_link_libraries``::
|
||||
|
||||
# Search for rocm in common locations
|
||||
list(APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm)
|
||||
# Find miopen
|
||||
find_package(miopen)
|
||||
# Create library
|
||||
add_library(myLib ...)
|
||||
# Link with miopen
|
||||
target_link_libraries(myLib MIOpen)
|
||||
|
||||
.. note::
|
||||
Most libraries are designed as host-only API, so using a GPU device
|
||||
compiler is not necessary for downstream projects unless it uses the GPU
|
||||
device code.
|
||||
|
||||
|
||||
ROCm CMake Packages
|
||||
--------------------
|
||||
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
| Component | Package | Targets |
|
||||
+===========+==========+=======================================================+
|
||||
| HIP | hip | hip::host, hip::device |
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
| rocPRIM | rocprim | roc::rocprim |
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
| rocThrust | rocthrust| roc::rocthrust |
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
| hipCUB | hipcub | hip::hipcub |
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
| rocRAND | rocrand | roc::rocrand |
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
| rocBLAS | rocblas | roc::rocblas |
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
| rocSOLVER | rocsolver| roc::rocsolver |
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
| hipBLAS | hipblas | roc::hipblas |
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
| rocFFT | rocfft | roc::rocfft |
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
| hipFFT | hipfft | hip::hipfft |
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
| rocSPARSE | rocsparse| roc::rocsparse |
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
| hipSPARSE | hipsparse|roc::hipsparse |
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
| rocALUTION|rocalution| roc::rocalution |
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
| RCCL | rccl | rccl |
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
| MIOpen | miopen | MIOpen |
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
| MIGraphX | migraphx | migraphx::migraphx, migraphx::migraphx_c, |
|
||||
| | | migraphx::migraphx_cpu, migraphx::migraphx_gpu, |
|
||||
| | | migraphx::migraphx_onnx, migraphx::migraphx_tf |
|
||||
+-----------+----------+-------------------------------------------------------+
|
||||
4
docs/sphinx/understand/compiler_disambiguation.md
Normal file
@@ -0,0 +1,4 @@
|
||||
# ROCm Compilers Disambiguation
|
||||
|
||||
Pull from
|
||||
<https://docs.amd.com/bundle/ROCm-Compiler-Reference-Guide-v5.4.1/page/Glossary_of_Compiler_Terms.html>
|
||||
BIN
docs/sphinx/understand/deep_learning/Deep Learning Image 1.png
Normal file
|
After Width: | Height: | Size: 58 KiB |
|
After Width: | Height: | Size: 46 KiB |