mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-10 15:18:11 -05:00
Compare commits
66 Commits
roc-5.5.x
...
docs/5.4.1
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
2edcc3a6c6 | ||
|
|
ae2409fa47 | ||
|
|
574d62b077 | ||
|
|
ff9c523c3e | ||
|
|
5712fd2b98 | ||
|
|
f0a9e81a9a | ||
|
|
829d91892b | ||
|
|
1e5228b65f | ||
|
|
5ae4c333c5 | ||
|
|
15292ddebe | ||
|
|
89986d332d | ||
|
|
de6fc1634a | ||
|
|
52986c3635 | ||
|
|
b86717e454 | ||
|
|
7dbd277203 | ||
|
|
31ee8e712c | ||
|
|
55eda666d5 | ||
|
|
01e24da121 | ||
|
|
f68c47d748 | ||
|
|
2c0a351bbd | ||
|
|
b6509809d3 | ||
|
|
a29205cc5c | ||
|
|
16c4d22099 | ||
|
|
ed3335c3a5 | ||
|
|
30f27c4644 | ||
|
|
f4be54f896 | ||
|
|
9c04aef6a5 | ||
|
|
c7d4e75e95 | ||
|
|
aabbea88f2 | ||
|
|
7747e130b9 | ||
|
|
a471e8debe | ||
|
|
8c86526f98 | ||
|
|
a42fae5140 | ||
|
|
bcb3dd3b4a | ||
|
|
8784fe3fba | ||
|
|
6e79d204b8 | ||
|
|
7076bc18ca | ||
|
|
519df7a51f | ||
|
|
90c697b6d3 | ||
|
|
125cc37981 | ||
|
|
5752b5986c | ||
|
|
2829c088c2 | ||
|
|
3b9fb62600 | ||
|
|
b7222caed2 | ||
|
|
c285dd729f | ||
|
|
0c93636d23 | ||
|
|
3fa5f1fddc | ||
|
|
17b029b885 | ||
|
|
460f46c3be | ||
|
|
6feca81dd0 | ||
|
|
ec8496041a | ||
|
|
c7350c08ab | ||
|
|
c1809766e6 | ||
|
|
61df1ec8c6 | ||
|
|
983987aab5 | ||
|
|
914b62e219 | ||
|
|
faac45772c | ||
|
|
d206494272 | ||
|
|
26c73a3986 | ||
|
|
dc74008ac6 | ||
|
|
108287dcd7 | ||
|
|
38440915ef | ||
|
|
d9c434881a | ||
|
|
4c795d45f6 | ||
|
|
ef0a88ea0e | ||
|
|
34578f0193 |
2
.github/CODEOWNERS
vendored
2
.github/CODEOWNERS
vendored
@@ -1 +1 @@
|
||||
* @saadrahim @Rmalavally @amd-aakash @zhang2amd @jlgreathouse @samjwu
|
||||
* @saadrahim @Rmalavally @amd-aakash @zhang2amd @jlgreathouse @samjwu @MathiasMagnus
|
||||
|
||||
6
.github/workflows/linting.yml
vendored
6
.github/workflows/linting.yml
vendored
@@ -32,10 +32,10 @@ jobs:
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v3
|
||||
- name: Use markdownlint
|
||||
uses: actionshub/markdownlint@v3.1.3
|
||||
- name: Use markdownlint-cli2
|
||||
uses: DavidAnson/markdownlint-cli2-action@v10.0.1
|
||||
with:
|
||||
filesToIgnoreRegex: CHANGELOG.md|(docs\/)?(RELEASE|release).md|tools\/autotag\/templates\/.
|
||||
globs: '**/*.md'
|
||||
|
||||
spelling:
|
||||
name: "Spelling"
|
||||
|
||||
1
.gitignore
vendored
1
.gitignore
vendored
@@ -15,3 +15,4 @@ _readthedocs/
|
||||
# avoid duplicating contributing.md due to conf.py
|
||||
docs/contributing.md
|
||||
docs/release.md
|
||||
docs/CHANGELOG.md
|
||||
|
||||
14
.markdownlint-cli2.yaml
Normal file
14
.markdownlint-cli2.yaml
Normal file
@@ -0,0 +1,14 @@
|
||||
config:
|
||||
default: true
|
||||
MD013: false
|
||||
MD026:
|
||||
punctuation: '.,;:!'
|
||||
MD029:
|
||||
style: ordered
|
||||
MD033: false
|
||||
MD034: false
|
||||
MD041: false
|
||||
ignores:
|
||||
- CHANGELOG.md
|
||||
- "{,docs/}{RELEASE,release}.md"
|
||||
- tools/autotag/templates/**/*.md
|
||||
@@ -3,12 +3,19 @@
|
||||
|
||||
version: 2
|
||||
|
||||
build:
|
||||
os: ubuntu-22.04
|
||||
tools:
|
||||
python: "3.10"
|
||||
apt_packages:
|
||||
- "doxygen"
|
||||
- "graphviz" # For dot graphs in doxygen
|
||||
|
||||
python:
|
||||
install:
|
||||
- requirements: docs/sphinx/requirements.txt
|
||||
|
||||
sphinx:
|
||||
configuration: docs/conf.py
|
||||
|
||||
formats: [htmlzip, pdf, epub]
|
||||
|
||||
python:
|
||||
version: "3.8"
|
||||
install:
|
||||
- requirements: docs/sphinx/requirements.txt
|
||||
formats: []
|
||||
|
||||
892
CHANGELOG.md
892
CHANGELOG.md
@@ -8,899 +8,13 @@
|
||||
<!-- markdownlint-disable no-blanks-blockquote -->
|
||||
<!-- markdownlint-disable ul-indent -->
|
||||
<!-- markdownlint-disable no-trailing-spaces -->
|
||||
|
||||
<!-- spellcheck-disable -->
|
||||
|
||||
The release notes for the ROCm platform.
|
||||
|
||||
-------------------
|
||||
|
||||
## ROCm 5.5.0
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
### What's New in This Release
|
||||
|
||||
#### HIP Enhancements
|
||||
|
||||
The ROCm v5.5 release consists of the following HIP enhancements:
|
||||
|
||||
##### Enhanced Stack Size Limit
|
||||
|
||||
In this release, the stack size limit is increased from 16k to 131056 bytes (or 128K - 16).
|
||||
Applications requiring to update the stack size can use hipDeviceSetLimit API.
|
||||
|
||||
##### `hipcc` Changes
|
||||
|
||||
The following hipcc changes are implemented in this release:
|
||||
|
||||
- `hipcc` will not implicitly link to `libpthread` and `librt`, as they are no longer a link time dependence for HIP programs. Applications that depend on these libraries must explicitly link to them.
|
||||
- `-use-staticlib` and `-use-sharedlib` options are deprecated.
|
||||
|
||||
##### Future Changes
|
||||
|
||||
- Separation of `hipcc` binaries (Perl scripts) from HIP to `hipcc` project. Users will access separate `hipcc` package for installing `hipcc` binaries in future ROCm releases.
|
||||
- In a future ROCm release, the following samples will be removed from the `hip-tests` project.
|
||||
- `hipBusbandWidth` at <https://github.com/ROCm-Developer-Tools/hip-tests/tree/develop/samples/1_Utils/shipBusBandwidth>
|
||||
- `hipCommander` at <https://github.com/ROCm-Developer-Tools/hip-tests/tree/develop/samples/1_Utils/hipCommander>
|
||||
|
||||
Note that the samples will continue to be available in previous release branches.
|
||||
|
||||
##### New HIP APIs in This Release
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> This is a pre-official version (beta) release of the new APIs and may contain unresolved issues.
|
||||
|
||||
###### Memory Management HIP APIs
|
||||
|
||||
The new memory management HIP API is as follows:
|
||||
|
||||
- Sets information on the specified pointer [BETA].
|
||||
|
||||
```h
|
||||
hipError_t hipPointerSetAttribute(const void* value, hipPointer_attribute attribute, hipDeviceptr_t ptr);
|
||||
```
|
||||
|
||||
###### Module Management HIP APIs
|
||||
|
||||
The new module management HIP APIs are as follows:
|
||||
|
||||
- Launches kernel $f$ with launch parameters and shared memory on stream with arguments passed to `kernelParams`, where thread blocks can cooperate and synchronize as they execute.
|
||||
|
||||
```h
|
||||
hipError_t hipModuleLaunchCooperativeKernel(hipFunction_t f, unsigned int gridDimX, unsigned int gridDimY, unsigned int gridDimZ, unsigned int blockDimX, unsigned int blockDimY, unsigned int blockDimZ, unsigned int sharedMemBytes, hipStream_t stream, void** kernelParams);
|
||||
```
|
||||
|
||||
- Launches kernels on multiple devices where thread blocks can cooperate and synchronize as they execute.
|
||||
|
||||
```h
|
||||
hipError_t hipModuleLaunchCooperativeKernelMultiDevice(hipFunctionLaunchParams* launchParamsList, unsigned int numDevices, unsigned int flags);
|
||||
```
|
||||
|
||||
###### HIP Graph Management APIs
|
||||
|
||||
The new HIP Graph Management APIs are as follows:
|
||||
|
||||
- Creates a memory allocation node and adds it to a graph [BETA]
|
||||
|
||||
```h
|
||||
hipError_t hipGraphAddMemAllocNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, hipMemAllocNodeParams* pNodeParams);
|
||||
```
|
||||
|
||||
- Return parameters for memory allocation node [BETA]
|
||||
|
||||
```h
|
||||
hipError_t hipGraphMemAllocNodeGetParams(hipGraphNode_t node, hipMemAllocNodeParams* pNodeParams);
|
||||
```
|
||||
|
||||
- Creates a memory free node and adds it to a graph [BETA]
|
||||
|
||||
```h
|
||||
hipError_t hipGraphAddMemFreeNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, void* dev_ptr);
|
||||
```
|
||||
|
||||
- Returns parameters for memory free node [BETA].
|
||||
|
||||
```h
|
||||
hipError_t hipGraphMemFreeNodeGetParams(hipGraphNode_t node, void* dev_ptr);
|
||||
```
|
||||
|
||||
- Write a DOT file describing graph structure [BETA].
|
||||
|
||||
```h
|
||||
hipError_t hipGraphDebugDotPrint(hipGraph_t graph, const char* path, unsigned int flags);
|
||||
```
|
||||
|
||||
- Copies attributes from source node to destination node [BETA].
|
||||
|
||||
```h
|
||||
hipError_t hipGraphKernelNodeCopyAttributes(hipGraphNode_t hSrc, hipGraphNode_t hDst);
|
||||
```
|
||||
|
||||
- Enables or disables the specified node in the given graphExec [BETA]
|
||||
|
||||
```h
|
||||
hipError_t hipGraphNodeSetEnabled(hipGraphExec_t hGraphExec, hipGraphNode_t hNode, unsigned int isEnabled);
|
||||
```
|
||||
|
||||
- Query whether a node in the given graphExec is enabled [BETA]
|
||||
|
||||
```h
|
||||
hipError_t hipGraphNodeGetEnabled(hipGraphExec_t hGraphExec, hipGraphNode_t hNode, unsigned int* isEnabled);
|
||||
```
|
||||
|
||||
##### OpenMP Enhancements
|
||||
This release consists of the following OpenMP enhancements:
|
||||
|
||||
- Additional support for OMPT functions `get_device_time` and `get_record_type`.
|
||||
- Add support for min/max fast fp atomics on AMD GPUs.
|
||||
- Fix the use of the abs function in C device regions.
|
||||
|
||||
### Deprecations and Warnings
|
||||
|
||||
#### HIP Deprecation
|
||||
|
||||
The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, compiled binaries will be available as `hipcc.bin` and `hipconfig.bin` as replacements for the Perl scripts.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> There will be a transition period where the Perl scripts and compiled binaries are available before the scripts are removed. There will be no functional difference between the Perl scripts and their compiled binary counterpart. No user action is required. Once these are available, users can optionally switch to `hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from `hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
|
||||
##### Linux Filesystem Hierarchy Standard for ROCm
|
||||
|
||||
ROCm packages have adopted the Linux foundation filesystem hierarchy standard in this release to ensure ROCm components follow open source conventions for Linux-based distributions. While moving to a new filesystem hierarchy, ROCm ensures backward compatibility with its 5.1 version or older filesystem hierarchy. See below for a detailed explanation of the new filesystem hierarchy and backward compatibility.
|
||||
|
||||
##### New Filesystem Hierarchy
|
||||
|
||||
The following is the new filesystem hierarchy:4
|
||||
|
||||
```text
|
||||
/opt/rocm-<ver>
|
||||
| --bin
|
||||
| --All externally exposed Binaries
|
||||
| --libexec
|
||||
| --<component>
|
||||
| -- Component specific private non-ISA executables (architecture independent)
|
||||
| --include
|
||||
| -- <component>
|
||||
| --<header files>
|
||||
| --lib
|
||||
| --lib<soname>.so -> lib<soname>.so.major -> lib<soname>.so.major.minor.patch
|
||||
(public libraries linked with application)
|
||||
| --<component> (component specific private library, executable data)
|
||||
| --<cmake>
|
||||
| --components
|
||||
| --<component>.config.cmake
|
||||
| --share
|
||||
| --html/<component>/*.html
|
||||
| --info/<component>/*.[pdf, md, txt]
|
||||
| --man
|
||||
| --doc
|
||||
| --<component>
|
||||
| --<licenses>
|
||||
| --<component>
|
||||
| --<misc files> (arch independent non-executable)
|
||||
| --samples
|
||||
|
||||
```
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major release.
|
||||
|
||||
For more information, refer to <https://refspecs.linuxfoundation.org/fhs.shtml>.
|
||||
|
||||
##### Backward Compatibility with Older Filesystems
|
||||
|
||||
ROCm has moved header files and libraries to its new location as indicated in the above structure and included symbolic-link and wrapper header files in its old location for backward compatibility.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> ROCm will continue supporting backward compatibility until the next major release.
|
||||
|
||||
##### Wrapper header files
|
||||
|
||||
Wrapper header files are placed in the old location (`/opt/rocm-xxx/<component>/include`) with a warning message to include files from the new location (`/opt/rocm-xxx/include`) as shown in the example below:
|
||||
|
||||
```h
|
||||
// Code snippet from hip_runtime.h
|
||||
#pragma message “This file is deprecated. Use file from include path /opt/rocm-ver/include/ and prefix with hip”.
|
||||
#include "hip/hip_runtime.h"
|
||||
```
|
||||
|
||||
The wrapper header files’ backward compatibility deprecation is as follows:
|
||||
|
||||
- `#pragma` message announcing deprecation -- ROCm v5.2 release
|
||||
- `#pragma` message changed to `#warning` -- Future release
|
||||
- `#warning` changed to `#error` -- Future release
|
||||
- Backward compatibility wrappers removed -- Future release
|
||||
|
||||
##### Library files
|
||||
|
||||
Library files are available in the `/opt/rocm-xxx/lib` folder. For backward compatibility, the old library location (`/opt/rocm-xxx/<component>/lib`) has a soft link to the library at the new location.
|
||||
|
||||
Example:
|
||||
|
||||
```log
|
||||
$ ls -l /opt/rocm/hip/lib/
|
||||
total 4
|
||||
drwxr-xr-x 4 root root 4096 May 12 10:45 cmake
|
||||
lrwxrwxrwx 1 root root 24 May 10 23:32 libamdhip64.so -> ../../lib/libamdhip64.so
|
||||
```
|
||||
|
||||
##### CMake Config files
|
||||
|
||||
All CMake configuration files are available in the `/opt/rocm-xxx/lib/cmake/<component>` folder.
|
||||
For backward compatibility, the old CMake locations (`/opt/rocm-xxx/<component>/lib/cmake`) consist of a soft link to the new CMake config.
|
||||
|
||||
Example:
|
||||
|
||||
```log
|
||||
$ ls -l /opt/rocm/hip/lib/cmake/hip/
|
||||
total 0
|
||||
lrwxrwxrwx 1 root root 42 May 10 23:32 hip-config.cmake -> ../../../../lib/cmake/hip/hip-config.cmake
|
||||
```
|
||||
|
||||
#### ROCm Support For Code Object V3 Deprecated
|
||||
|
||||
Support for Code Object v3 is deprecated and will be removed in a future release.
|
||||
|
||||
#### Comgr V3.0 Changes
|
||||
|
||||
The following APIs and macros have been marked as deprecated. These are expected to be removed in a future ROCm release and coincides with the release of Comgr v3.0.
|
||||
|
||||
##### API Changes
|
||||
|
||||
- `amd_comgr_action_info_set_options()`
|
||||
- `amd_comgr_action_info_get_options()`
|
||||
|
||||
##### Actions and Data Types
|
||||
|
||||
- `AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES`
|
||||
- `AMD_COMGR_ACTION_COMPILE_SOURCE_TO_FATBIN`
|
||||
|
||||
For replacements, see the `AMD_COMGR_ACTION_INFO_GET`/`SET_OPTION_LIST APIs`, and the `AMD_COMGR_ACTION_COMPILE_SOURCE_(WITH_DEVICE_LIBS)_TO_BC` macros.
|
||||
|
||||
#### Deprecated Environment Variables
|
||||
|
||||
The following environment variables are removed in this ROCm release:
|
||||
|
||||
- `GPU_MAX_COMMAND_QUEUES`
|
||||
- `GPU_MAX_WORKGROUP_SIZE_2D_X`
|
||||
- `GPU_MAX_WORKGROUP_SIZE_2D_Y`
|
||||
- `GPU_MAX_WORKGROUP_SIZE_3D_X`
|
||||
- `GPU_MAX_WORKGROUP_SIZE_3D_Y`
|
||||
- `GPU_MAX_WORKGROUP_SIZE_3D_Z`
|
||||
- `GPU_BLIT_ENGINE_TYPE`
|
||||
- `GPU_USE_SYNC_OBJECTS`
|
||||
- `AMD_OCL_SC_LIB`
|
||||
- `AMD_OCL_ENABLE_MESSAGE_BOX`
|
||||
- `GPU_FORCE_64BIT_PTR`
|
||||
- `GPU_FORCE_OCL20_32BIT`
|
||||
- `GPU_RAW_TIMESTAMP`
|
||||
- `GPU_SELECT_COMPUTE_RINGS_ID`
|
||||
- `GPU_USE_SINGLE_SCRATCH`
|
||||
- `GPU_ENABLE_LARGE_ALLOCATION`
|
||||
- `HSA_LOCAL_MEMORY_ENABLE`
|
||||
- `HSA_ENABLE_COARSE_GRAIN_SVM`
|
||||
- `GPU_IFH_MODE`
|
||||
- `OCL_SYSMEM_REQUIREMENT`
|
||||
- `OCL_CODE_CACHE_ENABLE`
|
||||
- `OCL_CODE_CACHE_RESET`
|
||||
|
||||
### Known Issues In This Release
|
||||
|
||||
The following are the known issues in this release.
|
||||
|
||||
#### `DISTRIBUTED`/`TEST_DISTRIBUTED_SPAWN` Fails
|
||||
|
||||
When user applications call `ncclCommAbort` to destruct communicators and then create new
|
||||
communicators repeatedly, subsequent communicators may fail to initialize.
|
||||
|
||||
This issue is under investigation and will be resolved in a future release.
|
||||
|
||||
#### Failures In HIP Directed Tests
|
||||
|
||||
Multiple HIP directed tests fail.
|
||||
|
||||
### Library Changes in ROCM 5.5.0
|
||||
|
||||
| Library | Version |
|
||||
|---------|---------|
|
||||
| hipBLAS | 0.53.0 ⇒ [0.54.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.5.0) |
|
||||
| hipCUB | 2.13.0 ⇒ [2.13.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.5.0) |
|
||||
| hipFFT | 1.0.10 ⇒ [1.0.11](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.5.0) |
|
||||
| hipSOLVER | 1.6.0 ⇒ [1.7.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.5.0) |
|
||||
| hipSPARSE | 2.3.3 ⇒ [2.3.5](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.5.0) |
|
||||
| rccl | 2.13.4 ⇒ [2.15.5](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.5.0) |
|
||||
| rocALUTION | 2.1.3 ⇒ [2.1.8](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.5.0) |
|
||||
| rocBLAS | 2.46.0 ⇒ [2.47.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.5.0) |
|
||||
| rocFFT | 1.0.21 ⇒ [1.0.22](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.5.0) |
|
||||
| rocPRIM | 2.12.0 ⇒ [2.13.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.5.0) |
|
||||
| rocRAND | 2.10.16 ⇒ [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.5.0) |
|
||||
| rocSOLVER | 3.20.0 ⇒ [3.21.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.5.0) |
|
||||
| rocSPARSE | 2.4.0 ⇒ [2.5.1](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.5.0) |
|
||||
| rocThrust | [2.17.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.5.0) |
|
||||
| rocWMMA | 0.9 ⇒ [1.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.5.0) |
|
||||
| Tensile | 4.35.0 ⇒ [4.36.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.5.0) |
|
||||
|
||||
#### hipBLAS 0.54.0
|
||||
|
||||
hipBLAS 0.54.0 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- added option to opt-in to use __half for hipblasHalf type in the API for c++ users who define HIPBLAS_USE_HIP_HALF
|
||||
- added scripts to plot performance for multiple functions
|
||||
- data driven hipblas-bench and hipblas-test execution via external yaml format data files
|
||||
- client smoke test added for quick validation using command hipblas-test --yaml hipblas_smoke.yaml
|
||||
|
||||
##### Fixed
|
||||
|
||||
- fixed datatype conversion functions to support more rocBLAS/cuBLAS datatypes
|
||||
- fixed geqrf to return successfully when nullptrs are passed in with n == 0 || m == 0
|
||||
- fixed getrs to return successfully when given nullptrs with corresponding size = 0
|
||||
- fixed getrs to give info = -1 when transpose is not an expected type
|
||||
- fixed gels to return successfully when given nullptrs with corresponding size = 0
|
||||
- fixed gels to give info = -1 when transpose is not in ('N', 'T') for real cases or not in ('N', 'C') for complex cases
|
||||
|
||||
##### Changed
|
||||
|
||||
- changed reference code for Windows to OpenBLAS
|
||||
- hipblas client executables all now begin with hipblas- prefix
|
||||
|
||||
#### hipCUB 2.13.1
|
||||
|
||||
hipCUB 2.13.1 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Benchmarks for `BlockShuffle`, `BlockLoad`, and `BlockStore`.
|
||||
|
||||
##### Changed
|
||||
|
||||
- CUB backend references CUB and Thrust version 1.17.2.
|
||||
- Improved benchmark coverage of `BlockScan` by adding `ExclusiveScan`, benchmark coverage of `BlockRadixSort` by adding `SortBlockedToStriped`, and benchmark coverage of `WarpScan` by adding `Broadcast`.
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Windows HIP SDK support
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- `BlockRadixRankMatch` is currently broken under the rocPRIM backend.
|
||||
- `BlockRadixRankMatch` with a warp size that does not exactly divide the block size is broken under the CUB backend.
|
||||
|
||||
#### hipFFT 1.0.11
|
||||
|
||||
hipFFT 1.0.11 for ROCm 5.5.0
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed old version rocm include/lib folders not removed on upgrade.
|
||||
|
||||
#### hipSOLVER 1.7.0
|
||||
|
||||
hipSOLVER 1.7.0 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added functions
|
||||
- gesvdj
|
||||
- hipsolverSgesvdj_bufferSize, hipsolverDgesvdj_bufferSize, hipsolverCgesvdj_bufferSize, hipsolverZgesvdj_bufferSize
|
||||
- hipsolverSgesvdj, hipsolverDgesvdj, hipsolverCgesvdj, hipsolverZgesvdj
|
||||
- gesvdjBatched
|
||||
- hipsolverSgesvdjBatched_bufferSize, hipsolverDgesvdjBatched_bufferSize, hipsolverCgesvdjBatched_bufferSize, hipsolverZgesvdjBatched_bufferSize
|
||||
- hipsolverSgesvdjBatched, hipsolverDgesvdjBatched, hipsolverCgesvdjBatched, hipsolverZgesvdjBatched
|
||||
|
||||
#### hipSPARSE 2.3.5
|
||||
|
||||
hipSPARSE 2.3.5 for ROCm 5.5.0
|
||||
|
||||
##### Improved
|
||||
|
||||
- Fixed an issue, where the rocm folder was not removed on upgrade of meta packages
|
||||
- Fixed a compilation issue with cusparse backend
|
||||
- Added more detailed messages on unit test failures due to missing input data
|
||||
- Improved documentation
|
||||
- Fixed a bug with deprecation messages when using gcc9 (Thanks @Maetveis)
|
||||
|
||||
#### rccl 2.15.5
|
||||
|
||||
RCCL 2.15.5 for ROCm 5.5.0
|
||||
|
||||
##### Changed
|
||||
|
||||
- Compatibility with NCCL 2.15.5
|
||||
- Unit test executable renamed to rccl-UnitTests
|
||||
|
||||
##### Added
|
||||
|
||||
- HW-topology aware binary tree implementation
|
||||
- Experimental support for MSCCL
|
||||
- New unit tests for hipGraph support
|
||||
- NPKit integration
|
||||
|
||||
##### Fixed
|
||||
|
||||
- rocm-smi ID conversion
|
||||
- Support for HIP_VISIBLE_DEVICES for unit tests
|
||||
- Support for p2p transfers to non (HIP) visible devices
|
||||
|
||||
##### Removed
|
||||
|
||||
- Removed TransferBench from tools. Exists in standalone repo: https://github.com/ROCmSoftwarePlatform/TransferBench
|
||||
|
||||
#### rocALUTION 2.1.8
|
||||
|
||||
rocALUTION 2.1.8 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added build support for Navi32
|
||||
|
||||
##### Improved
|
||||
|
||||
- Fixed a typo in MPI backend
|
||||
- Fixed a bug with the backend when HIP support is disabled
|
||||
- Fixed a bug in SAAMG hierarchy building on HIP backend
|
||||
- Improved SAAMG hierarchy build performance on HIP backend
|
||||
|
||||
##### Changed
|
||||
|
||||
- LocalVector::GetIndexValues(ValueType\*) is deprecated, use LocalVector::GetIndexValues(const LocalVector&, LocalVector\*) instead
|
||||
- LocalVector::SetIndexValues(const ValueType\*) is deprecated, use LocalVector::SetIndexValues(const LocalVector&, const LocalVector&) instead
|
||||
- LocalMatrix::RSDirectInterpolation(const LocalVector&, const LocalVector&, LocalMatrix\*, LocalMatrix\*) is deprecated, use LocalMatrix::RSDirectInterpolation(const LocalVector&, const LocalVector&, LocalMatrix\*) instead
|
||||
- LocalMatrix::RSExtPIInterpolation(const LocalVector&, const LocalVector&, bool, float, LocalMatrix\*, LocalMatrix\*) is deprecated, use LocalMatrix::RSExtPIInterpolation(const LocalVector&, const LocalVector&, bool, LocalMatrix\*) instead
|
||||
- LocalMatrix::RugeStueben() is deprecated
|
||||
- LocalMatrix::AMGSmoothedAggregation(ValueType, const LocalVector&, const LocalVector&, LocalMatrix\*, LocalMatrix\*, int) is deprecated, use LocalMatrix::AMGAggregation(ValueType, const LocalVector&, const LocalVector&, LocalMatrix\*, int) instead
|
||||
- LocalMatrix::AMGAggregation(const LocalVector&, LocalMatrix\*, LocalMatrix\*) is deprecated, use LocalMatrix::AMGAggregation(const LocalVector&, LocalMatrix\*) instead
|
||||
|
||||
#### rocBLAS 2.47.0
|
||||
|
||||
rocBLAS 2.47.0 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- added functionality rocblas_geam_ex for matrix-matrix minimum operations
|
||||
- added HIP Graph support as beta feature for rocBLAS Level 1, Level 2, and Level 3(pointer mode host) functions
|
||||
- added beta features API. Exposed using compiler define ROCBLAS_BETA_FEATURES_API
|
||||
- added support for vector initialization in the rocBLAS test framework with negative increments
|
||||
- added windows build documentation for forthcoming support using ROCm HIP SDK
|
||||
- added scripts to plot performance for multiple functions
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- improved performance of Level 2 rocBLAS GEMV for float and double precision. Performance enhanced by 150-200% for certain problem sizes when (m==n) measured on a gfx90a GPU.
|
||||
- improved performance of Level 2 rocBLAS GER for float, double and complex float precisions. Performance enhanced by 5-7% for certain problem sizes measured on a gfx90a GPU.
|
||||
- improved performance of Level 2 rocBLAS SYMV for float and double precisions. Performance enhanced by 120-150% for certain problem sizes measured on both gfx908 and gfx90a GPUs.
|
||||
|
||||
##### Fixed
|
||||
|
||||
- fixed setting of executable mode on client script rocblas_gentest.py to avoid potential permission errors with clients rocblas-test and rocblas-bench
|
||||
- fixed deprecated API compatibility with Visual Studio compiler
|
||||
- fixed test framework memory exception handling for Level 2 functions when the host memory allocation exceeds the available memory
|
||||
|
||||
##### Changed
|
||||
|
||||
- install.sh internally runs rmake.py (also used on windows) and rmake.py may be used directly by developers on linux (use --help)
|
||||
- rocblas client executables all now begin with rocblas- prefix
|
||||
|
||||
##### Removed
|
||||
|
||||
- install.sh removed options -o --cov as now Tensile will use the default COV format, set by cmake define Tensile_CODE_OBJECT_VERSION=default
|
||||
|
||||
#### rocFFT 1.0.22
|
||||
|
||||
rocFFT 1.0.22 for ROCm 5.5.0
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Improved performance of 1D lengths < 2048 that use Bluestein's algorithm.
|
||||
- Reduced time for generating code during plan creation.
|
||||
- Optimized 3D R2C/C2R lengths 32, 84, 128.
|
||||
- Optimized batched small 1D R2C/C2R cases.
|
||||
|
||||
##### Added
|
||||
|
||||
- Added gfx1101 to default AMDGPU_TARGETS.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Moved client programs to C++17.
|
||||
- Moved planar kernels and infrequently used Stockham kernels to be runtime-compiled.
|
||||
- Moved transpose, real-complex, Bluestein, and Stockham kernels to library kernel cache.
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Removed zero-length twiddle table allocations, which fixes errors from hipMallocManaged.
|
||||
- Fixed incorrect freeing of HIP stream handles during twiddle computation when multiple devices are present.
|
||||
|
||||
#### rocPRIM 2.13.0
|
||||
|
||||
rocPRIM 2.13.0 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- New block level `radix_rank` primitive.
|
||||
- New block level `radix_rank_match` primitive.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Improved the performance of `block_radix_sort` and `device_radix_sort`.
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- Disabled GPU error messages relating to incorrect warp operation usage with Navi GPUs on Windows, due to GPU printf performance issues on Windows.
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed benchmark build on Windows
|
||||
|
||||
#### rocRAND 2.10.17
|
||||
|
||||
rocRAND 2.10.17 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- MT19937 pseudo random number generator based on M. Matsumoto and T. Nishimura, 1998, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator.
|
||||
- New benchmark for the device API using Google Benchmark, `benchmark_rocrand_device_api`, replacing `benchmark_rocrand_kernel`. `benchmark_rocrand_kernel` is deprecated and will be removed in a future version. Likewise, `benchmark_curand_host_api` is added to replace `benchmark_curand_generate` and `benchmark_curand_device_api` is added to replace `benchmark_curand_kernel`.
|
||||
- experimental HIP-CPU feature
|
||||
- ThreeFry pseudorandom number generator based on Salmon et al., 2011, "Parallel random numbers: as easy as 1, 2, 3".
|
||||
|
||||
##### Changed
|
||||
|
||||
- Python 2.7 is no longer officially supported.
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Windows HIP SDK support
|
||||
|
||||
#### rocSOLVER 3.21.0
|
||||
|
||||
rocSOLVER 3.21.0 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- SVD for general matrices using Jacobi algorithm:
|
||||
- GESVDJ (with batched and strided\_batched versions)
|
||||
- LU factorization without pivoting for block tridiagonal matrices:
|
||||
- GEBLTTRF_NPVT (with batched and strided\_batched versions)
|
||||
- Linear system solver without pivoting for block tridiagonal matrices:
|
||||
- GEBLTTRS_NPVT (with batched and strided\_batched, versions)
|
||||
- Product of triangular matrices
|
||||
- LAUUM
|
||||
- Added experimental hipGraph support for rocSOLVER functions
|
||||
|
||||
##### Optimized
|
||||
|
||||
- Improved the performance of SYEVJ/HEEVJ.
|
||||
|
||||
##### Changed
|
||||
|
||||
- STEDC, SYEVD/HEEVD and SYGVD/HEGVD now use fully implemented Divide and Conquer approach.
|
||||
|
||||
##### Fixed
|
||||
|
||||
- SYEVJ/HEEVJ should now be invariant under matrix scaling.
|
||||
- SYEVJ/HEEVJ should now properly output the eigenvalues when no sweeps are executed.
|
||||
- Fixed GETF2\_NPVT and GETRF\_NPVT input data initialization in tests and benchmarks.
|
||||
- Fixed rocblas missing from the dependency list of the rocsolver deb and rpm packages.
|
||||
|
||||
#### rocSPARSE 2.5.1
|
||||
|
||||
rocSPARSE 2.5.1 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added bsrgemm and spgemm for BSR format
|
||||
- Added bsrgeam
|
||||
- Added build support for Navi32
|
||||
- Added experimental hipGraph support for some rocSPARSE routines
|
||||
- Added csritsv, spitsv csr iterative triangular solve
|
||||
- Added mixed precisions for SpMV
|
||||
- Added batched SpMM for transpose A in COO format with atomic atomic algorithm
|
||||
|
||||
##### Improved
|
||||
|
||||
- Optimization to csr2bsr
|
||||
- Optimization to csr2csr_compress
|
||||
- Optimization to csr2coo
|
||||
- Optimization to gebsr2csr
|
||||
- Optimization to csr2gebsr
|
||||
- Fixes to documentation
|
||||
- Fixes a bug in COO SpMV gridsize
|
||||
- Fixes a bug in SpMM gridsize when using very large matrices
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- In csritlu0, the algorithm rocsparse_itilu0_alg_sync_split_fusion has some accuracy issues to investigate with XNACK enabled. The fallback is rocsparse_itilu0_alg_sync_split.
|
||||
|
||||
#### rocWMMA 1.0
|
||||
|
||||
rocWMMA 1.0 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added support for wave32 on gfx11+
|
||||
- Added infrastructure changes to support hipRTC
|
||||
- Added performance tracking system
|
||||
|
||||
##### Changed
|
||||
|
||||
- Modified the assignment of hardware information
|
||||
- Modified the data access for unsigned datatypes
|
||||
- Added library config to support multiple architectures
|
||||
|
||||
#### Tensile 4.36.0
|
||||
|
||||
Tensile 4.36.0 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Add functions for user-driven tuning
|
||||
- Add GFX11 support: HostLibraryTests yamls, rearragne FP32(C)/FP64(C) instruction order, archCaps for instruction renaming condition, adjust vgpr bank for A/B/C for optimize, separate vscnt and vmcnt, dual mac
|
||||
- Add binary search for Grid-Based algorithm
|
||||
- Add reject condition for (StoreCInUnroll + BufferStore=0) and (DirectToVgpr + ScheduleIterAlg<3 + PrefetchGlobalRead==2)
|
||||
- Add support for (DirectToLds + hgemm + NN/NT/TT) and (DirectToLds + hgemm + GlobalLoadVectorWidth < 4)
|
||||
- Add support for (DirectToLds + hgemm(TLU=True only) or sgemm + NumLoadsCoalesced > 1)
|
||||
- Add GSU SingleBuffer algorithm for HSS/BSS
|
||||
- Add gfx900:xnack-, gfx1032, gfx1034, gfx1035
|
||||
- Enable gfx1031 support
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Use AssertSizeLessThan for BufferStoreOffsetLimitCheck if it is smaller than MT1
|
||||
- Improve InitAccVgprOpt
|
||||
|
||||
##### Changed
|
||||
|
||||
- Use global_atomic for GSU instead of flat and global_store for debug code
|
||||
- Replace flat_load/store with global_load/store
|
||||
- Use global_load/store for BufferLoad/Store=0 and enable scheduling
|
||||
- LocalSplitU support for HGEMM+HPA when MFMA disabled
|
||||
- Update Code Object Version
|
||||
- Type cast local memory to COMPUTE_DATA_TYPE in LDS to avoid precision loss
|
||||
- Update asm cap cache arguments
|
||||
- Unify SplitGlobalRead into ThreadSeparateGlobalRead and remove SplitGlobalRead
|
||||
- Change checks, error messages, assembly syntax, and coverage for DirectToLds
|
||||
- Remove unused cmake file
|
||||
- Clean up the LLVM dependency code
|
||||
- Update ThreadSeparateGlobalRead test cases for PrefetchGlobalRead=2
|
||||
- Update sgemm/hgemm test cases for DirectToLds and ThreadSepareteGlobalRead
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Add build-id to header of compiled source kernels
|
||||
- Fix solution index collisions
|
||||
- Fix h beta vectorwidth4 correctness issue for WMMA
|
||||
- Fix an error with BufferStore=0
|
||||
- Fix mismatch issue with (StoreCInUnroll + PrefetchGlobalRead=2)
|
||||
- Fix MoveMIoutToArch bug
|
||||
- Fix flat load correctness issue on I8 and flat store correctness issue
|
||||
- Fix mismatch issue with BufferLoad=0 + TailLoop for large array sizes
|
||||
- Fix code generation error with BufferStore=0 and StoreCInUnrollPostLoop
|
||||
- Fix issues with DirectToVgpr + ScheduleIterAlg<3
|
||||
- Fix mismatch issue with DGEMM TT + LocalReadVectorWidth=2
|
||||
- Fix mismatch issue with PrefetchGlobalRead=2
|
||||
- Fix mismatch issue with DirectToVgpr + PrefetchGlobalRead=2 + small tile size
|
||||
- Fix an error with PersistentKernel=0 + PrefetchAcrossPersistent=1 + PrefetchAcrossPersistentMode=1
|
||||
- Fix mismatch issue with DirectToVgpr + DirectToLds + only 1 iteration in unroll loop case
|
||||
- Remove duplicate GSU kernels: for GSU = 1, GSUAlgorithm SingleBuffer and MultipleBuffer kernels are identical
|
||||
- Fix for failing CI tests due to CpuThreads=0
|
||||
- Fix mismatch issue with DirectToLds + PrefetchGlobalRead=2
|
||||
- Remove the reject condition for ThreadSeparateGlobalRead and DirectToLds (HGEMM, SGEMM only)
|
||||
- Modify reject condition for minimum lanes of ThreadSeparateGlobalRead (SGEMM or larger data type only)
|
||||
|
||||
-------------------
|
||||
|
||||
## ROCm 5.4.3
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
### Deprecations and Warnings
|
||||
|
||||
#### HIP Perl Scripts Deprecation
|
||||
|
||||
The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, compiled binaries will be available as `hipcc.bin` and `hipconfig.bin` as replacements for the Perl scripts.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> There will be a transition period where the Perl scripts and compiled binaries are available before the scripts are removed. There will be no functional difference between the Perl scripts and their compiled binary counterpart. No user action is required. Once these are available, users can optionally switch to `hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from `hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
|
||||
##### Linux Filesystem Hierarchy Standard for ROCm
|
||||
|
||||
ROCm packages have adopted the Linux foundation filesystem hierarchy standard in this release to ensure ROCm components follow open source conventions for Linux-based distributions. While moving to a new filesystem hierarchy, ROCm ensures backward compatibility with its 5.1 version or older filesystem hierarchy. See below for a detailed explanation of the new filesystem hierarchy and backward compatibility.
|
||||
|
||||
##### New Filesystem Hierarchy
|
||||
|
||||
The following is the new filesystem hierarchy:4
|
||||
|
||||
```text
|
||||
/opt/rocm-<ver>
|
||||
| --bin
|
||||
| --All externally exposed Binaries
|
||||
| --libexec
|
||||
| --<component>
|
||||
| -- Component specific private non-ISA executables (architecture independent)
|
||||
| --include
|
||||
| -- <component>
|
||||
| --<header files>
|
||||
| --lib
|
||||
| --lib<soname>.so -> lib<soname>.so.major -> lib<soname>.so.major.minor.patch
|
||||
(public libraries linked with application)
|
||||
| --<component> (component specific private library, executable data)
|
||||
| --<cmake>
|
||||
| --components
|
||||
| --<component>.config.cmake
|
||||
| --share
|
||||
| --html/<component>/*.html
|
||||
| --info/<component>/*.[pdf, md, txt]
|
||||
| --man
|
||||
| --doc
|
||||
| --<component>
|
||||
| --<licenses>
|
||||
| --<component>
|
||||
| --<misc files> (arch independent non-executable)
|
||||
| --samples
|
||||
|
||||
```
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major release.
|
||||
|
||||
For more information, refer to <https://refspecs.linuxfoundation.org/fhs.shtml>.
|
||||
|
||||
##### Backward Compatibility with Older Filesystems
|
||||
|
||||
ROCm has moved header files and libraries to its new location as indicated in the above structure and included symbolic-link and wrapper header files in its old location for backward compatibility.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> ROCm will continue supporting backward compatibility until the next major release.
|
||||
|
||||
##### Wrapper header files
|
||||
|
||||
Wrapper header files are placed in the old location (`/opt/rocm-xxx/<component>/include`) with a warning message to include files from the new location (`/opt/rocm-xxx/include`) as shown in the example below:
|
||||
|
||||
```h
|
||||
// Code snippet from hip_runtime.h
|
||||
#pragma message “This file is deprecated. Use file from include path /opt/rocm-ver/include/ and prefix with hip”.
|
||||
#include "hip/hip_runtime.h"
|
||||
```
|
||||
|
||||
The wrapper header files’ backward compatibility deprecation is as follows:
|
||||
|
||||
- `#pragma` message announcing deprecation -- ROCm v5.2 release
|
||||
- `#pragma` message changed to `#warning` -- Future release
|
||||
- `#warning` changed to `#error` -- Future release
|
||||
- Backward compatibility wrappers removed -- Future release
|
||||
|
||||
##### Library files
|
||||
|
||||
Library files are available in the `/opt/rocm-xxx/lib` folder. For backward compatibility, the old library location (`/opt/rocm-xxx/<component>/lib`) has a soft link to the library at the new location.
|
||||
|
||||
Example:
|
||||
|
||||
```log
|
||||
$ ls -l /opt/rocm/hip/lib/
|
||||
total 4
|
||||
drwxr-xr-x 4 root root 4096 May 12 10:45 cmake
|
||||
lrwxrwxrwx 1 root root 24 May 10 23:32 libamdhip64.so -> ../../lib/libamdhip64.so
|
||||
```
|
||||
|
||||
##### CMake Config files
|
||||
|
||||
All CMake configuration files are available in the `/opt/rocm-xxx/lib/cmake/<component>` folder. For backward compatibility, the old CMake locations (`/opt/rocm-xxx/<component>/lib/cmake`) consist of a soft link to the new CMake config.
|
||||
|
||||
Example:
|
||||
|
||||
```log
|
||||
$ ls -l /opt/rocm/hip/lib/cmake/hip/
|
||||
total 0
|
||||
lrwxrwxrwx 1 root root 42 May 10 23:32 hip-config.cmake -> ../../../../lib/cmake/hip/hip-config.cmake
|
||||
```
|
||||
|
||||
### Fixed Defects
|
||||
|
||||
#### Compiler Improvements
|
||||
|
||||
In ROCm v5.4.3, improvements to the compiler address errors with the following signatures:
|
||||
|
||||
- "error: unhandled SGPR spill to memory"
|
||||
- "cannot scavenge register without an emergency spill slot!"
|
||||
- "error: ran out of registers during register allocation"
|
||||
|
||||
### Known Issues
|
||||
|
||||
#### Compiler Option Error at Runtime
|
||||
|
||||
Some users may encounter a “Cannot find Symbol” error at runtime when using -save-temps. While most -save-temps use cases work correctly, this error may appear occasionally.
|
||||
|
||||
This issue is under investigation, and the known workaround is not to use -save-temps when the error appears.
|
||||
|
||||
### Library Changes in ROCM 5.4.3
|
||||
|
||||
| Library | Version |
|
||||
|---------|---------|
|
||||
| hipBLAS | [0.53.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.4.3) |
|
||||
| hipCUB | [2.13.0](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.4.3) |
|
||||
| hipFFT | [1.0.10](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.4.3) |
|
||||
| hipSOLVER | [1.6.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.4.3) |
|
||||
| hipSPARSE | [2.3.3](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.4.3) |
|
||||
| rccl | [2.13.4](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.4.3) |
|
||||
| rocALUTION | [2.1.3](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.4.3) |
|
||||
| rocBLAS | [2.46.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.4.3) |
|
||||
| rocFFT | 1.0.20 ⇒ [1.0.21](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.4.3) |
|
||||
| rocPRIM | [2.12.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.4.3) |
|
||||
| rocRAND | [2.10.16](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.4.3) |
|
||||
| rocSOLVER | [3.20.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.4.3) |
|
||||
| rocSPARSE | [2.4.0](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.4.3) |
|
||||
| rocThrust | [2.17.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.4.3) |
|
||||
| rocWMMA | [0.9](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.4.3) |
|
||||
| Tensile | [4.35.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.4.3) |
|
||||
|
||||
#### rocFFT 1.0.21
|
||||
|
||||
rocFFT 1.0.21 for ROCm 5.4.3
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Removed source directory from rocm_install_targets call to prevent installation of rocfft.h in an unintended location.
|
||||
|
||||
-------------------
|
||||
|
||||
## ROCm 5.4.2
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
### Deprecations and Warnings
|
||||
|
||||
#### HIP Perl Scripts Deprecation
|
||||
|
||||
The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, compiled binaries will be available as `hipcc.bin` and `hipconfig.bin` as replacements for the Perl scripts.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> There will be a transition period where the Perl scripts and compiled binaries are available before the scripts are removed. There will be no functional difference between the Perl scripts and their compiled binary counterpart. No user action is required. Once these are available, users can optionally switch to `hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from `hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
|
||||
#### `hipcc` Options Deprecation
|
||||
|
||||
The following hipcc options are being deprecated and will be removed in a future release:
|
||||
|
||||
- The `--amdgpu-target` option is being deprecated, and user must use the `–offload-arch` option to specify the GPU architecture.
|
||||
- The `--amdhsa-code-object-version` option is being deprecated. Users can use the Clang/LLVM option `-mllvm -mcode-object-version` to debug issues related to code object versions.
|
||||
- The `--hipcc-func-supp`/`--hipcc-no-func-supp` options are being deprecated, as the function calls are already supported in production on AMD GPUs.
|
||||
|
||||
### Known Issues
|
||||
|
||||
Under certain circumstances typified by high register pressure, users may encounter a compiler abort with one of the following error messages:
|
||||
|
||||
- > `error: unhandled SGPR spill to memory`
|
||||
|
||||
- > `cannot scavenge register without an emergency spill slot!`
|
||||
|
||||
- > `error: ran out of registers during register allocation`
|
||||
|
||||
This is a known issue and will be fixed in a future release.
|
||||
|
||||
### Library Changes in ROCM 5.4.2
|
||||
|
||||
| Library | Version |
|
||||
|---------|---------|
|
||||
| hipBLAS | [0.53.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.4.2) |
|
||||
| hipCUB | [2.13.0](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.4.2) |
|
||||
| hipFFT | [1.0.10](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.4.2) |
|
||||
| hipSOLVER | [1.6.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.4.2) |
|
||||
| hipSPARSE | [2.3.3](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.4.2) |
|
||||
| rccl | [2.13.4](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.4.2) |
|
||||
| rocALUTION | [2.1.3](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.4.2) |
|
||||
| rocBLAS | [2.46.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.4.2) |
|
||||
| rocFFT | [1.0.20](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.4.2) |
|
||||
| rocPRIM | [2.12.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.4.2) |
|
||||
| rocRAND | [2.10.16](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.4.2) |
|
||||
| rocSOLVER | [3.20.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.4.2) |
|
||||
| rocSPARSE | [2.4.0](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.4.2) |
|
||||
| rocThrust | [2.17.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.4.2) |
|
||||
| rocWMMA | [0.9](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.4.2) |
|
||||
| Tensile | [4.35.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.4.2) |
|
||||
|
||||
-------------------
|
||||
|
||||
## ROCm 5.4.1
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
### What's New in This Release
|
||||
|
||||
10
README.md
10
README.md
@@ -1,4 +1,4 @@
|
||||
# AMD ROCm™ Platform - Powering Your GPU Computational Needs
|
||||
# AMD ROCm™ Platform
|
||||
|
||||
ROCm™ is an open-source stack for GPU computation. ROCm is primarily Open-Source
|
||||
Software (OSS) that allows developers the freedom to customize and tailor their
|
||||
@@ -32,7 +32,13 @@ The default.xml file uses the repo Manifest format.
|
||||
The develop branch of this repository contains content for the next
|
||||
ROCm release.
|
||||
|
||||
## How to build documentation via Sphinx
|
||||
## ROCm Documentation
|
||||
|
||||
ROCm Documentation is available online at
|
||||
[rocm.docs.amd.com](https://rocm.docs.amd.com). Source code for the documenation
|
||||
is located in the docs folder of most repositories that are part of ROCm.
|
||||
|
||||
### How to build documentation via Sphinx
|
||||
|
||||
```bash
|
||||
cd docs
|
||||
|
||||
672
RELEASE.md
672
RELEASE.md
@@ -15,130 +15,40 @@ The release notes for the ROCm platform.
|
||||
|
||||
-------------------
|
||||
|
||||
## ROCm 5.5.0
|
||||
## ROCm 5.4.1
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
### What's New in This Release
|
||||
|
||||
#### HIP Enhancements
|
||||
|
||||
The ROCm v5.5 release consists of the following HIP enhancements:
|
||||
The ROCm v5.4.1 release consists of the following new HIP API:
|
||||
|
||||
##### Enhanced Stack Size Limit
|
||||
##### New HIP API - hipLaunchHostFunc
|
||||
|
||||
In this release, the stack size limit is increased from 16k to 131056 bytes (or 128K - 16).
|
||||
Applications requiring to update the stack size can use hipDeviceSetLimit API.
|
||||
|
||||
##### `hipcc` Changes
|
||||
|
||||
The following hipcc changes are implemented in this release:
|
||||
|
||||
- `hipcc` will not implicitly link to `libpthread` and `librt`, as they are no longer a link time dependence for HIP programs. Applications that depend on these libraries must explicitly link to them.
|
||||
- `-use-staticlib` and `-use-sharedlib` options are deprecated.
|
||||
|
||||
##### Future Changes
|
||||
|
||||
- Separation of `hipcc` binaries (Perl scripts) from HIP to `hipcc` project. Users will access separate `hipcc` package for installing `hipcc` binaries in future ROCm releases.
|
||||
- In a future ROCm release, the following samples will be removed from the `hip-tests` project.
|
||||
- `hipBusbandWidth` at <https://github.com/ROCm-Developer-Tools/hip-tests/tree/develop/samples/1_Utils/shipBusBandwidth>
|
||||
- `hipCommander` at <https://github.com/ROCm-Developer-Tools/hip-tests/tree/develop/samples/1_Utils/hipCommander>
|
||||
|
||||
Note that the samples will continue to be available in previous release branches.
|
||||
|
||||
##### New HIP APIs in This Release
|
||||
The following new HIP API is introduced in the ROCm v5.4.1 release.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> This is a pre-official version (beta) release of the new APIs and may contain unresolved issues.
|
||||
> This is a pre-official version (beta) release of the new APIs.
|
||||
|
||||
###### Memory Management HIP APIs
|
||||
```h
|
||||
hipError_t hipLaunchHostFunc(hipStream_t stream, hipHostFn_t fn, void* userData);
|
||||
```
|
||||
|
||||
The new memory management HIP API is as follows:
|
||||
This swaps the stream capture mode of a thread.
|
||||
|
||||
- Sets information on the specified pointer [BETA].
|
||||
```text
|
||||
@param [in] mode - Pointer to mode value to swap with the current mode
|
||||
```
|
||||
|
||||
```h
|
||||
hipError_t hipPointerSetAttribute(const void* value, hipPointer_attribute attribute, hipDeviceptr_t ptr);
|
||||
```
|
||||
This parameter returns `#hipSuccess`, `#hipErrorInvalidValue`.
|
||||
|
||||
###### Module Management HIP APIs
|
||||
|
||||
The new module management HIP APIs are as follows:
|
||||
|
||||
- Launches kernel $f$ with launch parameters and shared memory on stream with arguments passed to `kernelParams`, where thread blocks can cooperate and synchronize as they execute.
|
||||
|
||||
```h
|
||||
hipError_t hipModuleLaunchCooperativeKernel(hipFunction_t f, unsigned int gridDimX, unsigned int gridDimY, unsigned int gridDimZ, unsigned int blockDimX, unsigned int blockDimY, unsigned int blockDimZ, unsigned int sharedMemBytes, hipStream_t stream, void** kernelParams);
|
||||
```
|
||||
|
||||
- Launches kernels on multiple devices where thread blocks can cooperate and synchronize as they execute.
|
||||
|
||||
```h
|
||||
hipError_t hipModuleLaunchCooperativeKernelMultiDevice(hipFunctionLaunchParams* launchParamsList, unsigned int numDevices, unsigned int flags);
|
||||
```
|
||||
|
||||
###### HIP Graph Management APIs
|
||||
|
||||
The new HIP Graph Management APIs are as follows:
|
||||
|
||||
- Creates a memory allocation node and adds it to a graph [BETA]
|
||||
|
||||
```h
|
||||
hipError_t hipGraphAddMemAllocNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, hipMemAllocNodeParams* pNodeParams);
|
||||
```
|
||||
|
||||
- Return parameters for memory allocation node [BETA]
|
||||
|
||||
```h
|
||||
hipError_t hipGraphMemAllocNodeGetParams(hipGraphNode_t node, hipMemAllocNodeParams* pNodeParams);
|
||||
```
|
||||
|
||||
- Creates a memory free node and adds it to a graph [BETA]
|
||||
|
||||
```h
|
||||
hipError_t hipGraphAddMemFreeNode(hipGraphNode_t* pGraphNode, hipGraph_t graph, const hipGraphNode_t* pDependencies, size_t numDependencies, void* dev_ptr);
|
||||
```
|
||||
|
||||
- Returns parameters for memory free node [BETA].
|
||||
|
||||
```h
|
||||
hipError_t hipGraphMemFreeNodeGetParams(hipGraphNode_t node, void* dev_ptr);
|
||||
```
|
||||
|
||||
- Write a DOT file describing graph structure [BETA].
|
||||
|
||||
```h
|
||||
hipError_t hipGraphDebugDotPrint(hipGraph_t graph, const char* path, unsigned int flags);
|
||||
```
|
||||
|
||||
- Copies attributes from source node to destination node [BETA].
|
||||
|
||||
```h
|
||||
hipError_t hipGraphKernelNodeCopyAttributes(hipGraphNode_t hSrc, hipGraphNode_t hDst);
|
||||
```
|
||||
|
||||
- Enables or disables the specified node in the given graphExec [BETA]
|
||||
|
||||
```h
|
||||
hipError_t hipGraphNodeSetEnabled(hipGraphExec_t hGraphExec, hipGraphNode_t hNode, unsigned int isEnabled);
|
||||
```
|
||||
|
||||
- Query whether a node in the given graphExec is enabled [BETA]
|
||||
|
||||
```h
|
||||
hipError_t hipGraphNodeGetEnabled(hipGraphExec_t hGraphExec, hipGraphNode_t hNode, unsigned int* isEnabled);
|
||||
```
|
||||
|
||||
##### OpenMP Enhancements
|
||||
This release consists of the following OpenMP enhancements:
|
||||
|
||||
- Additional support for OMPT functions `get_device_time` and `get_record_type`.
|
||||
- Add support for min/max fast fp atomics on AMD GPUs.
|
||||
- Fix the use of the abs function in C device regions.
|
||||
For more information, refer to the HIP API documentation at /bundle/HIP_API_Guide/page/modules.html.
|
||||
|
||||
### Deprecations and Warnings
|
||||
|
||||
#### HIP Deprecation
|
||||
#### HIP Perl Scripts Deprecation
|
||||
|
||||
The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, compiled binaries will be available as `hipcc.bin` and `hipconfig.bin` as replacements for the Perl scripts.
|
||||
|
||||
@@ -146,548 +56,28 @@ The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, co
|
||||
>
|
||||
> There will be a transition period where the Perl scripts and compiled binaries are available before the scripts are removed. There will be no functional difference between the Perl scripts and their compiled binary counterpart. No user action is required. Once these are available, users can optionally switch to `hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from `hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
|
||||
##### Linux Filesystem Hierarchy Standard for ROCm
|
||||
### IFWI Fixes
|
||||
|
||||
ROCm packages have adopted the Linux foundation filesystem hierarchy standard in this release to ensure ROCm components follow open source conventions for Linux-based distributions. While moving to a new filesystem hierarchy, ROCm ensures backward compatibility with its 5.1 version or older filesystem hierarchy. See below for a detailed explanation of the new filesystem hierarchy and backward compatibility.
|
||||
These defects were identified and documented as known issues in previous ROCm releases and are fixed in this release.
|
||||
AMD Instinct™ MI200 Firmware IFWI Maintenance Update #3
|
||||
|
||||
##### New Filesystem Hierarchy
|
||||
This IFWI release fixes the following issue in AMD Instinct™ MI210/MI250 Accelerators.
|
||||
|
||||
The following is the new filesystem hierarchy:4
|
||||
After prolonged periods of operation, certain MI200 Instinct™ Accelerators may perform in a degraded way resulting in application failures.
|
||||
|
||||
```text
|
||||
/opt/rocm-<ver>
|
||||
| --bin
|
||||
| --All externally exposed Binaries
|
||||
| --libexec
|
||||
| --<component>
|
||||
| -- Component specific private non-ISA executables (architecture independent)
|
||||
| --include
|
||||
| -- <component>
|
||||
| --<header files>
|
||||
| --lib
|
||||
| --lib<soname>.so -> lib<soname>.so.major -> lib<soname>.so.major.minor.patch
|
||||
(public libraries linked with application)
|
||||
| --<component> (component specific private library, executable data)
|
||||
| --<cmake>
|
||||
| --components
|
||||
| --<component>.config.cmake
|
||||
| --share
|
||||
| --html/<component>/*.html
|
||||
| --info/<component>/*.[pdf, md, txt]
|
||||
| --man
|
||||
| --doc
|
||||
| --<component>
|
||||
| --<licenses>
|
||||
| --<component>
|
||||
| --<misc files> (arch independent non-executable)
|
||||
| --samples
|
||||
In this package, AMD delivers a new firmware version for MI200 GPU accelerators and a firmware installation tool – AMD FW FLASH 1.2.
|
||||
|
||||
```
|
||||
| GPU | Production Part Number | SKU | IFWI Name |
|
||||
|-------|------------|--------|---------------|
|
||||
| MI210 | 113-D673XX | D67302 | D6730200V.110 |
|
||||
| MI210 | 113-D673XX | D67301 | D6730100V.073 |
|
||||
| MI250 | 113-D652XX | D65209 | D6520900.073 |
|
||||
| MI250 | 113-D652XX | D65210 | D6521000.073 |
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> ROCm will not support backward compatibility with the v5.1(old) file system hierarchy in its next major release.
|
||||
Instructions on how to download and apply MI200 maintenance updates are available at:
|
||||
|
||||
For more information, refer to <https://refspecs.linuxfoundation.org/fhs.shtml>.
|
||||
<https://www.amd.com/en/support/server-accelerators/amd-instinct/amd-instinct-mi-series/amd-instinct-mi210>
|
||||
|
||||
##### Backward Compatibility with Older Filesystems
|
||||
#### AMD Instinct™ MI200 SRIOV Virtualization Support
|
||||
|
||||
ROCm has moved header files and libraries to its new location as indicated in the above structure and included symbolic-link and wrapper header files in its old location for backward compatibility.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> ROCm will continue supporting backward compatibility until the next major release.
|
||||
|
||||
##### Wrapper header files
|
||||
|
||||
Wrapper header files are placed in the old location (`/opt/rocm-xxx/<component>/include`) with a warning message to include files from the new location (`/opt/rocm-xxx/include`) as shown in the example below:
|
||||
|
||||
```h
|
||||
// Code snippet from hip_runtime.h
|
||||
#pragma message “This file is deprecated. Use file from include path /opt/rocm-ver/include/ and prefix with hip”.
|
||||
#include "hip/hip_runtime.h"
|
||||
```
|
||||
|
||||
The wrapper header files’ backward compatibility deprecation is as follows:
|
||||
|
||||
- `#pragma` message announcing deprecation -- ROCm v5.2 release
|
||||
- `#pragma` message changed to `#warning` -- Future release
|
||||
- `#warning` changed to `#error` -- Future release
|
||||
- Backward compatibility wrappers removed -- Future release
|
||||
|
||||
##### Library files
|
||||
|
||||
Library files are available in the `/opt/rocm-xxx/lib` folder. For backward compatibility, the old library location (`/opt/rocm-xxx/<component>/lib`) has a soft link to the library at the new location.
|
||||
|
||||
Example:
|
||||
|
||||
```log
|
||||
$ ls -l /opt/rocm/hip/lib/
|
||||
total 4
|
||||
drwxr-xr-x 4 root root 4096 May 12 10:45 cmake
|
||||
lrwxrwxrwx 1 root root 24 May 10 23:32 libamdhip64.so -> ../../lib/libamdhip64.so
|
||||
```
|
||||
|
||||
##### CMake Config files
|
||||
|
||||
All CMake configuration files are available in the `/opt/rocm-xxx/lib/cmake/<component>` folder.
|
||||
For backward compatibility, the old CMake locations (`/opt/rocm-xxx/<component>/lib/cmake`) consist of a soft link to the new CMake config.
|
||||
|
||||
Example:
|
||||
|
||||
```log
|
||||
$ ls -l /opt/rocm/hip/lib/cmake/hip/
|
||||
total 0
|
||||
lrwxrwxrwx 1 root root 42 May 10 23:32 hip-config.cmake -> ../../../../lib/cmake/hip/hip-config.cmake
|
||||
```
|
||||
|
||||
#### ROCm Support For Code Object V3 Deprecated
|
||||
|
||||
Support for Code Object v3 is deprecated and will be removed in a future release.
|
||||
|
||||
#### Comgr V3.0 Changes
|
||||
|
||||
The following APIs and macros have been marked as deprecated. These are expected to be removed in a future ROCm release and coincides with the release of Comgr v3.0.
|
||||
|
||||
##### API Changes
|
||||
|
||||
- `amd_comgr_action_info_set_options()`
|
||||
- `amd_comgr_action_info_get_options()`
|
||||
|
||||
##### Actions and Data Types
|
||||
|
||||
- `AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES`
|
||||
- `AMD_COMGR_ACTION_COMPILE_SOURCE_TO_FATBIN`
|
||||
|
||||
For replacements, see the `AMD_COMGR_ACTION_INFO_GET`/`SET_OPTION_LIST APIs`, and the `AMD_COMGR_ACTION_COMPILE_SOURCE_(WITH_DEVICE_LIBS)_TO_BC` macros.
|
||||
|
||||
#### Deprecated Environment Variables
|
||||
|
||||
The following environment variables are removed in this ROCm release:
|
||||
|
||||
- `GPU_MAX_COMMAND_QUEUES`
|
||||
- `GPU_MAX_WORKGROUP_SIZE_2D_X`
|
||||
- `GPU_MAX_WORKGROUP_SIZE_2D_Y`
|
||||
- `GPU_MAX_WORKGROUP_SIZE_3D_X`
|
||||
- `GPU_MAX_WORKGROUP_SIZE_3D_Y`
|
||||
- `GPU_MAX_WORKGROUP_SIZE_3D_Z`
|
||||
- `GPU_BLIT_ENGINE_TYPE`
|
||||
- `GPU_USE_SYNC_OBJECTS`
|
||||
- `AMD_OCL_SC_LIB`
|
||||
- `AMD_OCL_ENABLE_MESSAGE_BOX`
|
||||
- `GPU_FORCE_64BIT_PTR`
|
||||
- `GPU_FORCE_OCL20_32BIT`
|
||||
- `GPU_RAW_TIMESTAMP`
|
||||
- `GPU_SELECT_COMPUTE_RINGS_ID`
|
||||
- `GPU_USE_SINGLE_SCRATCH`
|
||||
- `GPU_ENABLE_LARGE_ALLOCATION`
|
||||
- `HSA_LOCAL_MEMORY_ENABLE`
|
||||
- `HSA_ENABLE_COARSE_GRAIN_SVM`
|
||||
- `GPU_IFH_MODE`
|
||||
- `OCL_SYSMEM_REQUIREMENT`
|
||||
- `OCL_CODE_CACHE_ENABLE`
|
||||
- `OCL_CODE_CACHE_RESET`
|
||||
|
||||
### Known Issues In This Release
|
||||
|
||||
The following are the known issues in this release.
|
||||
|
||||
#### `DISTRIBUTED`/`TEST_DISTRIBUTED_SPAWN` Fails
|
||||
|
||||
When user applications call `ncclCommAbort` to destruct communicators and then create new
|
||||
communicators repeatedly, subsequent communicators may fail to initialize.
|
||||
|
||||
This issue is under investigation and will be resolved in a future release.
|
||||
|
||||
#### Failures In HIP Directed Tests
|
||||
|
||||
Multiple HIP directed tests fail.
|
||||
|
||||
### Library Changes in ROCM 5.5.0
|
||||
|
||||
| Library | Version |
|
||||
|---------|---------|
|
||||
| hipBLAS | 0.53.0 ⇒ [0.54.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.5.0) |
|
||||
| hipCUB | 2.13.0 ⇒ [2.13.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.5.0) |
|
||||
| hipFFT | 1.0.10 ⇒ [1.0.11](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.5.0) |
|
||||
| hipSOLVER | 1.6.0 ⇒ [1.7.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.5.0) |
|
||||
| hipSPARSE | 2.3.3 ⇒ [2.3.5](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.5.0) |
|
||||
| rccl | 2.13.4 ⇒ [2.15.5](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.5.0) |
|
||||
| rocALUTION | 2.1.3 ⇒ [2.1.8](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.5.0) |
|
||||
| rocBLAS | 2.46.0 ⇒ [2.47.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.5.0) |
|
||||
| rocFFT | 1.0.21 ⇒ [1.0.22](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.5.0) |
|
||||
| rocPRIM | 2.12.0 ⇒ [2.13.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.5.0) |
|
||||
| rocRAND | 2.10.16 ⇒ [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.5.0) |
|
||||
| rocSOLVER | 3.20.0 ⇒ [3.21.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.5.0) |
|
||||
| rocSPARSE | 2.4.0 ⇒ [2.5.1](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.5.0) |
|
||||
| rocThrust | [2.17.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.5.0) |
|
||||
| rocWMMA | 0.9 ⇒ [1.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.5.0) |
|
||||
| Tensile | 4.35.0 ⇒ [4.36.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.5.0) |
|
||||
|
||||
#### hipBLAS 0.54.0
|
||||
|
||||
hipBLAS 0.54.0 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- added option to opt-in to use __half for hipblasHalf type in the API for c++ users who define HIPBLAS_USE_HIP_HALF
|
||||
- added scripts to plot performance for multiple functions
|
||||
- data driven hipblas-bench and hipblas-test execution via external yaml format data files
|
||||
- client smoke test added for quick validation using command hipblas-test --yaml hipblas_smoke.yaml
|
||||
|
||||
##### Fixed
|
||||
|
||||
- fixed datatype conversion functions to support more rocBLAS/cuBLAS datatypes
|
||||
- fixed geqrf to return successfully when nullptrs are passed in with n == 0 || m == 0
|
||||
- fixed getrs to return successfully when given nullptrs with corresponding size = 0
|
||||
- fixed getrs to give info = -1 when transpose is not an expected type
|
||||
- fixed gels to return successfully when given nullptrs with corresponding size = 0
|
||||
- fixed gels to give info = -1 when transpose is not in ('N', 'T') for real cases or not in ('N', 'C') for complex cases
|
||||
|
||||
##### Changed
|
||||
|
||||
- changed reference code for Windows to OpenBLAS
|
||||
- hipblas client executables all now begin with hipblas- prefix
|
||||
|
||||
#### hipCUB 2.13.1
|
||||
|
||||
hipCUB 2.13.1 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Benchmarks for `BlockShuffle`, `BlockLoad`, and `BlockStore`.
|
||||
|
||||
##### Changed
|
||||
|
||||
- CUB backend references CUB and Thrust version 1.17.2.
|
||||
- Improved benchmark coverage of `BlockScan` by adding `ExclusiveScan`, benchmark coverage of `BlockRadixSort` by adding `SortBlockedToStriped`, and benchmark coverage of `WarpScan` by adding `Broadcast`.
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Windows HIP SDK support
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- `BlockRadixRankMatch` is currently broken under the rocPRIM backend.
|
||||
- `BlockRadixRankMatch` with a warp size that does not exactly divide the block size is broken under the CUB backend.
|
||||
|
||||
#### hipFFT 1.0.11
|
||||
|
||||
hipFFT 1.0.11 for ROCm 5.5.0
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed old version rocm include/lib folders not removed on upgrade.
|
||||
|
||||
#### hipSOLVER 1.7.0
|
||||
|
||||
hipSOLVER 1.7.0 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added functions
|
||||
- gesvdj
|
||||
- hipsolverSgesvdj_bufferSize, hipsolverDgesvdj_bufferSize, hipsolverCgesvdj_bufferSize, hipsolverZgesvdj_bufferSize
|
||||
- hipsolverSgesvdj, hipsolverDgesvdj, hipsolverCgesvdj, hipsolverZgesvdj
|
||||
- gesvdjBatched
|
||||
- hipsolverSgesvdjBatched_bufferSize, hipsolverDgesvdjBatched_bufferSize, hipsolverCgesvdjBatched_bufferSize, hipsolverZgesvdjBatched_bufferSize
|
||||
- hipsolverSgesvdjBatched, hipsolverDgesvdjBatched, hipsolverCgesvdjBatched, hipsolverZgesvdjBatched
|
||||
|
||||
#### hipSPARSE 2.3.5
|
||||
|
||||
hipSPARSE 2.3.5 for ROCm 5.5.0
|
||||
|
||||
##### Improved
|
||||
|
||||
- Fixed an issue, where the rocm folder was not removed on upgrade of meta packages
|
||||
- Fixed a compilation issue with cusparse backend
|
||||
- Added more detailed messages on unit test failures due to missing input data
|
||||
- Improved documentation
|
||||
- Fixed a bug with deprecation messages when using gcc9 (Thanks @Maetveis)
|
||||
|
||||
#### rccl 2.15.5
|
||||
|
||||
RCCL 2.15.5 for ROCm 5.5.0
|
||||
|
||||
##### Changed
|
||||
|
||||
- Compatibility with NCCL 2.15.5
|
||||
- Unit test executable renamed to rccl-UnitTests
|
||||
|
||||
##### Added
|
||||
|
||||
- HW-topology aware binary tree implementation
|
||||
- Experimental support for MSCCL
|
||||
- New unit tests for hipGraph support
|
||||
- NPKit integration
|
||||
|
||||
##### Fixed
|
||||
|
||||
- rocm-smi ID conversion
|
||||
- Support for HIP_VISIBLE_DEVICES for unit tests
|
||||
- Support for p2p transfers to non (HIP) visible devices
|
||||
|
||||
##### Removed
|
||||
|
||||
- Removed TransferBench from tools. Exists in standalone repo: https://github.com/ROCmSoftwarePlatform/TransferBench
|
||||
|
||||
#### rocALUTION 2.1.8
|
||||
|
||||
rocALUTION 2.1.8 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added build support for Navi32
|
||||
|
||||
##### Improved
|
||||
|
||||
- Fixed a typo in MPI backend
|
||||
- Fixed a bug with the backend when HIP support is disabled
|
||||
- Fixed a bug in SAAMG hierarchy building on HIP backend
|
||||
- Improved SAAMG hierarchy build performance on HIP backend
|
||||
|
||||
##### Changed
|
||||
|
||||
- LocalVector::GetIndexValues(ValueType\*) is deprecated, use LocalVector::GetIndexValues(const LocalVector&, LocalVector\*) instead
|
||||
- LocalVector::SetIndexValues(const ValueType\*) is deprecated, use LocalVector::SetIndexValues(const LocalVector&, const LocalVector&) instead
|
||||
- LocalMatrix::RSDirectInterpolation(const LocalVector&, const LocalVector&, LocalMatrix\*, LocalMatrix\*) is deprecated, use LocalMatrix::RSDirectInterpolation(const LocalVector&, const LocalVector&, LocalMatrix\*) instead
|
||||
- LocalMatrix::RSExtPIInterpolation(const LocalVector&, const LocalVector&, bool, float, LocalMatrix\*, LocalMatrix\*) is deprecated, use LocalMatrix::RSExtPIInterpolation(const LocalVector&, const LocalVector&, bool, LocalMatrix\*) instead
|
||||
- LocalMatrix::RugeStueben() is deprecated
|
||||
- LocalMatrix::AMGSmoothedAggregation(ValueType, const LocalVector&, const LocalVector&, LocalMatrix\*, LocalMatrix\*, int) is deprecated, use LocalMatrix::AMGAggregation(ValueType, const LocalVector&, const LocalVector&, LocalMatrix\*, int) instead
|
||||
- LocalMatrix::AMGAggregation(const LocalVector&, LocalMatrix\*, LocalMatrix\*) is deprecated, use LocalMatrix::AMGAggregation(const LocalVector&, LocalMatrix\*) instead
|
||||
|
||||
#### rocBLAS 2.47.0
|
||||
|
||||
rocBLAS 2.47.0 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- added functionality rocblas_geam_ex for matrix-matrix minimum operations
|
||||
- added HIP Graph support as beta feature for rocBLAS Level 1, Level 2, and Level 3(pointer mode host) functions
|
||||
- added beta features API. Exposed using compiler define ROCBLAS_BETA_FEATURES_API
|
||||
- added support for vector initialization in the rocBLAS test framework with negative increments
|
||||
- added windows build documentation for forthcoming support using ROCm HIP SDK
|
||||
- added scripts to plot performance for multiple functions
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- improved performance of Level 2 rocBLAS GEMV for float and double precision. Performance enhanced by 150-200% for certain problem sizes when (m==n) measured on a gfx90a GPU.
|
||||
- improved performance of Level 2 rocBLAS GER for float, double and complex float precisions. Performance enhanced by 5-7% for certain problem sizes measured on a gfx90a GPU.
|
||||
- improved performance of Level 2 rocBLAS SYMV for float and double precisions. Performance enhanced by 120-150% for certain problem sizes measured on both gfx908 and gfx90a GPUs.
|
||||
|
||||
##### Fixed
|
||||
|
||||
- fixed setting of executable mode on client script rocblas_gentest.py to avoid potential permission errors with clients rocblas-test and rocblas-bench
|
||||
- fixed deprecated API compatibility with Visual Studio compiler
|
||||
- fixed test framework memory exception handling for Level 2 functions when the host memory allocation exceeds the available memory
|
||||
|
||||
##### Changed
|
||||
|
||||
- install.sh internally runs rmake.py (also used on windows) and rmake.py may be used directly by developers on linux (use --help)
|
||||
- rocblas client executables all now begin with rocblas- prefix
|
||||
|
||||
##### Removed
|
||||
|
||||
- install.sh removed options -o --cov as now Tensile will use the default COV format, set by cmake define Tensile_CODE_OBJECT_VERSION=default
|
||||
|
||||
#### rocFFT 1.0.22
|
||||
|
||||
rocFFT 1.0.22 for ROCm 5.5.0
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Improved performance of 1D lengths < 2048 that use Bluestein's algorithm.
|
||||
- Reduced time for generating code during plan creation.
|
||||
- Optimized 3D R2C/C2R lengths 32, 84, 128.
|
||||
- Optimized batched small 1D R2C/C2R cases.
|
||||
|
||||
##### Added
|
||||
|
||||
- Added gfx1101 to default AMDGPU_TARGETS.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Moved client programs to C++17.
|
||||
- Moved planar kernels and infrequently used Stockham kernels to be runtime-compiled.
|
||||
- Moved transpose, real-complex, Bluestein, and Stockham kernels to library kernel cache.
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Removed zero-length twiddle table allocations, which fixes errors from hipMallocManaged.
|
||||
- Fixed incorrect freeing of HIP stream handles during twiddle computation when multiple devices are present.
|
||||
|
||||
#### rocPRIM 2.13.0
|
||||
|
||||
rocPRIM 2.13.0 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- New block level `radix_rank` primitive.
|
||||
- New block level `radix_rank_match` primitive.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Improved the performance of `block_radix_sort` and `device_radix_sort`.
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- Disabled GPU error messages relating to incorrect warp operation usage with Navi GPUs on Windows, due to GPU printf performance issues on Windows.
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed benchmark build on Windows
|
||||
|
||||
#### rocRAND 2.10.17
|
||||
|
||||
rocRAND 2.10.17 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- MT19937 pseudo random number generator based on M. Matsumoto and T. Nishimura, 1998, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator.
|
||||
- New benchmark for the device API using Google Benchmark, `benchmark_rocrand_device_api`, replacing `benchmark_rocrand_kernel`. `benchmark_rocrand_kernel` is deprecated and will be removed in a future version. Likewise, `benchmark_curand_host_api` is added to replace `benchmark_curand_generate` and `benchmark_curand_device_api` is added to replace `benchmark_curand_kernel`.
|
||||
- experimental HIP-CPU feature
|
||||
- ThreeFry pseudorandom number generator based on Salmon et al., 2011, "Parallel random numbers: as easy as 1, 2, 3".
|
||||
|
||||
##### Changed
|
||||
|
||||
- Python 2.7 is no longer officially supported.
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Windows HIP SDK support
|
||||
|
||||
#### rocSOLVER 3.21.0
|
||||
|
||||
rocSOLVER 3.21.0 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- SVD for general matrices using Jacobi algorithm:
|
||||
- GESVDJ (with batched and strided\_batched versions)
|
||||
- LU factorization without pivoting for block tridiagonal matrices:
|
||||
- GEBLTTRF_NPVT (with batched and strided\_batched versions)
|
||||
- Linear system solver without pivoting for block tridiagonal matrices:
|
||||
- GEBLTTRS_NPVT (with batched and strided\_batched, versions)
|
||||
- Product of triangular matrices
|
||||
- LAUUM
|
||||
- Added experimental hipGraph support for rocSOLVER functions
|
||||
|
||||
##### Optimized
|
||||
|
||||
- Improved the performance of SYEVJ/HEEVJ.
|
||||
|
||||
##### Changed
|
||||
|
||||
- STEDC, SYEVD/HEEVD and SYGVD/HEGVD now use fully implemented Divide and Conquer approach.
|
||||
|
||||
##### Fixed
|
||||
|
||||
- SYEVJ/HEEVJ should now be invariant under matrix scaling.
|
||||
- SYEVJ/HEEVJ should now properly output the eigenvalues when no sweeps are executed.
|
||||
- Fixed GETF2\_NPVT and GETRF\_NPVT input data initialization in tests and benchmarks.
|
||||
- Fixed rocblas missing from the dependency list of the rocsolver deb and rpm packages.
|
||||
|
||||
#### rocSPARSE 2.5.1
|
||||
|
||||
rocSPARSE 2.5.1 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added bsrgemm and spgemm for BSR format
|
||||
- Added bsrgeam
|
||||
- Added build support for Navi32
|
||||
- Added experimental hipGraph support for some rocSPARSE routines
|
||||
- Added csritsv, spitsv csr iterative triangular solve
|
||||
- Added mixed precisions for SpMV
|
||||
- Added batched SpMM for transpose A in COO format with atomic atomic algorithm
|
||||
|
||||
##### Improved
|
||||
|
||||
- Optimization to csr2bsr
|
||||
- Optimization to csr2csr_compress
|
||||
- Optimization to csr2coo
|
||||
- Optimization to gebsr2csr
|
||||
- Optimization to csr2gebsr
|
||||
- Fixes to documentation
|
||||
- Fixes a bug in COO SpMV gridsize
|
||||
- Fixes a bug in SpMM gridsize when using very large matrices
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- In csritlu0, the algorithm rocsparse_itilu0_alg_sync_split_fusion has some accuracy issues to investigate with XNACK enabled. The fallback is rocsparse_itilu0_alg_sync_split.
|
||||
|
||||
#### rocWMMA 1.0
|
||||
|
||||
rocWMMA 1.0 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added support for wave32 on gfx11+
|
||||
- Added infrastructure changes to support hipRTC
|
||||
- Added performance tracking system
|
||||
|
||||
##### Changed
|
||||
|
||||
- Modified the assignment of hardware information
|
||||
- Modified the data access for unsigned datatypes
|
||||
- Added library config to support multiple architectures
|
||||
|
||||
#### Tensile 4.36.0
|
||||
|
||||
Tensile 4.36.0 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Add functions for user-driven tuning
|
||||
- Add GFX11 support: HostLibraryTests yamls, rearragne FP32(C)/FP64(C) instruction order, archCaps for instruction renaming condition, adjust vgpr bank for A/B/C for optimize, separate vscnt and vmcnt, dual mac
|
||||
- Add binary search for Grid-Based algorithm
|
||||
- Add reject condition for (StoreCInUnroll + BufferStore=0) and (DirectToVgpr + ScheduleIterAlg<3 + PrefetchGlobalRead==2)
|
||||
- Add support for (DirectToLds + hgemm + NN/NT/TT) and (DirectToLds + hgemm + GlobalLoadVectorWidth < 4)
|
||||
- Add support for (DirectToLds + hgemm(TLU=True only) or sgemm + NumLoadsCoalesced > 1)
|
||||
- Add GSU SingleBuffer algorithm for HSS/BSS
|
||||
- Add gfx900:xnack-, gfx1032, gfx1034, gfx1035
|
||||
- Enable gfx1031 support
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Use AssertSizeLessThan for BufferStoreOffsetLimitCheck if it is smaller than MT1
|
||||
- Improve InitAccVgprOpt
|
||||
|
||||
##### Changed
|
||||
|
||||
- Use global_atomic for GSU instead of flat and global_store for debug code
|
||||
- Replace flat_load/store with global_load/store
|
||||
- Use global_load/store for BufferLoad/Store=0 and enable scheduling
|
||||
- LocalSplitU support for HGEMM+HPA when MFMA disabled
|
||||
- Update Code Object Version
|
||||
- Type cast local memory to COMPUTE_DATA_TYPE in LDS to avoid precision loss
|
||||
- Update asm cap cache arguments
|
||||
- Unify SplitGlobalRead into ThreadSeparateGlobalRead and remove SplitGlobalRead
|
||||
- Change checks, error messages, assembly syntax, and coverage for DirectToLds
|
||||
- Remove unused cmake file
|
||||
- Clean up the LLVM dependency code
|
||||
- Update ThreadSeparateGlobalRead test cases for PrefetchGlobalRead=2
|
||||
- Update sgemm/hgemm test cases for DirectToLds and ThreadSepareteGlobalRead
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Add build-id to header of compiled source kernels
|
||||
- Fix solution index collisions
|
||||
- Fix h beta vectorwidth4 correctness issue for WMMA
|
||||
- Fix an error with BufferStore=0
|
||||
- Fix mismatch issue with (StoreCInUnroll + PrefetchGlobalRead=2)
|
||||
- Fix MoveMIoutToArch bug
|
||||
- Fix flat load correctness issue on I8 and flat store correctness issue
|
||||
- Fix mismatch issue with BufferLoad=0 + TailLoop for large array sizes
|
||||
- Fix code generation error with BufferStore=0 and StoreCInUnrollPostLoop
|
||||
- Fix issues with DirectToVgpr + ScheduleIterAlg<3
|
||||
- Fix mismatch issue with DGEMM TT + LocalReadVectorWidth=2
|
||||
- Fix mismatch issue with PrefetchGlobalRead=2
|
||||
- Fix mismatch issue with DirectToVgpr + PrefetchGlobalRead=2 + small tile size
|
||||
- Fix an error with PersistentKernel=0 + PrefetchAcrossPersistent=1 + PrefetchAcrossPersistentMode=1
|
||||
- Fix mismatch issue with DirectToVgpr + DirectToLds + only 1 iteration in unroll loop case
|
||||
- Remove duplicate GSU kernels: for GSU = 1, GSUAlgorithm SingleBuffer and MultipleBuffer kernels are identical
|
||||
- Fix for failing CI tests due to CpuThreads=0
|
||||
- Fix mismatch issue with DirectToLds + PrefetchGlobalRead=2
|
||||
- Remove the reject condition for ThreadSeparateGlobalRead and DirectToLds (HGEMM, SGEMM only)
|
||||
- Modify reject condition for minimum lanes of ThreadSeparateGlobalRead (SGEMM or larger data type only)
|
||||
Maintenance update #3, combined with ROCm 5.4.1, now provides SRIOV virtualization support for all AMD Instinct™ MI200 devices.
|
||||
|
||||
@@ -12,7 +12,7 @@ fetch="https://github.com/GPUOpen-ProfessionalCompute-Libraries/" />
|
||||
fetch="https://github.com/GPUOpen-Tools/" />
|
||||
<remote name="KhronosGroup"
|
||||
fetch="https://github.com/KhronosGroup/" />
|
||||
<default revision="refs/tags/rocm-5.5.0"
|
||||
<default revision="refs/tags/rocm-5.5.1"
|
||||
remote="roc-github"
|
||||
sync-c="true"
|
||||
sync-j="4" />
|
||||
|
||||
52
docs/conf.py
52
docs/conf.py
@@ -5,40 +5,21 @@
|
||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html
|
||||
|
||||
import shutil
|
||||
shutil.copy2('../CONTRIBUTING.md','./contributing.md')
|
||||
shutil.copy2('../RELEASE.md','./release.md')
|
||||
|
||||
|
||||
from rocm_docs import ROCmDocs
|
||||
|
||||
# working anchors that linkcheck cannot find
|
||||
linkcheck_anchors_ignore = [
|
||||
'd90e61',
|
||||
'd1667e113',
|
||||
'd2999e60',
|
||||
'building-from-source',
|
||||
'use-the-rocm-build-tool-rbuild',
|
||||
'use-cmake-to-build-migraphx',
|
||||
'example'
|
||||
]
|
||||
linkcheck_ignore = [
|
||||
# site to be built
|
||||
"https://rocmdocs.amd.com/projects/ROCmCC/en/latest/",
|
||||
"https://rocmdocs.amd.com/projects/amdsmi/en/latest/",
|
||||
"https://rocmdocs.amd.com/projects/rdc/en/latest/",
|
||||
"https://rocmdocs.amd.com/projects/rocmsmi/en/latest/",
|
||||
"https://rocmdocs.amd.com/projects/roctracer/en/latest/",
|
||||
"https://rocmdocs.amd.com/projects/MIGraphX/en/latest/",
|
||||
"https://rocmdocs.amd.com/projects/rocprofiler/en/latest/",
|
||||
# correct links that linkcheck times out on
|
||||
"https://github.com/ROCm-Developer-Tools/HIP-VS/blob/master/README.md",
|
||||
r"https://www.amd.com/system/files/.*.pdf",
|
||||
"https://www.amd.com/en/developer/aocc.html",
|
||||
"https://www.amd.com/en/support/linux-drivers",
|
||||
"https://www.amd.com/en/technologies/infinity-hub",
|
||||
r"https://bitbucket.org/icl/magma/*",
|
||||
"http://cs231n.stanford.edu/"
|
||||
]
|
||||
|
||||
shutil.copy2('../CONTRIBUTING.md','./contributing.md')
|
||||
shutil.copy2('../RELEASE.md','./release.md')
|
||||
# Keep capitalization due to similar linking on GitHub's markdown preview.
|
||||
shutil.copy2('../CHANGELOG.md','./CHANGELOG.md')
|
||||
|
||||
# configurations for PDF output by Read the Docs
|
||||
project = "ROCm Documentation"
|
||||
author = "Advanced Micro Devices, Inc."
|
||||
copyright = "Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved."
|
||||
version = "5.4.1"
|
||||
release = "5.4.1"
|
||||
|
||||
setting_all_article_info = True
|
||||
all_article_info_os = ["linux"]
|
||||
@@ -73,7 +54,7 @@ article_pages = [
|
||||
{"file":"how_to/system_debugging", "os":["linux"]},
|
||||
{"file":"how_to/tensorflow_install/tensorflow_install", "os":["linux"]},
|
||||
|
||||
{"file":"examples/ai_ml_inferencing", "os":["linux"]},
|
||||
{"file":"examples/machine_learning", "os":["linux"]},
|
||||
{"file":"examples/inception_casestudy/inception_casestudy", "os":["linux"]},
|
||||
|
||||
{"file":"understand/file_reorg", "os":["linux"]},
|
||||
@@ -83,8 +64,13 @@ article_pages = [
|
||||
|
||||
external_toc_path = "./sphinx/_toc.yml"
|
||||
|
||||
docs_core = ROCmDocs("ROCm Documentation")
|
||||
docs_core = ROCmDocs("ROCm 5.4.1 Documentation Home")
|
||||
docs_core.setup()
|
||||
|
||||
external_projects_current_project = "rocm"
|
||||
|
||||
for sphinx_var in ROCmDocs.SPHINX_VARS:
|
||||
globals()[sphinx_var] = getattr(docs_core, sphinx_var)
|
||||
html_theme_options = {
|
||||
"link_main_doc": False
|
||||
}
|
||||
|
||||
@@ -1,44 +0,0 @@
|
||||
# Deploy
|
||||
|
||||
Please follow the guides below to begin your ROCm journey. ROCm can be consumed
|
||||
via many mechanisms.
|
||||
:::::{grid} 1 1 3 3
|
||||
:gutter: 1
|
||||
|
||||
::::{grid-item-card}
|
||||
:padding: 2
|
||||
Quick Start
|
||||
^^^
|
||||
|
||||
- [Linux](quick_start)
|
||||
- [Windows](hip_sdk_install_win/hip_sdk_install_win)
|
||||
|
||||
::::
|
||||
|
||||
::::{grid-item-card}
|
||||
:padding: 2
|
||||
Docker
|
||||
^^^
|
||||
|
||||
- [Guide](deploy/docker)
|
||||
- [Dockerhub](https://hub.docker.com/u/rocm/)
|
||||
|
||||
::::
|
||||
|
||||
::::{grid-item-card}
|
||||
:padding: 2
|
||||
[Advanced](deploy/advanced)
|
||||
^^^
|
||||
|
||||
- [Uninstall](deploy/advanced/uninstall)
|
||||
- [Multi-ROCm Installations](deploy/advanced/multi)
|
||||
- [spack](deploy/advanced/spack)
|
||||
- [Build from Source](deploy/advanced/build_source)
|
||||
|
||||
::::
|
||||
|
||||
:::::
|
||||
|
||||
## Related Information
|
||||
|
||||
[Release Information](release)
|
||||
@@ -4,9 +4,9 @@
|
||||
|
||||
Docker containers share the kernel with the host operating system, therefore the
|
||||
ROCm kernel-mode driver must be installed on the host. Please refer to
|
||||
[](/deploy/linux/install) for details. The other user-space parts
|
||||
(like the HIP-runtime or math libraries) of the ROCm stack will be loaded from
|
||||
the container image and don't need to be installed to the host.
|
||||
{ref}`using-the-package-manager` on installing `amdgpu-dkms`. The other
|
||||
user-space parts (like the HIP-runtime or math libraries) of the ROCm stack will
|
||||
be loaded from the container image and don't need to be installed to the host.
|
||||
|
||||
(docker-access-gpus-in-container)=
|
||||
|
||||
|
||||
@@ -1,17 +1,13 @@
|
||||
# Deploy ROCm on Linux
|
||||
|
||||
Please start with the [Quick Start Linux](quick_start) or follow the detailed instructions below.
|
||||
Start with {doc}`/deploy/linux/quick_start` or follow the detailed
|
||||
instructions below.
|
||||
|
||||
::::{grid} 2 3 3 3
|
||||
## Prepare to Install
|
||||
|
||||
::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} Overview
|
||||
:link: install
|
||||
:link-type: doc
|
||||
|
||||
Overview and comparison of the different ways to install ROCm.
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Prerequisites
|
||||
:link: prerequisites
|
||||
:link-type: doc
|
||||
@@ -19,37 +15,39 @@ Overview and comparison of the different ways to install ROCm.
|
||||
The prerequisites page lists the required steps *before* installation.
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Installation
|
||||
:link: install
|
||||
:::{grid-item-card} Install Choices
|
||||
:link: install_overview
|
||||
:link-type: doc
|
||||
|
||||
Detailed steps to install with the package manager or with the installation
|
||||
script, including multi-version installation. Recommended for most users.
|
||||
Package manager vs AMDGPU Installer
|
||||
|
||||
Standard Packages vs Multi-Version Packages
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Upgrading
|
||||
:link: upgrade
|
||||
::::
|
||||
|
||||
## Choose your install method
|
||||
|
||||
::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} Package Manager
|
||||
:link: os-native/index
|
||||
:link-type: doc
|
||||
|
||||
Instructions for upgrading an existing ROCm installation.
|
||||
Directly use your distribution's package manager to install ROCm.
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Uninstallation
|
||||
:link: uninstall
|
||||
:::{grid-item-card} AMDGPU Installer
|
||||
:link: installer/index
|
||||
:link-type: doc
|
||||
|
||||
Steps for removing ROCm packages libraries and tools.
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Package Manager Integration
|
||||
:link: package_manager_integration
|
||||
:link-type: doc
|
||||
|
||||
Information about (meta-)packages in the ROCm ecosystem.
|
||||
Use an installer tool that orchestrates changes via the package
|
||||
manager.
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
## See Also
|
||||
|
||||
- [GPU and OS Support Linux](../../gpu_os_support.md)
|
||||
- {doc}`/release/gpu_os_support`
|
||||
|
||||
@@ -1,956 +0,0 @@
|
||||
# Installation (Linux)
|
||||
|
||||
Installing can be done in one of two ways, depending on your preference:
|
||||
|
||||
- Using an installer script
|
||||
- Through your system's package manager
|
||||
|
||||
```{attention}
|
||||
For information on installing ROCm on devices with NVIDIA GPUs, refer to the HIP
|
||||
Installation Guide.
|
||||
```
|
||||
|
||||
(install-script-method)=
|
||||
|
||||
## Installer Script Method
|
||||
|
||||
The installer script method automates the installation process for the AMDGPU
|
||||
and ROCm stack. The installer script handles the complete installation process
|
||||
for ROCm, including setting up the repository, cleaning the system, updating,
|
||||
and installing the desired drivers and meta-packages. With this approach, the
|
||||
system has more control over the ROCm installation process. Thus, those who are
|
||||
less familiar with the Linux standard commands can choose this method for ROCm
|
||||
installation.
|
||||
|
||||
For AMDGPU and ROCm installation using the installer script method on Linux
|
||||
distribution, follow these steps:
|
||||
|
||||
1. **Meet prerequisites** – Ensure the Prerequisites are met before downloading
|
||||
and installing the installer using the installer script method.
|
||||
|
||||
2. **Download and install the installer script** – Ensure you download and
|
||||
install the installer script from the recommended URL.
|
||||
|
||||
```{tip}
|
||||
The installer package is updated periodically to resolve known issues and add
|
||||
new features. The links for each Linux distribution always point to the latest
|
||||
available build.
|
||||
```
|
||||
|
||||
3. **Use the installer script on Linux distributions** – Ensure you execute the
|
||||
script for installing use cases.
|
||||
|
||||
### Download and Install the Installer Script
|
||||
|
||||
::::::{tab-set}
|
||||
:::::{tab-item} Ubuntu
|
||||
:sync: ubuntu
|
||||
|
||||
<!-- markdownlint-disable-next-line MD013 -->
|
||||
::::{rubric} To download the amdgpu-install script on the system, use the following commands.
|
||||
::::
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
sudo apt update
|
||||
wget https://repo.radeon.com/amdgpu-install/5.4.3/ubuntu/focal/amdgpu-install_5.4.50403-1_all.deb
|
||||
sudo apt install ./amdgpu-install_5.4.50403-1_all.deb
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
sudo apt update
|
||||
wget https://repo.radeon.com/amdgpu-install/5.4.3/ubuntu/jammy/amdgpu-install_5.4.50403-1_all.deb
|
||||
sudo apt install ./amdgpu-install_5.4.50403-1_all.deb
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
:::::{tab-item} Red Hat Enterprise Linux
|
||||
:sync: RHEL
|
||||
|
||||
<!-- markdownlint-disable-next-line MD013 -->
|
||||
::::{rubric} To download the amdgpu-install script on the system, use the following commands.
|
||||
::::
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8.6
|
||||
:sync: RHEL-8.6
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.4.3/rhel/8.6/amdgpu-install-5.4.50403-1.el8.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 8.7
|
||||
:sync: RHEL-8.7
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.4.3/rhel/8.7/amdgpu-install-5.4.50403-1.el8.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 9.1
|
||||
:sync: RHEL-9.1
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.4.3/rhel/9.1/amdgpu-install-5.4.50403-1.el9.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
|
||||
<!-- markdownlint-disable-next-line MD013 -->
|
||||
::::{rubric} To download the amdgpu-install script on the system, use the following commands.
|
||||
::::
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Service Pack 4
|
||||
:sync: SLES15-SP4
|
||||
|
||||
```shell
|
||||
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.4.3/sle/15.4/amdgpu-install-5.4.50403-1.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
::::::
|
||||
|
||||
### Using the Installer Script for Single-version ROCm Installation
|
||||
|
||||
To install use cases specific to your requirements, use the installer
|
||||
`amdgpu-install` as follows:
|
||||
|
||||
- To install a single use case:
|
||||
|
||||
```shell
|
||||
sudo amdgpu-install --usecase=rocm
|
||||
```
|
||||
|
||||
- To install kernel-mode driver:
|
||||
|
||||
```shell
|
||||
sudo amdgpu-install --usecase=dkms
|
||||
```
|
||||
|
||||
- To install multiple use cases:
|
||||
|
||||
```shell
|
||||
sudo amdgpu-install --usecase=hiplibsdk,rocm
|
||||
```
|
||||
|
||||
- To display a list of available use cases:
|
||||
|
||||
```shell
|
||||
sudo amdgpu-install --list-usecase
|
||||
```
|
||||
|
||||
Following is a sample of output listed by the command above:
|
||||
|
||||
```{note}
|
||||
The list in this section represents only a sample of available use cases for ROCm:
|
||||
```
|
||||
|
||||
```none
|
||||
If --usecase option is not present, the default selection is "graphics,opencl,hip"
|
||||
|
||||
Available use cases:
|
||||
rocm(for users and developers requiring full ROCm stack)
|
||||
- OpenCL (ROCr/KFD based) runtime
|
||||
- HIP runtimes
|
||||
- Machine learning framework
|
||||
- All ROCm libraries and applications
|
||||
- ROCm Compiler and device libraries
|
||||
- ROCr runtime and thunk
|
||||
lrt(for users of applications requiring ROCm runtime)
|
||||
- ROCm Compiler and device libraries
|
||||
- ROCr runtime and thunk
|
||||
opencl(for users of applications requiring OpenCL on Vega or
|
||||
later products)
|
||||
- ROCr based OpenCL
|
||||
- ROCm Language runtime
|
||||
|
||||
openclsdk (for application developers requiring ROCr based OpenCL)
|
||||
- ROCr based OpenCL
|
||||
- ROCm Language runtime
|
||||
- development and SDK files for ROCr based OpenCL
|
||||
|
||||
hip(for users of HIP runtime on AMD products)
|
||||
- HIP runtimes
|
||||
hiplibsdk (for application developers requiring HIP on AMD products)
|
||||
- HIP runtimes
|
||||
- ROCm math libraries
|
||||
- HIP development libraries
|
||||
```
|
||||
|
||||
```{tip}
|
||||
Adding `-y` as a parameter to `amdgpu-install` skips user prompts (for
|
||||
automation). Example: `amdgpu-install -y --usecase=rocm`
|
||||
```
|
||||
|
||||
### Using Installer Script in Docker
|
||||
|
||||
When the installation is initiated in Docker, the installer tries to install the
|
||||
use case along with the kernel-mode driver. However, you cannot install the
|
||||
kernel-mode driver in a Docker container. To skip the installation of the
|
||||
kernel-mode driver, proceed with the `--no-dkms` option, as shown below:
|
||||
|
||||
```shell
|
||||
sudo amdgpu-install --usecase=rocm --no-dkms
|
||||
```
|
||||
|
||||
### Using the Installer Script for Multi-version ROCm Installation
|
||||
|
||||
The multi-version ROCm installation requires you to download and install the
|
||||
latest ROCm release installer from the list of ROCm releases you want to install
|
||||
simultaneously on your system.
|
||||
|
||||
**Example:** If you want to install ROCm releases 4.5.0, 4.5.1, and 5.4.3
|
||||
simultaneously, you are required to download the installer from the latest ROCm
|
||||
release v5.4.3.
|
||||
|
||||
To download and install the installer, refer to the [Download and Install the
|
||||
Installer Script](#download-and-install-the-installer-script) section.
|
||||
|
||||
```{attention}
|
||||
If the existing ROCm release contains non-versioned ROCm packages, uninstall
|
||||
those packages before proceeding with the multi-version installation to avoid
|
||||
conflicts.
|
||||
```
|
||||
|
||||
#### Add Required ROCm Repositories
|
||||
|
||||
Add the required repositories using the following steps:
|
||||
|
||||
```{important}
|
||||
Add the AMDGPU and ROCm repositories manually for all ROCm releases you want to
|
||||
install except the latest one. The amdgpu-install script automatically adds the
|
||||
required repositories for the latest release.
|
||||
```
|
||||
|
||||
::::::{tab-set}
|
||||
:::::{tab-item} Ubuntu
|
||||
:sync: ubuntu
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
for ver in 5.0.2 5.1.4 5.2.5 5.3.3; do
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" | sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
done
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
for ver in 5.0.2 5.1.4 5.2.5 5.3.3; do
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" | sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
done
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
:::::{tab-item} Red Hat Enterprise Linux
|
||||
:sync: RHEL
|
||||
|
||||
```shell
|
||||
for ver in 5.0.2 5.1.4 5.2.5 5.3.3; do
|
||||
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-$ver]
|
||||
Name=ROCm$ver
|
||||
baseurl=https://repo.radeon.com/rocm/$ver/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
done
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
|
||||
```shell
|
||||
for ver in 5.0.2 5.1.4 5.2.5 5.3.3; do
|
||||
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/amdgpu/$ver/sle/15.4/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
done
|
||||
sudo zypper ref
|
||||
```
|
||||
|
||||
:::::
|
||||
::::::
|
||||
|
||||
#### Use the Installer to Install Multi-version ROCm Meta-packages
|
||||
|
||||
Use the installer script as given below:
|
||||
|
||||
```none
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=<release-number-1>
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=<release-number-2>
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=<release-number-3>
|
||||
```
|
||||
|
||||
```{tip}
|
||||
If the kernel-mode driver is already present on the system and you do not want
|
||||
to upgrade it, use the `--no-dkms` option to skip the installation of the
|
||||
kernel-mode driver, as shown in the following samples:
|
||||
```
|
||||
|
||||
```none
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=4.5.0 --no-dkms
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=5.4.3 --no-dkms
|
||||
```
|
||||
|
||||
Following are examples of ROCm multi-version installation. The kernel-mode
|
||||
driver, associated with the ROCm release v5.4.3, will be installed as its latest
|
||||
release in the list.
|
||||
|
||||
```none
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=4.5.0
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=4.5.2
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=5.4.3
|
||||
```
|
||||
|
||||
## Package Manager Method
|
||||
|
||||
The package manager method involves a manual setup of the repository, which
|
||||
includes setting up the repository, updating, and installing/uninstalling
|
||||
meta-packages. This involves using standard commands such as yum, apt, and
|
||||
others respective to the Linux distribution.
|
||||
|
||||
The functions of a package manager installation system are:
|
||||
|
||||
- Grouping packages based on function
|
||||
- Extracting package archives
|
||||
- Ensuring a package is installed with all necessary packages and dependencies
|
||||
are managed
|
||||
- From a remote repository, looking up, downloading, installing, or updating
|
||||
existing packages
|
||||
- Ensuring the authenticity and integrity of the package
|
||||
|
||||
### Installing ROCm on Linux Distributions
|
||||
|
||||
For a fresh ROCm installation using the package manager method on a Linux
|
||||
distribution, follow the steps below:
|
||||
|
||||
1. **Meet prerequisites** – Ensure the Prerequisites are met before the ROCm
|
||||
installation.
|
||||
|
||||
2. **Install kernel headers and development packages** – Ensure kernel headers
|
||||
and development packages are installed on the system.
|
||||
|
||||
3. **Select the base URLs for AMDGPU and ROCm stack repository** – Ensure the
|
||||
base URLs for AMDGPU and ROCm stack repositories are selected.
|
||||
|
||||
4. **Add the AMDGPU stack repository** – Ensure the AMDGPU stack repository is
|
||||
added.
|
||||
|
||||
5. **Install the kernel-mode driver and reboot the system** – Ensure the
|
||||
kernel-mode driver is installed and the system is rebooted.
|
||||
|
||||
6. **Add ROCm stack repository** – Ensure the ROCm stack repository is added.
|
||||
|
||||
7. **Install single-version or multi-version ROCm meta-packages** – Install the
|
||||
desired meta-packages.
|
||||
|
||||
8. **Verify installation for the applicable distributions** – Verify if the
|
||||
installation is successful.
|
||||
|
||||
```{important}
|
||||
You cannot install a kernel-mode driver in a Docker container. Refer to the
|
||||
sections below for specific commands to install the AMDGPU and ROCm stack on
|
||||
various Linux distributions.
|
||||
```
|
||||
|
||||
#### Understanding the Release-specific AMDGPU and ROCm Stack Repositories on Linux Distributions
|
||||
|
||||
The release-specific repositories consist of packages from a specific release of
|
||||
the AMDGPU stack and ROCm stack. The repositories are not updated for the latest
|
||||
packages with subsequent releases. When a new ROCm release is available, the new
|
||||
repository, specific to that release, is added. You can select a specific
|
||||
release to install, update the previously installed single version to the later
|
||||
available release, or add the latest version of ROCm along with the currently
|
||||
installed version by using the multi-version ROCm packages.
|
||||
|
||||
```{note}
|
||||
Users installing multiple versions of the ROCm stack must use the
|
||||
release-specific base URL.
|
||||
```
|
||||
|
||||
#### Using the Package Manager
|
||||
|
||||
::::::{tab-set}
|
||||
:::::{tab-item} Ubuntu
|
||||
:sync: ubuntu
|
||||
|
||||
::::{rubric} Installation of Kernel Headers and Development Packages
|
||||
::::
|
||||
|
||||
The following instructions to install kernel headers and development packages
|
||||
apply to all versions and kernels of Ubuntu. The ROCm installation requires you
|
||||
to install the Linux-headers and Linux-modules-extra package with the correct
|
||||
version corresponding to the kernel's version.
|
||||
|
||||
**Example:** If the system is running the Linux kernel version
|
||||
`5.15.0-41-generic`, you must install the identical versions of Linux-headers
|
||||
and development packages. Refer to {ref}`check-kernel-info` on to how to check
|
||||
the system's kernel version.
|
||||
|
||||
To check the `kernel-headers` and `linux-modules-extra` package versions,
|
||||
follow these steps:
|
||||
|
||||
1. For the Ubuntu/Debian environment, execute the following command to verify
|
||||
the kernel headers and development packages are installed with the
|
||||
respective versions:
|
||||
|
||||
```shell
|
||||
sudo dpkg -l | grep linux-headers
|
||||
```
|
||||
|
||||
The command indicates if there are Linux headers installed as shown below:
|
||||
|
||||
```none
|
||||
ii linux-headers-5.15.0-41-generic 5.15.0-41.44~20.04.1 amd64 Linux kernel headers for version 5.15.0 on 64 bit x86 SMP
|
||||
```
|
||||
|
||||
2. Execute the following command to check whether the development packages are
|
||||
installed:
|
||||
|
||||
```shell
|
||||
sudo dpkg -l | grep linux-modules-extra
|
||||
```
|
||||
|
||||
The command mentioned above lists the installed `linux-modules-extra`
|
||||
packages like the output below:
|
||||
|
||||
```none
|
||||
ii linux-modules-extra-5.15.0-41-generic 5.15.0-41.44~20.04.1 amd64 Linux kernel extra modules for version 5.15.0 on 64 bit x86 SMP
|
||||
```
|
||||
|
||||
3. If the supported version installation of Linux headers and development
|
||||
packages are not installed on the system, execute the following command
|
||||
to install the packages:
|
||||
|
||||
```shell
|
||||
sudo apt install linux-headers-`uname -r` linux-modules-extra-`uname -r`
|
||||
```
|
||||
|
||||
::::{rubric} Adding the AMDGPU and ROCm Stack Repositories
|
||||
::::
|
||||
|
||||
1. Add GPG Key for AMDGPU and ROCm Stack
|
||||
|
||||
Add the GPG key for AMDGPU and ROCm repositories. For Debian-based systems
|
||||
like Ubuntu, configure the Debian ROCm repository as follows:
|
||||
|
||||
```shell
|
||||
curl -fsSL https://repo.radeon.com/rocm/rocm.gpg.key | sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/rocm-keyring.gpg
|
||||
```
|
||||
|
||||
```{note}
|
||||
The GPG key may change; ensure it is updated when installing a new release. If
|
||||
the key signature verification fails while updating, re-add the key from the
|
||||
ROCm to the apt repository as mentioned above. The current `rocm.gpg.key` is not
|
||||
available in a standard key ring distribution but has the following SHA1 sum
|
||||
hash: `73f5d8100de6048aa38a8b84cd9a87f05177d208 rocm.gpg.key`
|
||||
```
|
||||
|
||||
2. Add the AMDGPU Stack Repository and Install the Kernel-mode Driver
|
||||
|
||||
```{attention}
|
||||
If you have a version of the kernel-mode driver installed, you may skip this
|
||||
section.
|
||||
```
|
||||
|
||||
To add the AMDGPU stack repository, follow these steps:
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/amdgpu/5.4.3/ubuntu focal main' | sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/amdgpu/5.4.3/ubuntu jammy main' | sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
Install the kernel mode driver and reboot the system using the following
|
||||
commands:
|
||||
|
||||
```shell
|
||||
sudo apt install amdgpu-dkms
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
3. Add the ROCm Stack Repository and Install Meta-packages
|
||||
|
||||
To add the ROCm repository, use the following steps:
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
for ver in 5.0.2 5.1.4 5.2.5 5.3.3 5.4.3; do
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" | sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
done
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
for ver in 5.0.2 5.1.4 5.2.5 5.3.3 5.4.3; do
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" | sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
done
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
Install packages of your choice in a single-version ROCm install or
|
||||
in a multi-version ROCm install fashion. For more information on what
|
||||
single/multi-version installations are, refer to {ref}`installation-types`.
|
||||
For a comprehensive list of meta-packages, refer to
|
||||
{ref}`meta-package-desc`.
|
||||
|
||||
- Sample Single-version installation
|
||||
|
||||
```shell
|
||||
sudo apt install rocm-hip-sdk
|
||||
```
|
||||
|
||||
- Sample Multi-version installation
|
||||
|
||||
```{important}
|
||||
If the existing ROCm release contains non-versioned ROCm packages, you must
|
||||
uninstall those packages before proceeding to the multi-version installation
|
||||
to avoid conflicts.
|
||||
```
|
||||
|
||||
```shell
|
||||
sudo apt install rocm-hip-sdk5.4.3 rocm-hip-sdk5.2.5
|
||||
```
|
||||
|
||||
:::::
|
||||
:::::{tab-item} Red Hat Enterprise Linux
|
||||
:sync: RHEL
|
||||
|
||||
::::{rubric} Installation of Kernel Headers and Development Packages
|
||||
::::
|
||||
|
||||
The ROCm installation requires that you install the kernel headers and
|
||||
`linux-modules-extra` package with the correct version corresponding to the
|
||||
kernel's version.
|
||||
|
||||
**Example:** If the system is running Linux kernel version
|
||||
`3.10.0-1160.el7.x86_64`, you must install the identical versions of kernel
|
||||
headers and development packages. Refer to {ref}`check-kernel-info` on to how to
|
||||
check the system's kernel version.
|
||||
|
||||
To check the kernel headers and `linux-modules-extra` package versions,
|
||||
follow these steps:
|
||||
|
||||
1. To verify you have the supported version of the installed kernel headers,
|
||||
type the following on the command line:
|
||||
|
||||
```shell
|
||||
sudo yum list installed kernel-headers
|
||||
```
|
||||
|
||||
The command mentioned above displays the list of kernel headers versions
|
||||
currently present on your system. Verify if the listed kernel headers have
|
||||
the same versions as the kernel.
|
||||
|
||||
2. The following command lists the development packages on your system. Verify
|
||||
if the listed development package's version number matches the kernel
|
||||
version number:
|
||||
|
||||
```shell
|
||||
sudo yum list installed kernel-devel
|
||||
```
|
||||
|
||||
3. If the supported version installation of kernel headers and development
|
||||
packages does not exist on the system, execute the command below to install:
|
||||
|
||||
```shell
|
||||
sudo yum install kernel-headers-`uname -r` kernel-devel-`uname -r`
|
||||
```
|
||||
|
||||
::::{rubric} Adding the AMDGPU and ROCm Stack Repositories
|
||||
::::
|
||||
|
||||
1. Add the AMDGPU Stack Repository and Install the Kernel-mode Driver
|
||||
|
||||
```{attention}
|
||||
If you have a version of the kernel-mode driver installed, you may skip this
|
||||
section.
|
||||
```
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8.6
|
||||
:sync: RHEL-8.6
|
||||
|
||||
```shell
|
||||
sudo tee --append /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
Name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.3/rhel/8.6/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL 8.7
|
||||
:sync: RHEL-8.7
|
||||
|
||||
```shell
|
||||
sudo tee --append /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
Name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.3/rhel/8.7/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL 9.1
|
||||
:sync: RHEL-9.1
|
||||
|
||||
```shell
|
||||
sudo tee --append /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
Name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.3/rhel/9.2/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
Install the kernel mode driver and reboot the system using the following
|
||||
commands:
|
||||
|
||||
```shell
|
||||
sudo yum install amdgpu-dkms
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
2. Add the ROCm Stack Repository and Install Meta-packages
|
||||
|
||||
To add the ROCm repository, use the following steps:
|
||||
|
||||
```shell
|
||||
for ver in 5.0.2 5.1.4 5.2.5 5.3.3 5.4.3; do
|
||||
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-$ver]
|
||||
Name=ROCm$ver
|
||||
baseurl=https://repo.radeon.com/rocm/$ver/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
done
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
Install packages of your choice in a single-version ROCm install or
|
||||
in a multi-version ROCm install fashion. For more information on what
|
||||
single/multi-version installations are, refer to {ref}`installation-types`.
|
||||
For a comprehensive list of meta-packages, refer to
|
||||
{ref}`meta-package-desc`.
|
||||
|
||||
- Sample Single-version installation
|
||||
|
||||
```shell
|
||||
sudo yum install rocm-hip-sdk
|
||||
```
|
||||
|
||||
- Sample Multi-version installation
|
||||
|
||||
```{important}
|
||||
If the existing ROCm release contains non-versioned ROCm packages, you must
|
||||
uninstall those packages before proceeding to the multi-version installation
|
||||
to avoid conflicts.
|
||||
```
|
||||
|
||||
```shell
|
||||
sudo yum install rocm-hip-sdk5.4.3 rocm-hip-sdk5.2.5
|
||||
```
|
||||
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
|
||||
::::{rubric} Installation of Kernel Headers and Development Packages
|
||||
::::
|
||||
|
||||
ROCm installation requires you to install `linux-headers` and
|
||||
`linux-modules-extra` package with the correct version corresponding to the
|
||||
kernel's version.
|
||||
|
||||
**Example:** If the system is running the Linux kernel version
|
||||
`5.3.18-57_11.0.18`, you must install the same versions of Linux headers and
|
||||
development packages. Refer to {ref}`check-kernel-info` on to how to check
|
||||
the system's kernel version.
|
||||
|
||||
To check the `kernel-headers` and `linux-modules-extra` package versions, follow
|
||||
these steps:
|
||||
|
||||
1. Ensure that the correct version of the latest `kernel-default-devel` and
|
||||
`kernel-default` packages are installed. The following command lists the
|
||||
installed `kernel-default-devel` and `kernel-default` package:
|
||||
|
||||
```shell
|
||||
sudo zypper info kernel-default-devel or kernel-default
|
||||
```
|
||||
|
||||
```{note}
|
||||
This next step is only required if you find from the above command that the
|
||||
`kernel-default-devel` and `kernel-default` versions of the package,
|
||||
corresponding to the kernel release version, do not exist on your system.
|
||||
```
|
||||
|
||||
2. If the required version of packages does not exist on the system, install
|
||||
with the command below:
|
||||
|
||||
```shell
|
||||
sudo zypper install kernel-default-devel or kernel-default
|
||||
```
|
||||
|
||||
::::{rubric} Adding the AMDGPU and ROCm Stack Repositories
|
||||
::::
|
||||
|
||||
1. Add the AMDGPU Stack Repository and Install the Kernel-mode Driver
|
||||
|
||||
```{attention}
|
||||
If you have a version of the kernel-mode driver installed, you may skip this
|
||||
section.
|
||||
```
|
||||
|
||||
```shell
|
||||
sudo tee --append /etc/zypp/repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.3/sle/15.4/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo zypper ref
|
||||
```
|
||||
|
||||
Install the kernel mode driver and reboot the system using the following
|
||||
commands:
|
||||
|
||||
```shell
|
||||
sudo zypper --gpg-auto-import-keys install amdgpu-dkms
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
2. Add the ROCm Stack Repository and Install Meta-packages
|
||||
|
||||
To add the ROCm repository, use the following steps:
|
||||
|
||||
```shell
|
||||
for ver in 5.0.2 5.1.4 5.2.5 5.3.3 5.4.3; do
|
||||
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/amdgpu/$ver/sle/15.4/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
done
|
||||
sudo zypper ref
|
||||
```
|
||||
|
||||
Install packages of your choice in a single-version ROCm install or
|
||||
in a multi-version ROCm install fashion. For more information on what
|
||||
single/multi-version installations are, refer to {ref}`installation-types`.
|
||||
For a comprehensive list of meta-packages, refer to
|
||||
{ref}`meta-package-desc`.
|
||||
|
||||
- Sample Single-version installation
|
||||
|
||||
```shell
|
||||
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk
|
||||
```
|
||||
|
||||
- Sample Multi-version installation
|
||||
|
||||
```{important}
|
||||
If the existing ROCm release contains non-versioned ROCm packages, you must
|
||||
uninstall those packages before proceeding to the multi-version installation
|
||||
to avoid conflicts.
|
||||
```
|
||||
|
||||
```shell
|
||||
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk5.4.3 rocm-hip-sdk5.2.5
|
||||
```
|
||||
|
||||
:::::
|
||||
::::::
|
||||
|
||||
(post-install-actions-linux)=
|
||||
|
||||
## Post-install Actions and Verification Process
|
||||
|
||||
The post-install actions listed here are optional and depend on your use case,
|
||||
but are generally useful. Verification of the install is advised.
|
||||
|
||||
### Post-install Actions
|
||||
|
||||
1. Instruct the system linker where to find the shared objects (`.so` files) for
|
||||
ROCm applications.
|
||||
|
||||
```shell
|
||||
sudo tee --append /etc/ld.so.conf.d/rocm.conf <<EOF
|
||||
/opt/rocm/lib
|
||||
/opt/rocm/lib64
|
||||
EOF
|
||||
sudo ldconfig
|
||||
```
|
||||
|
||||
```{note}
|
||||
Multi-version installations require extra care. Having multiple versions on
|
||||
the system linker library search path is unadvised. One must take care both
|
||||
at compile-time and at run-time to assure that the proper libraries are
|
||||
picked up. You can override `ld.so.conf` entries on a case-by-case basis
|
||||
using the `LD_LIBRARY_PATH` environmental variable.
|
||||
```
|
||||
|
||||
2. Add binary paths to the `PATH` environment variable.
|
||||
|
||||
```shell
|
||||
export PATH=$PATH:/opt/rocm-5.4.3/bin:/opt/rocm-5.4.3/opencl/bin
|
||||
```
|
||||
|
||||
```{attention}
|
||||
When using CMake to build applications, having the ROCm install location on
|
||||
the PATH subtly affects how ROCm libraries are searched for. See [Config Mode
|
||||
Search Procedure](https://cmake.org/cmake/help/latest/command/find_package.html#config-mode-search-procedure)
|
||||
and [CMAKE_FIND_USE_SYSTEM_ENVIRONMENT_PATH](https://cmake.org/cmake/help/latest/variable/CMAKE_FIND_USE_SYSTEM_ENVIRONMENT_PATH.html)
|
||||
for details.
|
||||
|
||||
(Entries in the `PATH` minus `bin` and `sbin` are added to library search
|
||||
paths, therefore this convenience will affect builds and result in ROCm
|
||||
libraries almost always being found. This may be an issue when you're
|
||||
developing these libraries or want to use self-built versions of them.)
|
||||
```
|
||||
|
||||
(verifying-kernel-mode-driver-installation)=
|
||||
|
||||
### Verifying Kernel-mode Driver Installation
|
||||
|
||||
Check the installation of the kernel-mode driver by typing the command given
|
||||
below:
|
||||
|
||||
```shell
|
||||
dkms status
|
||||
```
|
||||
|
||||
### Verifying ROCm Installation
|
||||
|
||||
After completing the ROCm installation, execute the following commands on the
|
||||
system to verify if the installation is successful. If you see your GPUs listed
|
||||
by both commands, the installation is considered successful:
|
||||
|
||||
```shell
|
||||
/opt/rocm/bin/rocminfo
|
||||
# OR
|
||||
/opt/rocm/opencl/bin/clinfo
|
||||
```
|
||||
|
||||
### Verifying Package Installation
|
||||
|
||||
To ensure the packages are installed successfully, use the following commands:
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu
|
||||
:sync: ubuntu
|
||||
|
||||
```shell
|
||||
sudo apt list --installed
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Red Hat Enterprise Linux
|
||||
:sync: RHEL
|
||||
|
||||
```shell
|
||||
sudo yum list installed
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
|
||||
```shell
|
||||
sudo zypper search --installed-only
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
@@ -1,111 +1,38 @@
|
||||
# Installation Overview (Linux)
|
||||
# ROCm Installation Options (Linux)
|
||||
|
||||
This document is intended for users familiar with Linux and discusses the
|
||||
installation of ROCm on various distributions.
|
||||
Users installing ROCm must choose between various installation options. A new
|
||||
user should follow the [Quick Start guide](./quick_start).
|
||||
|
||||
The guide provides instructions for the following:
|
||||
## Package Manager versus AMDGPU Installer?
|
||||
|
||||
- Kernel-mode driver installation
|
||||
- ROCm single-version and multi-version installation
|
||||
- ROCm and kernel-mode driver version upgrade
|
||||
- ROCm single-version and multi-version uninstallation
|
||||
- Kernel-mode driver uninstallation
|
||||
ROCm supports two methods for installation:
|
||||
|
||||
```{note}
|
||||
The rest of this document refers to _Radeon™ Software for Linux_ as the `amdgpu`
|
||||
stack and `amdgpu-dkms` driver as the kernel-mode driver.
|
||||
```
|
||||
- Directly using the Linux distribution's package manager
|
||||
- The `amdgpu-install` script
|
||||
|
||||
## Installation Methods
|
||||
There is no difference in the final installation state when choosing either
|
||||
option.
|
||||
|
||||
It is customary for Linux installers to integrate into the system's package
|
||||
manager. There are two notable groups of package sources:
|
||||
Using the distribution's package manager lets the user install,
|
||||
upgrade and uninstall using familiar commands and workflows. Third party
|
||||
ecosystem support is the same as your OS package manager.
|
||||
|
||||
- AMD-hosted repositories maintained by AMD available to register on supported
|
||||
Linux distribution versions. For a complete list of AMD-supported platforms,
|
||||
refer to the article: [GPU and OS Support](/release/gpu_os_support).
|
||||
- Distribution-hosted repositories maintained by the developer of said Linux
|
||||
distribution. These require little to no setup from the user, but aren't tested
|
||||
by AMD. For support on these installations, contact the relevant maintainers.
|
||||
The `amdgpu-install` script is a wrapper around the package manager. The same
|
||||
packages are installed by this script as the package manager system.
|
||||
|
||||
AMD also provides installer scripts for those that wish to drive installations
|
||||
in a more manual fashion.
|
||||
|
||||
## Package Licensing
|
||||
|
||||
```{attention}
|
||||
AQL Profiler and AOCC CPU optimization are both provided in binary form, each
|
||||
subject to the license agreement enclosed in the directory for the binary and is
|
||||
available here: `/opt/rocm/share/doc/rocm-llvm-alt/EULA`. By using, installing,
|
||||
copying or distributing AQL Profiler and/or AOCC CPU Optimizations, you agree to
|
||||
the terms and conditions of this license agreement. If you do not agree to the
|
||||
terms of this agreement, do not install, copy or use the AQL Profiler and/or the
|
||||
AOCC CPU Optimizations.
|
||||
```
|
||||
|
||||
For the rest of the ROCm packages, you can find the licensing information at the
|
||||
following location: `/opt/rocm/share/doc/<component-name>/`
|
||||
|
||||
For example, you can fetch the licensing information of the `_amd_comgr_`
|
||||
component (Code Object Manager) from the `amd_comgr` folder. A file named
|
||||
`LICENSE.txt` contains the license details at:
|
||||
`/opt/rocm-5.4.3/share/doc/amd_comgr/LICENSE.txt`
|
||||
|
||||
### Package Manager Integration
|
||||
|
||||
Integrating with the distribution's package manager let's the user install,
|
||||
upgrade and uninstall using familiar commands and workflows. The actual commands
|
||||
vary from distribution to distribution. For more information, refer to
|
||||
[Package Manager Integration](package_manager_integration).
|
||||
|
||||
### Installer Script
|
||||
|
||||
The `amdgpu-install` script streamlines the installation process by:
|
||||
|
||||
- Abstracting the distribution-specific package installation logic
|
||||
- Performing the repository setup
|
||||
- Allowing you to specify the use case and automating the installation of all
|
||||
the required packages
|
||||
- Installing multiple ROCm releases simultaneously on a system
|
||||
- Automating updating local repository information through enhanced
|
||||
functionality of the `amdgpu-install` script
|
||||
- Performing post-install checks to verify whether the installation was
|
||||
completed successfully
|
||||
- Upgrading the installed ROCm release
|
||||
- Uninstalling the installed single-version or multi-version ROCm releases
|
||||
|
||||
```{tip}
|
||||
The installer script is provided for convenience. It doesn't do anything the
|
||||
user otherwise couldn't. It automates some tasks surrounding installation, such
|
||||
as registering/unregistering and driving the system's package manager, but the
|
||||
bulk of the work will still be done by the system's package manager. As is the
|
||||
case with most convenience wrappers, some degree of customization is lost for
|
||||
the sake of simplicity.
|
||||
```
|
||||
|
||||
#### Use cases
|
||||
|
||||
The installer script introduces the notion of "use cases", which denote usage
|
||||
patterns or reasons why someone installs ROCm. This is to allow users to install
|
||||
only a subset of the ROCm ecosystem, parts concerning them, resulting in
|
||||
smaller installation footprint and faster installs/upgrades.
|
||||
|
||||
Some of the ROCm-specific use cases the installer supports are:
|
||||
|
||||
- OpenCL (ROCr/KFD based) runtime
|
||||
- HIP runtimes
|
||||
- ROCm libraries and applications
|
||||
- ROCm Compiler and device libraries
|
||||
- Kernel-mode driver
|
||||
|
||||
For more information, refer to the How to Install ROCm section in this guide.
|
||||
The installer automates the installation process for the AMDGPU
|
||||
and ROCm stack. It handles the complete installation process
|
||||
for ROCm, including setting up the repository, cleaning the system, updating,
|
||||
and installing the desired drivers and meta-packages. Users who are
|
||||
less familiar with the package manager can choose this method for ROCm
|
||||
installation.
|
||||
|
||||
(installation-types)=
|
||||
|
||||
## Installation types
|
||||
## Single Version ROCm install versus Multi-Version
|
||||
|
||||
This section discusses the single-version and multi-version installation of the
|
||||
ROCm software stack.
|
||||
ROCm packages are versioned with both semantic versioning that is package
|
||||
specific and a ROCm release version.
|
||||
|
||||
### Single-version Installation
|
||||
|
||||
@@ -123,8 +50,14 @@ The multi-version installation refers to the following:
|
||||
ability to support multiple versions of packages simultaneously.
|
||||
- Use of versioned ROCm meta-packages.
|
||||
|
||||
```{attention}
|
||||
ROCm packages that were previously installed from a single-version installation
|
||||
must be removed before proceeding with the multi-version installation to avoid
|
||||
conflicts.
|
||||
```
|
||||
|
||||
```{note}
|
||||
Multiversion install is not available for the AMDGPU stack.
|
||||
Multiversion install is not available for the kernel driver module, also referred to as AMDGPU.
|
||||
```
|
||||
|
||||
The following image demonstrates the difference between single-version and
|
||||
|
||||
31
docs/deploy/linux/installer/index.md
Normal file
31
docs/deploy/linux/installer/index.md
Normal file
@@ -0,0 +1,31 @@
|
||||
# AMDGPU Install Script
|
||||
|
||||
::::{grid} 2 3 3 3
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} Install
|
||||
:link: install
|
||||
:link-type: doc
|
||||
|
||||
How to install ROCm?
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Upgrade
|
||||
:link: upgrade
|
||||
:link-type: doc
|
||||
|
||||
Instructions for upgrading an existing ROCm installation.
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Uninstall
|
||||
:link: uninstall
|
||||
:link-type: doc
|
||||
|
||||
Steps for removing ROCm packages libraries and tools.
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
## See Also
|
||||
|
||||
- {doc}`/release/gpu_os_support`
|
||||
299
docs/deploy/linux/installer/install.md
Normal file
299
docs/deploy/linux/installer/install.md
Normal file
@@ -0,0 +1,299 @@
|
||||
# Installation with install script
|
||||
|
||||
Prior to beginning, please ensure you have the [prerequisites](../prerequisites)
|
||||
installed.
|
||||
|
||||
## Download the Installer Script
|
||||
|
||||
To download and install the `amdgpu-install` script on the system, use the
|
||||
following commands based on your distribution.
|
||||
|
||||
::::::{tab-set}
|
||||
:::::{tab-item} Ubuntu
|
||||
:sync: ubuntu
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
sudo apt update
|
||||
wget https://repo.radeon.com/amdgpu-install/5.4.1/ubuntu/focal/amdgpu-install_5.4.50401-1_all.deb
|
||||
sudo apt install ./amdgpu-install_5.4.50401-1_all.deb
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
sudo apt update
|
||||
wget https://repo.radeon.com/amdgpu-install/5.4.1/ubuntu/jammy/amdgpu-install_5.4.50401-1_all.deb
|
||||
sudo apt install ./amdgpu-install_5.4.50401-1_all.deb
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
:::::{tab-item} Red Hat Enterprise Linux
|
||||
:sync: RHEL
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8.6
|
||||
:sync: RHEL-8.6
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.4.1/rhel/8.6/amdgpu-install-5.4.50401-1.el8.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 8.7
|
||||
:sync: RHEL-8.7
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.4.1/rhel/8.7/amdgpu-install-5.4.50401-1.el8.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 9.1
|
||||
:sync: RHEL-9.1
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.4.1/rhel/9.1/amdgpu-install-5.4.50401-1.el9.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Service Pack 4
|
||||
:sync: SLES15-SP4
|
||||
|
||||
```shell
|
||||
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.4.1/sle/15.4/amdgpu-install-5.4.50401-1.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
::::::
|
||||
|
||||
## Use cases
|
||||
|
||||
Instead of installing individual applications or libraries the installer script
|
||||
groups packages into specific use cases, matching typical workflows and runtimes.
|
||||
|
||||
To display a list of available use cases execute the command:
|
||||
|
||||
```shell
|
||||
sudo amdgpu-install --list-usecase
|
||||
```
|
||||
|
||||
The available use-cases will be printed in a format similar to the example
|
||||
output below.
|
||||
|
||||
```none
|
||||
If --usecase option is not present, the default selection is "graphics,opencl,hip"
|
||||
|
||||
Available use cases:
|
||||
rocm(for users and developers requiring full ROCm stack)
|
||||
- OpenCL (ROCr/KFD based) runtime
|
||||
- HIP runtimes
|
||||
- Machine learning framework
|
||||
- All ROCm libraries and applications
|
||||
- ROCm Compiler and device libraries
|
||||
- ROCr runtime and thunk
|
||||
lrt(for users of applications requiring ROCm runtime)
|
||||
- ROCm Compiler and device libraries
|
||||
- ROCr runtime and thunk
|
||||
opencl(for users of applications requiring OpenCL on Vega or
|
||||
later products)
|
||||
- ROCr based OpenCL
|
||||
- ROCm Language runtime
|
||||
|
||||
openclsdk (for application developers requiring ROCr based OpenCL)
|
||||
- ROCr based OpenCL
|
||||
- ROCm Language runtime
|
||||
- development and SDK files for ROCr based OpenCL
|
||||
|
||||
hip(for users of HIP runtime on AMD products)
|
||||
- HIP runtimes
|
||||
hiplibsdk (for application developers requiring HIP on AMD products)
|
||||
- HIP runtimes
|
||||
- ROCm math libraries
|
||||
- HIP development libraries
|
||||
```
|
||||
|
||||
To install use cases specific to your requirements, use the installer
|
||||
`amdgpu-install` as follows:
|
||||
|
||||
- To install a single use case add it with the `--usecase` option:
|
||||
|
||||
```shell
|
||||
sudo amdgpu-install --usecase=rocm
|
||||
```
|
||||
|
||||
- For multiple use cases separate them with commas:
|
||||
|
||||
```shell
|
||||
sudo amdgpu-install --usecase=hiplibsdk,rocm
|
||||
```
|
||||
|
||||
## Single-version ROCm Installation
|
||||
|
||||
By default (without the `--rocmrelease` option)
|
||||
the installer script will install packages in the single-version layout.
|
||||
|
||||
## Multi-version ROCm Installation
|
||||
|
||||
For the multi-version ROCm installation you must use the installer script from
|
||||
the latest release of ROCm that you wish to install.
|
||||
|
||||
**Example:** If you want to install ROCm releases 5.3.3 and 5.4.1
|
||||
simultaneously, you are required to download the installer from the latest ROCm
|
||||
release v5.4.1.
|
||||
|
||||
### Add Required Repositories
|
||||
|
||||
You must add the ROCm repositories manually for all ROCm releases
|
||||
you want to install except the latest one. The `amdgpu-install` script
|
||||
automatically adds the required repositories for the latest release.
|
||||
|
||||
Run the following commands based on your distribution to add the repositories:
|
||||
|
||||
::::::{tab-set}
|
||||
:::::{tab-item} Ubuntu
|
||||
:sync: ubuntu
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4; do
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" | sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
done
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4; do
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" | sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
done
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
:::::{tab-item} Red Hat Enterprise Linux
|
||||
:sync: RHEL
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4; do
|
||||
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-$ver]
|
||||
name=ROCm$ver
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/$ver/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
done
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 9
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4; do
|
||||
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-$ver]
|
||||
name=ROCm$ver
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/$ver/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
done
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4; do
|
||||
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/$ver/sle/15.4/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
done
|
||||
sudo zypper ref
|
||||
```
|
||||
|
||||
:::::
|
||||
::::::
|
||||
|
||||
### Install packages
|
||||
|
||||
Use the installer script as given below:
|
||||
|
||||
```none
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=<release-number-1>
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=<release-number-2>
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=<release-number-3>
|
||||
```
|
||||
|
||||
Following are examples of ROCm multi-version installation. The kernel-mode
|
||||
driver, associated with the ROCm release v5.4.1, will be installed as its latest
|
||||
release in the list.
|
||||
|
||||
```none
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=5.3.3
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=5.4.1
|
||||
```
|
||||
|
||||
## Additional options
|
||||
|
||||
### Unattended installation
|
||||
|
||||
Adding `-y` as a parameter to `amdgpu-install` skips user prompts (for
|
||||
automation). Example: `amdgpu-install -y --usecase=rocm`
|
||||
|
||||
### Skipping kernel mode driver installation
|
||||
|
||||
The installer script tries to install the kernel mode driver along with the
|
||||
requested use cases. This might be unnecessary as in the case of docker
|
||||
containers or you may wish to keep a specific version when using multi-version
|
||||
installation, and not have the last installed version overwrite the kernel mode
|
||||
driver.
|
||||
|
||||
To skip the installation of the kernel-mode driver add the `--no-dkms` option
|
||||
when calling the installer script.
|
||||
25
docs/deploy/linux/installer/uninstall.md
Normal file
25
docs/deploy/linux/installer/uninstall.md
Normal file
@@ -0,0 +1,25 @@
|
||||
# Installer Script Uninstallation (Linux)
|
||||
|
||||
To uninstall all ROCm packages and the kernel-mode driver the following commands
|
||||
can be used.
|
||||
|
||||
::::{rubric} Uninstalling Single-Version Install
|
||||
::::
|
||||
|
||||
```console shell
|
||||
sudo amdgpu-install --uninstall
|
||||
```
|
||||
|
||||
::::{rubric} Uninstalling a Specific ROCm Release
|
||||
::::
|
||||
|
||||
```console shell
|
||||
sudo amdgpu-install --uninstall --rocmrelease=<release-number>
|
||||
```
|
||||
|
||||
::::{rubric} Uninstalling all ROCm Releases
|
||||
::::
|
||||
|
||||
```console shell
|
||||
sudo amdgpu-install --uninstall --rocmrelease=all
|
||||
```
|
||||
5
docs/deploy/linux/installer/upgrade.md
Normal file
5
docs/deploy/linux/installer/upgrade.md
Normal file
@@ -0,0 +1,5 @@
|
||||
# Upgrading with the Installer Script (Linux)
|
||||
|
||||
The upgrade procedure with the installer script is exactly the same as
|
||||
installing for 1st time use. Refer to the {doc}`install`
|
||||
section on the exact procedure to follow.
|
||||
38
docs/deploy/linux/os-native/index.md
Normal file
38
docs/deploy/linux/os-native/index.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# Installation via Package manager
|
||||
|
||||
::::{grid} 2 3 3 3
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} Install
|
||||
:link: install
|
||||
:link-type: doc
|
||||
|
||||
How to install ROCm?
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Upgrade
|
||||
:link: upgrade
|
||||
:link-type: doc
|
||||
|
||||
Instructions for upgrading an existing ROCm installation.
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Uninstall
|
||||
:link: uninstall
|
||||
:link-type: doc
|
||||
|
||||
Steps for removing ROCm packages libraries and tools.
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Package Manager Integration
|
||||
:link: package_manager_integration
|
||||
:link-type: doc
|
||||
|
||||
Information about packages.
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
## See Also
|
||||
|
||||
- {doc}`/release/gpu_os_support`
|
||||
465
docs/deploy/linux/os-native/install.md
Normal file
465
docs/deploy/linux/os-native/install.md
Normal file
@@ -0,0 +1,465 @@
|
||||
# Installation (Linux)
|
||||
|
||||
## Understanding the Release-specific AMDGPU and ROCm Repositories on Linux Distributions
|
||||
|
||||
The release-specific repositories consist of packages from a specific release of
|
||||
versions of AMDGPU and ROCm. The repositories are not updated for the latest
|
||||
packages with subsequent releases. When a new ROCm release is available, the new
|
||||
repository, specific to that release, is added. You can select a specific
|
||||
release to install, update the previously installed single version to the later
|
||||
available release, or add the latest version of ROCm along with the currently
|
||||
installed version by using the multi-version ROCm packages.
|
||||
|
||||
## Step by Step Instructions
|
||||
|
||||
::::::{tab-set}
|
||||
:::::{tab-item} Ubuntu
|
||||
:sync: ubuntu
|
||||
|
||||
::::{rubric} 1. Download and convert the package signing key
|
||||
::::
|
||||
|
||||
```shell
|
||||
# Make the directory if it doesn't exist yet.
|
||||
# This location is recommended by the distribution maintainers.
|
||||
sudo mkdir --parents --mode=0755 /etc/apt/keyrings
|
||||
# Download the key, convert the signing-key to a full
|
||||
# keyring required by apt and store in the keyring directory
|
||||
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
|
||||
gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
|
||||
```
|
||||
|
||||
```{note}
|
||||
The GPG key may change; ensure it is updated when installing a new release. If
|
||||
the key signature verification fails while updating, re-add the key from the
|
||||
ROCm to the apt repository as mentioned above. The current `rocm.gpg.key` is not
|
||||
available in a standard key ring distribution but has the following SHA1 sum
|
||||
hash: `73f5d8100de6048aa38a8b84cd9a87f05177d208 rocm.gpg.key`
|
||||
```
|
||||
|
||||
::::{rubric} 2. Add the AMDGPU Repository and Install the Kernel-mode Driver
|
||||
::::
|
||||
|
||||
```{tip}
|
||||
If you have a version of the kernel-mode driver installed, you may skip this
|
||||
section.
|
||||
```
|
||||
|
||||
To add the AMDGPU repository, follow these steps:
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
# amdgpu repository for focal
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.4.1/ubuntu focal main' \
|
||||
| sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
# amdgpu repository for jammy
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.4.1/ubuntu jammy main' \
|
||||
| sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
Install the kernel mode driver and reboot the system using the following
|
||||
commands:
|
||||
|
||||
```shell
|
||||
sudo apt install amdgpu-dkms
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
::::{rubric} 3. Add the ROCm Repository
|
||||
::::
|
||||
|
||||
To add the ROCm repository, use the following steps:
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
# ROCm repositories for focal
|
||||
for ver in 5.3.3 5.4.1; do
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" \
|
||||
| sudo tee --append /etc/apt/sources.list.d/rocm.list
|
||||
done
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
|
||||
| sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
# ROCm repositories for jammy
|
||||
for ver in 5.3.3 5.4.1; do
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" \
|
||||
| sudo tee --append /etc/apt/sources.list.d/rocm.list
|
||||
done
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
|
||||
| sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
::::{rubric} 4. Install packages
|
||||
::::
|
||||
|
||||
Install packages of your choice in a single-version ROCm install or
|
||||
in a multi-version ROCm install fashion. For more information on what
|
||||
single/multi-version installations are, refer to {ref}`installation-types`.
|
||||
For a comprehensive list of meta-packages, refer to
|
||||
{ref}`meta-package-desc`.
|
||||
|
||||
- Sample Single-version installation
|
||||
|
||||
```shell
|
||||
sudo apt install rocm-hip-sdk
|
||||
```
|
||||
|
||||
- Sample Multi-version installation
|
||||
|
||||
```shell
|
||||
sudo apt install rocm-hip-sdk5.4.1 rocm-hip-sdk5.3.3
|
||||
```
|
||||
|
||||
:::::
|
||||
:::::{tab-item} Red Hat Enterprise Linux
|
||||
:sync: RHEL
|
||||
|
||||
::::{rubric} 1. Add the AMDGPU Stack Repository and Install the Kernel-mode Driver
|
||||
::::
|
||||
|
||||
```{tip}
|
||||
If you have a version of the kernel-mode driver installed, you may skip this
|
||||
section.
|
||||
```
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8.6
|
||||
:sync: RHEL-8.6
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.1/rhel/8.6/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL 8.7
|
||||
:sync: RHEL-8.7
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.1/rhel/8.7/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL 9.1
|
||||
:sync: RHEL-9.1
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.1/rhel/9.1/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
Install the kernel mode driver and reboot the system using the following
|
||||
commands:
|
||||
|
||||
```shell
|
||||
sudo yum install amdgpu-dkms
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
::::{rubric} 2. Add the ROCm Stack Repository
|
||||
::::
|
||||
|
||||
To add the ROCm repository, use the following steps, based on your distribution:
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4.1; do
|
||||
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-$ver]
|
||||
name=ROCm$ver
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/$ver/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
done
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 9
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4.1; do
|
||||
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-$ver]
|
||||
name=ROCm$ver
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/$ver/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
done
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
::::{rubric} 3. Install packages
|
||||
::::
|
||||
|
||||
Install packages of your choice in a single-version ROCm install or
|
||||
in a multi-version ROCm install fashion. For more information on what
|
||||
single/multi-version installations are, refer to {ref}`installation-types`.
|
||||
For a comprehensive list of meta-packages, refer to
|
||||
{ref}`meta-package-desc`.
|
||||
|
||||
- Sample Single-version installation
|
||||
|
||||
```shell
|
||||
sudo yum install rocm-hip-sdk
|
||||
```
|
||||
|
||||
- Sample Multi-version installation
|
||||
|
||||
```shell
|
||||
sudo yum install rocm-hip-sdk5.4.1 rocm-hip-sdk5.3.3
|
||||
```
|
||||
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
|
||||
::::{rubric} 1. Add the AMDGPU Repository and Install the Kernel-mode Driver
|
||||
::::
|
||||
|
||||
```{tip}
|
||||
If you have a version of the kernel-mode driver installed, you may skip this
|
||||
section.
|
||||
```
|
||||
|
||||
```shell
|
||||
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.1/sle/15.4/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo zypper ref
|
||||
```
|
||||
|
||||
Install the kernel mode driver and reboot the system using the following
|
||||
commands:
|
||||
|
||||
```shell
|
||||
sudo zypper --gpg-auto-import-keys install amdgpu-dkms
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
::::{rubric} 2. Add the ROCm Stack Repository
|
||||
::::
|
||||
|
||||
To add the ROCm repository, use the following steps:
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4.1; do
|
||||
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
|
||||
[ROCm-$ver]
|
||||
name=ROCm$ver
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/zyp/$ver/main
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
done
|
||||
sudo zypper ref
|
||||
```
|
||||
|
||||
::::{rubric} 3. Install packages
|
||||
::::
|
||||
|
||||
Install packages of your choice in a single-version ROCm install or
|
||||
in a multi-version ROCm install fashion. For more information on what
|
||||
single/multi-version installations are, refer to {ref}`installation-types`.
|
||||
For a comprehensive list of meta-packages, refer to
|
||||
{ref}`meta-package-desc`.
|
||||
|
||||
- Sample Single-version installation
|
||||
|
||||
```shell
|
||||
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk
|
||||
```
|
||||
|
||||
- Sample Multi-version installation
|
||||
|
||||
```shell
|
||||
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk5.4.1 rocm-hip-sdk5.3.3
|
||||
```
|
||||
|
||||
:::::
|
||||
::::::
|
||||
|
||||
(post-install-actions-linux)=
|
||||
|
||||
## Post-install Actions and Verification Process
|
||||
|
||||
The post-install actions listed here are optional and depend on your use case,
|
||||
but are generally useful. Verification of the install is advised.
|
||||
|
||||
### Post-install Actions
|
||||
|
||||
1. Instruct the system linker where to find the shared objects (`.so` files) for
|
||||
ROCm applications.
|
||||
|
||||
```shell
|
||||
sudo tee --append /etc/ld.so.conf.d/rocm.conf <<EOF
|
||||
/opt/rocm/lib
|
||||
/opt/rocm/lib64
|
||||
EOF
|
||||
sudo ldconfig
|
||||
```
|
||||
|
||||
```{note}
|
||||
Multi-version installations require extra care. Having multiple versions on
|
||||
the system linker library search path is unadvised. One must take care both
|
||||
at compile-time and at run-time to assure that the proper libraries are
|
||||
picked up. You can override `ld.so.conf` entries on a case-by-case basis
|
||||
using the `LD_LIBRARY_PATH` environmental variable.
|
||||
```
|
||||
|
||||
2. Add binary paths to the `PATH` environment variable.
|
||||
|
||||
```shell
|
||||
export PATH=$PATH:/opt/rocm-5.4.1/bin:/opt/rocm-5.4.1/opencl/bin
|
||||
```
|
||||
|
||||
```{attention}
|
||||
When using CMake to build applications, having the ROCm install location on
|
||||
the PATH subtly affects how ROCm libraries are searched for. See [Config Mode
|
||||
Search Procedure](https://cmake.org/cmake/help/latest/command/find_package.html#config-mode-search-procedure)
|
||||
and [CMAKE_FIND_USE_SYSTEM_ENVIRONMENT_PATH](https://cmake.org/cmake/help/latest/variable/CMAKE_FIND_USE_SYSTEM_ENVIRONMENT_PATH.html)
|
||||
for details.
|
||||
|
||||
(Entries in the `PATH` minus `bin` and `sbin` are added to library search
|
||||
paths, therefore this convenience will affect builds and result in ROCm
|
||||
libraries almost always being found. This may be an issue when you're
|
||||
developing these libraries or want to use self-built versions of them.)
|
||||
```
|
||||
|
||||
(verifying-kernel-mode-driver-installation)=
|
||||
|
||||
### Verifying Kernel-mode Driver Installation
|
||||
|
||||
Check the installation of the kernel-mode driver by typing the command given
|
||||
below:
|
||||
|
||||
```shell
|
||||
dkms status
|
||||
```
|
||||
|
||||
### Verifying ROCm Installation
|
||||
|
||||
After completing the ROCm installation, execute the following commands on the
|
||||
system to verify if the installation is successful. If you see your GPUs listed
|
||||
by both commands, the installation is considered successful:
|
||||
|
||||
```shell
|
||||
/opt/rocm/bin/rocminfo
|
||||
# OR
|
||||
/opt/rocm/opencl/bin/clinfo
|
||||
```
|
||||
|
||||
### Verifying Package Installation
|
||||
|
||||
To ensure the packages are installed successfully, use the following commands:
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu
|
||||
:sync: ubuntu
|
||||
|
||||
```shell
|
||||
sudo apt list --installed
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Red Hat Enterprise Linux
|
||||
:sync: RHEL
|
||||
|
||||
```shell
|
||||
sudo yum list installed
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} SUSE Linux Enterprise Server
|
||||
:sync: SLES
|
||||
|
||||
```shell
|
||||
sudo zypper search --installed-only
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
@@ -1,23 +1,9 @@
|
||||
# Uninstallation (Linux)
|
||||
# Uninstallation with package manager (Linux)
|
||||
|
||||
Uninstallation of ROCm entails removing ROCm packages, tools, and libraries from
|
||||
the system.
|
||||
|
||||
You can uninstall using the following methods:
|
||||
|
||||
- Package manager uninstallation
|
||||
- Uninstallation using the uninstall script
|
||||
|
||||
```{attention}
|
||||
Use the same uninstall method that you used to install ROCm. Mixing procedures
|
||||
is untested and may result in inconsistent system state.
|
||||
```
|
||||
|
||||
## Package Manager Method
|
||||
|
||||
The package manager uninstallation offers a method for a clean uninstallation
|
||||
process for ROCm. This section describes how to uninstall the ROCm instance from
|
||||
various Linux distributions.
|
||||
This section describes how to uninstall ROCm with the Linux distribution's
|
||||
package manager. This method should be used if ROCm was installed via the package
|
||||
manager. If the installer script was used for installation, then it should be
|
||||
used for uninstallation too, refer to {doc}`/deploy/linux/installer/uninstall`.
|
||||
|
||||
::::::{tab-set}
|
||||
:::::{tab-item} Ubuntu
|
||||
@@ -182,31 +168,3 @@ sudo zypper remove --clean-deps amdgpu-dkms
|
||||
|
||||
:::::
|
||||
::::::
|
||||
|
||||
## Installer Script Method
|
||||
|
||||
::::{rubric} Uninstalling Single-Version Install
|
||||
::::
|
||||
|
||||
```console shell
|
||||
sudo amdgpu-install --uninstall
|
||||
```
|
||||
|
||||
```{note}
|
||||
This command uninstalls all ROCm packages associated with the installed ROCm
|
||||
release along with the kernel-mode driver.
|
||||
```
|
||||
|
||||
::::{rubric} Uninstalling a Specific ROCm Release
|
||||
::::
|
||||
|
||||
```console shell
|
||||
sudo amdgpu-install --uninstall --rocmrelease=<release-number>
|
||||
```
|
||||
|
||||
::::{rubric} Uninstalling all ROCm Releases
|
||||
::::
|
||||
|
||||
```console shell
|
||||
sudo amdgpu-install --uninstall --rocmrelease=all
|
||||
```
|
||||
292
docs/deploy/linux/os-native/upgrade.md
Normal file
292
docs/deploy/linux/os-native/upgrade.md
Normal file
@@ -0,0 +1,292 @@
|
||||
# Upgrade ROCm with the package manager
|
||||
|
||||
This section explains how to upgrade the existing AMDGPU driver and ROCm
|
||||
packages to the latest version using your OS's distributed package manager.
|
||||
|
||||
```{note}
|
||||
Package upgrade is applicable to single-version packages only. If the preference
|
||||
is to install an updated version of the ROCm along with the currently
|
||||
installed version, refer to the [](install) page.
|
||||
```
|
||||
|
||||
## Upgrade Steps
|
||||
|
||||
### Update the AMDGPU repository
|
||||
|
||||
Execute the commands below based on your distribution to point the `amdgpu`
|
||||
repository to the new release.
|
||||
|
||||
::::::{tab-set}
|
||||
:::::{tab-item} Ubuntu
|
||||
:sync: ubuntu
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
# amdgpu repository for focal
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.4.1/ubuntu focal main' \
|
||||
| sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
# amdgpu repository for jammy
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.4.1/ubuntu jammy main' \
|
||||
| sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
:::::{tab-item} Red Hat Enterprise Linux
|
||||
:sync: RHEL
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8.6
|
||||
:sync: RHEL-8.6
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.1/rhel/8.6/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 8.7
|
||||
:sync: RHEL-8.7
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.1/rhel/8.7/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 9.1
|
||||
:sync: RHEL-9.1
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.1/rhel/9.1/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
|
||||
```shell
|
||||
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.1/sle/15.4/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo zypper ref
|
||||
```
|
||||
|
||||
:::::
|
||||
::::::
|
||||
|
||||
### Upgrade the kernel-mode driver & reboot
|
||||
|
||||
Upgrade the kernel mode driver and reboot the system using the following
|
||||
commands based on your distribution:
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu
|
||||
:sync: ubuntu
|
||||
|
||||
```shell
|
||||
sudo apt install amdgpu-dkms
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} Red Hat Enterprise Linux
|
||||
:sync: RHEL
|
||||
|
||||
```shell
|
||||
sudo yum install amdgpu-dkms
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
|
||||
```shell
|
||||
sudo zypper --gpg-auto-import-keys install amdgpu-dkms
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
### Update the ROCm repository
|
||||
|
||||
Execute the commands below based on your distribution to point the `rocm`
|
||||
repository to the new release.
|
||||
|
||||
::::::{tab-set}
|
||||
:::::{tab-item} Ubuntu
|
||||
:sync: ubuntu
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.4.1 focal main" \
|
||||
| sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
|
||||
| sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.4.1 jammy main" \
|
||||
| sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
|
||||
| sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
:::::{tab-item} Red Hat Enterprise Linux
|
||||
:sync: RHEL
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-5.4.1]
|
||||
name=ROCm5.4.1
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/5.4.1/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 9
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-5.4.1]
|
||||
name=ROCm5.4.1
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/5.4.1/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
|
||||
```shell
|
||||
sudo tee /etc/zypp/repos.d/rocm.repo <<EOF
|
||||
[ROCm-5.4.1]
|
||||
name=ROCm5.4.1
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/zyp/5.4.1/main
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo zypper ref
|
||||
```
|
||||
|
||||
:::::
|
||||
::::::
|
||||
|
||||
### Upgrade the ROCm packages
|
||||
|
||||
Your packages can be upgraded now through their meta-packages, see the following
|
||||
example based on your distribution:
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu
|
||||
:sync: ubuntu
|
||||
|
||||
```shell
|
||||
sudo apt install --only-upgrade rocm-hip-sdk
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} Red Hat Enterprise Linux
|
||||
:sync: RHEL
|
||||
|
||||
```shell
|
||||
sudo yum update rocm-hip-sdk
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} Suse Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
|
||||
```shell
|
||||
sudo zypper --gpg-auto-import-keys update rocm-hip-sdk
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
## Verification Process
|
||||
|
||||
To verify if the upgrade is successful, refer to the
|
||||
{ref}`post-install-actions-linux` given in the
|
||||
[Installation](install) section.
|
||||
@@ -49,59 +49,128 @@ Verify the kernel version using the following steps:
|
||||
uname -srmv
|
||||
```
|
||||
|
||||
2. Confirm that the obtained kernel version information matches with System
|
||||
Requirements.
|
||||
|
||||
**Example:** The output of the command above lists the kernel version in the
|
||||
following format:
|
||||
|
||||
```shell
|
||||
```output
|
||||
Linux 5.15.0-46-generic #44~20.04.5-Ubuntu SMP Fri Jun 24 13:27:29 UTC 2022 x86_64
|
||||
```
|
||||
|
||||
## Confirm the System has a ROCm-Capable GPU
|
||||
2. Confirm that the obtained kernel version information matches with system
|
||||
requirements as listed in {ref}`supported_distributions`.
|
||||
|
||||
The ROCm platform is designed to support the following GPUs:
|
||||
## Additional package repositories
|
||||
|
||||
```{table} GPU Support for ROCm Programming Models
|
||||
:name: gpu-support
|
||||
| **Classification** | **GPU Name** | **GFX ID** | **Product Id** |
|
||||
|:------------------:|:-------------------------:|:----------:|:--------------:|
|
||||
| **GFX9 GPUs** | AMD Radeon Instinct™ MI50 | gfx906 | Vega 20 |
|
||||
| **GFX9 GPUs** | AMD Radeon Instinct™ MI60 | gfx906 | Vega 20 |
|
||||
| **GFX9 GPUs** | AMD Radeon™ VII | gfx906 | Vega 20 |
|
||||
| **GFX9 GPUs** | AMD Radeon™ Pro VII | gfx906 | Vega 20 |
|
||||
| **RDNA GPUs** | AMD Radeon™ Pro W6800 | gfx1030 | Navi 21 GL-XL |
|
||||
| **RDNA GPUs** | AMD Radeon™ Pro V620 | gfx1030 | Navi 21 GL-XE |
|
||||
| **CDNA GPUs** | AMD Instinct™ MI100 | gfx908 | Arcturus |
|
||||
| **CDNA GPUs** | AMD Instinct™ MI200 | gfx90a | Aldebaran |
|
||||
On some distributions the ROCm packages depend on packages outside the default
|
||||
package repositories. These extra repositories need to be enabled before
|
||||
installation. Follow the instructions below based on your distributions.
|
||||
|
||||
::::::{tab-set}
|
||||
|
||||
:::::{tab-item} Ubuntu
|
||||
:sync: ubuntu
|
||||
|
||||
All packages are available in the default Ubuntu repositories, therefore
|
||||
no additional repositories need to be added.
|
||||
|
||||
:::::
|
||||
:::::{tab-item} Red Hat Enterprise Linux
|
||||
:sync: RHEL
|
||||
|
||||
::::{rubric} 1. Add the EPEL repository
|
||||
::::
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
|
||||
sudo rpm -ivh epel-release-latest-8.noarch.rpm
|
||||
```
|
||||
|
||||
### Verify Your System Has a ROCm-Capable GPU
|
||||
:::
|
||||
:::{tab-item} RHEL 9
|
||||
|
||||
To verify that your system has a ROCm-capable GPU, use these steps:
|
||||
```shell
|
||||
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
|
||||
sudo rpm -ivh epel-release-latest-9.noarch.rpm
|
||||
```
|
||||
|
||||
1. Enter the following command:
|
||||
:::
|
||||
::::
|
||||
|
||||
```shell
|
||||
lspci | grep -i display
|
||||
```
|
||||
::::{rubric} 2. Enable the CodeReady Linux Builder repository
|
||||
::::
|
||||
|
||||
The command displays the details of detected GPUs on the system in the
|
||||
following format in the case of AMD Instinct™ MI200:
|
||||
Run the following command and follow the instructions.
|
||||
|
||||
```text
|
||||
c1:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran
|
||||
c5:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Aldebaran
|
||||
```
|
||||
```shell
|
||||
sudo crb enable
|
||||
```
|
||||
|
||||
2. Verify from the output that the listed product names match with the Product
|
||||
Id given in the table above.
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
|
||||
### Setting Permissions for Groups
|
||||
Add the perl languages repository.
|
||||
|
||||
```shell
|
||||
zypper addrepo https://download.opensuse.org/repositories/devel:languages:perl/SLE_15_SP4/devel:languages:perl.repo
|
||||
```
|
||||
|
||||
:::::
|
||||
::::::
|
||||
|
||||
## Kernel headers and development packages
|
||||
|
||||
The driver package uses
|
||||
[{abbr}`DKMS (Dynamic Kernel Module Support)`][DKMS-wiki] to build
|
||||
the `amdgpu-dkms` module (driver) for the installed kernels. This requires the
|
||||
Linux kernel headers and modules to be installed for each. Usually these are
|
||||
automatically installed with the kernel, but if you have multiple kernel
|
||||
versions or you have downloaded the kernel images and not the kernel
|
||||
meta-packages then they must be manually installed.
|
||||
|
||||
[DKMS-wiki]: https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support
|
||||
|
||||
To install for the currently active kernel run the command corresponding
|
||||
to your distribution.
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu
|
||||
:sync: ubuntu
|
||||
|
||||
```shell
|
||||
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Red Hat Enterprise Linux
|
||||
:sync: RHEL
|
||||
|
||||
```shell
|
||||
sudo yum install kernel-headers kernel-devel
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} SUSE Linux Enterprise Server
|
||||
:sync: SLES
|
||||
|
||||
```shell
|
||||
sudo zypper install kernel-default-devel
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
## Setting Permissions for Groups
|
||||
|
||||
This section provides steps to add any current user to a video group to access
|
||||
GPU resources.
|
||||
Use of the video group is recommended for all ROCm-supported operating
|
||||
systems.
|
||||
|
||||
1. To check the groups in your system, issue the following command:
|
||||
|
||||
@@ -109,21 +178,17 @@ GPU resources.
|
||||
groups
|
||||
```
|
||||
|
||||
2. Add yourself to the `render` or `video` group using the following instruction:
|
||||
2. Add yourself to the `render` and `video` group using the command:
|
||||
|
||||
```shell
|
||||
sudo usermod -a -G render $LOGNAME
|
||||
# OR
|
||||
sudo usermod -a -G video $LOGNAME
|
||||
sudo usermod -a -G render,video $LOGNAME
|
||||
```
|
||||
|
||||
3. Use of the video group is recommended for all ROCm-supported operating
|
||||
systems.
|
||||
To add all future users to the `video` and `render` groups by default, run
|
||||
the following commands:
|
||||
|
||||
To add all future users to the `video` and `render` groups by default, run the following commands:
|
||||
|
||||
```shell
|
||||
echo 'ADD_EXTRA_GROUPS=1' | sudo tee -a /etc/adduser.conf
|
||||
echo 'EXTRA_GROUPS=video' | sudo tee -a /etc/adduser.conf
|
||||
echo 'EXTRA_GROUPS=render' | sudo tee -a /etc/adduser.conf
|
||||
```
|
||||
```shell
|
||||
echo 'ADD_EXTRA_GROUPS=1' | sudo tee -a /etc/adduser.conf
|
||||
echo 'EXTRA_GROUPS=video' | sudo tee -a /etc/adduser.conf
|
||||
echo 'EXTRA_GROUPS=render' | sudo tee -a /etc/adduser.conf
|
||||
```
|
||||
|
||||
@@ -1,48 +1,5 @@
|
||||
# Quick Start (Linux)
|
||||
|
||||
## Install Prerequisites
|
||||
|
||||
The driver package uses
|
||||
[{abbr}`DKMS (Dynamic Kernel Module Support)`][DKMS-wiki] to build
|
||||
the `amdgpu-dkms` module (driver) for the installed kernels. This requires the Linux
|
||||
kernel headers and modules to be installed for each. Usually these are
|
||||
automatically installed with the kernel, but if you have multiple kernel
|
||||
versions or you have downloaded the kernel images and not the kernel
|
||||
meta-packages then they must be manually installed.
|
||||
|
||||
[DKMS-wiki]: https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support
|
||||
|
||||
To install for the currently active kernel run the command corresponding
|
||||
to your distribution.
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu
|
||||
:sync: ubuntu
|
||||
|
||||
```shell
|
||||
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Red Hat Enterprise Linux
|
||||
:sync: RHEL
|
||||
|
||||
```shell
|
||||
sudo yum install kernel-headers kernel-devel
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} SUSE Linux Enterprise Server
|
||||
:sync: SLES
|
||||
|
||||
```shell
|
||||
sudo zypper install kernel-default-devel
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
## Add Repositories
|
||||
|
||||
::::::{tab-set}
|
||||
@@ -72,11 +29,11 @@ wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
|
||||
```shell
|
||||
# Kernel driver repository for focal
|
||||
sudo tee /etc/apt/sources.list.d/amdgpu.list <<'EOF'
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/latest/ubuntu focal main
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.4.1/ubuntu focal main
|
||||
EOF
|
||||
# ROCm repository for focal
|
||||
sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF'
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/debian focal main
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.4.1 focal main
|
||||
EOF
|
||||
```
|
||||
|
||||
@@ -87,11 +44,12 @@ EOF
|
||||
```shell
|
||||
# Kernel driver repository for jammy
|
||||
sudo tee /etc/apt/sources.list.d/amdgpu.list <<'EOF'
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/latest/ubuntu jammy main
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.4.1/ubuntu jammy main
|
||||
EOF
|
||||
# ROCm repository for jammy
|
||||
sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF'
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/debian jammy main
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.4.1 jammy main
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
EOF
|
||||
```
|
||||
|
||||
@@ -122,7 +80,7 @@ sudo apt update
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/8.6/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.1/rhel/8.6/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -131,7 +89,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
|
||||
[rocm]
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/latest/main
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/5.4.1/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -149,7 +107,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/8.7/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.1/rhel/8.7/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -158,7 +116,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
|
||||
[rocm]
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/latest/main
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/5.4.1/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -176,7 +134,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/9.1/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.1/rhel/9.1/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -185,7 +143,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
|
||||
[rocm]
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/latest/main
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/5.4.1/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -221,7 +179,7 @@ sudo yum clean all
|
||||
sudo tee /etc/zypp/repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/latest/sle/15.4/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.1/sle/15.4/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
|
||||
@@ -1,282 +0,0 @@
|
||||
# Upgrade (Linux)
|
||||
|
||||
This section explains how to upgrade the existing kernel-mode driver and ROCm
|
||||
packages to the latest version. The assumption is that you already have a
|
||||
version of the kernel-mode driver and the ROCm software stack is installed on
|
||||
the system.
|
||||
|
||||
```{note}
|
||||
Package upgrade is applicable to single-version packages only. If the preference
|
||||
is to install an updated version of the ROCm stack along with the currently
|
||||
installed version, refer to the [](install) page.
|
||||
```
|
||||
|
||||
You may use the following upgrade methods to upgrade ROCm:
|
||||
|
||||
- Package manager method
|
||||
- Installer script method
|
||||
|
||||
## Package Manager Method
|
||||
|
||||
To upgrade the system with the desired ROCm release using the package manager
|
||||
method, follow the steps below:
|
||||
|
||||
1. **Update the AMDGPU stack repository** – Ensure you have updated the AMDGPU
|
||||
repository.
|
||||
|
||||
2. **Upgrade the kernel-mode driver and reboot the system** – Ensure you have
|
||||
upgraded the kernel-mode driver and rebooted the system.
|
||||
|
||||
3. **Update the ROCm repository** – Ensure you have updated the ROCm repository
|
||||
with the desired ROCm release.
|
||||
|
||||
4. **Upgrade the ROCm meta-packages** – Upgrade the ROCm meta-packages.
|
||||
|
||||
5. **Verify the upgrade for the applicable distributions** – Verify if the
|
||||
upgrade is successful.
|
||||
|
||||
To upgrade ROCm on different Linux distributions, refer to the sections below
|
||||
for specific commands.
|
||||
|
||||
::::::{tab-set}
|
||||
:::::{tab-item} Ubuntu
|
||||
:sync: ubuntu
|
||||
|
||||
::::{rubric} Update the AMDGPU Stack Repository
|
||||
::::
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/amdgpu/5.4.3/ubuntu focal main' | sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/amdgpu/5.4.3/ubuntu jammy main' | sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
Upgrade the kernel mode driver and reboot the system using the following
|
||||
commands:
|
||||
|
||||
```shell
|
||||
sudo apt install amdgpu-dkms
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
::::{rubric} Update the ROCm Stack Repository
|
||||
::::
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Ubuntu 20.04
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/5.4.3 focal main" | sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} Ubuntu 22.04
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/5.4.3 jammy main" | sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt update
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
::::{rubric} Upgrade the ROCm Meta-packages
|
||||
::::
|
||||
|
||||
Your packages can be upgraded now through their meta-packages, for example:
|
||||
|
||||
```shell
|
||||
sudo apt install –-only-upgrade rocm-hip-sdk
|
||||
```
|
||||
|
||||
:::::
|
||||
:::::{tab-item} Red Hat Enterprise Linux
|
||||
:sync: RHEL
|
||||
|
||||
::::{rubric} Update the AMDGPU Stack Repository
|
||||
::::
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8.6
|
||||
:sync: RHEL-8.6
|
||||
|
||||
```shell
|
||||
sudo tee --append /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
Name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.3/rhel/8.6/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL 8.7
|
||||
:sync: RHEL-8.7
|
||||
|
||||
```shell
|
||||
sudo tee --append /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
Name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.3/rhel/8.7/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL 9.1
|
||||
:sync: RHEL-9.1
|
||||
|
||||
```shell
|
||||
sudo tee --append /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
Name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.3/rhel/9.2/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
::::{rubric} Upgrade the Kernel-mode Driver and Reboot the System
|
||||
::::
|
||||
|
||||
Upgrade the kernel mode driver and reboot the system using the following
|
||||
commands:
|
||||
|
||||
```shell
|
||||
sudo yum install amdgpu-dkms
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
::::{rubric} Update the ROCm Repository
|
||||
::::
|
||||
|
||||
```shell
|
||||
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-5.4.3]
|
||||
Name=ROCm5.4.3
|
||||
baseurl=https://repo.radeon.com/rocm/5.4.3/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
done
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
::::{rubric} Upgrade the ROCm Meta-packages
|
||||
::::
|
||||
|
||||
Your packages can be upgraded now through their meta-packages, for example:
|
||||
|
||||
```shell
|
||||
sudo apt install –-only-upgrade rocm-hip-sdk
|
||||
```
|
||||
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
|
||||
::::{rubric} Update the AMDGPU Stack Repository
|
||||
::::
|
||||
|
||||
```shell
|
||||
sudo tee --append /etc/zypp/repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.3/sle/15.4/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo zypper ref
|
||||
```
|
||||
|
||||
::::{rubric} Upgrade the Kernel-mode Driver and Reboot the System
|
||||
::::
|
||||
|
||||
Upgrade the kernel mode driver and reboot the system using the following
|
||||
commands:
|
||||
|
||||
```shell
|
||||
sudo zypper --gpg-auto-import-keys install amdgpu-dkms
|
||||
sudo reboot
|
||||
```
|
||||
|
||||
::::{rubric} Update the ROCm Stack Repository
|
||||
::::
|
||||
|
||||
```shell
|
||||
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.4.3/sle/15.4/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo zypper ref
|
||||
```
|
||||
|
||||
::::{rubric} Upgrade the ROCm Meta-packages
|
||||
::::
|
||||
|
||||
Your packages can be upgraded now through their meta-packages, for example:
|
||||
|
||||
```shell
|
||||
sudo zypper --gpg-auto-import-keys update -y rocm-hip-sdk
|
||||
```
|
||||
|
||||
:::::
|
||||
::::::
|
||||
|
||||
## Installer Script Method
|
||||
|
||||
The installer script method automates the upgrade process for the AMDGPU and
|
||||
ROCm stack. The `amdgpu-install` script handles the complete upgrade process for
|
||||
ROCm, including updating the required repositories and upgrading the desired
|
||||
meta-packages.
|
||||
|
||||
The upgrade procedure is exactly the same as installing for 1st time use. Refer
|
||||
to the {ref}`install-script-method` section on the exact procedure to follow.
|
||||
|
||||
## Verification Process
|
||||
|
||||
To verify if the upgrade is successful, refer to the
|
||||
{ref}`post-install-actions-linux` given in the
|
||||
[Installation](install) section.
|
||||
@@ -1,5 +0,0 @@
|
||||
# AI/ML/Inferencing
|
||||
|
||||
To demonstrate some of the potential usages of ROCm for AI/ML/DL/Inferencing we
|
||||
provide a detailed example of a
|
||||
[ROCm implementation of Inception v3 using the PyTorch framework](./inception_casestudy/inception_casestudy.md).
|
||||
25
docs/examples/all.md
Normal file
25
docs/examples/all.md
Normal file
@@ -0,0 +1,25 @@
|
||||
# All Tutorial Material
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} ROCm Examples
|
||||
:link: https://github.com/amd/rocm-examples
|
||||
:link-type: url
|
||||
Samples codes demonstrating and explaining the use of the HIP API as well as
|
||||
ROCm-accelerated domain libraries.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} AI/ML/Inferencing
|
||||
:link: machine_learning/all
|
||||
:link-type: doc
|
||||
Detailed walkthroughs of specific use-cases driven by frameworks using ROCm
|
||||
acceleration.
|
||||
|
||||
- [Implementing Inception V3 on ROCm with PyTorch](machine_learning/pytorch_inception.md)
|
||||
- [Optimizing Inference with MIGraphX](machine_learning/migraphx_optimization.md)
|
||||
|
||||
:::
|
||||
|
||||
:::::
|
||||
20
docs/examples/machine_learning/all.md
Normal file
20
docs/examples/machine_learning/all.md
Normal file
@@ -0,0 +1,20 @@
|
||||
# Machine Learning, Deep Learning, and Artificial Intelligence
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} Inception V3 with PyTorch
|
||||
:link: pytorch_inception
|
||||
:link-type: doc
|
||||
A collection of detailed and guided examples for working with Inception V3 with PyTorch on ROCm.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Optimizing Inference with MIGraphX
|
||||
:link: migraphx_optimization
|
||||
:link-type: doc
|
||||
Walkthroughs of optimizing inference using MIGraphX.
|
||||
|
||||
:::
|
||||
|
||||
:::::
|
||||
338
docs/examples/machine_learning/migraphx_optimization.md
Normal file
338
docs/examples/machine_learning/migraphx_optimization.md
Normal file
@@ -0,0 +1,338 @@
|
||||
# Inference Optimization with MIGraphX
|
||||
|
||||
The following sections cover inferencing and introduces MIGraphX.
|
||||
|
||||
## Inference
|
||||
|
||||
The inference is where capabilities learned during Deep Learning training are put to work. It refers to using a fully trained neural network to make conclusions (predictions) on unseen data that the model has never interacted with before. Deep Learning inferencing is achieved by feeding new data, such as new images, to the network, giving the Deep Neural Network a chance to classify the image.
|
||||
|
||||
Taking our previous example of MNIST, the DNN can be fed new images of handwritten digit images, allowing the neural network to classify digits. A fully trained DNN should make accurate predictions about what an image represents, and inference cannot happen without training.
|
||||
|
||||
## MIGraphX Introduction
|
||||
|
||||
MIGraphX is a graph compiler focused on accelerating the Machine Learning inference that can target AMD GPUs and CPUs. MIGraphX accelerates the Machine Learning models by leveraging several graph-level transformations and optimizations. These optimizations include:
|
||||
|
||||
- Operator fusion
|
||||
|
||||
- Arithmetic simplifications
|
||||
|
||||
- Dead-code elimination
|
||||
|
||||
- Common subexpression elimination (CSE)
|
||||
|
||||
- Constant propagation
|
||||
|
||||
After doing all these transformations, MIGraphX emits code for the AMD GPU by calling to MIOpen or rocBLAS or creating HIP kernels for a particular operator. MIGraphX can also target CPUs using DNNL or ZenDNN libraries.
|
||||
|
||||
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using MIGraphX's C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
|
||||
|
||||
- Number of arguments
|
||||
|
||||
- Type of arguments
|
||||
|
||||
- Shape of arguments
|
||||
|
||||
After optimization passes, all these operators get mapped to different kernels on GPUs or CPUs.
|
||||
|
||||
After importing a model into MIGraphX, the model is represented as `migraphx::program`. `migraphx::program` is made up of `migraphx::module`. The program can consist of several modules, but it always has one main_module. Modules are made up of `migraphx::instruction_ref`. Instructions contain the `migraphx::op` and arguments to the operator.
|
||||
|
||||
## Installing MIGraphX
|
||||
|
||||
There are three options to get started with MIGraphX installation. MIGraphX depends on ROCm libraries; assume that the machine has ROCm installed.
|
||||
|
||||
### Option 1: Installing Binaries
|
||||
|
||||
To install MIGraphX on Debian-based systems like Ubuntu, use the following command:
|
||||
|
||||
```bash
|
||||
sudo apt update && sudo apt install -y migraphx
|
||||
```
|
||||
|
||||
The header files and libraries are installed under `/opt/rocm-\<version\>`, where \<version\> is the ROCm version.
|
||||
|
||||
### Option 2: Building from Source
|
||||
|
||||
There are two ways to build the MIGraphX sources.
|
||||
|
||||
- [Use the ROCm build tool](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#use-the-rocm-build-tool-rbuild) - This approach uses [rbuild](https://github.com/RadeonOpenCompute/rbuild) to install the prerequisites and build the libraries with just one command.
|
||||
|
||||
or
|
||||
|
||||
- [Use CMake](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#use-cmake-to-build-migraphx) - This approach uses a script to install the prerequisites, then uses CMake to build the source.
|
||||
|
||||
For detailed steps on building from source and installing dependencies, refer to the following `README` file:
|
||||
|
||||
[https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#building-from-source](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#building-from-source)
|
||||
|
||||
### Option 3: Use Docker
|
||||
|
||||
To use Docker, follow these steps:
|
||||
|
||||
1. The easiest way to set up the development environment is to use Docker. To build Docker from scratch, first clone the MIGraphX repository by running:
|
||||
|
||||
```bash
|
||||
git clone --recursive https://github.com/ROCmSoftwarePlatform/AMDMIGraphX
|
||||
```
|
||||
|
||||
2. The repository contains a Dockerfile from which you can build a Docker image as:
|
||||
|
||||
```bash
|
||||
docker build -t migraphx .
|
||||
```
|
||||
|
||||
3. Then to enter the development environment, use Docker run:
|
||||
|
||||
```bash
|
||||
docker run --device='/dev/kfd' --device='/dev/dri' -v=`pwd`:/code/AMDMIGraphX -w /code/AMDMIGraphX --group-add video -it migraphx
|
||||
```
|
||||
|
||||
The Docker image contains all the prerequisites required for the installation, so users can go to the folder `/code/AMDMIGraphX` and follow the steps mentioned in [Option 2: Building from Source](#option-2-building-from-source).
|
||||
|
||||
## MIGraphX Example
|
||||
|
||||
MIGraphX provides both C++ and Python APIs. The following sections show examples of both using the Inception v3 model. To walk through the examples, fetch the Inception v3 ONNX model by running the following:
|
||||
|
||||
```py
|
||||
import torch
|
||||
import torchvision.models as models
|
||||
inception = models.inception_v3(pretrained=True)
|
||||
torch.onnx.export(inception,torch.randn(1,3,299,299), "inceptioni1.onnx")
|
||||
```
|
||||
|
||||
This will create `inceptioni1.onnx`, which can be imported in MIGraphX using C++ or Python API.
|
||||
|
||||
### MIGraphX Python API
|
||||
|
||||
Follow these steps:
|
||||
|
||||
1. To import the MIGraphX module in Python script, set `PYTHONPATH` to the MIGraphX libraries installation. If binaries are installed using steps mentioned in [Option 1: Installing Binaries](#option-1-installing-binaries), perform the following action:
|
||||
|
||||
```bash
|
||||
export PYTHONPATH=$PYTHONPATH:/opt/rocm/
|
||||
```
|
||||
|
||||
2. The following script shows the usage of Python API to import the ONNX model, compile it, and run inference on it. Set `LD_LIBRARY_PATH` to `/opt/rocm/` if required.
|
||||
|
||||
```py
|
||||
# import migraphx and numpy
|
||||
import migraphx
|
||||
import numpy as np
|
||||
# import and parse inception model
|
||||
model = migraphx.parse_onnx("inceptioni1.onnx")
|
||||
# compile model for the GPU target
|
||||
model.compile(migraphx.get_target("gpu"))
|
||||
# optionally print compiled model
|
||||
model.print()
|
||||
# create random input image
|
||||
input_image = np.random.rand(1, 3, 299, 299).astype('float32')
|
||||
# feed image to model, 'x.1` is the input param name
|
||||
results = model.run({'x.1': input_image})
|
||||
# get the results back
|
||||
result_np = np.array(results[0])
|
||||
# print the inferred class of the input image
|
||||
print(np.argmax(result_np))
|
||||
```
|
||||
|
||||
Find additional examples of Python API in the `/examples` directory of the MIGraphX repository.
|
||||
|
||||
## MIGraphX C++ API
|
||||
|
||||
Follow these steps:
|
||||
|
||||
1. The following is a minimalist example that shows the usage of MIGraphX C++ API to load ONNX file, compile it for the GPU, and run inference on it. To use MIGraphX C++ API, you only need to load the `migraphx.hpp` file. This example runs inference on the Inception v3 model.
|
||||
|
||||
```c++
|
||||
#include <vector>
|
||||
#include <string>
|
||||
#include <algorithm>
|
||||
#include <ctime>
|
||||
#include <random>
|
||||
#include <migraphx/migraphx.hpp>
|
||||
|
||||
int main(int argc, char** argv)
|
||||
{
|
||||
migraphx::program prog;
|
||||
migraphx::onnx_options onnx_opts;
|
||||
// import and parse onnx file into migraphx::program
|
||||
prog = parse_onnx("inceptioni1.onnx", onnx_opts);
|
||||
// print imported model
|
||||
prog.print();
|
||||
migraphx::target targ = migraphx::target("gpu");
|
||||
migraphx::compile_options comp_opts;
|
||||
comp_opts.set_offload_copy();
|
||||
// compile for the GPU
|
||||
prog.compile(targ, comp_opts);
|
||||
// print the compiled program
|
||||
prog.print();
|
||||
// randomly generate input image
|
||||
// of shape (1, 3, 299, 299)
|
||||
std::srand(unsigned(std::time(nullptr)));
|
||||
std::vector<float> input_image(1*299*299*3);
|
||||
std::generate(input_image.begin(), input_image.end(), std::rand);
|
||||
// users need to provide data for the input
|
||||
// parameters in order to run inference
|
||||
// you can query into migraph program for the parameters
|
||||
migraphx::program_parameters prog_params;
|
||||
auto param_shapes = prog.get_parameter_shapes();
|
||||
auto input = param_shapes.names().front();
|
||||
// create argument for the parameter
|
||||
prog_params.add(input, migraphx::argument(param_shapes[input], input_image.data()));
|
||||
// run inference
|
||||
auto outputs = prog.eval(prog_params);
|
||||
// read back the output
|
||||
float* results = reinterpret_cast<float*>(outputs[0].data());
|
||||
float* max = std::max_element(results, results + 1000);
|
||||
int answer = max - results;
|
||||
std::cout << "answer: " << answer << std::endl;
|
||||
}
|
||||
```
|
||||
|
||||
2. To compile this program, you can use CMake and you only need to link the `migraphx::c` library to use MIGraphX's C++ API. The following is the `CMakeLists.txt` file that can build the earlier example:
|
||||
|
||||
```cmake
|
||||
cmake_minimum_required(VERSION 3.5)
|
||||
project (CAI)
|
||||
|
||||
set (CMAKE_CXX_STANDARD 14)
|
||||
set (EXAMPLE inception_inference)
|
||||
|
||||
list (APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm)
|
||||
find_package (migraphx)
|
||||
|
||||
message("source file: " ${EXAMPLE}.cpp " ---> bin: " ${EXAMPLE})
|
||||
add_executable(${EXAMPLE} ${EXAMPLE}.cpp)
|
||||
|
||||
target_link_libraries(${EXAMPLE} migraphx::c)
|
||||
```
|
||||
|
||||
3. To build the executable file, run the following from the directory containing the `inception_inference.cpp` file:
|
||||
|
||||
```bash
|
||||
mkdir build
|
||||
cd build
|
||||
cmake ..
|
||||
make -j$(nproc)
|
||||
./inception_inference
|
||||
```
|
||||
|
||||
:::{note}
|
||||
Set `LD_LIBRARY_PATH` to `/opt/rocm/lib` if required during the build. Additional examples can be found in the MIGraphX repository under the `/examples/` directory.
|
||||
:::
|
||||
|
||||
## Tuning MIGraphX
|
||||
|
||||
MIGraphX uses MIOpen kernels to target AMD GPU. For the model compiled with MIGraphX, tune MIOpen to pick the best possible kernel implementation. The MIOpen tuning results in a significant performance boost. Tuning can be done by setting the environment variable `MIOPEN_FIND_ENFORCE=3`.
|
||||
|
||||
:::{note}
|
||||
The tuning process can take a long time to finish.
|
||||
:::
|
||||
|
||||
**Example:** The average inference time of the inception model example shown previously over 100 iterations using untuned kernels is 0.01383ms. After tuning, it reduces to 0.00459ms, which is a 3x improvement. This result is from ROCm v4.5 on a MI100 GPU.
|
||||
|
||||
:::{note}
|
||||
The results may vary depending on the system configurations.
|
||||
:::
|
||||
|
||||
For reference, the following code snippet shows inference runs for only the first 10 iterations for both tuned and untuned kernels:
|
||||
|
||||
```console
|
||||
### UNTUNED ###
|
||||
iterator : 0
|
||||
Inference complete
|
||||
Inference time: 0.063ms
|
||||
iterator : 1
|
||||
Inference complete
|
||||
Inference time: 0.008ms
|
||||
iterator : 2
|
||||
Inference complete
|
||||
Inference time: 0.007ms
|
||||
iterator : 3
|
||||
Inference complete
|
||||
Inference time: 0.007ms
|
||||
iterator : 4
|
||||
Inference complete
|
||||
Inference time: 0.007ms
|
||||
iterator : 5
|
||||
Inference complete
|
||||
Inference time: 0.008ms
|
||||
iterator : 6
|
||||
Inference complete
|
||||
Inference time: 0.007ms
|
||||
iterator : 7
|
||||
Inference complete
|
||||
Inference time: 0.028ms
|
||||
iterator : 8
|
||||
Inference complete
|
||||
Inference time: 0.029ms
|
||||
iterator : 9
|
||||
Inference complete
|
||||
Inference time: 0.029ms
|
||||
|
||||
### TUNED ###
|
||||
iterator : 0
|
||||
Inference complete
|
||||
Inference time: 0.063ms
|
||||
iterator : 1
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
iterator : 2
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
iterator : 3
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
iterator : 4
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
iterator : 5
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
iterator : 6
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
iterator : 7
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
iterator : 8
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
iterator : 9
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
```
|
||||
|
||||
### YModel
|
||||
|
||||
The best inference performance through MIGraphX is conditioned upon having tuned kernel configurations stored in a `/home` local User Database (DB). If a user were to move their model to a different server or allow a different user to use it, they would have to run through the MIOpen tuning process again to populate the next User DB with the best kernel configurations and corresponding solvers.
|
||||
|
||||
Tuning is time consuming, and if the users have not performed tuning, they would see discrepancies between expected or claimed inference performance and actual inference performance. This has led to repetitive and time-consuming tuning tasks for each user.
|
||||
|
||||
MIGraphX introduces a feature, known as YModel, that stores the kernel config parameters found during tuning into a `.mxr` file. This ensures the same level of expected performance, even when a model is copied to a different user/system.
|
||||
|
||||
The YModel feature is available starting from ROCm 5.4.1 and UIF 1.1.
|
||||
|
||||
#### YModel Example
|
||||
|
||||
Through the `migraphx-driver` functionality, you can generate `.mxr` files with tuning information stored inside it by passing additional `--binary --output model.mxr` to `migraphx-driver` along with the rest of the necessary flags.
|
||||
|
||||
For example, to generate `.mxr` file from the ONNX model, use the following:
|
||||
|
||||
```bash
|
||||
./path/to/migraphx-driver compile --onnx resnet50.onnx --enable-offload-copy --binary --output resnet50.mxr
|
||||
```
|
||||
|
||||
To run generated `.mxr` files through `migraphx-driver`, use the following:
|
||||
|
||||
```bash
|
||||
./path/to/migraphx-driver run --migraphx resnet50.mxr --enable-offload-copy
|
||||
```
|
||||
|
||||
Alternatively, you can use MIGraphX's C++ or Python API to generate `.mxr` file. Refer to {numref}`image018` for an example.
|
||||
|
||||
```{figure} ../../data/understand/deep_learning/image.018.png
|
||||
:name: image018
|
||||
---
|
||||
align: center
|
||||
---
|
||||
Generating a `.mxr` File
|
||||
```
|
||||
@@ -1,4 +1,4 @@
|
||||
# Training and Inference Walk-through: Inception V3 with PyTorch
|
||||
# Inception V3 with PyTorch
|
||||
|
||||
## Deep Learning Training
|
||||
|
||||
@@ -15,11 +15,11 @@ Training occurs in multiple phases for every batch of training data. {numref}`Ty
|
||||
:::{table} Types of Training Phases
|
||||
:name: TypesOfTrainingPhases
|
||||
:widths: auto
|
||||
| Types of Phases | |
|
||||
| ----------- | ----------- |
|
||||
| Forward Pass | The input features are fed into the model, whose parameters may be randomly initialized initially. Activations (outputs) of each layer are retained during this pass to help in the loss gradient computation during the backward pass. |
|
||||
| Loss Computation | The output is compared against the target outputs, and the loss is computed. |
|
||||
| Backward Pass | The loss is propagated backward, and the model's error gradients are computed and stored for each trainable parameter. |
|
||||
| Types of Phases | |
|
||||
| ----------------- | --- |
|
||||
| Forward Pass | The input features are fed into the model, whose parameters may be randomly initialized initially. Activations (outputs) of each layer are retained during this pass to help in the loss gradient computation during the backward pass. |
|
||||
| Loss Computation | The output is compared against the target outputs, and the loss is computed. |
|
||||
| Backward Pass | The loss is propagated backward, and the model's error gradients are computed and stored for each trainable parameter. |
|
||||
| Optimization Pass | The optimization algorithm updates the model parameters using the stored error gradients. |
|
||||
:::
|
||||
|
||||
@@ -44,19 +44,19 @@ The following sections contain case studies for the Inception v3 model.
|
||||
|
||||
### Inception v3 with PyTorch
|
||||
|
||||
Convolution Neural Networks are forms of artificial neural networks commonly used for image processing. One of the core layers of such a network is the convolutional layer, which convolves the input with a weight tensor and passes the result to the next layer. Inception v3 [1] is an architectural development over the ImageNet competition-winning entry, AlexNet, using more profound and broader networks while attempting to meet computational and memory budgets.
|
||||
Convolution Neural Networks are forms of artificial neural networks commonly used for image processing. One of the core layers of such a network is the convolutional layer, which convolves the input with a weight tensor and passes the result to the next layer. Inception v3[^inception_arch] is an architectural development over the ImageNet competition-winning entry, AlexNet, using more profound and broader networks while attempting to meet computational and memory budgets.
|
||||
|
||||
The implementation uses PyTorch as a framework. This case study utilizes `torchvision` [2], a repository of popular datasets and model architectures, for obtaining the model. `torchvision` also provides pre-trained weights as a starting point to develop new models or fine-tune the model for a new task.
|
||||
The implementation uses PyTorch as a framework. This case study utilizes `torchvision`[^torch_vision], a repository of popular datasets and model architectures, for obtaining the model. `torchvision` also provides pre-trained weights as a starting point to develop new models or fine-tune the model for a new task.
|
||||
|
||||
#### Evaluating a Pre-Trained Model
|
||||
|
||||
The Inception v3 model introduces a simple image classification task with the pre-trained model. This does not involve training but utilizes an already pre-trained model from `torchvision`.
|
||||
|
||||
This example is adapted from the PyTorch research hub page on Inception v3 [3].
|
||||
This example is adapted from the PyTorch research hub page on Inception v3[^torch_vision_inception].
|
||||
|
||||
Follow these steps:
|
||||
|
||||
1. Run the PyTorch ROCm-based Docker image or refer to the section [Installing PyTorch](https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4-/page/Frameworks_Installation.html#d1667e113) for setting up a PyTorch environment on ROCm.
|
||||
1. Run the PyTorch ROCm-based Docker image or refer to the section [Installing PyTorch](/how_to/pytorch_install/pytorch_install.md) for setting up a PyTorch environment on ROCm.
|
||||
|
||||
```dockerfile
|
||||
docker run -it -v $HOME:/data --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
|
||||
@@ -146,16 +146,16 @@ The previous section focused on downloading and using the Inception v3 model for
|
||||
|
||||
Follow these steps:
|
||||
|
||||
1. Run the PyTorch ROCm Docker image or refer to the section [Installing PyTorch](https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4-/page/Frameworks_Installation.html#d1667e113) for setting up a PyTorch environment on ROCm.
|
||||
1. Run the PyTorch ROCm Docker image or refer to the section [Installing PyTorch](how_to/pytorch_install/pytorch_install.md) for setting up a PyTorch environment on ROCm.
|
||||
|
||||
```dockerfile
|
||||
docker pull rocm/pytorch:latest
|
||||
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
|
||||
```
|
||||
|
||||
2. Download an ImageNet database. For this example, the `tiny-imagenet-200` [4], a smaller ImageNet variant with 200 image classes and a training dataset with 100,000 images, was downsized to 64x64 color images.
|
||||
2. Download an ImageNet database. For this example, the `tiny-imagenet-200`[^Stanford_deep_learning], a smaller ImageNet variant with 200 image classes and a training dataset with 100,000 images, was downsized to 64x64 color images.
|
||||
|
||||
```py
|
||||
```bash
|
||||
wget http://cs231n.stanford.edu/tiny-imagenet-200.zip
|
||||
```
|
||||
|
||||
@@ -357,7 +357,7 @@ Follow these steps:
|
||||
model.to(device)
|
||||
```
|
||||
|
||||
13. Set the loss criteria. For this example, Cross Entropy Loss [5] is used.
|
||||
13. Set the loss criteria. For this example, Cross Entropy Loss[^cross_entropy] is used.
|
||||
|
||||
```py
|
||||
criterion = torch.nn.CrossEntropyLoss()
|
||||
@@ -583,7 +583,7 @@ Follow these steps:
|
||||
import torch.optim as optim
|
||||
```
|
||||
|
||||
10. Set the loss criteria. For this example, Cross Entropy Loss [5] is used.
|
||||
10. Set the loss criteria. For this example, Cross Entropy Loss[^cross_entropy] is used.
|
||||
|
||||
```py
|
||||
criterion = nn.CrossEntropyLoss()
|
||||
@@ -1164,7 +1164,7 @@ To prepare the data for training, follow these steps:
|
||||
---
|
||||
```
|
||||
|
||||
8. A model needs a loss function and an optimizer for training. Since this is a binary classification problem and the model outputs a probability (a single-unit layer with a sigmoid activation), use [losses.BinaryCrossentropy](https://www.tensorflow.org/api_docs/python/tf/keras/losses/BinaryCrossentropy) loss function.
|
||||
8. A model needs a loss function and an optimizer for training. Since this is a binary classification problem and the model outputs a probability (a single-unit layer with a sigmoid activation), use [`losses.BinaryCrossentropy`](https://www.tensorflow.org/api_docs/python/tf/keras/losses/BinaryCrossentropy) loss function.
|
||||
|
||||
```py
|
||||
model.compile(loss=losses.BinaryCrossentropy(from_logits=True),
|
||||
@@ -1272,422 +1272,14 @@ To prepare the data for training, follow these steps:
|
||||
export_model.predict(examples)
|
||||
```
|
||||
|
||||
## Optimization
|
||||
|
||||
The following sections cover inferencing and introduces MIGraphX.
|
||||
|
||||
### Inferencing
|
||||
|
||||
The inference is where capabilities learned during Deep Learning training are put to work. It refers to using a fully trained neural network to make conclusions (predictions) on unseen data that the model has never interacted with before. Deep Learning inferencing is achieved by feeding new data, such as new images, to the network, giving the Deep Neural Network a chance to classify the image.
|
||||
|
||||
Taking our previous example of MNIST, the DNN can be fed new images of handwritten digit images, allowing the neural network to classify digits. A fully trained DNN should make accurate predictions about what an image represents, and inference cannot happen without training.
|
||||
|
||||
### MIGraphX Introduction
|
||||
|
||||
MIGraphX is a graph compiler focused on accelerating the Machine Learning inference that can target AMD GPUs and CPUs. MIGraphX accelerates the Machine Learning models by leveraging several graph-level transformations and optimizations. These optimizations include:
|
||||
|
||||
- Operator fusion
|
||||
|
||||
- Arithmetic simplifications
|
||||
|
||||
- Dead-code elimination
|
||||
|
||||
- Common subexpression elimination (CSE)
|
||||
|
||||
- Constant propagation
|
||||
|
||||
After doing all these transformations, MIGraphX emits code for the AMD GPU by calling to MIOpen or rocBLAS or creating HIP kernels for a particular operator. MIGraphX can also target CPUs using DNNL or ZenDNN libraries.
|
||||
|
||||
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using MIGraphX's C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
|
||||
|
||||
- Number of arguments
|
||||
|
||||
- Type of arguments
|
||||
|
||||
- Shape of arguments
|
||||
|
||||
After optimization passes, all these operators get mapped to different kernels on GPUs or CPUs.
|
||||
|
||||
After importing a model into MIGraphX, the model is represented as `migraphx::program`. `migraphx::program` is made up of `migraphx::module`. The program can consist of several modules, but it always has one main_module. Modules are made up of `migraphx::instruction_ref`. Instructions contain the `migraphx::op` and arguments to the operator.
|
||||
|
||||
### MIGraphX Installation
|
||||
|
||||
There are three options to get started with MIGraphX installation. MIGraphX depends on ROCm libraries; assume that the machine has ROCm installed.
|
||||
|
||||
#### Option 1: Installing Binaries
|
||||
|
||||
To install MIGraphX on Debian-based systems like Ubuntu, use the following command:
|
||||
|
||||
```bash
|
||||
sudo apt update && sudo apt install -y migraphx
|
||||
```
|
||||
|
||||
The header files and libraries are installed under `/opt/rocm-\<version\>`, where \<version\> is the ROCm version.
|
||||
|
||||
#### Option 2: Building from Source
|
||||
|
||||
There are two ways to build the MIGraphX sources.
|
||||
|
||||
- [Use the ROCm build tool](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#use-the-rocm-build-tool-rbuild) - This approach uses [rbuild](https://github.com/RadeonOpenCompute/rbuild) to install the prerequisites and build the libraries with just one command.
|
||||
|
||||
or
|
||||
|
||||
- [Use CMake](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#use-cmake-to-build-migraphx) - This approach uses a script to install the prerequisites, then uses CMake to build the source.
|
||||
|
||||
For detailed steps on building from source and installing dependencies, refer to the following `README` file:
|
||||
|
||||
[https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#building-from-source](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX#building-from-source)
|
||||
|
||||
#### Option 3: Use Docker
|
||||
|
||||
To use Docker, follow these steps:
|
||||
|
||||
1. The easiest way to set up the development environment is to use Docker. To build Docker from scratch, first clone the MIGraphX repository by running:
|
||||
|
||||
```bash
|
||||
git clone --recursive https://github.com/ROCmSoftwarePlatform/AMDMIGraphX
|
||||
```
|
||||
|
||||
2. The repository contains a Dockerfile from which you can build a Docker image as:
|
||||
|
||||
```bash
|
||||
docker build -t migraphx .
|
||||
```
|
||||
|
||||
3. Then to enter the development environment, use Docker run:
|
||||
|
||||
```bash
|
||||
docker run --device='/dev/kfd' --device='/dev/dri' -v=`pwd`:/code/AMDMIGraphX -w /code/AMDMIGraphX --group-add video -it migraphx
|
||||
```
|
||||
|
||||
The Docker image contains all the prerequisites required for the installation, so users can go to the folder /code/AMDMIGraphX and follow the steps mentioned in [Option 2: Building from Source](#option-2-building-from-source).
|
||||
|
||||
### MIGraphX Example
|
||||
|
||||
MIGraphX provides both C++ and Python APIs. The following sections show examples of both using the Inception v3 model. To walk through the examples, fetch the Inception v3 ONNX model by running the following:
|
||||
|
||||
```py
|
||||
import torch
|
||||
import torchvision.models as models
|
||||
inception = models.inception_v3(pretrained=True)
|
||||
torch.onnx.export(inception,torch.randn(1,3,299,299), "inceptioni1.onnx")
|
||||
```
|
||||
|
||||
This will create `inceptioni1.onnx`, which can be imported in MIGraphX using C++ or Python API.
|
||||
|
||||
### MIGraphX Python API
|
||||
|
||||
Follow these steps:
|
||||
|
||||
1. To import the MIGraphX module in Python script, set `PYTHONPATH` to the MIGraphX libraries installation. If binaries are installed using steps mentioned in [Option 1: Installing Binaries](#option-1-installing-binaries), perform the following action:
|
||||
|
||||
```py
|
||||
export PYTHONPATH=$PYTHONPATH:/opt/rocm/
|
||||
```
|
||||
|
||||
2. The following script shows the usage of Python API to import the ONNX model, compile it, and run inference on it. Set `LD_LIBRARY_PATH` to `/opt/rocm/` if required.
|
||||
|
||||
```py
|
||||
# import migraphx and numpy
|
||||
import migraphx
|
||||
import numpy as np
|
||||
# import and parse inception model
|
||||
model = migraphx.parse_onnx("inceptioni1.onnx")
|
||||
# compile model for the GPU target
|
||||
model.compile(migraphx.get_target("gpu"))
|
||||
# optionally print compiled model
|
||||
model.print()
|
||||
# create random input image
|
||||
input_image = np.random.rand(1, 3, 299, 299).astype('float32')
|
||||
# feed image to model, 'x.1` is the input param name
|
||||
results = model.run({'x.1': input_image})
|
||||
# get the results back
|
||||
result_np = np.array(results[0])
|
||||
# print the inferred class of the input image
|
||||
print(np.argmax(result_np))
|
||||
```
|
||||
|
||||
Find additional examples of Python API in the /examples directory of the MIGraphX repository.
|
||||
|
||||
### MIGraphX C++ API
|
||||
|
||||
Follow these steps:
|
||||
|
||||
1. The following is a minimalist example that shows the usage of MIGraphX C++ API to load ONNX file, compile it for the GPU, and run inference on it. To use MIGraphX C++ API, you only need to load the `migraphx.hpp` file. This example runs inference on the Inception v3 model.
|
||||
|
||||
```c++
|
||||
#include <vector>
|
||||
#include <string>
|
||||
#include <algorithm>
|
||||
#include <ctime>
|
||||
#include <random>
|
||||
#include <migraphx/migraphx.hpp>
|
||||
|
||||
int main(int argc, char** argv)
|
||||
{
|
||||
migraphx::program prog;
|
||||
migraphx::onnx_options onnx_opts;
|
||||
// import and parse onnx file into migraphx::program
|
||||
prog = parse_onnx("inceptioni1.onnx", onnx_opts);
|
||||
// print imported model
|
||||
prog.print();
|
||||
migraphx::target targ = migraphx::target("gpu");
|
||||
migraphx::compile_options comp_opts;
|
||||
comp_opts.set_offload_copy();
|
||||
// compile for the GPU
|
||||
prog.compile(targ, comp_opts);
|
||||
// print the compiled program
|
||||
prog.print();
|
||||
// randomly generate input image
|
||||
// of shape (1, 3, 299, 299)
|
||||
std::srand(unsigned(std::time(nullptr)));
|
||||
std::vector<float> input_image(1*299*299*3);
|
||||
std::generate(input_image.begin(), input_image.end(), std::rand);
|
||||
// users need to provide data for the input
|
||||
// parameters in order to run inference
|
||||
// you can query into migraph program for the parameters
|
||||
migraphx::program_parameters prog_params;
|
||||
auto param_shapes = prog.get_parameter_shapes();
|
||||
auto input = param_shapes.names().front();
|
||||
// create argument for the parameter
|
||||
prog_params.add(input, migraphx::argument(param_shapes[input], input_image.data()));
|
||||
// run inference
|
||||
auto outputs = prog.eval(prog_params);
|
||||
// read back the output
|
||||
float* results = reinterpret_cast<float*>(outputs[0].data());
|
||||
float* max = std::max_element(results, results + 1000);
|
||||
int answer = max - results;
|
||||
std::cout << "answer: " << answer << std::endl;
|
||||
}
|
||||
```
|
||||
|
||||
2. To compile this program, you can use CMake and you only need to link the `migraphx::c` library to use MIGraphX's C++ API. The following is the `CMakeLists.txt` file that can build the earlier example:
|
||||
|
||||
```py
|
||||
cmake_minimum_required(VERSION 3.5)
|
||||
project (CAI)
|
||||
|
||||
set (CMAKE_CXX_STANDARD 14)
|
||||
set (EXAMPLE inception_inference)
|
||||
|
||||
list (APPEND CMAKE_PREFIX_PATH /opt/rocm/hip /opt/rocm)
|
||||
find_package (migraphx)
|
||||
|
||||
message("source file: " ${EXAMPLE}.cpp " ---> bin: " ${EXAMPLE})
|
||||
add_executable(${EXAMPLE} ${EXAMPLE}.cpp)
|
||||
|
||||
target_link_libraries(${EXAMPLE} migraphx::c)
|
||||
```
|
||||
|
||||
3. To build the executable file, run the following from the directory containing the `inception_inference.cpp` file:
|
||||
|
||||
```py
|
||||
mkdir build
|
||||
cd build
|
||||
cmake ..
|
||||
make -j$(nproc)
|
||||
./inception_inference
|
||||
```
|
||||
|
||||
:::{note}
|
||||
Set `LD_LIBRARY_PATH` to `/opt/rocm/lib` if required during the build. Additional examples can be found in the MIGraphX repository under the `/examples/` directory.
|
||||
:::
|
||||
|
||||
### Tuning MIGraphX
|
||||
|
||||
MIGraphX uses MIOpen kernels to target AMD GPU. For the model compiled with MIGraphX, tune MIOpen to pick the best possible kernel implementation. The MIOpen tuning results in a significant performance boost. Tuning can be done by setting the environment variable MIOPEN_FIND_ENFORCE=3.
|
||||
|
||||
:::{note}
|
||||
The tuning process can take a long time to finish.
|
||||
:::
|
||||
|
||||
**Example:** The average inference time of the inception model example shown previously over 100 iterations using untuned kernels is 0.01383ms. After tuning, it reduces to 0.00459ms, which is a 3x improvement. This result is from ROCm v4.5 on a MI100 GPU.
|
||||
|
||||
:::{note}
|
||||
The results may vary depending on the system configurations.
|
||||
:::
|
||||
|
||||
For reference, the following code snippet shows inference runs for only the first 10 iterations for both tuned and untuned kernels:
|
||||
|
||||
```py
|
||||
### UNTUNED ###
|
||||
iterator : 0
|
||||
Inference complete
|
||||
Inference time: 0.063ms
|
||||
iterator : 1
|
||||
Inference complete
|
||||
Inference time: 0.008ms
|
||||
iterator : 2
|
||||
Inference complete
|
||||
Inference time: 0.007ms
|
||||
iterator : 3
|
||||
Inference complete
|
||||
Inference time: 0.007ms
|
||||
iterator : 4
|
||||
Inference complete
|
||||
Inference time: 0.007ms
|
||||
iterator : 5
|
||||
Inference complete
|
||||
Inference time: 0.008ms
|
||||
iterator : 6
|
||||
Inference complete
|
||||
Inference time: 0.007ms
|
||||
iterator : 7
|
||||
Inference complete
|
||||
Inference time: 0.028ms
|
||||
iterator : 8
|
||||
Inference complete
|
||||
Inference time: 0.029ms
|
||||
iterator : 9
|
||||
Inference complete
|
||||
Inference time: 0.029ms
|
||||
|
||||
### TUNED ###
|
||||
iterator : 0
|
||||
Inference complete
|
||||
Inference time: 0.063ms
|
||||
iterator : 1
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
iterator : 2
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
iterator : 3
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
iterator : 4
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
iterator : 5
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
iterator : 6
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
iterator : 7
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
iterator : 8
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
iterator : 9
|
||||
Inference complete
|
||||
Inference time: 0.004ms
|
||||
```
|
||||
|
||||
#### YModel
|
||||
|
||||
The best inference performance through MIGraphX is conditioned upon having tuned kernel configurations stored in a /home local User Database (DB). If a user were to move their model to a different server or allow a different user to use it, they would have to run through the MIOpen tuning process again to populate the next User DB with the best kernel configurations and corresponding solvers.
|
||||
|
||||
Tuning is time consuming, and if the users have not performed tuning, they would see discrepancies between expected or claimed inference performance and actual inference performance. This has led to repetitive and time-consuming tuning tasks for each user.
|
||||
|
||||
MIGraphX introduces a feature, known as YModel, that stores the kernel config parameters found during tuning into a `.mxr` file. This ensures the same level of expected performance, even when a model is copied to a different user/system.
|
||||
|
||||
The YModel feature is available starting from ROCm 5.4.1 and UIF 1.1.
|
||||
|
||||
##### YModel Example
|
||||
|
||||
Through the `migraphx-driver` functionality, you can generate `.mxr` files with tuning information stored inside it by passing additional `--binary --output model.mxr` to `migraphx-driver` along with the rest of the necessary flags.
|
||||
|
||||
For example, to generate `.mxr` file from the ONNX model, use the following:
|
||||
|
||||
```bash
|
||||
./path/to/migraphx-driver compile --onnx resnet50.onnx --enable-offload-copy --binary --output resnet50.mxr
|
||||
```
|
||||
|
||||
To run generated `.mxr` files through `migraphx-driver`, use the following:
|
||||
|
||||
```bash
|
||||
./path/to/migraphx-driver run --migraphx resnet50.mxr --enable-offload-copy
|
||||
```
|
||||
|
||||
Alternatively, you can use MIGraphX's C++ or Python API to generate `.mxr` file. Refer to {numref}`image018` for an example.
|
||||
|
||||
```{figure} ../../data/understand/deep_learning/image.018.png
|
||||
:name: image018
|
||||
---
|
||||
align: center
|
||||
---
|
||||
Generating a `.mxr` File
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Q: What do I do if I get this error when trying to run PyTorch:**
|
||||
|
||||
```bash
|
||||
hipErrorNoBinaryForGPU: Unable to find code object for all current devices!
|
||||
```
|
||||
|
||||
Ans: The error denotes that the installation of PyTorch and/or other dependencies or libraries do not support the current GPU.
|
||||
|
||||
**Workaround:**
|
||||
|
||||
To implement a workaround, follow these steps:
|
||||
|
||||
1. Confirm that the hardware supports the ROCm stack. Refer to the Hardware and Software Support document at [https://docs.amd.com](https://docs.amd.com).
|
||||
|
||||
2. Determine the gfx target.
|
||||
|
||||
```py
|
||||
rocminfo | grep gfx
|
||||
```
|
||||
|
||||
3. Check if PyTorch is compiled with the correct gfx target.
|
||||
|
||||
```py
|
||||
TORCHDIR=$( dirname $( python3 -c 'import torch; print(torch.__file__)' ) )
|
||||
roc-obj-ls -v $TORCHDIR/lib/libtorch_hip.so # check for gfx target
|
||||
```
|
||||
|
||||
:::{note}
|
||||
Recompile PyTorch with the right gfx target if compiling from the source if the hardware is not supported. For wheels or Docker installation, contact ROCm support [6].
|
||||
:::
|
||||
|
||||
**Q: Why am I unable to access Docker or GPU in user accounts?**
|
||||
|
||||
Ans: Ensure that the user is added to docker, video, and render Linux groups as described in the ROCm Installation Guide at [https://docs.amd.com](https://docs.amd.com).
|
||||
|
||||
**Q: Which consumer GPUs does ROCm support?**
|
||||
|
||||
Ans: ROCm supports gfx1030, which is the Navi 21 series.
|
||||
|
||||
**Q: Can I install PyTorch directly on bare metal?**
|
||||
|
||||
Ans: Bare-metal installation of PyTorch is supported through wheels. Refer to Option 2: Install PyTorch Using Wheels Package in the section [Installing PyTorch](/ROCm/docs/how_to/pytorch_install/pytorch_install) of this guide for more information.
|
||||
|
||||
**Q: How do I profile PyTorch workloads?**
|
||||
|
||||
Ans: Use the PyTorch Profiler \[6\] to profile GPU kernels on ROCm.
|
||||
|
||||
**Q: Can I run ROCm on Windows?**
|
||||
|
||||
Ans: ROCm is not supported on Windows.
|
||||
|
||||
## References
|
||||
|
||||
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," CoRR, p. abs/1512.00567, 2015
|
||||
[^inception_arch]: C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," CoRR, p. abs/1512.00567, 2015
|
||||
|
||||
PyTorch, \[Online\]. Available: [https://pytorch.org/vision/stable/index.html](https://pytorch.org/vision/stable/index.html)
|
||||
[^torch_vision]: PyTorch, \[Online\]. Available: [https://pytorch.org/vision/stable/index.html](https://pytorch.org/vision/stable/index.html)
|
||||
|
||||
PyTorch, \[Online\]. Available: [https://pytorch.org/hub/pytorch_vision_inception_v3/](https://pytorch.org/hub/pytorch_vision_inception_v3/)
|
||||
[^torch_vision_inception]: PyTorch, \[Online\]. Available: [https://pytorch.org/hub/pytorch_vision_inception_v3/](https://pytorch.org/hub/pytorch_vision_inception_v3/)
|
||||
|
||||
Stanford, \[Online\]. Available: [http://cs231n.stanford.edu/](http://cs231n.stanford.edu/)
|
||||
[^Stanford_deep_learning]: Stanford, \[Online\]. Available: [http://cs231n.stanford.edu/](http://cs231n.stanford.edu/)
|
||||
|
||||
Wikipedia, \[Online\]. Available: [https://en.wikipedia.org/wiki/Cross_entropy](https://en.wikipedia.org/wiki/Cross_entropy)
|
||||
|
||||
AMD, "ROCm issues," \[Online\]. Available: [https://github.com/RadeonOpenCompute/ROCm/issues](https://github.com/RadeonOpenCompute/ROCm/issues)
|
||||
|
||||
PyTorch, \[Online image\]. [https://pytorch.org/assets/brand-guidelines/PyTorch-Brand-Guidelines.pdf](https://pytorch.org/assets/brand-guidelines/PyTorch-Brand-Guidelines.pdf)
|
||||
|
||||
TensorFlow, \[Online image\]. [https://www.tensorflow.org/extras/tensorflow_brand_guidelines.pdf](https://www.tensorflow.org/extras/tensorflow_brand_guidelines.pdf)
|
||||
|
||||
MAGMA, \[Online image\]. [https://bitbucket.org/icl/magma/src/master/docs/](https://bitbucket.org/icl/magma/src/master/docs/)
|
||||
|
||||
Advanced Micro Devices, Inc., \[Online\]. Available: [https://rocmsoftwareplatform.github.io/AMDMIGraphX/doc/html/](https://rocmsoftwareplatform.github.io/AMDMIGraphX/doc/html/)
|
||||
|
||||
Advanced Micro Devices, Inc., \[Online\]. Available: [https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/wiki](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/wiki)
|
||||
|
||||
Docker, \[Online\]. [https://docs.docker.com/get-started/overview/](https://docs.docker.com/get-started/overview/)
|
||||
|
||||
Torchvision, \[Online\]. Available [https://pytorch.org/vision/master/index.html?highlight=torchvision#module-torchvision](https://pytorch.org/vision/master/index.html?highlight=torchvision#module-torchvision)
|
||||
[^cross_entropy]: Wikipedia, \[Online\]. Available: [https://en.wikipedia.org/wiki/Cross_entropy](https://en.wikipedia.org/wiki/Cross_entropy)
|
||||
56
docs/examples/troubleshooting.md
Normal file
56
docs/examples/troubleshooting.md
Normal file
@@ -0,0 +1,56 @@
|
||||
|
||||
# Troubleshooting
|
||||
|
||||
**Q: What do I do if I get this error when trying to run PyTorch:**
|
||||
|
||||
```bash
|
||||
hipErrorNoBinaryForGPU: Unable to find code object for all current devices!
|
||||
```
|
||||
|
||||
Ans: The error denotes that the installation of PyTorch and/or other
|
||||
dependencies or libraries do not support the current GPU.
|
||||
|
||||
**Workaround:**
|
||||
|
||||
To implement a workaround, follow these steps:
|
||||
|
||||
1. Confirm that the hardware supports the ROCm stack. Refer to
|
||||
{ref}`supported_gpus`.
|
||||
|
||||
2. Determine the gfx target.
|
||||
|
||||
```bash
|
||||
rocminfo | grep gfx
|
||||
```
|
||||
|
||||
3. Check if PyTorch is compiled with the correct gfx target.
|
||||
|
||||
```bash
|
||||
TORCHDIR=$( dirname $( python3 -c 'import torch; print(torch.__file__)' ) )
|
||||
roc-obj-ls -v $TORCHDIR/lib/libtorch_hip.so # check for gfx target
|
||||
```
|
||||
|
||||
:::{note}
|
||||
Recompile PyTorch with the right gfx target if compiling from the source if
|
||||
the hardware is not supported. For wheels or Docker installation, contact
|
||||
ROCm support [^ROCm_issues].
|
||||
:::
|
||||
|
||||
**Q: Why am I unable to access Docker or GPU in user accounts?**
|
||||
|
||||
Ans: Ensure that the user is added to docker, video, and render Linux groups as
|
||||
described in the ROCm Installation Guide at {ref}`setting_group_permissions`.
|
||||
|
||||
**Q: Can I install PyTorch directly on bare metal?**
|
||||
|
||||
Ans: Bare-metal installation of PyTorch is supported through wheels. Refer to
|
||||
Option 2: Install PyTorch Using Wheels Package in the section
|
||||
{ref}`install_pytorch_using_wheels` of this guide for more information.
|
||||
|
||||
**Q: How do I profile PyTorch workloads?**
|
||||
|
||||
Ans: Use the PyTorch Profiler to profile GPU kernels on ROCm.
|
||||
|
||||
------
|
||||
|
||||
[^ROCm_issues]: AMD, "ROCm issues," \[Online\]. Available: [https://github.com/RadeonOpenCompute/ROCm/issues](https://github.com/RadeonOpenCompute/ROCm/issues)
|
||||
34
docs/how_to/all.md
Normal file
34
docs/how_to/all.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# All How-To Material
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} Tuning Guides
|
||||
:link: tuning_guides/index
|
||||
:link-type: doc
|
||||
Use case-specific system setup and tuning guides.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Deep Learning Guide
|
||||
:link: deep_learning_rocm
|
||||
:link-type: doc
|
||||
Installation of various Deep Learning frameworks and applications.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} GPU-Enabled MPI
|
||||
:link: gpu_aware_mpi
|
||||
:link-type: doc
|
||||
This chapter exemplifies how to set up Open MPI with the ROCm platform.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} System Debugging Guide
|
||||
:link: system_debugging
|
||||
:link-type: doc
|
||||
Useful commands to debug misbehaving ROCm installations.
|
||||
|
||||
:::
|
||||
|
||||
:::::
|
||||
@@ -1,7 +1,10 @@
|
||||
# Deep Learning Guide
|
||||
|
||||
The following sections cover the different framework installations for ROCm and
|
||||
Deep Learning applications. {numref}`Rocm-Compat-Frameworks-Flowchart` provides the sequential flow for the use of each framework. Refer to the ROCm Compatible Frameworks Release Notes for each framework's most current release notes at [Framework Release Notes](https://docs.amd.com/bundle/ROCm-Compatible-Frameworks-Release-Notes/page/Framework_Release_Notes.html).
|
||||
Deep Learning applications. {numref}`Rocm-Compat-Frameworks-Flowchart` provides
|
||||
the sequential flow for the use of each framework. Refer to the ROCm Compatible
|
||||
Frameworks Release Notes for each framework's most current release notes at
|
||||
{ref}`ml_framework_compat_matrix`.
|
||||
|
||||
```{figure} ../data/how_to/magma_install/image.005.png
|
||||
:name: Rocm-Compat-Frameworks-Flowchart
|
||||
@@ -14,5 +17,5 @@ ROCm Compatible Frameworks Flowchart
|
||||
## Frameworks Installation
|
||||
|
||||
- [How to Install PyTorch?](pytorch_install/pytorch_install)
|
||||
- [How to Install Tensorflow?](tensorflow_install/tensorflow_install)
|
||||
- [How to Install Magma?](magma_install/magma_install)
|
||||
- [How to Install Magma?](tensorflow_install/tensorflow_install)
|
||||
|
||||
@@ -61,7 +61,7 @@ The next step is to set up UCX by compiling its source code and install it:
|
||||
```shell
|
||||
export UCX_DIR=$INSTALL_DIR/ucx
|
||||
cd $BUILD_DIR
|
||||
git clone https://github.com/openucx/ucx.git -b v1.13.0
|
||||
git clone https://github.com/openucx/ucx.git -b v1.14.1
|
||||
cd ucx
|
||||
./autogen.sh
|
||||
mkdir build
|
||||
@@ -75,6 +75,10 @@ make -j $(nproc)
|
||||
make -j $(nproc) install
|
||||
```
|
||||
|
||||
The following
|
||||
[table](../release/3rd_party_support_matrix.md#communication-libraries)
|
||||
documents the compatibility of UCX versions with ROCm versions.
|
||||
|
||||
## Install Open MPI
|
||||
|
||||
These are the steps to build Open MPI:
|
||||
@@ -89,6 +93,7 @@ cd ompi
|
||||
mkdir build
|
||||
cd build
|
||||
../configure --prefix=$OMPI_DIR --with-ucx=$UCX_DIR \
|
||||
--with-rocm=/opt/rocm \
|
||||
--enable-mca-no-build=btl-uct --enable-mpi1-compatibility \
|
||||
CC=clang CXX=clang++ FC=flang
|
||||
make -j $(nproc)
|
||||
@@ -97,7 +102,7 @@ make -j $(nproc) install
|
||||
|
||||
## ROCm-enabled OSU
|
||||
|
||||
he OSU Micro Benchmarks v5.9 (OMB) can be used to evaluate the performance of
|
||||
The OSU Micro Benchmarks v5.9 (OMB) can be used to evaluate the performance of
|
||||
various primitives with an AMD GPU device and ROCm support. This functionality
|
||||
is exposed when configured with `--enable-rocm` option. We can use the following
|
||||
steps to compile OMB:
|
||||
@@ -118,13 +123,21 @@ make -j $(nproc)
|
||||
|
||||
## Intra-node Run
|
||||
|
||||
Before running an Open MPI job, it is essential to set some environment variables to
|
||||
ensure that the correct version of Open MPI and UCX is being used.
|
||||
|
||||
```shell
|
||||
export LD_LIBRARY_PATH=$OMPI_DIR/lib:$UCX_DIR/lib:/opt/rocm/lib
|
||||
export PATH=$OMPI_DIR/bin:$PATH
|
||||
```
|
||||
|
||||
The following command runs the OSU bandwidth benchmark between the first two GPU
|
||||
devices (i.e., GPU 0 and GPU 1, same OAM) by default inside the same node. It
|
||||
measures the unidirectional bandwidth from the first device to the other.
|
||||
|
||||
```shell
|
||||
$OMPI_DIR/bin/mpirun -np 2 --mca btl '^openib' \
|
||||
-x UCX_TLS=sm,self,rocm_copy,rocm_ipc \
|
||||
$OMPI_DIR/bin/mpirun -np 2 \
|
||||
-x UCX_TLS=sm,self,rocm \
|
||||
--mca pml ucx mpi/pt2pt/osu_bw -d rocm D D
|
||||
```
|
||||
|
||||
@@ -146,3 +159,37 @@ connection:
|
||||
:alt: OSU execution showing transfer bandwidth increasing alongside payload inc.
|
||||
Inter-GPU bandwidth with various payload sizes.
|
||||
:::
|
||||
|
||||
## Collective Operations
|
||||
|
||||
Collective Operations on GPU buffers are best handled through the
|
||||
Unified Collective Communication Library (UCC) component in Open MPI.
|
||||
For this, the UCC library has to be configured and compiled with ROCm
|
||||
support. An example for configuring UCC and Open MPI with ROCm support
|
||||
is shown below:
|
||||
|
||||
```shell
|
||||
export UCC_DIR=$INSTALL_DIR/ucc
|
||||
git clone https://github.com/openucx/ucc.git
|
||||
cd ucc
|
||||
./configure --with-rocm=/opt/rocm \
|
||||
--with-ucx=$UCX_DIR \
|
||||
--prefix=$UCC_DIR
|
||||
make -j && make install
|
||||
|
||||
# Configure and compile Open MPI with UCX, UCC, and ROCm support
|
||||
cd ompi
|
||||
./configure --with-rocm=/opt/rocm \
|
||||
--with-ucx=$UCX_DIR \
|
||||
--with-ucc=$UCC_DIR
|
||||
--prefix=$OMPI_DIR
|
||||
```
|
||||
|
||||
To use the UCC component with an MPI application requires setting some
|
||||
additional parameters:
|
||||
|
||||
```shell
|
||||
mpirun --mca pml ucx --mca osc ucx \
|
||||
--mca coll_ucc_enable 1 \
|
||||
--mca coll_ucc_priority 100 -np 64 ./my_mpi_app
|
||||
```
|
||||
|
||||
@@ -14,10 +14,12 @@ automatic differentiation. Other advanced features include:
|
||||
|
||||
### Installing PyTorch
|
||||
|
||||
To install ROCm on bare metal, refer to the section
|
||||
[ROCm Installation](https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4-/page/Prerequisites.html#d2999e60).
|
||||
The recommended option to get a PyTorch environment is through Docker. However,
|
||||
installing the PyTorch wheels package on bare metal is also supported.
|
||||
To install ROCm on bare metal, refer to the sections
|
||||
[GPU and OS Support (Linux)](../../release/gpu_os_support.md) and
|
||||
[Compatibility](../../release/compatibility.md) for hardware, software and
|
||||
3rd-party framework compatibility between ROCm and PyTorch. The recommended
|
||||
option to get a PyTorch environment is through Docker. However, installing the
|
||||
PyTorch wheels package on bare metal is also supported.
|
||||
|
||||
#### Option 1 (Recommended): Use Docker Image with PyTorch Pre-Installed
|
||||
|
||||
@@ -51,6 +53,8 @@ Follow these steps:
|
||||
onto the container.
|
||||
:::
|
||||
|
||||
(install_pytorch_using_wheels)=
|
||||
|
||||
#### Option 2: Install PyTorch Using Wheels Package
|
||||
|
||||
PyTorch supports the ROCm platform by providing tested wheels packages. To
|
||||
@@ -77,9 +81,9 @@ To install PyTorch using the wheels package, follow these installation steps:
|
||||
|
||||
b. Download a base OS Docker image and install ROCm following the
|
||||
installation directions in the section
|
||||
[Installation](https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4-/page/Prerequisites.html#d2999e60).
|
||||
ROCm 5.2 is installed in this example, as supported by the installation
|
||||
matrix from <http://pytorch.org/>.
|
||||
[Installation](../../deploy/linux/install.md). ROCm 5.2 is installed in
|
||||
this example, as supported by the installation matrix from
|
||||
<http://pytorch.org/>.
|
||||
|
||||
or
|
||||
|
||||
@@ -152,7 +156,7 @@ Follow these steps:
|
||||
cd ~
|
||||
git clone https://github.com/pytorch/pytorch.git
|
||||
cd pytorch
|
||||
git submodule update --init –recursive
|
||||
git submodule update --init --recursive
|
||||
```
|
||||
|
||||
4. Build PyTorch for ROCm.
|
||||
@@ -194,7 +198,7 @@ Follow these steps:
|
||||
|
||||
```bash
|
||||
python3 tools/amd_build/build_amd.py
|
||||
USE_ROCM=1 MAX_JOBS=4 python3 setup.py install ––user
|
||||
USE_ROCM=1 MAX_JOBS=4 python3 setup.py install --user
|
||||
```
|
||||
|
||||
#### Option 4: Install Using PyTorch Upstream Docker File
|
||||
@@ -217,7 +221,7 @@ Follow these steps:
|
||||
cd ~
|
||||
git clone https://github.com/pytorch/pytorch.git
|
||||
cd pytorch
|
||||
git submodule update --init –recursive
|
||||
git submodule update --init --recursive
|
||||
```
|
||||
|
||||
2. Build the PyTorch Docker image.
|
||||
|
||||
@@ -52,7 +52,7 @@ Debug messages when developing/debugging base ROCm driver. You could enable the
|
||||
|
||||
## Turn Off Page Retry on GFX9/Vega Devices
|
||||
|
||||
`sudo –s`
|
||||
`sudo -s`
|
||||
|
||||
`echo 1 > /sys/module/amdkfd/parameters/noretry`
|
||||
|
||||
@@ -65,4 +65,4 @@ Debug messages when developing/debugging base ROCm driver. You could enable the
|
||||
## PCIe-Debug
|
||||
|
||||
Refer to ROCm PCIe Debug, <a href="https://rocmdocs.amd.com/en/latest/Other_Solutions/PCIe-Debug.html#pcie-debug" target="_blank">https://rocmdocs.amd.com/en/latest/Other_Solutions/PCIe-Debug.html#pcie-debug</a>.
|
||||
For information on how to debug and profile HIP applications, see <a href="https://rocmdocs.amd.com/projects/HIP/en/latest/how_to_guides/debugging.html" target="_blank">https://rocmdocs.amd.com/projects/HIP/en/latest/how_to_guides/debugging.html</a>
|
||||
For information on how to debug and profile HIP applications, see {doc}`hip:how_to_guides/debugging`
|
||||
|
||||
@@ -16,8 +16,8 @@ The following sections contain options for installing TensorFlow.
|
||||
#### Option 1: Install TensorFlow Using Docker Image
|
||||
|
||||
To install ROCm on bare metal, follow the section
|
||||
[ROCm Installation](https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4-/page/Prerequisites.html#d2999e60).
|
||||
The recommended option to get a TensorFlow environment is through Docker.
|
||||
[Installation (Linux)](../../deploy/linux/install.md). The recommended option to
|
||||
get a TensorFlow environment is through Docker.
|
||||
|
||||
Using Docker provides portability and access to a prebuilt Docker container that
|
||||
has been rigorously tested within AMD. This might also save compilation time and
|
||||
@@ -45,7 +45,7 @@ To install TensorFlow using the wheels package, follow these steps:
|
||||
1. Check the Python version.
|
||||
|
||||
```bash
|
||||
python3 –version
|
||||
python3 --version
|
||||
```
|
||||
|
||||
| If: | Then: |
|
||||
@@ -105,7 +105,7 @@ To install TensorFlow using the wheels package, follow these steps:
|
||||
5. Install TensorFlow for the Python version as indicated in Step 2.
|
||||
|
||||
```bash
|
||||
/usr/bin/python[version] -m pip install --user tensorflow-rocm==[wheel-version] –upgrade
|
||||
/usr/bin/python[version] -m pip install --user tensorflow-rocm==[wheel-version] --upgrade
|
||||
```
|
||||
|
||||
For a valid wheel version for a ROCm release, refer to the instruction below:
|
||||
|
||||
@@ -1,5 +1,7 @@
|
||||
# Tuning Guides
|
||||
|
||||
Use case-specific system setup and tuning guides.
|
||||
|
||||
## High Performance Computing
|
||||
|
||||
High Performance Computing (HPC) workloads have unique requirements. The default
|
||||
|
||||
@@ -83,78 +83,97 @@ available as listed in {numref}`mi100-bios`.
|
||||
- AMD CBS / NBIO Common Options
|
||||
- IOMMU
|
||||
- Disable
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options
|
||||
- PCIe Ten Bit Tag Support
|
||||
- Enable
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options
|
||||
- Preferred IO
|
||||
- Manual
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options
|
||||
- Preferred IO Bus
|
||||
- "Use lspci to find pci device id"
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options
|
||||
- Enhanced Preferred IO Mode
|
||||
- Enable
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- Determinism Control
|
||||
- Manual
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- Determinism Slider
|
||||
- Power
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- cTDP Control
|
||||
- Manual
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- cTDP
|
||||
- 240
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- Package Power Limit Control
|
||||
- Manual
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- Package Power Limit
|
||||
- 240
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- xGMI Link Width Control
|
||||
- Manual
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- xGMI Force Link Width
|
||||
- 2
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- xGMI Force Link Width Control
|
||||
- Force
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- APBDIS
|
||||
- 1
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- DF C-states
|
||||
- Auto
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- Fixed SOC P-state
|
||||
- P0
|
||||
-
|
||||
*
|
||||
- AMD CBS / UMC Common Options / DDR4 Common Options
|
||||
- Enforce POR
|
||||
- Accept
|
||||
-
|
||||
*
|
||||
- AMD CBS / UMC Common Options / DDR4 Common Options / Enforce POR
|
||||
- Overclock
|
||||
- Enabled
|
||||
-
|
||||
*
|
||||
- AMD CBS / UMC Common Options / DDR4 Common Options / Enforce POR
|
||||
- Memory Clock Speed
|
||||
|
||||
@@ -27,8 +27,6 @@ Analogous settings for other non-AMI System BIOS providers could be set
|
||||
similarly. For systems with Intel processors, some settings may not apply or be
|
||||
available as listed in {numref}`mi200-bios`.
|
||||
|
||||
Table 2: Recommended settings for the system BIOS in a GIGABYTE platform.
|
||||
|
||||
```{list-table} Recommended settings for the system BIOS in a GIGABYTE platform.
|
||||
:header-rows: 1
|
||||
:name: mi200-bios
|
||||
@@ -82,30 +80,37 @@ Table 2: Recommended settings for the system BIOS in a GIGABYTE platform.
|
||||
- AMD CBS / NBIO Common Options
|
||||
- IOMMU
|
||||
- Disable
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options
|
||||
- PCIe Ten Bit Tag Support
|
||||
- Auto
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options
|
||||
- Preferred IO
|
||||
- Bus
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options
|
||||
- Preferred IO Bus
|
||||
- "Use lspci to find pci device id"
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options
|
||||
- Enhanced Preferred IO Mode
|
||||
- Enable
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- Determinism Control
|
||||
- Manual
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- Determinism Slider
|
||||
- Power
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- cTDP Control
|
||||
@@ -115,6 +120,7 @@ Table 2: Recommended settings for the system BIOS in a GIGABYTE platform.
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- cTDP
|
||||
- 280
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- Package Power Limit Control
|
||||
@@ -124,6 +130,7 @@ Table 2: Recommended settings for the system BIOS in a GIGABYTE platform.
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- Package Power Limit
|
||||
- 280
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- xGMI Link Width Control
|
||||
@@ -133,30 +140,37 @@ Table 2: Recommended settings for the system BIOS in a GIGABYTE platform.
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- xGMI Force Link Width
|
||||
- 2
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- xGMI Force Link Width Control
|
||||
- Force
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- APBDIS
|
||||
- 1
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- DF C-states
|
||||
- Enabled
|
||||
-
|
||||
*
|
||||
- AMD CBS / NBIO Common Options / SMU Common Options
|
||||
- Fixed SOC P-state
|
||||
- P0
|
||||
-
|
||||
*
|
||||
- AMD CBS / UMC Common Options / DDR4 Common Options
|
||||
- Enforce POR
|
||||
- Accept
|
||||
-
|
||||
*
|
||||
- AMD CBS / UMC Common Options / DDR4 Common Options / Enforce POR
|
||||
- Overclock
|
||||
- Enabled
|
||||
-
|
||||
*
|
||||
- AMD CBS / UMC Common Options / DDR4 Common Options / Enforce POR
|
||||
- Memory Clock Speed
|
||||
|
||||
@@ -1,4 +0,0 @@
|
||||
# Inference Optimization Using MIGraphX
|
||||
|
||||
Pull content from
|
||||
<https://docs.amd.com/bundle/ROCm-Deep-Learning-Guide-v5.4.1/page/Optimization.html>
|
||||
@@ -1,4 +1,4 @@
|
||||
# AMD ROCm™ Platform - Powering Your GPU Computational Needs
|
||||
# AMD ROCm™ Documentation
|
||||
|
||||
:::::{grid} 1 1 3 3
|
||||
:gutter: 1
|
||||
@@ -14,7 +14,7 @@ agile, flexible, rapid and secure manner. [more...](rocm)
|
||||
::::
|
||||
|
||||
::::{grid-item}
|
||||
:::{dropdown} [Deploy ROCm](deploy)
|
||||
:::{dropdown} Deploy ROCm
|
||||
|
||||
- {doc}`/deploy/linux/index`
|
||||
- {doc}`/deploy/docker`
|
||||
@@ -44,14 +44,14 @@ agile, flexible, rapid and secure manner. [more...](rocm)
|
||||
[APIs and Reference](reference/all)
|
||||
^^^
|
||||
|
||||
- [Compilers and Development Tools](reference/compilers)
|
||||
- [HIP](reference/hip)
|
||||
- [OpenMP](reference/openmp/openmp)
|
||||
- [Math Libraries](reference/gpu_libraries/math)
|
||||
- [C++ Primitives Libraries](reference/gpu_libraries/c++_primitives)
|
||||
- [Communication Libraries](reference/gpu_libraries/communication)
|
||||
- [AI Libraries](reference/ai_tools)
|
||||
- [Computer Vision](reference/computer_vision)
|
||||
- [OpenMP](reference/openmp/openmp)
|
||||
- [Compilers and Tools](reference/compilers)
|
||||
- [Management Tools](reference/management_tools)
|
||||
- [Validation Tools](reference/validation_tools)
|
||||
|
||||
@@ -59,23 +59,24 @@ agile, flexible, rapid and secure manner. [more...](rocm)
|
||||
|
||||
:::{grid-item-card}
|
||||
:padding: 2
|
||||
Understand ROCm
|
||||
[Understand ROCm](understand/all)
|
||||
^^^
|
||||
|
||||
- [Compiler Disambiguation](understand/compiler_disambiguation)
|
||||
- [Using CMake](understand/cmake_packages)
|
||||
- [ROCm File Reorganization White Paper](understand/file_reorg)
|
||||
- [GPU Architecture](understand/gpu_arch)
|
||||
- [Linux Folder Structure Reorganization](understand/file_reorg)
|
||||
- [GPU Isolation Techniques](understand/gpu_isolation)
|
||||
- [GPU Architecture](understand/gpu_arch)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
:padding: 2
|
||||
How to Guides
|
||||
[How to Guides](how_to/all)
|
||||
^^^
|
||||
|
||||
- [System Tuning for Various Architectures](how_to/tuning_guides/index)
|
||||
- [GPU Aware MPI](how_to/gpu_aware_mpi)
|
||||
- [Setting up for Deep Learning with ROCm](how_to/deep_learning_rocm)
|
||||
- [Magma Installation](how_to/magma_install/magma_install)
|
||||
- [PyTorch Installation](how_to/pytorch_install/pytorch_install)
|
||||
@@ -86,12 +87,13 @@ How to Guides
|
||||
|
||||
:::{grid-item-card}
|
||||
:padding: 2
|
||||
Examples
|
||||
[Tutorials & Examples](examples/all)
|
||||
^^^
|
||||
|
||||
- [ROCm Examples](https://github.com/amd/rocm-examples)
|
||||
- [AI/ML/Inferencing](examples/ai_ml_inferencing)
|
||||
- [Inception V3 with PyTorch](examples/inception_casestudy/inception_casestudy)
|
||||
- [Examples](https://github.com/amd/rocm-examples)
|
||||
- [ML, DL, and AI](examples/machine_learning/all)
|
||||
- [](examples/machine_learning/pytorch_inception)
|
||||
- [](examples/machine_learning/migraphx_optimization)
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
@@ -3,24 +3,24 @@
|
||||
::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [MIOpen](https://rocmdocs.amd.com/projects/MIOpen/en/latest/)
|
||||
:::{grid-item-card} {doc}`MIOpen <miopen:index>`
|
||||
AMD's library for high performance machine learning primitives.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/MIOpen/en/latest/)
|
||||
- {doc}`Documentation <miopen:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Composable Kernel](https://rocmdocs.amd.com/projects/composable_kernel/en/latest/)
|
||||
:::{grid-item-card} {doc}`Composable Kernel <composable_kernel:index>`
|
||||
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/composable_kernel/en/latest/)
|
||||
- {doc}`Documentation <composable_kernel:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [MIGraphX](https://rocmdocs.amd.com/projects/MIGraphX/en/latest/)
|
||||
:::{grid-item-card} {doc}`MIGraphX <amdmigraphx:index>`
|
||||
AMD MIGraphX is AMD's graph inference engine that accelerates machine learning model inference.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/MIGraphX/en/latest/)
|
||||
- {doc}`Documentation <amdmigraphx:index>`
|
||||
|
||||
:::
|
||||
|
||||
|
||||
@@ -8,7 +8,7 @@
|
||||
:::{grid-item-card} [HIP](./hip)
|
||||
HIP is both AMD's GPU programming language extension and the GPU runtime.
|
||||
|
||||
- [HIP Runtime API Manual](https://rocmdocs.amd.com/projects/hipBLAS/en/latest/)
|
||||
- {doc}`hip:doxygen/html/index`
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)
|
||||
|
||||
:::
|
||||
@@ -25,33 +25,33 @@ HIP Math Libraries support the following domains:
|
||||
:::{grid-item-card} [C++ Primitive Libraries](./gpu_libraries/c++_primitives)
|
||||
ROCm template libraries for C++ primitives and algorithms are as follows:
|
||||
|
||||
- [rocPRIM](https://rocprim.readthedocs.io/en/latest/)
|
||||
- [rocThrust](https://rocthrust.readthedocs.io/en/latest/)
|
||||
- [hipCUB](https://hipcub.readthedocs.io/en/latest/)
|
||||
- {doc}`rocPRIM <rocprim:index>`
|
||||
- {doc}`rocThrust <rocthrust:index>`
|
||||
- {doc}`hipCUB <hipcub:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Communication Libraries](gpu_libraries/communication)
|
||||
Inter and intra-node communication is supported by the following projects:
|
||||
|
||||
- [RCCL](https://rocmdocs.amd.com/projects/rccl/en/latest/)
|
||||
- {doc}`RCCL <rccl:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [AI Libraries](./ai_tools)
|
||||
Libraries related to AI.
|
||||
|
||||
- [MIOpen](https://rocmdocs.amd.com/projects/MIOpen/en/latest/)
|
||||
- [Composable Kernel](https://rocmdocs.amd.com/projects/composable_kernel/en/latest/)
|
||||
- [MIGraphX](https://rocmdocs.amd.com/projects/MIGraphX/en/latest/)
|
||||
- {doc}`MIOpen <miopen:index>`
|
||||
- {doc}`Composable Kernel <composable_kernel:index>`
|
||||
- {doc}`MIGraphX <amdmigraphx:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Computer Vision](./computer_vision)
|
||||
Computer vision related projects.
|
||||
|
||||
- [MIVisionX](https://rocmdocs.amd.com/projects/MIVisionX/en/latest)
|
||||
- [rocAL](https://rocmdocs.amd.com/projects/rocAL/en/latest)
|
||||
- {doc}`MIVisionX <mivisionx:README>`
|
||||
- {doc}`rocAL <rocal:README>`
|
||||
|
||||
:::
|
||||
|
||||
@@ -63,25 +63,25 @@ Computer vision related projects.
|
||||
|
||||
:::{grid-item-card} [Compilers and Tools](compilers)
|
||||
|
||||
- [ROCmCC](https://rocmdocs.amd.com/projects/ROCmCC/en/latest/)
|
||||
- [ROCgdb](https://rocmdocs.amd.com/projects/ROCgdb/en/latest/)
|
||||
- [ROCProfiler](https://rocmdocs.amd.com/projects/rocprofiler/en/latest/)
|
||||
- [ROCTracer](https://rocmdocs.amd.com/projects/roctracer/en/latest/)
|
||||
- [ROCmCC](/reference/rocmcc/rocmcc)
|
||||
- {doc}`ROCgdb <rocgdb:index>`
|
||||
- {doc}`ROCProfiler <rocprofiler:rocprof>`
|
||||
- {doc}`ROCTracer <roctracer:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Management Tools](management_tools)
|
||||
|
||||
- [AMD SMI](https://rocmdocs.amd.com/projects/amdsmi/en/latest/)
|
||||
- [ROCm SMI](https://rocmdocs.amd.com/projects/rocmsmi/en/latest/)
|
||||
- [ROCm Datacenter Tool](https://rocmdocs.amd.com/projects/rdc/en/latest/)
|
||||
- AMD SMI
|
||||
- [ROCm SMI](https://rocmdocs.amd.com/projects/rocm_smi_lib/en/latest/)
|
||||
- {doc}`ROCm Datacenter Tool <rdc:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Validation Tools](validation_tools)
|
||||
|
||||
- [ROCm Validation Suite](https://rocm.docs.amd.com/projects/ROCmValidationSuite/en/latest/)
|
||||
- [TransferBench](https://rocmdocs.amd.com/projects/TransferBench/en/latest/)
|
||||
- {doc}`ROCm Validation Suite <rocmvalidationsuite:index>`
|
||||
- {doc}`TransferBench <transferbench:index>`
|
||||
|
||||
:::
|
||||
|
||||
|
||||
@@ -3,32 +3,47 @@
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [ROCmCC](https://rocmdocs.amd.com/projects/ROCmCC/en/latest/)
|
||||
ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance computing on AMD GPUs and CPUs and supports various heterogeneous programming models such as HIP, OpenMP, and OpenCL.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/ROCmCC/en/latest/)
|
||||
:::{grid-item-card} ROCmCC
|
||||
:link: /reference/rocmcc/rocmcc
|
||||
:link-type: doc
|
||||
ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance
|
||||
computing on AMD GPUs and CPUs and supports various heterogeneous programming
|
||||
models such as HIP, OpenMP, and OpenCL.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [ROCgdb](https://rocmdocs.amd.com/projects/ROCgdb/en/latest/)
|
||||
:::{grid-item-card} ROCgdb
|
||||
:link: rocgdb:index
|
||||
:link-type: doc
|
||||
This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/ROCgdb/en/latest/)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [ROCProfiler](https://rocmdocs.amd.com/projects/rocprofiler/en/latest/)
|
||||
:::{grid-item-card} ROCProfiler
|
||||
:link: rocprofiler:rocprof
|
||||
:link-type: doc
|
||||
ROC profiler library. Profiling with performance counters and derived metrics. Library supports GFX8/GFX9. Hardware specific low-level performance analysis interface for profiling of GPU compute applications. The profiling includes hardware performance counters with complex performance metrics.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rocprofiler/en/latest/)
|
||||
:::
|
||||
|
||||
:::{grid-item-card} ROCTracer
|
||||
:link: roctracer:index
|
||||
:link-type: doc
|
||||
Callback/Activity Library for Performance tracing AMD GPU's
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [ROCTracer](https://rocmdocs.amd.com/projects/roctracer/en/latest/)
|
||||
Callback/Activity Library for Performance tracing AMD GPU's
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/roctracer/en/latest/)
|
||||
:::{grid-item-card} ROCdbgapi
|
||||
:link: rocdbgapi:index
|
||||
:link-type: doc
|
||||
The AMD Debugger API is a library that provides all the support necessary for a
|
||||
debugger and other tools to perform low level control of the execution and
|
||||
inspection of execution state of AMD's commercially available GPU architectures.
|
||||
|
||||
:::
|
||||
|
||||
:::::
|
||||
|
||||
## See Also
|
||||
|
||||
- [Compiler Disambiguation](../understand/compiler_disambiguation.md)
|
||||
|
||||
@@ -3,17 +3,17 @@
|
||||
::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [MIVisionX](https://rocmdocs.amd.com/projects/MIVisionX/en/latest/)
|
||||
:::{grid-item-card} {doc}`MIVisionX <mivisionx:README>`
|
||||
MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/MIVisionX/en/latest/)
|
||||
- {doc}`Documentation <mivisionx:README>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [rocAL](https://rocmdocs.amd.com/projects/rocAL/en/latest/)
|
||||
:::{grid-item-card} {doc}`rocAL <rocal:README>`
|
||||
The AMD ROCm Augmentation Library (rocAL) is designed to efficiently decode and process images and videos from a variety of storage formats and modify them through a processing graph programmable by the user. rocAL currently provides C API.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rocAL/en/latest/)
|
||||
- {doc}`Documentation <rocal:README>`
|
||||
|
||||
:::
|
||||
|
||||
|
||||
@@ -1,37 +0,0 @@
|
||||
# Framework Compatibility
|
||||
|
||||
The ROCm release supports the most recent and two prior releases of PyTorch and TensorFlow.
|
||||
|
||||
Legends:
|
||||
|
||||
Blue: Shows compatibility tested versions
|
||||
|
||||
Gray: Not tested
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
## Supported Frameworks
|
||||
|
||||
This section contains the latest release notes for each framework compatible with ROCm™ and Deep Learning (DL) applications.
|
||||
|
||||
The ROCm 5.4 platform supports the following frameworks:
|
||||
|
||||
- PyTorch v1.12.1
|
||||
|
||||
- MAGMA v2.5.4
|
||||
|
||||
- TensorFlow v2.10.0
|
||||
|
||||
### PyTorch
|
||||
|
||||
For the latest release of PyTorch, refer to <a href="https://github.com/pytorch/pytorch/releases/" target="_blank">https://github.com/pytorch/pytorch/releases/</a>
|
||||
|
||||
### MAGMA
|
||||
|
||||
For the latest release of MAGMA, refer to <a href="https://icl.utk.edu/magma/index.html" target="_blank">https://icl.utk.edu/magma/index.html</a>
|
||||
|
||||
### TensorFlow
|
||||
|
||||
For the latest release of TensorFlow, refer to <a href="https://github.com/tensorflow/tensorflow/releases/" target="_blank">https://github.com/tensorflow/tensorflow/releases</a>
|
||||
@@ -5,33 +5,33 @@ ROCm template libraries for algorithms are as follows:
|
||||
:::::{grid} 1 1 3 3
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [rocPRIM](https://rocmdocs.amd.com/projects/rocPRIM/en/latest/)
|
||||
:::{grid-item-card} {doc}`rocPRIM <rocprim:index>`
|
||||
rocPRIM is an AMD GPU optimized template library of algorithm primitives, like
|
||||
transforms, reductions, scans, etc. It also serves as a common back-end for
|
||||
similar libraries found inside ROCm.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rocPRIM/en/latest/)
|
||||
- {doc}`Documentation <rocprim:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocPRIM)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [rocThrust](https://rocmdocs.amd.com/projects/rocThrust/en/latest/)
|
||||
:::{grid-item-card} {doc}`rocThrust <rocthrust:index>`
|
||||
rocThrust is a template library of algorithm primitives with a Thrust-compatible
|
||||
interface. Their CPU back-ends are identical, while the GPU back-end calls into
|
||||
rocPRIM.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rocThrust/en/latest/)
|
||||
- {doc}`Documentation <rocthrust:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocThrust/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocThrust)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [hipCUB](https://rocmdocs.amd.com/projects/hipCUB/en/latest/)
|
||||
:::{grid-item-card} {doc}`hipCUB <hipcub:index>`
|
||||
hipCUB is a template library of algorithm primitives with a CUB-compatible
|
||||
interface. It's back-end is rocPRIM.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/hipCUB/en/latest/)
|
||||
- {doc}`Documentation <hipcub:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/hipCUB)
|
||||
|
||||
|
||||
@@ -3,13 +3,13 @@
|
||||
:::::{grid} 1 1 1 1
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [RCCL](https://rocmdocs.amd.com/projects/rccl/en/latest/)
|
||||
:::{grid-item-card} {doc}`RCCL <rccl:index>`
|
||||
RCCL (pronounced "Rickle") is a stand-alone library of standard collective communication routines for GPUs,
|
||||
implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, gather, scatter, and all-to-all.
|
||||
The collective operations are implemented using ring and tree algorithms and have been optimized for
|
||||
throughput and latency.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rccl/en/latest/)
|
||||
- {doc}`Documentation <rccl:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/ROCmSoftwarePlatform/rccl/tree/develop/tools)
|
||||
|
||||
|
||||
@@ -5,20 +5,20 @@ ROCm libraries for FFT are as follows:
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [rocFFT](https://rocmdocs.amd.com/projects/rocFFT/en/latest/)
|
||||
:::{grid-item-card} {doc}`rocFFT <rocfft:index>`
|
||||
rocFFT is an AMD GPU optimized library for FFT.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rocFFT/en/latest/)
|
||||
- {doc}`Documentation <rocfft:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [hipFFT](https://rocmdocs.amd.com/projects/hipFFT/en/latest/)
|
||||
:::{grid-item-card} {doc}`hipFFT <hipfft:index>`
|
||||
hipFFT is a compatibility layer for GPU accelerated FFT optimized for AMD GPUs
|
||||
using rocFFT. hipFFT allows for a common interface for other non AMD GPU
|
||||
FFT libraries.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/hipFFT/en/latest/)
|
||||
- {doc}`Documentation <hipfft:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
@@ -5,85 +5,85 @@ ROCm libraries for linear algebra are as follows:
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [rocBLAS](https://rocmdocs.amd.com/projects/rocBLAS/en/develop/)
|
||||
:::{grid-item-card} {doc}`rocBLAS <rocblas:index>`
|
||||
`rocBLAS` is an AMD GPU optimized library for BLAS (Basic Linear Algebra Subprograms).
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rocBLAS/en/develop/)
|
||||
- {doc}`Documentation <rocblas:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocBLAS)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [hipBLAS](https://rocmdocs.amd.com/projects/hipBLAS/en/develop/)
|
||||
:::{grid-item-card} {doc}`hipBLAS <hipblas:index>`
|
||||
`hipBLAS` is a compatibility layer for GPU accelerated BLAS optimized for AMD GPUs
|
||||
via `rocBLAS` and `rocSOLVER`. `hipBLAS` allows for a common interface for other GPU
|
||||
BLAS libraries.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/hipBLAS/en/develop/)
|
||||
- {doc}`Documentation <hipblas:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [hipBLASLt](https://rocmdocs.amd.com/projects/hipBLASLt/en/develop/)
|
||||
:::{grid-item-card} {doc}`hipBLASLt <hipblaslt:index>`
|
||||
`hipBLASLt` is a library that provides general matrix-matrix operations with a
|
||||
flexible API and extends functionalities beyond traditional BLAS library.
|
||||
`hipBLASLt` is exposed APIs in HIP programming language with an underlying
|
||||
optimized generator as a back-end kernel provider.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/hipBLASLt/en/develop/)
|
||||
- {doc}`Documentation <hipblaslt:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLASLt/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [rocALUTION](https://rocmdocs.amd.com/projects/rocALUTION/en/develop/)
|
||||
:::{grid-item-card} {doc}`rocALUTION <rocalution:index>`
|
||||
`rocALUTION` is a sparse linear algebra library with focus on exploring
|
||||
fine-grained parallelism on top of AMD's ROCm runtime and toolchains, targeting
|
||||
modern CPU and GPU platforms.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rocALUTION/en/develop/)
|
||||
- {doc}`Documentation <rocalution:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocALUTION/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [rocWMMA](https://rocmdocs.amd.com/projects/rocWMMA/en/develop/)
|
||||
:::{grid-item-card} {doc}`rocWMMA <rocwmma:index>`
|
||||
`rocWMMA` provides an API to break down mixed precision matrix multiply-accumulate
|
||||
(MMA) problems into fragments and distributes these over GPU wavefronts.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rocWMMA/en/develop/)
|
||||
- {doc}`Documentation <rocwmma:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [rocSOLVER](https://rocmdocs.amd.com/projects/rocSOLVER/en/develop/)
|
||||
:::{grid-item-card} {doc}`rocSOLVER <rocsolver:index>`
|
||||
`rocSOLVER` provides a subset of LAPACK (Linear Algebra Package) functionality on the ROCm platform.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rocSOLVER/en/develop/)
|
||||
- {doc}`Documentation <rocsolver:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [hipSOLVER](https://rocmdocs.amd.com/projects/hipSOLVER/en/develop/)
|
||||
:::{grid-item-card} {doc}`hipSOLVER <hipsolver:index>`
|
||||
`hipSOLVER` is a LAPACK marshalling library supporting both `rocSOLVER` and `cuSOLVER`
|
||||
as backends whilst exporting a unified interface.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/hipSOLVER/en/develop/)
|
||||
- {doc}`Documentation <hipsolver:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [rocSPARSE](https://rocmdocs.amd.com/projects/rocSOLVER/en/develop/)
|
||||
:::{grid-item-card} {doc}`rocSPARSE <rocsparse:index>`
|
||||
`rocSPARSE` is a library to provide BLAS for sparse computations.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rocSOLVER/en/develop/)
|
||||
- {doc}`Documentation <rocsparse:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [hipSPARSE](https://rocmdocs.amd.com/projects/hipSOLVER/en/develop/)
|
||||
:::{grid-item-card} {doc}`hipSPARSE <hipsparse:index>`
|
||||
`hipSPARSE` is a marshalling library to provide sparse BLAS functionality,
|
||||
supporting both `rocSPARSE` and `cuSPARSE` as backends.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/hipSOLVER/en/develop/)
|
||||
- {doc}`Documentation <hipsparse:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
@@ -12,30 +12,35 @@ vendor libraries as their back-ends. Due to their static dispatch nature, suppor
|
||||
at compile-time of the hipLIB in question. For dynamic dispatch between vendor implementations, refer to the
|
||||
[Orochi](https://github.com/GPUOpen-LibrariesAndSDKs/Orochi) library.
|
||||
|
||||
::::{grid} 1 2 3 3
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [Linear Algebra Libraries](linear_algebra)
|
||||
|
||||
- [rocBLAS](https://rocmdocs.amd.com/projects/rocBLAS/en/develop/)
|
||||
- [hipBLAS](https://rocmdocs.amd.com/projects/hipBLAS/en/develop/)
|
||||
- [hipBLASLt](https://rocmdocs.amd.com/projects/hipBLASLt/en/develop/)
|
||||
- [rocALUTION](https://rocmdocs.amd.com/projects/rocALUTION/en/develop/)
|
||||
- [rocWMMA](https://rocmdocs.amd.com/projects/rocWMMA/en/develop/)
|
||||
- [rocSOLVER](https://rocmdocs.amd.com/projects/rocSOLVER/en/develop/)
|
||||
- [hipSOLVER](https://rocmdocs.amd.com/projects/hipSOLVER/en/develop/)
|
||||
- [rocSPARSE](https://rocmdocs.amd.com/projects/rocSPARSE/en/develop/)
|
||||
- [hipSPARSE](https://rocmdocs.amd.com/projects/hipSPARSE/en/develop/)
|
||||
- {doc}`rocBLAS <rocblas:index>`
|
||||
- {doc}`hipBLAS <hipblas:index>`
|
||||
- {doc}`hipBLASLt <hipblaslt:index>`
|
||||
- {doc}`rocALUTION <rocalution:index>`
|
||||
- {doc}`rocWMMA <rocwmma:index>`
|
||||
- {doc}`rocSOLVER <rocsolver:index>`
|
||||
- {doc}`hipSOLVER <hipsolver:index>`
|
||||
- {doc}`rocSPARSE <rocsparse:index>`
|
||||
- {doc}`hipSPARSE <hipsparse:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Fast Fourier Transforms](fft)
|
||||
|
||||
- [rocFFT](https://rocmdocs.amd.com/projects/rocFFT/en/develop/)
|
||||
- [hipFFT](https://rocmdocs.amd.com/projects/hipFFT/en/develop/)
|
||||
- {doc}`rocFFT <rocfft:index>`
|
||||
- {doc}`hipFFT <hipfft:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Random Numbers](rand)
|
||||
|
||||
- [rocRAND](https://rocmdocs.amd.com/projects/rocRAND/en/develop/)
|
||||
- [hipRAND](https://rocmdocs.amd.com/projects/hipRAND/en/develop/)
|
||||
- {doc}`rocRAND <rocrand:index>`
|
||||
- {doc}`hipRAND <hiprand:index>`
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
@@ -3,21 +3,21 @@
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [rocRAND](https://rocmdocs.amd.com/projects/rocRAND/en/latest/)
|
||||
:::{grid-item-card} {doc}`rocRAND <rocrand:index>`
|
||||
rocRAND is an AMD GPU optimized library for pseudo-random number generators (PRNG).
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rocRAND/en/latest/)
|
||||
- {doc}`Documentation <rocrand:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocRAND/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocRAND)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [hipRAND](https://rocmdocs.amd.com/projects/hipRAND/en/latest/)
|
||||
hipRAND is a compatibility layer for GPU accelerated FFT optimized for AMD GPUs
|
||||
using rocFFT. hipFFT allows for a common interface for other non AMD GPU
|
||||
FFT libraries.
|
||||
:::{grid-item-card} {doc}`hipRAND <hiprand:index>`
|
||||
hipRAND is a compatibility layer for GPU accelerated pseudo-random number
|
||||
generation (PRNG) optimized for AMD GPUs using rocRAND. hipRAND allows for a
|
||||
common interface for other non AMD GPU PRNG libraries.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/hipRAND/en/latest/)
|
||||
- {doc}`Documentation <hiprand:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipRAND/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
@@ -1,16 +1,18 @@
|
||||
# HIP
|
||||
|
||||
HIP is both AMD's GPU programming language extension and the GPU runtime. This page introduces the HIP runtime and other HIP libraries and tools.
|
||||
HIP is both AMD's GPU programming language extension and the GPU runtime. This
|
||||
page introduces the HIP runtime and other HIP libraries and tools.
|
||||
|
||||
## HIP Runtime
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [HIP Runtime](https://rocmdocs.amd.com/projects/HIP/en/develop/)
|
||||
The HIP Runtime is used to enable GPU acceleration for all HIP language based products.
|
||||
:::{grid-item-card} {doc}`HIP Runtime <hip:index>`
|
||||
The HIP Runtime is used to enable GPU acceleration for all HIP language based
|
||||
products.
|
||||
|
||||
- [HIP Runtime API Manual](https://rocmdocs.amd.com/projects/HIP/en/develop/)
|
||||
- {doc}`hip:doxygen/html/index`
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)
|
||||
|
||||
:::
|
||||
@@ -19,14 +21,14 @@ The HIP Runtime is used to enable GPU acceleration for all HIP language based pr
|
||||
|
||||
## Porting tools
|
||||
|
||||
:::::{grid} 1 1 1 1
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [HIPify](https://rocm.docs.amd.com/projects/HIPIFY/en/latest/)
|
||||
HIPify assists with porting applications from based on CUDA to the HIP Runtime. Supported
|
||||
CUDA APIs are documented here as well.
|
||||
:::{grid-item-card} {doc}`HIPIFY <hipify:index>`
|
||||
HIPIFY assists with porting applications from based on CUDA to the HIP Runtime.
|
||||
Supported CUDA APIs are documented here as well.
|
||||
|
||||
- [Reference Manual](https://rocm.docs.amd.com/projects/HIPIFY/en/latest/)
|
||||
- {doc}`Reference Manual <hipify:index>`
|
||||
|
||||
:::
|
||||
|
||||
|
||||
@@ -3,31 +3,28 @@
|
||||
:::::{grid} 1 1 3 3
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [AMD SMI](https://rocmdocs.amd.com/projects/amdsmi/en/latest/)
|
||||
GO AMD SMI provides GO binding for [E-SMI In-Band C library](https://github.com/amd/esmi_ib_library),
|
||||
[ROCm SMI Library](https://github.com/RadeonOpenCompute/rocm_smi_lib), and any
|
||||
GO language application that needs to link with these libraries and call the APIs
|
||||
from the GO application. The GO binding are imported in the
|
||||
[AMD SMI Exporter](https://github.com/amd/amd_smi_exporter) to export information
|
||||
provided by the AMD E-SMI inband library and the ROCm SMI GPU library to the Prometheus server.
|
||||
:::{grid-item-card} AMD SMI
|
||||
The AMD System Management Interface Library, or AMD SMI library, is a C library for Linux that provides a user space interface for applications to monitor and control AMD devices.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/amdsmi/en/latest/)
|
||||
- [GitHub](https://github.com/RadeonOpenCompute/amdsmi)
|
||||
- [Examples](https://github.com/amd/go_amd_smi#example)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [ROCm SMI](https://rocmdocs.amd.com/projects/rocmsmi/en/latest/)
|
||||
:::{grid-item-card} [ROCm SMI](https://rocmdocs.amd.com/projects/rocm_smi_lib/en/latest/)
|
||||
This tool acts as a command line interface for manipulating and monitoring the AMD GPU kernel, and is intended to replace and deprecate the existing `rocm_smi.py` CLI tool. It uses `ctypes` to call the `rocm_smi_lib` API.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rocmsmi/en/latest/)
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rocm_smi_lib/en/latest/)
|
||||
- [GitHub](https://github.com/RadeonOpenCompute/rocm_smi_lib)
|
||||
- [Examples](https://github.com/RadeonOpenCompute/rocm_smi_lib/tree/master/python_smi_tools)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [ROCm Datacenter Tool](https://rocmdocs.amd.com/projects/rdc/en/latest/)
|
||||
:::{grid-item-card} {doc}`ROCm Datacenter Tool <rdc:index>`
|
||||
The ROCm™ Data Center Tool simplifies the administration and addresses key infrastructure challenges in AMD GPUs in cluster and data center environments.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rdc/en/latest/)
|
||||
- {doc}`Documentation <rdc:index>`
|
||||
- [GitHub](https://github.com/RadeonOpenCompute/rdc)
|
||||
- [Examples](https://github.com/RadeonOpenCompute/rdc/tree/master/example)
|
||||
|
||||
:::
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# OpenMP Support in ROCm
|
||||
|
||||
## Introduction to OpenMP Support Guide
|
||||
## Introduction
|
||||
|
||||
The ROCm™ installation includes an LLVM-based implementation that fully supports
|
||||
the OpenMP 4.5 standard and a subset of OpenMP 5.0, 5.1, and 5.2 standards.
|
||||
@@ -9,8 +9,7 @@ Along with host APIs, the OpenMP compilers support offloading code and data onto
|
||||
GPU devices. This document briefly describes the installation location of the
|
||||
OpenMP toolchain, example usage of device offloading, and usage of `rocprof`
|
||||
with OpenMP applications. The GPUs supported are the same as those supported by
|
||||
this ROCm release. See the list of supported GPUs in the installation guide at
|
||||
[https://docs.amd.com/](https://docs.amd.com/).
|
||||
this ROCm release. See the list of supported GPUs in {doc}`/release/gpu_os_support`.
|
||||
|
||||
### Installation
|
||||
|
||||
@@ -97,7 +96,7 @@ code compiled with AOMP:
|
||||
```
|
||||
|
||||
The stats option produces timestamps for the kernels. Look into the output
|
||||
CSV file for the field, `Durations`, which is useful in getting an
|
||||
CSV file for the field, `DurationNs`, which is useful in getting an
|
||||
understanding of the critical kernels in the code.
|
||||
|
||||
Apart from `--stats`, the option `--timestamp` on produces a timestamp for
|
||||
@@ -110,7 +109,7 @@ code compiled with AOMP:
|
||||
an XML file as an input.
|
||||
|
||||
For more details on `rocprof`, refer to the ROCm Profiling Tools document on
|
||||
<https://docs.amd.com>.
|
||||
{doc}`rocprofiler:rocprof`.
|
||||
|
||||
### Using Tracing Options
|
||||
|
||||
@@ -118,7 +117,7 @@ For more details on `rocprof`, refer to the ROCm Profiling Tools document on
|
||||
program with:
|
||||
|
||||
```bash
|
||||
-Wl,–rpath,/opt/rocm-{version}/lib -lamdhip64
|
||||
-Wl,-rpath,/opt/rocm-{version}/lib -lamdhip64
|
||||
```
|
||||
|
||||
The following tracing options are widely used to generate useful information:
|
||||
@@ -137,7 +136,7 @@ Navigate to Chrome or Perfetto and load the JSON file to see the timeline of the
|
||||
HSA calls.
|
||||
|
||||
For more details on tracing, refer to the ROCm Profiling Tools document on
|
||||
<https://docs.amd.com>.
|
||||
{doc}`rocprofiler:rocprof`.
|
||||
|
||||
### Environment Variables
|
||||
|
||||
@@ -157,6 +156,8 @@ For more details on tracing, refer to the ROCm Profiling Tools document on
|
||||
The OpenMP programming model is greatly enhanced with the following new features
|
||||
implemented in the past releases.
|
||||
|
||||
(openmp_usm)=
|
||||
|
||||
### Unified Shared Memory
|
||||
|
||||
Unified Shared Memory (USM) provides a pointer-based approach to memory
|
||||
|
||||
@@ -666,9 +666,8 @@ The following OpenMP pragma is available on MI200, and it must be executed with
|
||||
omp requires unified_shared_memory
|
||||
```
|
||||
|
||||
For more details on
|
||||
[USM](https://docs.amd.com/bundle/OpenMP-Support-Guide-v5.4/page/OpenMP_Features.html#d90e61),
|
||||
refer to the OpenMP Support Guide at [https://docs.amd.com](https://docs.amd.com).
|
||||
For more details on USM refer to the {ref}`openmp_usm` section of the OpenMP
|
||||
Guide.
|
||||
|
||||
### Support Status of Other Clang Options
|
||||
|
||||
|
||||
@@ -3,19 +3,19 @@
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [RVS](https://rocm.docs.amd.com/projects/ROCmValidationSuite/en/latest/)
|
||||
:::{grid-item-card} {doc}`RVS <rocmvalidationsuite:index>`
|
||||
The ROCm Validation Suite is a system administrator’s and cluster manager's tool for detecting and troubleshooting common problems affecting AMD GPU(s) running in a high-performance computing environment, enabled using the ROCm software stack on a compatible platform.
|
||||
|
||||
- [Documentation](https://rocm.docs.amd.com/projects/ROCmValidationSuite/en/latest/)
|
||||
- {doc}`Documentation <rocmvalidationsuite:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [TransferBench](https://rocmdocs.amd.com/projects/TransferBench/en/latest/)
|
||||
:::{grid-item-card} {doc}`TransferBench <transferbench:index>`
|
||||
TransferBench is a simple utility capable of benchmarking simultaneous transfers between user-specified devices (CPUs/GPUs).
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/TransferBench/en/latest/)
|
||||
- {doc}`Documentation <transferbench:index>`
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/TransferBench/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://rocmdocs.amd.com/projects/TransferBench/en/develop/examples/index.html#examples)
|
||||
- {doc}`transferbench:examples/index`
|
||||
|
||||
:::
|
||||
|
||||
|
||||
50
docs/release/3rd_party_support_matrix.md
Normal file
50
docs/release/3rd_party_support_matrix.md
Normal file
@@ -0,0 +1,50 @@
|
||||
# 3rd Party Support Matrix
|
||||
|
||||
ROCm™ supports various 3rd party libraries and frameworks. Supported versions
|
||||
are tested and known to work. Non-supported versions of 3rd parties may also
|
||||
work, but aren't tested.
|
||||
|
||||
(ml_framework_compat_matrix)=
|
||||
|
||||
## Deep Learning
|
||||
|
||||
ROCm releases support the most recent and two prior releases of PyTorch and
|
||||
TensorFlow
|
||||
|
||||
| ROCm | [PyTorch](https://github.com/pytorch/pytorch/releases/) | [TensorFlow](https://github.com/tensorflow/tensorflow/releases/) | [MAGMA](https://icl.utk.edu/magma/index.html) |
|
||||
|:------|:--------------------------:|:--------------------:|:-----:|
|
||||
| 5.0.2 | 1.8, 1.9, 1.10 | 2.6, 2.7, 2.8 | |
|
||||
| 5.1.3 | 1.9, 1.10, 1.11 | 2.7, 2.8, 2.9 | |
|
||||
| 5.2.x | 1.10, 1.11, 1.12 | 2.8, 2.9, 2.9 | |
|
||||
| 5.3.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.8, 2.9, 2.10 | |
|
||||
| 5.4.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.8, 2.9, 2.10, 2.11 | 2.5.4 |
|
||||
|
||||
## Communication libraries
|
||||
|
||||
ROCm supports [OpenUCX](https://openucx.org/) an "an open-source,
|
||||
production-grade communication framework for data-centric and high-performance
|
||||
applications".
|
||||
|
||||
UCX version | ROCm 5.4 and older | ROCm 5.5 and newer |
|
||||
|:----------|:------------------:|:------------------:|
|
||||
| -1.14.0 | COMPATIBLE | INCOMPATIBLE |
|
||||
| 1.14.1+ | COMPATIBLE | COMPATIBLE |
|
||||
|
||||
## Algorithm libraries
|
||||
|
||||
ROCm releases provide algorithm libraries with interfaces compatible with
|
||||
contemporary CUDA / NVIDIA HPC SDK alternatives.
|
||||
|
||||
- Thrust → rocThrust
|
||||
- CUB → hipCUB
|
||||
|
||||
| ROCm | Thrust / CUB | HPC SDK |
|
||||
|:------|:------------:|:-------:|
|
||||
| 5.0.2 | 1.14 | 21.9 |
|
||||
| 5.1.3 | 1.15 | 22.1 |
|
||||
| 5.2.x | 1.15 | 22.2, 22.3 |
|
||||
| 5.3.x | 1.16 | 22.7 |
|
||||
| 5.4.x | 1.16 | 22.9 |
|
||||
|
||||
For the latest documentation of these libraries, refer to the
|
||||
[associated documentation](../reference/gpu_libraries/c%2B%2B_primitives.md).
|
||||
@@ -1,5 +1,29 @@
|
||||
# Compatibility
|
||||
|
||||
[Frameworks Support Matrix](docker_support_matrix.md)
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
[Framework Compatibility](../reference/framework_compatibility/framework_compatibility)
|
||||
:::{grid-item-card} User space & Kernel Fusion Driver
|
||||
Forward and backward compatibility of ROCm user space components and the
|
||||
kernel space Kernel Fusion Driver (KFD).
|
||||
|
||||
- [User/Kernel-Space Support Matrix](./user_kernel_space_compat_matrix.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Docker Image Support
|
||||
ROCm releases several Docker container images.
|
||||
|
||||
- [Docker Image Support Matrix](./docker_image_support_matrix.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} 3rd Party Support
|
||||
Several 3rd party libraries ship with ROCm enablement as well as several ROCm
|
||||
components provide interfaces compatible with 3rd party solutions.
|
||||
|
||||
- [3rd Party Support Matrix](./3rd_party_support_matrix.md)
|
||||
|
||||
:::
|
||||
|
||||
:::::
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Frameworks Support Matrix
|
||||
# Docker Image Support Matrix
|
||||
|
||||
The software support matrices for ROCm container releases is listed.
|
||||
|
||||
@@ -61,7 +61,7 @@ The software support matrices for ROCm container releases is listed.
|
||||
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
|
||||
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
|
||||
* [tensorflow-rocm 2.13.0]()
|
||||
* `tensorflow-rocm` 2.13.0
|
||||
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
|
||||
* [OMPI 4.0.7](https://github.com/open-mpi/ompi/tree/v4.0.7)
|
||||
* [Horovod 0.27.0](https://github.com/horovod/horovod/tree/v0.27.0)
|
||||
@@ -71,7 +71,7 @@ The software support matrices for ROCm container releases is listed.
|
||||
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
|
||||
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
|
||||
* [tensorflow-rocm 2.11.0](https://pypi.org/project/tensorflow-rocm/2.11.0.540/)
|
||||
* [`tensorflow-rocm` 2.11.0](https://pypi.org/project/tensorflow-rocm/2.11.0.540/)
|
||||
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
|
||||
* [OMPI 4.0.7](https://github.com/open-mpi/ompi/tree/v4.0.7)
|
||||
* [Horovod 0.27.0](https://github.com/horovod/horovod/tree/v0.27.0)
|
||||
@@ -81,7 +81,7 @@ The software support matrices for ROCm container releases is listed.
|
||||
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
|
||||
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
|
||||
* [tensorflow-rocm 2.10.1](https://pypi.org/project/tensorflow-rocm/2.10.1.540/)
|
||||
* [`tensorflow-rocm` 2.10.1](https://pypi.org/project/tensorflow-rocm/2.10.1.540/)
|
||||
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
|
||||
* [OMPI 4.0.7](https://github.com/open-mpi/ompi/tree/v4.0.7)
|
||||
* [Horovod 0.27.0](https://github.com/horovod/horovod/tree/v0.27.0)
|
||||
@@ -10,9 +10,9 @@ AMD ROCm™ Platform supports the following Linux distributions.
|
||||
|--------------------|-----------------------|--------------------|
|
||||
| RHEL 9.1 | x86-64 | 5.14 |
|
||||
| RHEL 8.6 to 8.7 | x86-64 | 4.18 |
|
||||
| SLES 15 SP4 | x86-64 | |
|
||||
| SLES 15 SP4 | x86-64 | 5.14.21 |
|
||||
| Ubuntu 20.04.5 LTS | x86-64 | 5.15 |
|
||||
| Ubuntu 22.04.1 LTS | x86-64 | 5.15, OEM 5.17 |
|
||||
| Ubuntu 22.04.1 LTS | x86-64 | 5.15, 5.17 OEM |
|
||||
|
||||
## Virtualization Support
|
||||
|
||||
@@ -24,20 +24,25 @@ ROCm supports virtualization for select GPUs only as shown below.
|
||||
| VMWare | ESXi 8 | MI210 | Ubuntu 20.04 (`5.15.0-56-generic`), SLES 15 SP4 (`5.14.21-150400.24.18-default`) |
|
||||
| VMWare | ESXi 7 | MI210 | Ubuntu 20.04 (`5.15.0-56-generic`), SLES 15 SP4 (`5.14.21-150400.24.18-default`) |
|
||||
|
||||
(supported_gpus)=
|
||||
|
||||
## GPU Support Table
|
||||
|
||||
::::{tab-set}
|
||||
|
||||
:::{tab-item} Instinct™
|
||||
:::{tab-item} AMD Instinct™
|
||||
:sync: instinct
|
||||
|
||||
Use Driver Shipped with ROCm
|
||||
| GPU | Architecture | Product | [LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Linux | Windows |
|
||||
|:-----------------:|:---------------:|:-------:|:--------------------------------------------------------------------:|:------------------------------------:|:-----------:|
|
||||
| AMD Instinct™ MI250X | CDNA2 | Full | gfx90a | Supported | Unsupported |
|
||||
| AMD Instinct™ MI250 | CDNA2 | Full | gfx90a | Supported | Unsupported |
|
||||
| AMD Instinct™ MI210 | CDNA2 | Full | gfx90a | Supported | Unsupported |
|
||||
| AMD Instinct™ MI100 | CDNA | Full | gfx908 | Supported | Unsupported |
|
||||
| AMD Instinct™ MI50 | Vega | Full | gfx906 | Supported | Unsupported |
|
||||
|
||||
| Product Name | Architecture | [LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) |Support |
|
||||
|:------------:|:------------:|:--------------------------------------------------------------------:|:-------:|
|
||||
| AMD Instinct™ MI250X | CDNA2 | gfx90a | ✅ |
|
||||
| AMD Instinct™ MI250 | CDNA2 | gfx90a | ✅ |
|
||||
| AMD Instinct™ MI210 | CDNA2 | gfx90a | ✅ |
|
||||
| AMD Instinct™ MI100 | CDNA | gfx908 | ✅ |
|
||||
| AMD Instinct™ MI50 | GCN5.1 | gfx906 | ✅ |
|
||||
| AMD Instinct™ MI25 | GCN5.0 | gfx900 | ❌ |
|
||||
|
||||
:::
|
||||
|
||||
@@ -46,11 +51,11 @@ Use Driver Shipped with ROCm
|
||||
|
||||
[Use Radeon Pro Driver](https://www.amd.com/en/support/linux-drivers)
|
||||
|
||||
This table is incomplete.
|
||||
| GPU | Architecture | SW Level | [LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Linux | Windows |
|
||||
|:-----------------:|:---------------:|:--------:|:--------------------------------------------------------------------:|:------------------------------------:|:-----------:|
|
||||
| AMD Radeon™ Pro W6800 | RDNA2 | Full | gfx1030 | Supported | Supported |
|
||||
| AMD Radeon™ Pro V620 | RDNA2 | Full | gfx1030 | Supported | Unsupported |
|
||||
| Name | Architecture |[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Support|
|
||||
|:----:|:------------:|:-------------------------------------------------------------------:|:-------:|
|
||||
| AMD Radeon™ Pro W6800 | RDNA2 | gfx1030 | ✅ |
|
||||
| AMD Radeon™ Pro V620 | RDNA2 | gfx1030 | ✅ |
|
||||
| AMD Radeon™ Pro VII | GCN5.1 | gfx906 | ✅ |
|
||||
|
||||
:::
|
||||
|
||||
@@ -59,56 +64,9 @@ This table is incomplete.
|
||||
|
||||
[Use Radeon Pro Driver](https://www.amd.com/en/support/linux-drivers)
|
||||
|
||||
This table is incomplete.
|
||||
| GPU | Architecture | SW Level | [LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Linux | Windows |
|
||||
|:------------------:|:--------------:|:----------:|:--------------------------------------------------------------------:|:------------------------------------:|:-----------:|
|
||||
| AMD Radeon™ RX 6900 XT | RDNA2 |HIP SDK | gfx1030 | Supported | Supported |
|
||||
| AMD Radeon™ RX 6600 | RDNA2 |HIP Runtime | gfx1031 | Supported | Supported |
|
||||
| AMD Radeon™ VII | Vega |Full | gfx906 | Supported | Unsupported |
|
||||
| AMD Radeon™ R9 Fury | Fiji |NA | gfx803 | Community | Unsupported |
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
### Software Enablement Level
|
||||
|
||||
::::{tab-set}
|
||||
|
||||
:::{tab-item} AMD Instinct™
|
||||
:sync: instinct
|
||||
|
||||
Instinct™ accelerators support the full stack available in ROCm. Instinct™
|
||||
accelerators are Linux only.
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} AMD Radeon Pro™
|
||||
:sync: radeonpro
|
||||
|
||||
ROCm software support varies by GPU type and Operating System. ROCm ecosystem
|
||||
products are three software stack enablement levels that correspond as
|
||||
described below:
|
||||
|
||||
- Full includes all software that is part of the ROCm ecosystem. Please see
|
||||
[article](link) for details of ROCm.
|
||||
- HIP SDK includes the HIP Runtime and a selection of GPU libraries for compute.
|
||||
Please see [article](link) for details of HIP SDK.
|
||||
- HIP Runtime enables the use of the HIP Runtime only.
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} AMD Radeon™
|
||||
:sync: radeon
|
||||
ROCm software support varies by GPU type and Operating System. ROCm ecosystem
|
||||
products are three software stack enablement levels that correspond as described
|
||||
below:
|
||||
|
||||
- Full includes all software that is part of the ROCm ecosystem. Please see
|
||||
[article](link) for details of ROCm.
|
||||
- HIP SDK includes the HIP Runtime and a selection of GPU libraries for compute.
|
||||
Please see [article](link) for details of HIP SDK.
|
||||
- HIP enables the use of the HIP Runtime only.
|
||||
| Name | Architecture |[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Support|
|
||||
|:----:|:------------:|:-------------------------------------------------------------------:|:-------:|
|
||||
| AMD Radeon™ VII | GCN5.1 | gfx906 | ✅ |
|
||||
|
||||
:::
|
||||
|
||||
@@ -116,47 +74,11 @@ below:
|
||||
|
||||
### Support Status
|
||||
|
||||
::::{tab-set}
|
||||
|
||||
:::{tab-item} Instinct™
|
||||
:sync: instinct
|
||||
|
||||
- Supported - AMD enables these GPUs in our software distributions for the
|
||||
corresponding ROCm product.
|
||||
- Unsupported - This configuration is not enabled in our software distributions.
|
||||
- Deprecated - Support will be removed in a future release.
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Radeon Pro™
|
||||
:sync: radeonpro
|
||||
|
||||
GPU support levels for Radeon Pro™
|
||||
|
||||
- Supported - AMD enables these GPUs in our software distributions for the
|
||||
corresponding ROCm product.
|
||||
- Unsupported - This configuration is not enabled in our software distributions.
|
||||
- Deprecated - Support will be removed in a future release.
|
||||
- Community - AMD does not enable these GPUs in our software distributions but
|
||||
end users are free to enable these GPUs themselves.
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Radeon™
|
||||
:sync: radeon
|
||||
|
||||
Support levels for Radeon™ GPUs:
|
||||
|
||||
- Supported - AMD enables these GPUs in our software distributions for the
|
||||
corresponding ROCm product.
|
||||
- Unsupported - This configuration is not enabled in our software distributions.
|
||||
- Deprecated - Support will be removed in a future release.
|
||||
- Community - AMD does not enable these GPUs in our software distributions but
|
||||
end users are free to enable these GPUs themselves.
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
- ✅: **Supported** - AMD enables these GPUs in our software distributions for
|
||||
the corresponding ROCm product.
|
||||
- ⚠️: **Deprecated** - Support will be removed in a future release.
|
||||
- ❌: **Unsupported** - This configuration is not enabled in our software
|
||||
distributions.
|
||||
|
||||
## CPU Support
|
||||
|
||||
|
||||
@@ -27,7 +27,7 @@ The table is ordered to follow ROCm's manifest file.
|
||||
| [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY/) | [MIT](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/LICENSE.txt) |
|
||||
| [HIPCC](https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/LICENSE.txt) | [MIT](https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/LICENSE.txt) |
|
||||
| [llvm-project](https://github.com/ROCm-Developer-Tools/llvm-project/) | [Apache](https://github.com/ROCm-Developer-Tools/llvm-project/blob/main/LICENSE.TXT) |
|
||||
| rocm-llvm-alt | [AMD Proprietary License]()
|
||||
| rocm-llvm-alt | [AMD Proprietary License](https://www.amd.com/en/support/amd-software-eula)
|
||||
| [ROCm-Device-Libs](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/LICENSE.TXT) |
|
||||
| [atmi](https://github.com/RadeonOpenCompute/atmi/) | [MIT](https://github.com/RadeonOpenCompute/atmi/blob/master/LICENSE.txt) |
|
||||
| [ROCm-CompilerSupport](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/blob/amd-stg-open/LICENSE.txt) |
|
||||
@@ -102,3 +102,23 @@ AMD, the AMD Arrow logo, ROCm, and combinations thereof are trademarks of
|
||||
Advanced Micro Devices, Inc. Other product names used in this publication are
|
||||
for identification purposes only and may be trademarks of their respective
|
||||
companies.
|
||||
|
||||
## Package Licensing
|
||||
|
||||
```{attention}
|
||||
AQL Profiler and AOCC CPU optimization are both provided in binary form, each
|
||||
subject to the license agreement enclosed in the directory for the binary and is
|
||||
available here: `/opt/rocm/share/doc/rocm-llvm-alt/EULA`. By using, installing,
|
||||
copying or distributing AQL Profiler and/or AOCC CPU Optimizations, you agree to
|
||||
the terms and conditions of this license agreement. If you do not agree to the
|
||||
terms of this agreement, do not install, copy or use the AQL Profiler and/or the
|
||||
AOCC CPU Optimizations.
|
||||
```
|
||||
|
||||
For the rest of the ROCm packages, you can find the licensing information at the
|
||||
following location: `/opt/rocm/share/doc/<component-name>/`
|
||||
|
||||
For example, you can fetch the licensing information of the `_amd_comgr_`
|
||||
component (Code Object Manager) from the `amd_comgr` folder. A file named
|
||||
`LICENSE.txt` contains the license details at:
|
||||
`/opt/rocm-5.4.1/share/doc/amd_comgr/LICENSE.txt`
|
||||
|
||||
15
docs/release/user_kernel_space_compat_matrix.md
Normal file
15
docs/release/user_kernel_space_compat_matrix.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# User/Kernel-Space Support Matrix
|
||||
|
||||
ROCm™ provides forward and backward compatibility between the Kernel Fusion
|
||||
Driver (KFD) and its user space software for +/- 2 releases. This table shows
|
||||
the compatibility combinations that are currently supported.
|
||||
|
||||
| KFD | Tested user space versions |
|
||||
|:------|:--------------------------:|
|
||||
| 5.0.2 | 5.1.0, 5.2.0 |
|
||||
| 5.1.0 | 5.0.2 |
|
||||
| 5.1.3 | 5.2.0, 5.3.0 |
|
||||
| 5.2.0 | 5.0.2, 5.1.3 |
|
||||
| 5.2.3 | 5.3.0, 5.4.0 |
|
||||
| 5.3.0 | 5.1.3, 5.2.3 |
|
||||
| 5.4.0 | 5.2.3, 5.3.3 |
|
||||
@@ -1,6 +1,6 @@
|
||||
# AMD ROCm™ Platform - Overview
|
||||
# What is ROCm?
|
||||
|
||||
ROCm™ is an open-source stack for GPU computation. ROCm is primarily Open-Source
|
||||
ROCm is an open-source stack for GPU computation. ROCm is primarily Open-Source
|
||||
Software (OSS) that allows developers the freedom to customize and tailor their
|
||||
GPU software for their own needs while collaborating with a community of other
|
||||
developers, and helping each other find solutions in an agile, flexible, rapid
|
||||
|
||||
@@ -19,27 +19,42 @@ subtrees:
|
||||
title: Installation Overview
|
||||
- file: deploy/linux/prerequisites
|
||||
title: Prerequisites
|
||||
- file: deploy/linux/install
|
||||
title: Installation
|
||||
- file: deploy/linux/upgrade
|
||||
title: Upgrade
|
||||
- file: deploy/linux/uninstall
|
||||
title: Uninstallation
|
||||
- file: deploy/linux/package_manager_integration
|
||||
- file: deploy/linux/os-native/index
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: deploy/linux/os-native/install
|
||||
title: Installation
|
||||
- file: deploy/linux/os-native/upgrade
|
||||
title: Upgrade
|
||||
- file: deploy/linux/os-native/uninstall
|
||||
title: Uninstallation
|
||||
- file: deploy/linux/os-native/package_manager_integration
|
||||
- file: deploy/linux/installer/index
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: deploy/linux/installer/install
|
||||
title: Installation
|
||||
- file: deploy/linux/installer/upgrade
|
||||
title: Upgrade
|
||||
- file: deploy/linux/installer/uninstall
|
||||
title: Uninstallation
|
||||
- file: deploy/docker
|
||||
title: Docker
|
||||
|
||||
- caption: Release Info
|
||||
entries:
|
||||
- file: release
|
||||
- file: CHANGELOG
|
||||
title: Changelog
|
||||
- file: release/gpu_os_support
|
||||
- url: https://github.com/RadeonOpenCompute/ROCm/labels/Verified%20Issue
|
||||
title: Known Issues
|
||||
- file: release/compatibility
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: release/docker_support_matrix
|
||||
- file: reference/framework_compatibility/framework_compatibility
|
||||
- file: release/user_kernel_space_compat_matrix
|
||||
- file: release/docker_image_support_matrix
|
||||
- file: release/3rd_party_support_matrix
|
||||
- file: release/licensing
|
||||
|
||||
|
||||
@@ -47,26 +62,26 @@ subtrees:
|
||||
entries:
|
||||
- file: reference/all
|
||||
- file: reference/compilers
|
||||
title: Compilers and Development Tools
|
||||
title: Compilers and Tools
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: reference/rocmcc/rocmcc
|
||||
title: ROCmCC
|
||||
- url: https://rocmdocs.amd.com/projects/ROCmCC/en/{branch}/
|
||||
- url: ${project:rocgdb}
|
||||
title: ROCgdb
|
||||
- url: https://rocmdocs.amd.com/projects/ROCgdb/en/hybrid/
|
||||
- url: ${project:rocprofiler}
|
||||
title: rocprofiler
|
||||
- url: https://rocmdocs.amd.com/projects/rocprofiler/en/{branch}/
|
||||
- url: ${project:roctracer}
|
||||
title: roctracer
|
||||
- url: https://rocmdocs.amd.com/projects/roctracer/en/{branch}/
|
||||
title: ROCdbgapi
|
||||
- url: ${project:rocdbgapi}
|
||||
title: ROCdbgapi
|
||||
- file: reference/hip
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: HIP Runtime API
|
||||
url: https://rocmdocs.amd.com/projects/HIP/en/{branch}/
|
||||
url: ${project:hip}
|
||||
- title: HIPify - Port Your Code
|
||||
url: https://advanced-micro-devices-demo--737.com.readthedocs.build/projects/HIPIFY/en/737/
|
||||
url: ${project:hipify}
|
||||
- file: reference/openmp/openmp
|
||||
title: OpenMP
|
||||
- file: reference/gpu_libraries/math
|
||||
@@ -77,72 +92,72 @@ subtrees:
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: rocBLAS
|
||||
url: https://rocmdocs.amd.com/projects/rocBLAS/en/{branch}/
|
||||
url: ${project:rocblas}
|
||||
- title: hipBLAS
|
||||
url: https://rocmdocs.amd.com/projects/hipBLAS/en/{branch}/
|
||||
url: ${project:hipblas}
|
||||
- title: hipBLASLt
|
||||
url: https://rocm.docs.amd.com/projects/hipBLASLt/en/{branch}/
|
||||
url: ${project:hipblaslt}
|
||||
- title: rocALUTION
|
||||
url: https://rocm.docs.amd.com/projects/rocALUTION/en/{branch}/
|
||||
url: ${project:rocalution}
|
||||
- title: rocWMMA
|
||||
url: https://rocm.docs.amd.com/projects/rocWMMA/en/{branch}/
|
||||
url: ${project:rocwmma}
|
||||
- title: rocSOLVER
|
||||
url: https://rocm.docs.amd.com/projects/rocSOLVER/en/{branch}/
|
||||
url: ${project:rocsolver}
|
||||
- title: hipSOLVER
|
||||
url: https://rocm.docs.amd.com/projects/hipSOLVER/en/{branch}/
|
||||
url: ${project:hipsolver}
|
||||
- title: rocSPARSE
|
||||
url: https://rocm.docs.amd.com/projects/rocSPARSE/en/{branch}/
|
||||
url: ${project:rocsparse}
|
||||
- title: hipSPARSE
|
||||
url: https://rocm.docs.amd.com/projects/hipSPARSE/en/{branch}/
|
||||
url: ${project:hipsparse}
|
||||
- file: reference/gpu_libraries/fft
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: rocFFT
|
||||
url: https://rocm.docs.amd.com/projects/rocFFT/en/{branch}/
|
||||
url: ${project:rocfft}
|
||||
- title: hipFFT
|
||||
url: https://rocm.docs.amd.com/projects/hipFFT/en/{branch}/
|
||||
url: ${project:hipfft}
|
||||
- file: reference/gpu_libraries/rand
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: rocRAND
|
||||
url: https://rocm.docs.amd.com/projects/rocRAND/en/{branch}/
|
||||
url: ${project:rocrand}
|
||||
- title: hipRAND
|
||||
url: https://rocm.docs.amd.com/projects/hipRAND/en/{branch}/
|
||||
url: ${project:hiprand}
|
||||
- file: reference/gpu_libraries/c++_primitives
|
||||
title: C++ Primitive Libraries
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: rocPRIM
|
||||
url: https://rocm.docs.amd.com/projects/rocPRIM/en/{branch}/
|
||||
url: ${project:rocprim}
|
||||
- entries:
|
||||
- title: hipCUB
|
||||
url: https://rocm.docs.amd.com/projects/hipCUB/en/{branch}/
|
||||
url: ${project:hipcub}
|
||||
- entries:
|
||||
- title: rocThrust
|
||||
url: https://rocm.docs.amd.com/projects/rocThrust/en/{branch}/
|
||||
url: ${project:rocthrust}
|
||||
- file: reference/gpu_libraries/communication
|
||||
title: Communication Libraries
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: RCCL
|
||||
url: https://rocm.docs.amd.com/projects/rccl/en/{branch}/
|
||||
url: ${project:rccl}
|
||||
- file: reference/ai_tools
|
||||
title: AI Libraries
|
||||
subtrees:
|
||||
- entries:
|
||||
- title: MIOpen - Machine Intelligence
|
||||
url: https://rocm.docs.amd.com/projects/MIOpen/en/{branch}/
|
||||
url: ${project:miopen}
|
||||
- title: Composable Kernel
|
||||
url: https://rocm.docs.amd.com/projects/composable_kernel/en/{branch}/
|
||||
url: ${project:composable_kernel}
|
||||
- title: MIGraphX - Graph Optimization
|
||||
url: https://rocmsoftwareplatform.github.io/AMDMIGraphX/doc/html/
|
||||
url: ${project:amdmigraphx}
|
||||
- file: reference/computer_vision
|
||||
subtrees:
|
||||
- entries:
|
||||
- url: https://rocm.docs.amd.com/projects/MIVisionX/en/{branch}/
|
||||
- url: ${project:mivisionx}
|
||||
title: MIVisionX
|
||||
- entries:
|
||||
- url: https://rocm.docs.amd.com/projects/rocAL/en/{branch}/
|
||||
- url: ${project:rocal}
|
||||
title: rocAL
|
||||
- file: reference/management_tools
|
||||
title: Management Tools
|
||||
@@ -150,20 +165,21 @@ subtrees:
|
||||
- entries:
|
||||
- url: https://rocm.docs.amd.com/projects/amdsmi/en/{branch}/
|
||||
title: AMD SMI
|
||||
- url: https://rocm.docs.amd.com/projects/rocmsmi/en/{branch}/
|
||||
- url: https://rocm.docs.amd.com/projects/rocm_smi_lib/en/{branch}/
|
||||
title: ROCm SMI
|
||||
- url: https://rocm.docs.amd.com/projects/rdc/en/{branch}/
|
||||
- url: ${project:rdc}
|
||||
title: ROCm Datacenter Tool
|
||||
- file: reference/validation_tools
|
||||
title: Validation Tools
|
||||
subtrees:
|
||||
- entries:
|
||||
- url: https://rocm.docs.amd.com/projects/rvs/en/{branch}/
|
||||
- url: ${project:rocmvalidationsuite}
|
||||
title: RVS
|
||||
- url: https://rocm.docs.amd.com/projects/TransferBench/en/{branch}/
|
||||
- url: ${project:transferbench}
|
||||
title: TransferBench
|
||||
- caption: Understand ROCm
|
||||
entries:
|
||||
- file: understand/all.md
|
||||
- title: Compiler Disambiguation
|
||||
file: understand/compiler_disambiguation
|
||||
- file: understand/cmake_packages
|
||||
@@ -176,8 +192,10 @@ subtrees:
|
||||
title: MI250
|
||||
- file: understand/gpu_arch/mi100
|
||||
title: MI100
|
||||
- file: understand/More-about-how-ROCm-uses-PCIe-Atomics
|
||||
- caption: How to Guides
|
||||
entries:
|
||||
- file: how_to/all
|
||||
- title: Tuning Guides
|
||||
file: how_to/tuning_guides/index.md
|
||||
subtrees:
|
||||
@@ -197,15 +215,17 @@ subtrees:
|
||||
- file: how_to/gpu_aware_mpi
|
||||
- file: how_to/system_debugging
|
||||
|
||||
- caption: Examples
|
||||
- caption: Tutorials & Examples
|
||||
file: examples
|
||||
entries:
|
||||
- title: ROCm Examples
|
||||
url: https://github.com/amd/rocm-examples
|
||||
- file: examples/ai_ml_inferencing
|
||||
title: AI/ML/Inferencing
|
||||
- title: Machine Learning
|
||||
file: examples/machine_learning/all
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: examples/inception_casestudy/inception_casestudy
|
||||
- entries:
|
||||
- file: examples/machine_learning/pytorch_inception
|
||||
- file: examples/machine_learning/migraphx_optimization
|
||||
|
||||
- caption: About
|
||||
entries:
|
||||
|
||||
@@ -1 +1,2 @@
|
||||
rocm-docs-core==0.11.0
|
||||
rocm-docs-core==1.8.0
|
||||
sphinx-reredirects
|
||||
|
||||
@@ -1,110 +1,106 @@
|
||||
#
|
||||
# This file is autogenerated by pip-compile with Python 3.8
|
||||
# This file is autogenerated by pip-compile with Python 3.10
|
||||
# by the following command:
|
||||
#
|
||||
# pip-compile sphinx/requirements.in
|
||||
# pip-compile requirements.in
|
||||
#
|
||||
accessible-pygments==0.0.3
|
||||
accessible-pygments==0.0.5
|
||||
# via pydata-sphinx-theme
|
||||
alabaster==0.7.13
|
||||
alabaster==1.0.0
|
||||
# via sphinx
|
||||
babel==2.11.0
|
||||
babel==2.16.0
|
||||
# via
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
beautifulsoup4==4.11.2
|
||||
beautifulsoup4==4.12.3
|
||||
# via pydata-sphinx-theme
|
||||
breathe==4.34.0
|
||||
breathe==4.35.0
|
||||
# via rocm-docs-core
|
||||
certifi==2022.12.7
|
||||
certifi==2024.8.30
|
||||
# via requests
|
||||
cffi==1.15.1
|
||||
cffi==1.17.1
|
||||
# via
|
||||
# cryptography
|
||||
# pynacl
|
||||
charset-normalizer==2.1.1
|
||||
charset-normalizer==3.3.2
|
||||
# via requests
|
||||
click==8.1.3
|
||||
click==8.1.7
|
||||
# via sphinx-external-toc
|
||||
cryptography==40.0.2
|
||||
cryptography==43.0.1
|
||||
# via pyjwt
|
||||
deprecated==1.2.13
|
||||
deprecated==1.2.14
|
||||
# via pygithub
|
||||
docutils==0.19
|
||||
docutils==0.21.2
|
||||
# via
|
||||
# breathe
|
||||
# myst-parser
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
fastjsonschema==2.16.3
|
||||
fastjsonschema==2.20.0
|
||||
# via rocm-docs-core
|
||||
gitdb==4.0.10
|
||||
gitdb==4.0.11
|
||||
# via gitpython
|
||||
gitpython==3.1.30
|
||||
gitpython==3.1.43
|
||||
# via rocm-docs-core
|
||||
idna==3.4
|
||||
idna==3.10
|
||||
# via requests
|
||||
imagesize==1.4.1
|
||||
# via sphinx
|
||||
jinja2==3.1.2
|
||||
jinja2==3.1.4
|
||||
# via
|
||||
# myst-parser
|
||||
# sphinx
|
||||
linkify-it-py==1.0.3
|
||||
# via myst-parser
|
||||
markdown-it-py==2.2.0
|
||||
markdown-it-py==3.0.0
|
||||
# via
|
||||
# mdit-py-plugins
|
||||
# myst-parser
|
||||
markupsafe==2.1.2
|
||||
markupsafe==2.1.5
|
||||
# via jinja2
|
||||
mdit-py-plugins==0.3.4
|
||||
mdit-py-plugins==0.4.2
|
||||
# via myst-parser
|
||||
mdurl==0.1.2
|
||||
# via markdown-it-py
|
||||
myst-parser[linkify]==1.0.0
|
||||
myst-parser==4.0.0
|
||||
# via rocm-docs-core
|
||||
packaging==23.0
|
||||
packaging==24.1
|
||||
# via
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
pycparser==2.21
|
||||
pycparser==2.22
|
||||
# via cffi
|
||||
pydata-sphinx-theme==0.13.3
|
||||
pydata-sphinx-theme==0.15.4
|
||||
# via
|
||||
# rocm-docs-core
|
||||
# sphinx-book-theme
|
||||
pygithub==1.58.1
|
||||
pygithub==2.4.0
|
||||
# via rocm-docs-core
|
||||
pygments==2.14.0
|
||||
pygments==2.18.0
|
||||
# via
|
||||
# accessible-pygments
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
pyjwt[crypto]==2.6.0
|
||||
pyjwt[crypto]==2.9.0
|
||||
# via pygithub
|
||||
pynacl==1.5.0
|
||||
# via pygithub
|
||||
pytz==2022.7.1
|
||||
# via babel
|
||||
pyyaml==6.0
|
||||
pyyaml==6.0.2
|
||||
# via
|
||||
# myst-parser
|
||||
# rocm-docs-core
|
||||
# sphinx-external-toc
|
||||
requests==2.28.1
|
||||
requests==2.32.3
|
||||
# via
|
||||
# pygithub
|
||||
# sphinx
|
||||
rocm-docs-core==0.11.0
|
||||
rocm-docs-core==1.8.0
|
||||
# via -r requirements.in
|
||||
smmap==5.0.0
|
||||
smmap==5.0.1
|
||||
# via gitdb
|
||||
snowballstemmer==2.2.0
|
||||
# via sphinx
|
||||
soupsieve==2.4
|
||||
soupsieve==2.6
|
||||
# via beautifulsoup4
|
||||
sphinx==5.3.0
|
||||
sphinx==8.0.2
|
||||
# via
|
||||
# breathe
|
||||
# myst-parser
|
||||
@@ -115,33 +111,40 @@ sphinx==5.3.0
|
||||
# sphinx-design
|
||||
# sphinx-external-toc
|
||||
# sphinx-notfound-page
|
||||
sphinx-book-theme==1.0.1
|
||||
# sphinx-reredirects
|
||||
sphinx-book-theme==1.1.3
|
||||
# via rocm-docs-core
|
||||
sphinx-copybutton==0.5.1
|
||||
sphinx-copybutton==0.5.2
|
||||
# via rocm-docs-core
|
||||
sphinx-design==0.4.1
|
||||
sphinx-design==0.6.1
|
||||
# via rocm-docs-core
|
||||
sphinx-external-toc==0.3.1
|
||||
sphinx-external-toc==1.0.1
|
||||
# via rocm-docs-core
|
||||
sphinx-notfound-page==0.8.3
|
||||
sphinx-notfound-page==1.0.4
|
||||
# via rocm-docs-core
|
||||
sphinxcontrib-applehelp==1.0.4
|
||||
sphinx-reredirects==0.1.5
|
||||
# via -r requirements.in
|
||||
sphinxcontrib-applehelp==2.0.0
|
||||
# via sphinx
|
||||
sphinxcontrib-devhelp==1.0.2
|
||||
sphinxcontrib-devhelp==2.0.0
|
||||
# via sphinx
|
||||
sphinxcontrib-htmlhelp==2.0.1
|
||||
sphinxcontrib-htmlhelp==2.1.0
|
||||
# via sphinx
|
||||
sphinxcontrib-jsmath==1.0.1
|
||||
# via sphinx
|
||||
sphinxcontrib-qthelp==1.0.3
|
||||
sphinxcontrib-qthelp==2.0.0
|
||||
# via sphinx
|
||||
sphinxcontrib-serializinghtml==1.1.5
|
||||
sphinxcontrib-serializinghtml==2.0.0
|
||||
# via sphinx
|
||||
typing-extensions==4.5.0
|
||||
# via pydata-sphinx-theme
|
||||
uc-micro-py==1.0.1
|
||||
# via linkify-it-py
|
||||
urllib3==1.26.13
|
||||
# via requests
|
||||
wrapt==1.14.1
|
||||
tomli==2.0.1
|
||||
# via sphinx
|
||||
typing-extensions==4.12.2
|
||||
# via
|
||||
# pydata-sphinx-theme
|
||||
# pygithub
|
||||
urllib3==2.2.3
|
||||
# via
|
||||
# pygithub
|
||||
# requests
|
||||
wrapt==1.16.0
|
||||
# via deprecated
|
||||
|
||||
149
docs/understand/More-about-how-ROCm-uses-PCIe-Atomics.rst
Normal file
149
docs/understand/More-about-how-ROCm-uses-PCIe-Atomics.rst
Normal file
@@ -0,0 +1,149 @@
|
||||
===========================
|
||||
How ROCm uses PCIe Atomics
|
||||
===========================
|
||||
|
||||
|
||||
ROCm PCIe Feature and Overview BAR Memory
|
||||
==========================================
|
||||
|
||||
|
||||
ROCm is an extension of HSA platform architecture, so it shares the queueing model, memory model, signaling and synchronization protocols. Platform atomics are integral to perform queuing and signaling memory operations where there may be multiple-writers across CPU and GPU agents.
|
||||
|
||||
The full list of HSA system architecture platform requirements are here: `HSA Sys Arch Features <http://hsafoundation.com/wp-content/uploads/2021/02/HSA-SysArch-1.2.pdf>`_.
|
||||
|
||||
The ROCm Platform uses the new PCI Express 3.0 (PCIe 3.0) features for Atomic Read-Modify-Write Transactions which extends inter-processor synchronization mechanisms to IO to support the defined set of HSA capabilities needed for queuing and signaling memory operations.
|
||||
|
||||
The new PCIe AtomicOps operate as completers for ``CAS`` (Compare and Swap), ``FetchADD``, ``SWAP`` atomics. The AtomicsOps are initiated by the
|
||||
I/O device which support 32-bit, 64-bit and 128-bit operand which target address have to be naturally aligned to operation sizes.
|
||||
|
||||
For ROCm the Platform atomics are used in ROCm in the following ways:
|
||||
|
||||
* Update HSA queue’s read_dispatch_id: 64 bit atomic add used by the command processor on the GPU agent to update the packet ID it processed.
|
||||
* Update HSA queue’s write_dispatch_id: 64 bit atomic add used by the CPU and GPU agent to support multi-writer queue insertions.
|
||||
* Update HSA Signals – 64bit atomic ops are used for CPU & GPU synchronization.
|
||||
|
||||
The PCIe 3.0 AtomicOp feature allows atomic transactions to be requested by, routed through and completed by PCIe components. Routing and completion does not require software support. Component support for each is detectable via the DEVCAP2 register. Upstream bridges need to have AtomicOp routing enabled or the Atomic Operations will fail even though PCIe endpoint and PCIe I/O Devices has the capability to Atomics Operations.
|
||||
|
||||
To do AtomicOp routing capability between two or more Root Ports, each associated Root Port must indicate that capability via the AtomicOp Routing Supported bit in the Device Capabilities 2 register.
|
||||
|
||||
If your system has a PCIe Express Switch it needs to support AtomicsOp routing. Again AtomicOp requests are permitted only if a component’s ``DEVCTL2.ATOMICOP_REQUESTER_ENABLE`` field is set. These requests can only be serviced if the upstream components support AtomicOp completion and/or routing to a component which does. AtomicOp Routing Support=1 Routing is supported, AtomicOp Routing Support=0 routing is not supported.
|
||||
|
||||
Atomic Operation is a Non-Posted transaction supporting 32-bit and 64-bit address formats, there must be a response for Completion containing the result of the operation. Errors associated with the operation (uncorrectable error accessing the target location or carrying out the Atomic operation) are signaled to the requester by setting the Completion Status field in the completion descriptor, they are set to to Completer Abort (CA) or Unsupported Request (UR).
|
||||
|
||||
To understand more about how PCIe Atomic operations work `PCIe Atomics <https://pcisig.com/sites/default/files/specification_documents/ECN_Atomic_Ops_080417.pdf>`_
|
||||
|
||||
`Linux Kernel Patch to pci_enable_atomic_request <https://patchwork.kernel.org/patch/7261731/>`_
|
||||
|
||||
There are also a number of papers which talk about these new capabilities:
|
||||
|
||||
* `Atomic Read Modify Write Primitives by Intel <https://www.intel.es/content/dam/doc/white-paper/atomic-read-modify-write-primitives-i-o-devices-paper.pdf>`_
|
||||
* `PCI express 3 Accelerator Whitepaper by Intel <https://www.intel.sg/content/dam/doc/white-paper/pci-express3-accelerator-white-paper.pdf>`_
|
||||
* `Intel PCIe Generation 3 Hotchips Paper <https://www.hotchips.org/wp-content/uploads/hc_archives/hc21/1_sun/HC21.23.1.SystemInterconnectTutorial-Epub/HC21.23.131.Ajanovic-Intel-PCIeGen3.pdf>`_
|
||||
* `PCIe Generation 4 Base Specification includes Atomics Operation <http://composter.com.ua/documents/PCI_Express_Base_Specification_Revision_4.0.Ver.0.3.pdf>`_
|
||||
|
||||
Other I/O devices with PCIe Atomics support
|
||||
|
||||
* `Mellanox ConnectX-5 InfiniBand Card <http://www.mellanox.com/related-docs/prod_adapter_cards/PB_ConnectX-5_VPI_Card.pdf>`_
|
||||
* `Cray Aries Interconnect <http://www.hoti.org/hoti20/slides/Bob_Alverson.pdf>`_
|
||||
* `Xilinx PCIe Ultrascale Whitepaper <https://www.xilinx.com/support/documentation/white_papers/wp464-PCIe-ultrascale.pdf>`_
|
||||
* `Xilinx 7 Series Devices <https://www.xilinx.com/support/documentation/ip_documentation/pcie_7x/v3_1/pg054-7series-pcie.pdf>`_
|
||||
|
||||
Future bus technology with richer I/O Atomics Operation Support
|
||||
|
||||
* `GenZ <http://genzconsortium.org/faq/gen-z-technology/#33/>`_
|
||||
|
||||
New PCIe Endpoints with support beyond AMD Ryzen and EPYC CPU; Intel Haswell or newer CPU’s with PCIe Generation 3.0 support.
|
||||
|
||||
* `Mellanox Bluefield SOC <http://www.mellanox.com/related-docs/npu-multicore-processors/PB_Bluefield_SoC.pdf>`_
|
||||
* `Cavium Thunder X2 <http://www.cavium.com/ThunderX2_ARM_Processors.html>`_
|
||||
|
||||
In ROCm, we also take advantage of PCIe ID based ordering technology for P2P when the GPU originates two writes to two different targets:
|
||||
|
||||
| 1. write to another GPU memory,
|
||||
|
||||
| 2. then write to system memory to indicate transfer complete.
|
||||
|
||||
They are routed off to different ends of the computer but we want to make sure the write to system memory to indicate transfer complete occurs AFTER P2P write to GPU has complete.
|
||||
|
||||
`Good Paper on Understanding PCIe Generation 3 Throughput <https://www.altera.com/en_US/pdfs/literature/an/an690.pdf>`_
|
||||
|
||||
BAR Memory Overview
|
||||
*******************
|
||||
On a Xeon E5 based system in the BIOS we can turn on above 4GB PCIe addressing, if so he need to set MMIO Base address ( MMIOH Base) and Range ( MMIO High Size) in the BIOS.
|
||||
|
||||
In SuperMicro system in the system bios you need to see the following
|
||||
|
||||
* Advanced->PCIe/PCI/PnP configuration-> Above 4G Decoding = Enabled
|
||||
|
||||
* Advanced->PCIe/PCI/PnP Configuration->MMIOH Base = 512G
|
||||
|
||||
* Advanced->PCIe/PCI/PnP Configuration->MMIO High Size = 256G
|
||||
|
||||
When we support Large Bar Capability there is a Large Bar Vbios which also disable the IO bar.
|
||||
|
||||
For GFX9 and Vega10 which have Physical Address up 44 bit and 48 bit Virtual address.
|
||||
|
||||
* BAR0-1 registers: 64bit, prefetchable, GPU memory. 8GB or 16GB depending on Vega10 SKU. Must be placed < 2^44 to support P2P access from other Vega10.
|
||||
* BAR2-3 registers: 64bit, prefetchable, Doorbell. Must be placed < 2^44 to support P2P access from other Vega10.
|
||||
* BAR4 register: Optional, not a boot device.
|
||||
* BAR5 register: 32bit, non-prefetchable, MMIO. Must be placed < 4GB.
|
||||
|
||||
Here is how our BAR works on GFX 8 GPU’s with 40 bit Physical Address Limit ::
|
||||
|
||||
11:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev c1)
|
||||
|
||||
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0b35
|
||||
|
||||
Flags: bus master, fast devsel, latency 0, IRQ 119
|
||||
|
||||
Memory at bf40000000 (64-bit, prefetchable) [size=256M]
|
||||
|
||||
Memory at bf50000000 (64-bit, prefetchable) [size=2M]
|
||||
|
||||
I/O ports at 3000 [size=256]
|
||||
|
||||
Memory at c7400000 (32-bit, non-prefetchable) [size=256K]
|
||||
|
||||
Expansion ROM at c7440000 [disabled] [size=128K]
|
||||
|
||||
Legend:
|
||||
|
||||
1 : GPU Frame Buffer BAR – In this example it happens to be 256M, but typically this will be size of the GPU memory (typically 4GB+). This BAR has to be placed < 2^40 to allow peer-to-peer access from other GFX8 AMD GPUs. For GFX9 (Vega GPU) the BAR has to be placed < 2^44 to allow peer-to-peer access from other GFX9 AMD GPUs.
|
||||
|
||||
2 : Doorbell BAR – The size of the BAR is typically will be < 10MB (currently fixed at 2MB) for this generation GPUs. This BAR has to be placed < 2^40 to allow peer-to-peer access from other current generation AMD GPUs.
|
||||
|
||||
3 : IO BAR - This is for legacy VGA and boot device support, but since this the GPUs in this project are not VGA devices (headless), this is not a concern even if the SBIOS does not setup.
|
||||
|
||||
4 : MMIO BAR – This is required for the AMD Driver SW to access the configuration registers. Since the reminder of the BAR available is only 1 DWORD (32bit), this is placed < 4GB. This is fixed at 256KB.
|
||||
|
||||
5 : Expansion ROM – This is required for the AMD Driver SW to access the GPU’s video-bios. This is currently fixed at 128KB.
|
||||
|
||||
Excepts form Overview of Changes to PCI Express 3.0
|
||||
===================================================
|
||||
By Mike Jackson, Senior Staff Architect, MindShare, Inc.
|
||||
********************************************************
|
||||
Atomic Operations – Goal:
|
||||
*************************
|
||||
Support SMP-type operations across a PCIe network to allow for things like offloading tasks between CPU cores and accelerators like a GPU. The spec says this enables advanced synchronization mechanisms that are particularly useful with multiple producers or consumers that need to be synchronized in a non-blocking fashion. Three new atomic non-posted requests were added, plus the corresponding completion (the address must be naturally aligned with the operand size or the TLP is malformed):
|
||||
|
||||
* Fetch and Add – uses one operand as the “add” value. Reads the target location, adds the operand, and then writes the result back to the original location.
|
||||
|
||||
* Unconditional Swap – uses one operand as the “swap” value. Reads the target location and then writes the swap value to it.
|
||||
|
||||
* Compare and Swap – uses 2 operands: first data is compare value, second is swap value. Reads the target location, checks it against the compare value and, if equal, writes the swap value to the target location.
|
||||
|
||||
* AtomicOpCompletion – new completion to give the result so far atomic request and indicate that the atomicity of the transaction has been maintained.
|
||||
|
||||
Since AtomicOps are not locked they don't have the performance downsides of the PCI locked protocol. Compared to locked cycles, they provide “lower latency, higher scalability, advanced synchronization algorithms, and dramatically lower impact on other PCIe traffic.” The lock mechanism can still be used across a bridge to PCI or PCI-X to achieve the desired operation.
|
||||
|
||||
AtomicOps can go from device to device, device to host, or host to device. Each completer indicates whether it supports this capability and guarantees atomic access if it does. The ability to route AtomicOps is also indicated in the registers for a given port.
|
||||
|
||||
ID-based Ordering – Goal:
|
||||
*************************
|
||||
Improve performance by avoiding stalls caused by ordering rules. For example, posted writes are never normally allowed to pass each other in a queue, but if they are requested by different functions, we can have some confidence that the requests are not dependent on each other. The previously reserved Attribute bit [2] is now combined with the RO bit to indicate ID ordering with or without relaxed ordering.
|
||||
|
||||
This only has meaning for memory requests, and is reserved for Configuration or IO requests. Completers are not required to copy this bit into a completion, and only use the bit if their enable bit is set for this operation.
|
||||
|
||||
To read more on PCIe Gen 3 new options https://www.mindshare.com/files/resources/PCIe%203-0.pdf
|
||||
|
||||
|
||||
47
docs/understand/all.md
Normal file
47
docs/understand/all.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# All Explanation Material
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} Compiler Nomencalture
|
||||
:link: compiler_disambiguation
|
||||
:link-type: doc
|
||||
ROCm ships multiple compilers of varying origins and purposes. This article
|
||||
disambiguates compiler naming used throughout the documentation.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Using CMake
|
||||
:link: cmake_packages
|
||||
:link-type: doc
|
||||
ROCm components ship with 1st party CMake support. This article details how that
|
||||
support works and how to use it.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} Linux Folder Structure Reorganization
|
||||
:link: file_reorg
|
||||
:link-type: doc
|
||||
ROCm™ packages have adopted the Linux foundation file system hierarchy standard
|
||||
to ensure ROCm components follow open source conventions for Linux-based
|
||||
distributions.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} GPU Isolation Techniques
|
||||
:link: gpu_isolation
|
||||
:link-type: doc
|
||||
Restricting the access of applications to a subset of GPUs, aka isolating GPUs
|
||||
allows users to hide GPU resources from programs.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} GPU Architectures
|
||||
:link: gpu_arch
|
||||
:link-type: doc
|
||||
AMD documentation around architectural details from both the CDNA and RDNA
|
||||
product lines.
|
||||
|
||||
:::
|
||||
|
||||
:::::
|
||||
@@ -179,7 +179,7 @@ This project can then be configured with for eg.
|
||||
- Linux: ``cmake -D CMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++``
|
||||
|
||||
Which use the device compiler provided from the binary packages of
|
||||
`ROCm HIP SDK <https://www.amd.com/en/graphics/servers-solutions-rocm>`_ and
|
||||
`ROCm HIP SDK <https://www.amd.com/en/developer/rocm-hub.html>`_ and
|
||||
`repo.radeon.com <https://repo.radeon.com>`_ respectively.
|
||||
|
||||
When using the CXX language support to compile HIP device code, selecting the
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
# ROCm Compilers Disambiguation
|
||||
|
||||
The following table summarizes the widely used terms in this document.
|
||||
ROCm ships multiple compilers of varying origins and purposes. This article
|
||||
disambiguates compiler naming used throughout the documentation.
|
||||
|
||||
## Compiler Terms
|
||||
|
||||
|
||||
@@ -17,7 +17,7 @@ distributions. Following is the ROCm proposed file structure.
|
||||
| -- architecture dependent libraries and binaries used internally by components
|
||||
| -- cmake
|
||||
| -- <component>
|
||||
| --<component>.config.cmake
|
||||
| --<component>-config.cmake
|
||||
| -- libexec
|
||||
| -- <component>
|
||||
| -- non ISA/architecture independent executables used internally by components
|
||||
@@ -162,7 +162,6 @@ correct header file and use correct search paths.
|
||||
|
||||
## References
|
||||
|
||||
ROCm deprecation warning :
|
||||
<https://docs.amd.com/bundle/ROCm-Release-Notes-v5.4.3/page/Deprecations_and_Warnings.html>
|
||||
{ref}`ROCm deprecation warning <5_4_0_filesystem_reorg_deprecation_notice>`
|
||||
|
||||
Linux File System Standard : <https://refspecs.linuxfoundation.org/fhs.shtml>
|
||||
[Linux File System Standard](https://refspecs.linuxfoundation.org/fhs.shtml)
|
||||
|
||||
@@ -1,79 +0,0 @@
|
||||
# ISV Deployment Guide (Windows)
|
||||
|
||||
## Abstract
|
||||
|
||||
ISVs deploying applications using the HIP SDK depend on the AMD GPU Drivers, HIP
|
||||
Runtime Library and HIP SDK Libraries. A compatibility matrix table provides
|
||||
details on AMD’s support model. AMD GPU Drivers are distributed with a HIP
|
||||
Runtime included. Each HIP Runtime is associated with a HIP compiler version.
|
||||
Applications built with a particular HIP compiler should document its associated
|
||||
HIP Runtime version and AMD GPU Driver as minimum version requirements for its
|
||||
end users. Applications do not distribute the HIP Runtime. Instead, end users
|
||||
will use the HIP Runtime provided by an AMD GPU Driver. AMD provides backward
|
||||
compatibility for applications dynamically linked to the HIP Runtime based on
|
||||
our Driver and HIP support policy. ISV applications using the HIP SDK Libraries,
|
||||
for example hipBLAS, should distribute the HIP SDK Library as part of its
|
||||
installer package. It is recommended not to require end users to install the
|
||||
HIP SDK. AMD provides backward compatibility for AMD Driver and HIP Runtime for
|
||||
the HIP SDK Libraries based on our support policy. AMD support policy for Visual
|
||||
Studio and other third-party compilers are documented here.
|
||||
|
||||
## Introduction
|
||||
|
||||
This guide is intended for Independent Software Vendors (ISVs) and other
|
||||
software developers intending to build applications with the HIP SDK for
|
||||
Windows. The HIP SDK is intended for developer distribution in contrast to the
|
||||
AMD GPU driver which is intended for all end users. The guide discusses how to
|
||||
use and distribute components from the HIP SDK. The HIP SDK is the collection of
|
||||
the AMD GPU Driver, HIP Runtime and the HIP Libraries. These three parts are
|
||||
distributed in the HIP SDK installer. The compatibility and versioning relation
|
||||
between these three parts is documented here. AMD’s support policies for the
|
||||
developer tools allows the ISVs the stability to plan the usage of a tool chain.
|
||||
|
||||
## Recommended Library Distribution Model
|
||||
|
||||
The HIP SDK is distributed via a Windows installer. This distribution system is
|
||||
only intended for software developers and testers. AMD recommends that end users
|
||||
of the program built against HIP SDK components do not have a requirement to
|
||||
install the HIP SDK. There are two types of ISV applications that use the HIP
|
||||
SDK as follows.
|
||||
|
||||
The first group of ISV applications have a dependency on the HIP Runtime and
|
||||
select HIP Header Only Libraries (rocPRIM, hipCUB and rocThrust). This group of
|
||||
ISV applications need to require their end users install an AMD GPU Driver. Each
|
||||
AMD GPU driver has a HIP runtime library bundled with it. The ISV application
|
||||
should ensure that the HIP runtime library has a minimum version associated with
|
||||
it. As the HIP runtime library does not have semantic versioning, the ISV
|
||||
application cannot check for compatibility. However, AMD is committed to not
|
||||
breaking API/ABI compatibility unless the major version number of the HIP
|
||||
runtime is incremented. ISV applications may run without user warning if the HIP
|
||||
major version available in the driver is the same as the HIP major version
|
||||
associated with the compiler it was built with. The ISV at its discretion may
|
||||
throw a warning if the HIP major version is higher than the associate HIP major
|
||||
version of the compiler it was built with.
|
||||
|
||||
The second group of ISV application has a dependency on the HIP Runtime and one
|
||||
or more Dynamically Linked HIP Libraries including the HIP RT library. ISV
|
||||
applications with this dependency need to ensure the end user installs an AMD
|
||||
GPU Driver and is recommended to distribute the dynamically linked HIP library
|
||||
in the installer package of its application. This allows end users to avoid
|
||||
installing the HIP SDK. One benefit of this model is smaller disk space required
|
||||
as only required binaries are distributed by the ISV application. It also avoids
|
||||
the end user to have to agree to licensing agreements for the entire HIP SDK.
|
||||
The version checks recommended for the ISV application including dynamically
|
||||
linked HIP Libraries follow the same requirements as the ISV applications that
|
||||
only have the HIP Runtime and header only library. In addition, each dynamically
|
||||
linked HIP library also has a minimum HIP runtime requirement. Checks for the
|
||||
minimum HIP version for each dynamically linked HIP library may be added at the
|
||||
ISVs discretion. Usually, the minimum HIP version check for the HIP runtime is
|
||||
sufficient if dynamically linked HIP libraries come from the same SDK package as
|
||||
the HIP compiler.
|
||||
|
||||
Please note AMD does not support static linking to any components distributed in
|
||||
the HIP SDK.
|
||||
|
||||
## Conclusion
|
||||
|
||||
This guide provides a limited set of guidance for ISVs application deployment.
|
||||
Please refer to the HIP API guides for the SDK and HIP Optimization guides for
|
||||
more information.
|
||||
@@ -1,23 +0,0 @@
|
||||
all
|
||||
# Extend line length
|
||||
rule 'MD013', :line_length => 99999
|
||||
|
||||
rule 'MD026', :punctuation => '.,;:!'
|
||||
|
||||
# Use "1. 2. 3."-style numbered lists instead of "1. 1. 1."
|
||||
rule 'MD029', :style => :ordered
|
||||
|
||||
# Allow in-line HTML
|
||||
exclude_rule 'MD033'
|
||||
|
||||
exclude_rule 'MD034'
|
||||
|
||||
exclude_rule 'MD041'
|
||||
|
||||
|
||||
|
||||
# False positives, see: https://github.com/markdownlint/markdownlint/issues/374
|
||||
exclude_rule 'MD005'
|
||||
|
||||
# False positives, see: https://github.com/markdownlint/markdownlint/issues/313
|
||||
exclude_rule 'MD007'
|
||||
@@ -60,7 +60,8 @@ ROCDebugger Machine Interface (MI) extends support to lanes. The following enhan
|
||||
|
||||
- MI varobjs are now lane-aware.
|
||||
|
||||
For more information, refer to the ROC Debugger User Guide at <https://docs.amd.com>.
|
||||
For more information, refer to the ROC Debugger User Guide at
|
||||
{doc}`ROCgdb <rocgdb:index>`.
|
||||
|
||||
##### Enhanced - clone-inferior Command
|
||||
|
||||
@@ -82,7 +83,7 @@ This release includes support for AMD Radeon™ Pro W6800, in addition to other
|
||||
|
||||
- Various other bug fixes and performance improvements
|
||||
|
||||
For more information, see <https://docs.amd.com/bundle/MIOpen_gh-pages/page/releasenotes.html>
|
||||
For more information, see {doc}`Documentation <miopen:index>`.
|
||||
|
||||
#### Checkpoint Restore Support With CRIU
|
||||
|
||||
|
||||
@@ -271,7 +271,8 @@ The new APIs for virtual memory management are as follows:
|
||||
hipError_t hipMemUnmap(void* ptr, size_t size);
|
||||
```
|
||||
|
||||
For more information, refer to the HIP API documentation at <https://docs.amd.com/bundle/HIP_API_Guide/page/modules.html>
|
||||
For more information, refer to the HIP API documentation at
|
||||
{doc}`hip:.doxygen/docBin/html/modules`.
|
||||
|
||||
##### Planned HIP Changes in Future Releases
|
||||
|
||||
@@ -287,7 +288,8 @@ This release introduces a new ROCm C++ library for accelerating mixed precision
|
||||
|
||||
rocWMMA is released as a header library and includes test and sample projects to validate and illustrate example usages of the C++ API. GEMM matrix multiplication is used as primary validation given the heavy precedent for the library. However, the usage portfolio is growing significantly and demonstrates different ways rocWMMA may be consumed.
|
||||
|
||||
For more information, refer to <https://docs.amd.com/category/libraries>.
|
||||
For more information, refer to
|
||||
[Communication Libraries](../../../../docs/reference/gpu_libraries/communication.md).
|
||||
|
||||
#### OpenMP Enhancements in This Release
|
||||
|
||||
|
||||
@@ -95,6 +95,8 @@ The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, co
|
||||
>
|
||||
> There will be a transition period where the Perl scripts and compiled binaries are available before the scripts are removed. There will be no functional difference between the Perl scripts and their compiled binary counterpart. No user action is required. Once these are available, users can optionally switch to `hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from `hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
|
||||
(5_4_0_filesystem_reorg_deprecation_notice)=
|
||||
|
||||
##### Linux Filesystem Hierarchy Standard for ROCm
|
||||
|
||||
ROCm packages have adopted the Linux foundation filesystem hierarchy standard in this release to ensure ROCm components follow open source conventions for Linux-based distributions. While moving to a new filesystem hierarchy, ROCm ensures backward compatibility with its 5.1 version or older filesystem hierarchy. See below for a detailed explanation of the new filesystem hierarchy and backward compatibility.
|
||||
@@ -205,9 +207,8 @@ The test was incorrectly using the `hipDeviceAttributePageableMemoryAccess` devi
|
||||
|
||||
`hipHostMalloc()` allocates memory with fine-grained access by default when the environment variable `HIP_HOST_COHERENT=1` is used.
|
||||
|
||||
For more information, refer to the HIP Programming Guide at
|
||||
For more information, refer to {doc}`hip:.doxygen/docBin/html/index`.
|
||||
|
||||
<https://docs.amd.com/bundle/HIP-Programming-Guide-v5.4/page/Introduction_to_HIP_Programming_Guide.html>
|
||||
|
||||
#### SoftHang with `hipStreamWithCUMask` test on AMD Instinct™
|
||||
|
||||
|
||||
11
tools/autotag/templates/rocm_changes/5.5.1.md
Normal file
11
tools/autotag/templates/rocm_changes/5.5.1.md
Normal file
@@ -0,0 +1,11 @@
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
### What's New in This Release
|
||||
|
||||
#### HIP API Change
|
||||
|
||||
The following HIP API is updated in the ROCm v5.5.1 release,
|
||||
|
||||
##### `hipDeviceSetCacheConfig`
|
||||
|
||||
- The return value for `hipDeviceSetCacheConfig` is updated from `hipErrorNotSupported` to `hipSuccess`
|
||||
Reference in New Issue
Block a user