Compare commits

..

7 Commits

Author SHA1 Message Date
Sam Wu
3ef81535df Update documentation requirements 2024-09-16 10:12:12 -08:00
Sam Wu
78380916b3 Update documentation requirements 2024-06-06 16:58:16 -06:00
Sam Wu
9f6cef51e1 Fix RTD config 2024-05-02 08:53:26 -06:00
Sam Wu
3f5f3a6fc7 Update documentation requirements 2024-05-01 16:58:35 -06:00
Sam Wu
55711837bc Update documentation requirements 2024-05-01 16:50:34 -06:00
Sam Wu
4595b88df7 add version to html title 2023-08-04 17:14:50 -06:00
Sam Wu
5ff050428b rocm-docs-core v0.18.3 2023-06-30 09:31:15 -06:00
19 changed files with 902 additions and 457 deletions

View File

@@ -3,20 +3,19 @@
version: 2
sphinx:
configuration: docs/conf.py
formats: [htmlzip]
python:
install:
- requirements: docs/sphinx/requirements.txt
build:
os: ubuntu-22.04
tools:
python: "3.10"
apt_packages:
- "doxygen"
- "gfortran" # For pre-processing fortran sources
- "graphviz" # For dot graphs in doxygen
python:
install:
- requirements: docs/sphinx/requirements.txt
sphinx:
configuration: docs/conf.py
formats: []

View File

@@ -15,6 +15,637 @@ The release notes for the ROCm platform.
-------------------
## ROCm 5.1.0
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-blanks-blockquote -->
### What's New in This Release
#### HIP Enhancements
The ROCm v5.1 release consists of the following HIP enhancements.
##### HIP Installation Guide Updates
The HIP Installation Guide is updated to include installation and building HIP from source on the AMD and NVIDIA platforms.
Refer to the HIP Installation Guide v5.1 for more details.
##### Support for HIP Graph
ROCm v5.1 extends support for HIP Graph.
##### Planned Changes for HIP in Future Releases
###### Separation of hiprtc (libhiprtc) library from hip runtime (amdhip64)
On ROCm/Linux, to maintain backward compatibility, the hipruntime library (amdhip64) will continue to include hiprtc symbols in future releases. The backward compatible support may be discontinued by removing hiprtc symbols from the hipruntime library (amdhip64) in the next major release.
###### hipDeviceProp_t Structure Enhancements
Changes to the hipDeviceProp_t structure in the next major release may result in backward incompatibility. More details on these changes will be provided in subsequent releases.
#### ROCDebugger Enhancements
##### Multi-language Source Level Debugger
The compiler now generates a source-level variable and function argument debug information.
The accuracy is guaranteed if the compiler options `-g -O0` are used and apply only to HIP.
This enhancement enables ROCDebugger users to interact with the HIP source-level variables and function arguments.
> **Note**
>
> The newly-suggested compiler -g option must be used instead of the previously-suggested `-ggdb` option. Although the effect of these two options is currently equivalent, this is not guaranteed for the future and might get changed by the upstream LLVM community.
##### Machine Interface Lanes Support
ROCDebugger Machine Interface (MI) extends support to lanes. The following enhancements are made:
- Added a new -lane-info command, listing the current thread's lanes.
- The -thread-select command now supports a lane switch to switch to a specific lane of a thread:
```sh
-thread-select -l LANE THREAD
```
- The =thread-selected notification gained a lane-id attribute. This enables the frontend to know which lane of the thread was selected.
- The *stopped asynchronous record gained lane-id and hit-lanes attributes. The former indicates which lane is selected, and the latter indicates which lanes explain the stop.
- MI commands now accept a global --lane option, similar to the global --thread and --frame options.
- MI varobjs are now lane-aware.
For more information, refer to the ROC Debugger User Guide at <https://docs.amd.com>.
##### Enhanced - clone-inferior Command
The clone-inferior command now ensures that the TTY, CMD, ARGS, and AMDGPU PRECISE-MEMORY settings are copied from the original inferior to the new one. All modifications to the environment variables done using the 'set environment' or 'unset environment' commands are also copied to the new inferior.
#### MIOpen Support for RDNA GPUs
This release includes support for AMD Radeon™ Pro W6800, in addition to other bug fixes and performance improvements as listed below:
- MIOpen now supports RDNA GPUs!! (via MIOpen PRs 973, 780, 764, 740, 739, 677, 660, 653, 493, 498)
- Fixed a correctness issue with ImplicitGemm algorithm
- Updated the performance data for new kernel versions
- Improved MIOpen build time by splitting large kernel header files
- Fixed an issue in reduction kernels for padded tensors
- Various other bug fixes and performance improvements
For more information, see <https://docs.amd.com/bundle/MIOpen_gh-pages/page/releasenotes.html>
#### Checkpoint Restore Support With CRIU
The new Checkpoint Restore in Userspace (CRIU) functionality is implemented to support AMD GPU and ROCm applications.
CRIU is a userspace tool to Checkpoint and Restore an application.
CRIU lacked the support for checkpoint restore applications that used device files such as a GPU. With this ROCm release, CRIU is enhanced with a new plugin to support AMD GPUs, which includes:
- Single and Multi GPU systems (Gfx9)
- Checkpoint / Restore on a different system
- Checkpoint / Restore inside a docker container
- PyTorch
- Tensorflow
- Using CRIU Image Streamer
For more information, refer to <https://github.com/checkpoint-restore/criu/tree/criu-dev/plugins/amdgpu>
> **Note**
>
> The CRIU plugin (amdgpu_plugin) is merged upstream with the CRIU repository. The KFD kernel patches are also available upstream with the amd-staging-drm-next branch (public) and the ROCm 5.1 release branch.
> **Note**
>
> This is a Beta release of the Checkpoint and Restore functionality, and some features are not available in this release.
For more information, refer to the following websites:
- <https://github.com/RadeonOpenCompute/criu/blob/amdgpu_plugin-03252022/Documentation/amdgpu_plugin.txt>
- <https://criu.org/Main_Page>
### Fixed Defects
The following defects are fixed in this release.
#### Driver Fails To Load after Installation
The issue with the driver failing to load after ROCm installation is now fixed.
The driver installs successfully, and the server reboots with working rocminfo and clinfo.
#### ROCDebugger Fixed Defects
##### Breakpoints in GPU kernel code Before Kernel Is Loaded
Previously, setting a breakpoint in device code by line number before the device code was loaded into the program resulted in ROCgdb incorrectly moving the breakpoint to the first following line that contains host code.
Now, the breakpoint is left pending. When the GPU kernel gets loaded, the breakpoint resolves to a location in the kernel.
##### Registers Invalidated After Write
Previously, the stale just-written value was presented as a current value.
ROCgdb now invalidates the cached values of registers whose content might differ after being written. For example, registers with read-only bits.
ROCgdb also invalidates all volatile registers when a volatile register is written. For example, writing VCC invalidates the content of STATUS as STATUS.VCCZ may change.
##### Scheduler-locking and GPU Wavefronts
When scheduler-locking is in effect, new wavefronts created by a resumed thread, CPU, or GPU wavefront, are held in the halt state. For example, the "set scheduler-locking" command.
##### ROCDebugger Fails Before Completion of Kernel Execution
It was possible (although erroneous) for a debugger to load GPU code in memory, send it to the device, start executing a kernel on the device, and dispose of the original code before the kernel had finished execution. If a breakpoint was hit after this point, the debugger failed with an internal error while trying to access the debug information.
This issue is now fixed by ensuring that the debugger keeps a local copy of the original code and debug information.
### Known Issues
#### Random Memory Access Fault Errors Observed While Running Math Libraries Unit Tests
**Issue:** Random memory access fault issues are observed while running Math libraries unit tests. This issue is encountered in ROCm v5.0, ROCm v5.0.1, and ROCm v5.0.2.
Note, the faults only occur in the SRIOV environment.
**Workaround:** Use SDMA to update the page table. The Guest set up steps are as follows:
```sh
sudo modprobe amdgpu vm_update_mode=0
```
To verify, use
**Guest:**
```sh
cat /sys/module/amdgpu/parameters/vm_update_mode 0
```
Where expectation is 0.
#### CU Masking Causes Application to Freeze
Using CU Masking results in an application freeze or runs exceptionally slowly. This issue is noticed only in the GFX10 suite of products. Note, this issue is observed only in GFX10 suite of products.
This issue is under active investigation at this time.
#### Failed Checkpoint in Docker Containers
A defect with Ubuntu images kernel-5.13-30-generic and kernel-5.13-35-generic with Overlay FS results in incorrect reporting of the mount ID.
This issue with Ubuntu causes CRIU checkpointing to fail in Docker containers.
As a workaround, use an older version of the kernel. For example, Ubuntu 5.11.0-46-generic.
#### Issue with Restoring Workloads Using Cooperative Groups Feature
Workloads that use the cooperative groups function to ensure all waves can be resident at the same time may fail to restore correctly.
This issue is under investigation and will be fixed in a future release.
#### Radeon Pro V620 and W6800 Workstation GPUs
##### No Support for ROCDebugger on SRIOV
ROCDebugger is not supported in the SRIOV environment on any GPU.
This is a known issue and will be fixed in a future release.
#### Random Error Messages in ROCm SMI for SR-IOV
Random error messages are generated by unsupported functions or commands.
This is a known issue and will be fixed in a future release.
### Library Changes in ROCM 5.1.0
| Library | Version |
|---------|---------|
| hipBLAS | 0.49.0 ⇒ [0.50.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.1.0) |
| hipCUB | 2.10.13 ⇒ [2.11.0](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.1.0) |
| hipFFT | 1.0.4 ⇒ [1.0.7](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.1.0) |
| hipSOLVER | 1.2.0 ⇒ [1.3.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.1.0) |
| hipSPARSE | 2.0.0 ⇒ [2.1.0](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.1.0) |
| rccl | 2.10.3 ⇒ [2.11.4](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.1.0) |
| rocALUTION | 2.0.1 ⇒ [2.0.2](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.1.0) |
| rocBLAS | 2.42.0 ⇒ [2.43.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.1.0) |
| rocFFT | 1.0.13 ⇒ [1.0.16](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.1.0) |
| rocPRIM | 2.10.12 ⇒ [2.10.13](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.1.0) |
| rocRAND | 2.10.12 ⇒ [2.10.13](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.1.0) |
| rocSOLVER | 3.16.0 ⇒ [3.17.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.1.0) |
| rocSPARSE | 2.0.0 ⇒ [2.1.0](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.1.0) |
| rocThrust | 2.13.0 ⇒ [2.14.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.1.0) |
| Tensile | 4.31.0 ⇒ [4.32.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.1.0) |
#### hipBLAS 0.50.0
hipBLAS 0.50.0 for ROCm 5.1.0
##### Added
- Added library version and device information to hipblas-test output
- Added --rocsolver-path command line option to choose path to pre-built rocSOLVER, as
absolute or relative path
- Added --cmake_install command line option to update cmake to minimum version if required
- Added cmake-arg parameter to pass in cmake arguments while building
- Added infrastructure to support readthedocs hipBLAS documentation.
##### Fixed
- Added hipblasVersionMinor define. hipblaseVersionMinor remains defined
for backwards compatibility.
- Doxygen warnings in hipblas.h header file.
##### Changed
- rocblas-path command line option can be specified as either absolute or relative path
- Help message improvements in install.sh and rmake.py
- Updated googletest dependency from 1.10.0 to 1.11.0
#### hipCUB 2.11.0
hipCUB 2.11.0 for ROCm 5.1.0
##### Added
- Device segmented sort
- Warp merge sort, WarpMask and thread sort from cub 1.15.0 supported in hipCUB
- Device three way partition
##### Changed
- Device_scan and device_segmented_scan: inclusive_scan now uses the input-type as accumulator-type, exclusive_scan uses initial-value-type.
- This particularly changes behaviour of small-size input types with large-size output types (e.g. short input, int output).
- And low-res input with high-res output (e.g. float input, double output)
- Block merge sort no longer supports non power of two blocksizes
#### hipFFT 1.0.7
hipFFT 1.0.7 for ROCm 5.1.0
##### Changed
- Use fft_params struct for accuracy and benchmark clients.
#### hipSOLVER 1.3.0
hipSOLVER 1.3.0 for ROCm 5.1.0
##### Added
- Added functions
- gels
- hipsolverSSgels_bufferSize, hipsolverDDgels_bufferSize, hipsolverCCgels_bufferSize, hipsolverZZgels_bufferSize
- hipsolverSSgels, hipsolverDDgels, hipsolverCCgels, hipsolverZZgels
- Added library version and device information to hipsolver-test output.
- Added compatibility API with hipsolverDn prefix.
- Added compatibility-only functions
- gesvdj
- hipsolverDnSgesvdj_bufferSize, hipsolverDnDgesvdj_bufferSize, hipsolverDnCgesvdj_bufferSize, hipsolverDnZgesvdj_bufferSize
- hipsolverDnSgesvdj, hipsolverDnDgesvdj, hipsolverDnCgesvdj, hipsolverDnZgesvdj
- gesvdjBatched
- hipsolverDnSgesvdjBatched_bufferSize, hipsolverDnDgesvdjBatched_bufferSize, hipsolverDnCgesvdjBatched_bufferSize, hipsolverDnZgesvdjBatched_bufferSize
- hipsolverDnSgesvdjBatched, hipsolverDnDgesvdjBatched, hipsolverDnCgesvdjBatched, hipsolverDnZgesvdjBatched
- syevj
- hipsolverDnSsyevj_bufferSize, hipsolverDnDsyevj_bufferSize, hipsolverDnCheevj_bufferSize, hipsolverDnZheevj_bufferSize
- hipsolverDnSsyevj, hipsolverDnDsyevj, hipsolverDnCheevj, hipsolverDnZheevj
- syevjBatched
- hipsolverDnSsyevjBatched_bufferSize, hipsolverDnDsyevjBatched_bufferSize, hipsolverDnCheevjBatched_bufferSize, hipsolverDnZheevjBatched_bufferSize
- hipsolverDnSsyevjBatched, hipsolverDnDsyevjBatched, hipsolverDnCheevjBatched, hipsolverDnZheevjBatched
- sygvj
- hipsolverDnSsygvj_bufferSize, hipsolverDnDsygvj_bufferSize, hipsolverDnChegvj_bufferSize, hipsolverDnZhegvj_bufferSize
- hipsolverDnSsygvj, hipsolverDnDsygvj, hipsolverDnChegvj, hipsolverDnZhegvj
##### Changed
- The rocSOLVER backend now allows hipsolverXXgels and hipsolverXXgesv to be called in-place when B == X.
- The rocSOLVER backend now allows rwork to be passed as a null pointer to hipsolverXgesvd.
##### Fixed
- bufferSize functions will now return HIPSOLVER_STATUS_NOT_INITIALIZED instead of HIPSOLVER_STATUS_INVALID_VALUE when both handle and lwork are null.
- Fixed rare memory allocation failure in syevd/heevd and sygvd/hegvd caused by improper workspace array allocation outside of rocSOLVER.
#### hipSPARSE 2.1.0
hipSPARSE 2.1.0 for ROCm 5.1.0
##### Added
- Added gtsv_interleaved_batch and gpsv_interleaved_batch routines
- Add SpGEMM_reuse
##### Changed
- Changed BUILD_CUDA with USE_CUDA in install script and cmake files
- Update googletest to 11.1
##### Improved
- Fixed a bug in SpMM Alg versioning
##### Known Issues
- none
#### rccl 2.11.4
RCCL 2.11.4 for ROCm 5.1.0
##### Added
- Compatibility with NCCL 2.11.4
##### Known Issues
- Managed memory is not currently supported for clique-based kernels
#### rocALUTION 2.0.2
rocALUTION 2.0.2 for ROCm 5.1.0
##### Added
- Added out-of-place matrix transpose functionality
- Added LocalVector&lt;bool&gt;
#### rocBLAS 2.43.0
rocBLAS 2.43.0 for ROCm 5.1.0
##### Added
- Option to install script for number of jobs to use for rocBLAS and Tensile compilation (-j, --jobs)
- Option to install script to build clients without using any Fortran (--clients_no_fortran)
- rocblas_client_initialize function, to perform rocBLAS initialize for clients(benchmark/test) and report the execution time.
- Added tests for output of reduction functions when given bad input
- Added user specified initialization (rand_int/trig_float/hpl) for initializing matrices and vectors in rocblas-bench
##### Optimizations
- Improved performance of trsm with side == left and n == 1
- Improved perforamnce of trsm with side == left and m &lt;= 32 along with side == right and n &lt;= 32
##### Changed
- For syrkx and trmm internal API use rocblas_stride datatype for offset
- For non-batched and batched gemm_ex functions if the C matrix pointer equals the D matrix pointer (aliased) their respective type and leading dimension arguments must now match
- Test client dependencies updated to GTest 1.11
- non-global false positives reported by cppcheck from file based suppression to inline suppression. File based suppression will only be used for global false positives.
- Help menu messages in install.sh
- For ger function, typecast the &#39;lda&#39;(offset) datatype to size_t during offset calculation to avoid overflow and remove duplicate template functions.
- Modified default initialization from rand_int to hpl for initializing matrices and vectors in rocblas-bench
##### Fixed
- For function trmv (non-transposed cases) avoid overflow in offset calculation
- Fixed cppcheck errors/warnings
- Fixed doxygen warnings
#### rocFFT 1.0.16
rocFFT 1.0.16 for ROCm 5.1.0
##### Changed
- Supported unaligned tile dimension for SBRC_2D kernels.
- Improved (more RAII) test and benchmark infrastructure.
- Enabled runtime compilation of length-2304 FFT kernel during plan creation.
##### Optimizations
- Optimized more large 1D cases by using L1D_CC plan.
- Optimized 3D 200^3 C2R case.
- Optimized 1D 2^30 double precision on MI200.
##### Fixed
- Fixed correctness of some R2C transforms with unusual strides.
##### Removed
- The hipFFT API (header) has been removed from after a long deprecation period. Please use the [hipFFT](https://github.com/ROCmSoftwarePlatform/hipFFT) package/repository to obtain the hipFFT API.
#### rocPRIM 2.10.13
rocPRIM 2.10.13 for ROCm 5.1.0
##### Fixed
- Fixed radix sort int64_t bug introduced in [2.10.11]
##### Added
- Future value
- Added device partition_three_way to partition input to three output iterators based on two predicates
##### Changed
- The reduce/scan algorithm precision issues in the tests has been resolved for half types.
##### Known Issues
- device_segmented_radix_sort unit test failing for HIP on Windows
#### rocRAND 2.10.13
rocRAND 2.10.13 for ROCm 5.1.0
##### Added
- Generating a random sequence different sizes now produces the same sequence without gaps
indepent of how many values are generated per call.
- Only in the case of XORWOW, MRG32K3A, PHILOX4X32_10, SOBOL32 and SOBOL64
- This only holds true if the size in each call is a divisor of the distributions
`output_width` due to performance
- Similarly the output pointer has to be aligned to `output_width * sizeof(output_type)`
##### Changed
- [hipRAND](https://github.com/ROCmSoftwarePlatform/hipRAND.git) split into a separate package
- Header file installation location changed to match other libraries.
- Using the `rocrand.h` header file should now use `#include &lt;rocrand/rocrand.h&gt;`, rather than `#include &lt;rocrand/rocrand.h&gt;`
- rocRAND still includes hipRAND using a submodule
- The rocRAND package also sets the provides field with hipRAND, so projects which require hipRAND can begin to specify it.
##### Fixed
- Fix offset behaviour for XORWOW, MRG32K3A and PHILOX4X32_10 generator, setting offset now
correctly generates the same sequence starting from the offset.
- Only uniform int and float will work as these can be generated with a single call to the generator
##### Known Issues
- kernel_xorwow unit test is failing for certain GPU architectures.
#### rocSOLVER 3.17.0
rocSOLVER 3.17.0 for ROCm 5.1.0
##### Optimized
- Optimized non-pivoting and batch cases of the LU factorization
##### Fixed
- Fixed missing synchronization in SYTRF with `rocblas_fill_lower` that could potentially
result in incorrect pivot values.
- Fixed multi-level logging output to file with the `ROCSOLVER_LOG_PATH`,
`ROCSOLVER_LOG_TRACE_PATH`, `ROCSOLVER_LOG_BENCH_PATH` and `ROCSOLVER_LOG_PROFILE_PATH`
environment variables.
- Fixed performance regression in the batched LU factorization of tiny matrices
#### rocSPARSE 2.1.0
rocSPARSE 2.1.0 for ROCm 5.1.0
##### Added
- gtsv_interleaved_batch
- gpsv_interleaved_batch
- SpGEMM_reuse
- Allow copying of mat info struct
##### Improved
- Optimization for SDDMM
- Allow unsorted matrices in csrgemm multipass algorithm
##### Known Issues
- none
#### rocThrust 2.14.0
rocThrust 2.14.0 for ROCm 5.1.0
##### Added
- Updated to match upstream Thrust 1.15.0
##### Known Issues
- async_copy, partition, and stable_sort_by_key unit tests are failing on HIP on Windows.
#### Tensile 4.32.0
Tensile 4.32.0 for ROCm 5.1.0
##### Added
- Better control of parallelism to control memory usage
- Support for multiprocessing on Windows for TensileCreateLibrary
- New JSD metric and metric selection functionality
- Initial changes to support two-tier solution selection
##### Optimized
- Optimized runtime of TensileCreateLibraries by reducing max RAM usage
- StoreCInUnroll additional optimizations plus adaptive K support
- DGEMM NN optimizations with PrefetchGlobalRead(PGR)=2 support
##### Changed
- Update Googletest to 1.11.0
##### Removed
- Remove no longer supported benchmarking steps
-------------------
## ROCm 5.0.2
<!-- markdownlint-disable first-line-h1 -->
### Fixed Defects
The following defects are fixed in the ROCm v5.0.2 release.
#### Issue with hostcall Facility in HIP Runtime
In ROCm v5.0, when using the “assert()” call in a HIP kernel, the compiler may sometimes fail to emit kernel metadata related to the hostcall facility, which results in incomplete initialization of the hostcall facility in the HIP runtime. This can cause the HIP kernel to crash when it attempts to execute the “assert()” call.
The root cause was an incorrect check in the compiler to determine whether the hostcall facility is required by the kernel. This is fixed in the ROCm v5.0.2 release.
The resolution includes a compiler change, which emits the required metadata by default, unless the compiler can prove that the hostcall facility is not required by the kernel. This ensures that the “assert()” call never fails.
Note:
This fix may lead to breakage in some OpenMP offload use cases, which use print inside a target region and result in an abort in device code. The issue will be fixed in a future release.
Compatibility Matrix Updates to ROCm Deep Learning Guide
The compatibility matrix in the AMD Deep Learning Guide is updated for ROCm v5.0.2.
### Library Changes in ROCM 5.0.2
| Library | Version |
|---------|---------|
| hipBLAS | [0.49.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.0.2) |
| hipCUB | [2.10.13](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.0.2) |
| hipFFT | [1.0.4](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.0.2) |
| hipSOLVER | [1.2.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.0.2) |
| hipSPARSE | [2.0.0](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.0.2) |
| rccl | [2.10.3](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.0.2) |
| rocALUTION | [2.0.1](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.0.2) |
| rocBLAS | [2.42.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.0.2) |
| rocFFT | [1.0.13](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.0.2) |
| rocPRIM | [2.10.12](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.0.2) |
| rocRAND | [2.10.12](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.0.2) |
| rocSOLVER | [3.16.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.0.2) |
| rocSPARSE | [2.0.0](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.0.2) |
| rocThrust | [2.13.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.0.2) |
| Tensile | [4.31.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.0.2) |
-------------------
## ROCm 5.0.1
<!-- markdownlint-disable first-line-h1 -->
### Deprecations and Warnings
#### Refactor of HIPCC/HIPCONFIG
In prior ROCm releases, by default, the hipcc/hipconfig Perl scripts were used to identify and set target compiler options, target platform, compiler, and runtime appropriately.
In ROCm v5.0.1, hipcc.bin and hipconfig.bin have been added as the compiled binary implementations of the hipcc and hipconfig. These new binaries are currently a work-in-progress, considered, and marked as experimental. ROCm plans to fully transition to hipcc.bin and hipconfig.bin in the a future ROCm release. The existing hipcc and hipconfig Perl scripts are renamed to hipcc.pl and hipconfig.pl respectively. New top-level hipcc and hipconfig Perl scripts are created, which can switch between the Perl script or the compiled binary based on the environment variable HIPCC_USE_PERL_SCRIPT.
In ROCm 5.0.1, by default, this environment variable is set to use hipcc and hipconfig through the Perl scripts.
Subsequently, Perl scripts will no longer be available in ROCm in a future release.
### Library Changes in ROCM 5.0.1
| Library | Version |
|---------|---------|
| hipBLAS | [0.49.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.0.1) |
| hipCUB | [2.10.13](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.0.1) |
| hipFFT | [1.0.4](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.0.1) |
| hipSOLVER | [1.2.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.0.1) |
| hipSPARSE | [2.0.0](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.0.1) |
| rccl | [2.10.3](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.0.1) |
| rocALUTION | [2.0.1](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.0.1) |
| rocBLAS | [2.42.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.0.1) |
| rocFFT | [1.0.13](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.0.1) |
| rocPRIM | [2.10.12](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.0.1) |
| rocRAND | [2.10.12](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.0.1) |
| rocSOLVER | [3.16.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.0.1) |
| rocSPARSE | [2.0.0](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.0.1) |
| rocThrust | [2.13.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.0.1) |
| Tensile | [4.31.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.0.1) |
-------------------
## ROCm 5.0.0
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-blanks-blockquote -->

View File

@@ -15,404 +15,219 @@ The release notes for the ROCm platform.
-------------------
## ROCm 5.0.0
## ROCm 5.1.0
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-duplicate-header -->
### What's New in This Release
#### HIP Enhancements
The ROCm v5.0 release consists of the following HIP enhancements.
The ROCm v5.1 release consists of the following HIP enhancements.
##### HIP Installation Guide Updates
The HIP Installation Guide is updated to include building HIP from source on the NVIDIA platform.
The HIP Installation Guide is updated to include installation and building HIP from source on the AMD and NVIDIA platforms.
Refer to the HIP Installation Guide v5.0 for more details.
Refer to the HIP Installation Guide v5.1 for more details.
##### Managed Memory Allocation
##### Support for HIP Graph
Managed memory, including the `__managed__` keyword, is now supported in the HIP combined host/device compilation. Through unified memory allocation, managed memory allows data to be shared and accessible to both the CPU and GPU using a single pointer. The allocation is managed by the AMD GPU driver using the Linux Heterogeneous Memory Management (HMM) mechanism. The user can call managed memory API hipMallocManaged to allocate a large chunk of HMM memory, execute kernels on a device, and fetch data between the host and device as needed.
ROCm v5.1 extends support for HIP Graph.
##### Planned Changes for HIP in Future Releases
###### Separation of hiprtc (libhiprtc) library from hip runtime (amdhip64)
On ROCm/Linux, to maintain backward compatibility, the hipruntime library (amdhip64) will continue to include hiprtc symbols in future releases. The backward compatible support may be discontinued by removing hiprtc symbols from the hipruntime library (amdhip64) in the next major release.
###### hipDeviceProp_t Structure Enhancements
Changes to the hipDeviceProp_t structure in the next major release may result in backward incompatibility. More details on these changes will be provided in subsequent releases.
#### ROCDebugger Enhancements
##### Multi-language Source Level Debugger
The compiler now generates a source-level variable and function argument debug information.
The accuracy is guaranteed if the compiler options `-g -O0` are used and apply only to HIP.
This enhancement enables ROCDebugger users to interact with the HIP source-level variables and function arguments.
> **Note**
>
> In a HIP application, it is recommended to do a capability check before calling the managed memory APIs. For example,
>
> ```cpp
> int managed_memory = 0;
> HIPCHECK(hipDeviceGetAttribute(&managed_memory,
> hipDeviceAttributeManagedMemory,p_gpuDevice));
> if (!managed_memory ) {
> printf ("info: managed memory access not supported on the device %d\n Skipped\n", p_gpuDevice);
> }
> else {
> HIPCHECK(hipSetDevice(p_gpuDevice));
> HIPCHECK(hipMallocManaged(&Hmm, N * sizeof(T)));
> . . .
> }
> ```
> The newly-suggested compiler -g option must be used instead of the previously-suggested `-ggdb` option. Although the effect of these two options is currently equivalent, this is not guaranteed for the future and might get changed by the upstream LLVM community.
##### Machine Interface Lanes Support
ROCDebugger Machine Interface (MI) extends support to lanes. The following enhancements are made:
- Added a new -lane-info command, listing the current thread's lanes.
- The -thread-select command now supports a lane switch to switch to a specific lane of a thread:
```sh
-thread-select -l LANE THREAD
```
- The =thread-selected notification gained a lane-id attribute. This enables the frontend to know which lane of the thread was selected.
- The *stopped asynchronous record gained lane-id and hit-lanes attributes. The former indicates which lane is selected, and the latter indicates which lanes explain the stop.
- MI commands now accept a global --lane option, similar to the global --thread and --frame options.
- MI varobjs are now lane-aware.
For more information, refer to the ROC Debugger User Guide at
{doc}`ROCgdb <rocgdb:index>`.
##### Enhanced - clone-inferior Command
The clone-inferior command now ensures that the TTY, CMD, ARGS, and AMDGPU PRECISE-MEMORY settings are copied from the original inferior to the new one. All modifications to the environment variables done using the 'set environment' or 'unset environment' commands are also copied to the new inferior.
#### MIOpen Support for RDNA GPUs
This release includes support for AMD Radeon™ Pro W6800, in addition to other bug fixes and performance improvements as listed below:
- MIOpen now supports RDNA GPUs!! (via MIOpen PRs 973, 780, 764, 740, 739, 677, 660, 653, 493, 498)
- Fixed a correctness issue with ImplicitGemm algorithm
- Updated the performance data for new kernel versions
- Improved MIOpen build time by splitting large kernel header files
- Fixed an issue in reduction kernels for padded tensors
- Various other bug fixes and performance improvements
For more information, see {doc}`Documentation <miopen:index>`.
#### Checkpoint Restore Support With CRIU
The new Checkpoint Restore in Userspace (CRIU) functionality is implemented to support AMD GPU and ROCm applications.
CRIU is a userspace tool to Checkpoint and Restore an application.
CRIU lacked the support for checkpoint restore applications that used device files such as a GPU. With this ROCm release, CRIU is enhanced with a new plugin to support AMD GPUs, which includes:
- Single and Multi GPU systems (Gfx9)
- Checkpoint / Restore on a different system
- Checkpoint / Restore inside a docker container
- PyTorch
- Tensorflow
- Using CRIU Image Streamer
For more information, refer to <https://github.com/checkpoint-restore/criu/tree/criu-dev/plugins/amdgpu>
> **Note**
>
> The managed memory capability check may not be necessary; however, if HMM is not supported, managed malloc will fall back to using system memory. Other managed memory API calls will, then, have
> The CRIU plugin (amdgpu_plugin) is merged upstream with the CRIU repository. The KFD kernel patches are also available upstream with the amd-staging-drm-next branch (public) and the ROCm 5.1 release branch.
Refer to the HIP API documentation for more details on managed memory APIs.
> **Note**
>
> This is a Beta release of the Checkpoint and Restore functionality, and some features are not available in this release.
For the application, see
For more information, refer to the following websites:
<https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-4.5.x/tests/src/runtimeApi/memory/hipMallocManaged.cpp>
- <https://github.com/RadeonOpenCompute/criu/blob/amdgpu_plugin-03252022/Documentation/amdgpu_plugin.txt>
#### New Environment Variable
- <https://criu.org/Main_Page>
The following new environment variable is added in this release:
### Fixed Defects
| Environment Variable | Value | Description |
|----------------------|-----------------------|-------------|
| HSA_COOP_CU_COUNT | 0 or 1 (default is 0) | Some processors support more CUs than can reliably be used in a cooperative dispatch. Setting the environment variable HSA_COOP_CU_COUNT to 1 will cause ROCr to return the correct CU count for cooperative groups through the HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT attribute of hsa_agent_get_info(). Setting HSA_COOP_CU_COUNT to other values, or leaving it unset, will cause ROCr to return the same CU count for the attributes HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT and HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT. Future ROCm releases will make HSA_COOP_CU_COUNT=1 the default. |
The following defects are fixed in this release.
### Breaking Changes
#### Driver Fails To Load after Installation
#### Runtime Breaking Change
The issue with the driver failing to load after ROCm installation is now fixed.
Re-ordering of the enumerated type in hip_runtime_api.h to better match NV. See below for the difference in enumerated types.
The driver installs successfully, and the server reboots with working rocminfo and clinfo.
ROCm software will be affected if any of the defined enums listed below are used in the code. Applications built with ROCm v5.0 enumerated types will work with a ROCm 4.5.2 driver. However, an undefined behavior error will occur with a ROCm v4.5.2 application that uses these enumerated types with a ROCm 5.0 runtime.
#### ROCDebugger Fixed Defects
```diff
typedef enum hipDeviceAttribute_t {
- hipDeviceAttributeMaxThreadsPerBlock, ///< Maximum number of threads per block.
- hipDeviceAttributeMaxBlockDimX, ///< Maximum x-dimension of a block.
- hipDeviceAttributeMaxBlockDimY, ///< Maximum y-dimension of a block.
- hipDeviceAttributeMaxBlockDimZ, ///< Maximum z-dimension of a block.
- hipDeviceAttributeMaxGridDimX, ///< Maximum x-dimension of a grid.
- hipDeviceAttributeMaxGridDimY, ///< Maximum y-dimension of a grid.
- hipDeviceAttributeMaxGridDimZ, ///< Maximum z-dimension of a grid.
- hipDeviceAttributeMaxSharedMemoryPerBlock, ///< Maximum shared memory available per block in
- ///< bytes.
- hipDeviceAttributeTotalConstantMemory, ///< Constant memory size in bytes.
- hipDeviceAttributeWarpSize, ///< Warp size in threads.
- hipDeviceAttributeMaxRegistersPerBlock, ///< Maximum number of 32-bit registers available to a
- ///< thread block. This number is shared by all thread
- ///< blocks simultaneously resident on a
- ///< multiprocessor.
- hipDeviceAttributeClockRate, ///< Peak clock frequency in kilohertz.
- hipDeviceAttributeMemoryClockRate, ///< Peak memory clock frequency in kilohertz.
- hipDeviceAttributeMemoryBusWidth, ///< Global memory bus width in bits.
- hipDeviceAttributeMultiprocessorCount, ///< Number of multiprocessors on the device.
- hipDeviceAttributeComputeMode, ///< Compute mode that device is currently in.
- hipDeviceAttributeL2CacheSize, ///< Size of L2 cache in bytes. 0 if the device doesn't have L2
- ///< cache.
- hipDeviceAttributeMaxThreadsPerMultiProcessor, ///< Maximum resident threads per
- ///< multiprocessor.
- hipDeviceAttributeComputeCapabilityMajor, ///< Major compute capability version number.
- hipDeviceAttributeComputeCapabilityMinor, ///< Minor compute capability version number.
- hipDeviceAttributeConcurrentKernels, ///< Device can possibly execute multiple kernels
- ///< concurrently.
- hipDeviceAttributePciBusId, ///< PCI Bus ID.
- hipDeviceAttributePciDeviceId, ///< PCI Device ID.
- hipDeviceAttributeMaxSharedMemoryPerMultiprocessor, ///< Maximum Shared Memory Per
- ///< Multiprocessor.
- hipDeviceAttributeIsMultiGpuBoard, ///< Multiple GPU devices.
- hipDeviceAttributeIntegrated, ///< iGPU
- hipDeviceAttributeCooperativeLaunch, ///< Support cooperative launch
- hipDeviceAttributeCooperativeMultiDeviceLaunch, ///< Support cooperative launch on multiple devices
- hipDeviceAttributeMaxTexture1DWidth, ///< Maximum number of elements in 1D images
- hipDeviceAttributeMaxTexture2DWidth, ///< Maximum dimension width of 2D images in image elements
- hipDeviceAttributeMaxTexture2DHeight, ///< Maximum dimension height of 2D images in image elements
- hipDeviceAttributeMaxTexture3DWidth, ///< Maximum dimension width of 3D images in image elements
- hipDeviceAttributeMaxTexture3DHeight, ///< Maximum dimensions height of 3D images in image elements
- hipDeviceAttributeMaxTexture3DDepth, ///< Maximum dimensions depth of 3D images in image elements
+ hipDeviceAttributeCudaCompatibleBegin = 0,
##### Breakpoints in GPU kernel code Before Kernel Is Loaded
- hipDeviceAttributeHdpMemFlushCntl, ///< Address of the HDP_MEM_COHERENCY_FLUSH_CNTL register
- hipDeviceAttributeHdpRegFlushCntl, ///< Address of the HDP_REG_COHERENCY_FLUSH_CNTL register
+ hipDeviceAttributeEccEnabled = hipDeviceAttributeCudaCompatibleBegin, ///< Whether ECC support is enabled.
+ hipDeviceAttributeAccessPolicyMaxWindowSize, ///< Cuda only. The maximum size of the window policy in bytes.
+ hipDeviceAttributeAsyncEngineCount, ///< Cuda only. Asynchronous engines number.
+ hipDeviceAttributeCanMapHostMemory, ///< Whether host memory can be mapped into device address space
+ hipDeviceAttributeCanUseHostPointerForRegisteredMem,///< Cuda only. Device can access host registered memory
+ ///< at the same virtual address as the CPU
+ hipDeviceAttributeClockRate, ///< Peak clock frequency in kilohertz.
+ hipDeviceAttributeComputeMode, ///< Compute mode that device is currently in.
+ hipDeviceAttributeComputePreemptionSupported, ///< Cuda only. Device supports Compute Preemption.
+ hipDeviceAttributeConcurrentKernels, ///< Device can possibly execute multiple kernels concurrently.
+ hipDeviceAttributeConcurrentManagedAccess, ///< Device can coherently access managed memory concurrently with the CPU
+ hipDeviceAttributeCooperativeLaunch, ///< Support cooperative launch
+ hipDeviceAttributeCooperativeMultiDeviceLaunch, ///< Support cooperative launch on multiple devices
+ hipDeviceAttributeDeviceOverlap, ///< Cuda only. Device can concurrently copy memory and execute a kernel.
+ ///< Deprecated. Use instead asyncEngineCount.
+ hipDeviceAttributeDirectManagedMemAccessFromHost, ///< Host can directly access managed memory on
+ ///< the device without migration
+ hipDeviceAttributeGlobalL1CacheSupported, ///< Cuda only. Device supports caching globals in L1
+ hipDeviceAttributeHostNativeAtomicSupported, ///< Cuda only. Link between the device and the host supports native atomic operations
+ hipDeviceAttributeIntegrated, ///< Device is integrated GPU
+ hipDeviceAttributeIsMultiGpuBoard, ///< Multiple GPU devices.
+ hipDeviceAttributeKernelExecTimeout, ///< Run time limit for kernels executed on the device
+ hipDeviceAttributeL2CacheSize, ///< Size of L2 cache in bytes. 0 if the device doesn't have L2 cache.
+ hipDeviceAttributeLocalL1CacheSupported, ///< caching locals in L1 is supported
+ hipDeviceAttributeLuid, ///< Cuda only. 8-byte locally unique identifier in 8 bytes. Undefined on TCC and non-Windows platforms
+ hipDeviceAttributeLuidDeviceNodeMask, ///< Cuda only. Luid device node mask. Undefined on TCC and non-Windows platforms
+ hipDeviceAttributeComputeCapabilityMajor, ///< Major compute capability version number.
+ hipDeviceAttributeManagedMemory, ///< Device supports allocating managed memory on this system
+ hipDeviceAttributeMaxBlocksPerMultiProcessor, ///< Cuda only. Max block size per multiprocessor
+ hipDeviceAttributeMaxBlockDimX, ///< Max block size in width.
+ hipDeviceAttributeMaxBlockDimY, ///< Max block size in height.
+ hipDeviceAttributeMaxBlockDimZ, ///< Max block size in depth.
+ hipDeviceAttributeMaxGridDimX, ///< Max grid size in width.
+ hipDeviceAttributeMaxGridDimY, ///< Max grid size in height.
+ hipDeviceAttributeMaxGridDimZ, ///< Max grid size in depth.
+ hipDeviceAttributeMaxSurface1D, ///< Maximum size of 1D surface.
+ hipDeviceAttributeMaxSurface1DLayered, ///< Cuda only. Maximum dimensions of 1D layered surface.
+ hipDeviceAttributeMaxSurface2D, ///< Maximum dimension (width, height) of 2D surface.
+ hipDeviceAttributeMaxSurface2DLayered, ///< Cuda only. Maximum dimensions of 2D layered surface.
+ hipDeviceAttributeMaxSurface3D, ///< Maximum dimension (width, height, depth) of 3D surface.
+ hipDeviceAttributeMaxSurfaceCubemap, ///< Cuda only. Maximum dimensions of Cubemap surface.
+ hipDeviceAttributeMaxSurfaceCubemapLayered, ///< Cuda only. Maximum dimension of Cubemap layered surface.
+ hipDeviceAttributeMaxTexture1DWidth, ///< Maximum size of 1D texture.
+ hipDeviceAttributeMaxTexture1DLayered, ///< Cuda only. Maximum dimensions of 1D layered texture.
+ hipDeviceAttributeMaxTexture1DLinear, ///< Maximum number of elements allocatable in a 1D linear texture.
+ ///< Use cudaDeviceGetTexture1DLinearMaxWidth() instead on Cuda.
+ hipDeviceAttributeMaxTexture1DMipmap, ///< Cuda only. Maximum size of 1D mipmapped texture.
+ hipDeviceAttributeMaxTexture2DWidth, ///< Maximum dimension width of 2D texture.
+ hipDeviceAttributeMaxTexture2DHeight, ///< Maximum dimension hight of 2D texture.
+ hipDeviceAttributeMaxTexture2DGather, ///< Cuda only. Maximum dimensions of 2D texture if gather operations performed.
+ hipDeviceAttributeMaxTexture2DLayered, ///< Cuda only. Maximum dimensions of 2D layered texture.
+ hipDeviceAttributeMaxTexture2DLinear, ///< Cuda only. Maximum dimensions (width, height, pitch) of 2D textures bound to pitched memory.
+ hipDeviceAttributeMaxTexture2DMipmap, ///< Cuda only. Maximum dimensions of 2D mipmapped texture.
+ hipDeviceAttributeMaxTexture3DWidth, ///< Maximum dimension width of 3D texture.
+ hipDeviceAttributeMaxTexture3DHeight, ///< Maximum dimension height of 3D texture.
+ hipDeviceAttributeMaxTexture3DDepth, ///< Maximum dimension depth of 3D texture.
+ hipDeviceAttributeMaxTexture3DAlt, ///< Cuda only. Maximum dimensions of alternate 3D texture.
+ hipDeviceAttributeMaxTextureCubemap, ///< Cuda only. Maximum dimensions of Cubemap texture
+ hipDeviceAttributeMaxTextureCubemapLayered, ///< Cuda only. Maximum dimensions of Cubemap layered texture.
+ hipDeviceAttributeMaxThreadsDim, ///< Maximum dimension of a block
+ hipDeviceAttributeMaxThreadsPerBlock, ///< Maximum number of threads per block.
+ hipDeviceAttributeMaxThreadsPerMultiProcessor, ///< Maximum resident threads per multiprocessor.
+ hipDeviceAttributeMaxPitch, ///< Maximum pitch in bytes allowed by memory copies
+ hipDeviceAttributeMemoryBusWidth, ///< Global memory bus width in bits.
+ hipDeviceAttributeMemoryClockRate, ///< Peak memory clock frequency in kilohertz.
+ hipDeviceAttributeComputeCapabilityMinor, ///< Minor compute capability version number.
+ hipDeviceAttributeMultiGpuBoardGroupID, ///< Cuda only. Unique ID of device group on the same multi-GPU board
+ hipDeviceAttributeMultiprocessorCount, ///< Number of multiprocessors on the device.
+ hipDeviceAttributeName, ///< Device name.
+ hipDeviceAttributePageableMemoryAccess, ///< Device supports coherently accessing pageable memory
+ ///< without calling hipHostRegister on it
+ hipDeviceAttributePageableMemoryAccessUsesHostPageTables, ///< Device accesses pageable memory via the host's page tables
+ hipDeviceAttributePciBusId, ///< PCI Bus ID.
+ hipDeviceAttributePciDeviceId, ///< PCI Device ID.
+ hipDeviceAttributePciDomainID, ///< PCI Domain ID.
+ hipDeviceAttributePersistingL2CacheMaxSize, ///< Cuda11 only. Maximum l2 persisting lines capacity in bytes
+ hipDeviceAttributeMaxRegistersPerBlock, ///< 32-bit registers available to a thread block. This number is shared
+ ///< by all thread blocks simultaneously resident on a multiprocessor.
+ hipDeviceAttributeMaxRegistersPerMultiprocessor, ///< 32-bit registers available per block.
+ hipDeviceAttributeReservedSharedMemPerBlock, ///< Cuda11 only. Shared memory reserved by CUDA driver per block.
+ hipDeviceAttributeMaxSharedMemoryPerBlock, ///< Maximum shared memory available per block in bytes.
+ hipDeviceAttributeSharedMemPerBlockOptin, ///< Cuda only. Maximum shared memory per block usable by special opt in.
+ hipDeviceAttributeSharedMemPerMultiprocessor, ///< Cuda only. Shared memory available per multiprocessor.
+ hipDeviceAttributeSingleToDoublePrecisionPerfRatio, ///< Cuda only. Performance ratio of single precision to double precision.
+ hipDeviceAttributeStreamPrioritiesSupported, ///< Cuda only. Whether to support stream priorities.
+ hipDeviceAttributeSurfaceAlignment, ///< Cuda only. Alignment requirement for surfaces
+ hipDeviceAttributeTccDriver, ///< Cuda only. Whether device is a Tesla device using TCC driver
+ hipDeviceAttributeTextureAlignment, ///< Alignment requirement for textures
+ hipDeviceAttributeTexturePitchAlignment, ///< Pitch alignment requirement for 2D texture references bound to pitched memory;
+ hipDeviceAttributeTotalConstantMemory, ///< Constant memory size in bytes.
+ hipDeviceAttributeTotalGlobalMem, ///< Global memory available on devicice.
+ hipDeviceAttributeUnifiedAddressing, ///< Cuda only. An unified address space shared with the host.
+ hipDeviceAttributeUuid, ///< Cuda only. Unique ID in 16 byte.
+ hipDeviceAttributeWarpSize, ///< Warp size in threads.
Previously, setting a breakpoint in device code by line number before the device code was loaded into the program resulted in ROCgdb incorrectly moving the breakpoint to the first following line that contains host code.
- hipDeviceAttributeMaxPitch, ///< Maximum pitch in bytes allowed by memory copies
- hipDeviceAttributeTextureAlignment, ///<Alignment requirement for textures
- hipDeviceAttributeTexturePitchAlignment, ///<Pitch alignment requirement for 2D texture references bound to pitched memory;
- hipDeviceAttributeKernelExecTimeout, ///<Run time limit for kernels executed on the device
- hipDeviceAttributeCanMapHostMemory, ///<Device can map host memory into device address space
- hipDeviceAttributeEccEnabled, ///<Device has ECC support enabled
+ hipDeviceAttributeCudaCompatibleEnd = 9999,
+ hipDeviceAttributeAmdSpecificBegin = 10000,
Now, the breakpoint is left pending. When the GPU kernel gets loaded, the breakpoint resolves to a location in the kernel.
- hipDeviceAttributeCooperativeMultiDeviceUnmatchedFunc, ///< Supports cooperative launch on multiple
- ///devices with unmatched functions
- hipDeviceAttributeCooperativeMultiDeviceUnmatchedGridDim, ///< Supports cooperative launch on multiple
- ///devices with unmatched grid dimensions
- hipDeviceAttributeCooperativeMultiDeviceUnmatchedBlockDim, ///< Supports cooperative launch on multiple
- ///devices with unmatched block dimensions
- hipDeviceAttributeCooperativeMultiDeviceUnmatchedSharedMem, ///< Supports cooperative launch on multiple
- ///devices with unmatched shared memories
- hipDeviceAttributeAsicRevision, ///< Revision of the GPU in this device
- hipDeviceAttributeManagedMemory, ///< Device supports allocating managed memory on this system
- hipDeviceAttributeDirectManagedMemAccessFromHost, ///< Host can directly access managed memory on
- /// the device without migration
- hipDeviceAttributeConcurrentManagedAccess, ///< Device can coherently access managed memory
- /// concurrently with the CPU
- hipDeviceAttributePageableMemoryAccess, ///< Device supports coherently accessing pageable memory
- /// without calling hipHostRegister on it
- hipDeviceAttributePageableMemoryAccessUsesHostPageTables, ///< Device accesses pageable memory via
- /// the host's page tables
- hipDeviceAttributeCanUseStreamWaitValue ///< '1' if Device supports hipStreamWaitValue32() and
- ///< hipStreamWaitValue64() , '0' otherwise.
+ hipDeviceAttributeClockInstructionRate = hipDeviceAttributeAmdSpecificBegin, ///< Frequency in khz of the timer used by the device-side "clock*"
+ hipDeviceAttributeArch, ///< Device architecture
+ hipDeviceAttributeMaxSharedMemoryPerMultiprocessor, ///< Maximum Shared Memory PerMultiprocessor.
+ hipDeviceAttributeGcnArch, ///< Device gcn architecture
+ hipDeviceAttributeGcnArchName, ///< Device gcnArch name in 256 bytes
+ hipDeviceAttributeHdpMemFlushCntl, ///< Address of the HDP_MEM_COHERENCY_FLUSH_CNTL register
+ hipDeviceAttributeHdpRegFlushCntl, ///< Address of the HDP_REG_COHERENCY_FLUSH_CNTL register
+ hipDeviceAttributeCooperativeMultiDeviceUnmatchedFunc, ///< Supports cooperative launch on multiple
+ ///< devices with unmatched functions
+ hipDeviceAttributeCooperativeMultiDeviceUnmatchedGridDim, ///< Supports cooperative launch on multiple
+ ///< devices with unmatched grid dimensions
+ hipDeviceAttributeCooperativeMultiDeviceUnmatchedBlockDim, ///< Supports cooperative launch on multiple
+ ///< devices with unmatched block dimensions
+ hipDeviceAttributeCooperativeMultiDeviceUnmatchedSharedMem, ///< Supports cooperative launch on multiple
+ ///< devices with unmatched shared memories
+ hipDeviceAttributeIsLargeBar, ///< Whether it is LargeBar
+ hipDeviceAttributeAsicRevision, ///< Revision of the GPU in this device
+ hipDeviceAttributeCanUseStreamWaitValue, ///< '1' if Device supports hipStreamWaitValue32() and
+ ///< hipStreamWaitValue64() , '0' otherwise.
##### Registers Invalidated After Write
+ hipDeviceAttributeAmdSpecificEnd = 19999,
+ hipDeviceAttributeVendorSpecificBegin = 20000,
+ // Extended attributes for vendors
} hipDeviceAttribute_t;
Previously, the stale just-written value was presented as a current value.
enum hipComputeMode {
```
ROCgdb now invalidates the cached values of registers whose content might differ after being written. For example, registers with read-only bits.
ROCgdb also invalidates all volatile registers when a volatile register is written. For example, writing VCC invalidates the content of STATUS as STATUS.VCCZ may change.
##### Scheduler-locking and GPU Wavefronts
When scheduler-locking is in effect, new wavefronts created by a resumed thread, CPU, or GPU wavefront, are held in the halt state. For example, the "set scheduler-locking" command.
##### ROCDebugger Fails Before Completion of Kernel Execution
It was possible (although erroneous) for a debugger to load GPU code in memory, send it to the device, start executing a kernel on the device, and dispose of the original code before the kernel had finished execution. If a breakpoint was hit after this point, the debugger failed with an internal error while trying to access the debug information.
This issue is now fixed by ensuring that the debugger keeps a local copy of the original code and debug information.
### Known Issues
#### Incorrect dGPU Behavior When Using AMDVBFlash Tool
#### Random Memory Access Fault Errors Observed While Running Math Libraries Unit Tests
The AMDVBFlash tool, used for flashing the VBIOS image to dGPU, does not communicate with the ROM Controller specifically when the driver is present. This is because the driver, as part of its runtime power management feature, puts the dGPU to a sleep state.
**Issue:** Random memory access fault issues are observed while running Math libraries unit tests. This issue is encountered in ROCm v5.0, ROCm v5.0.1, and ROCm v5.0.2.
As a workaround, users can run amdgpu.runpm=0, which temporarily disables the runtime power management feature from the driver and dynamically changes some power control-related sysfs files.
Note, the faults only occur in the SRIOV environment.
#### Issue with START Timestamp in ROCProfiler
**Workaround:** Use SDMA to update the page table. The Guest set up steps are as follows:
Users may encounter an issue with the enabled timestamp functionality for monitoring one or multiple counters. ROCProfiler outputs the following four timestamps for each kernel:
```sh
sudo modprobe amdgpu vm_update_mode=0
```
- Dispatch
To verify, use
- Start
**Guest:**
- End
```sh
cat /sys/module/amdgpu/parameters/vm_update_mode 0
```
- Complete
Where expectation is 0.
##### Issue
#### CU Masking Causes Application to Freeze
This defect is related to the Start timestamp functionality, which incorrectly shows an earlier time than the Dispatch timestamp.
Using CU Masking results in an application freeze or runs exceptionally slowly. This issue is noticed only in the GFX10 suite of products. Note, this issue is observed only in GFX10 suite of products.
To reproduce the issue,
This issue is under active investigation at this time.
1. Enable timing using the --timestamp on flag.
#### Failed Checkpoint in Docker Containers
2. Use the -i option with the input filename that contains the name of the counter(s) to monitor.
A defect with Ubuntu images kernel-5.13-30-generic and kernel-5.13-35-generic with Overlay FS results in incorrect reporting of the mount ID.
3. Run the program.
This issue with Ubuntu causes CRIU checkpointing to fail in Docker containers.
4. Check the output result file.
As a workaround, use an older version of the kernel. For example, Ubuntu 5.11.0-46-generic.
##### Current behavior
#### Issue with Restoring Workloads Using Cooperative Groups Feature
BeginNS is lower than DispatchNS, which is incorrect.
##### Expected behavior
The correct order is:
Dispatch < Start < End < Complete
Users cannot use ROCProfiler to measure the time spent on each kernel because of the incorrect timestamp with counter collection enabled.
##### Recommended Workaround
Users are recommended to collect kernel execution timestamps without monitoring counters, as follows:
1. Enable timing using the --timestamp on flag, and run the application.
2. Rerun the application using the -i option with the input filename that contains the name of the counter(s) to monitor, and save this to a different output file using the -o flag.
3. Check the output result file from step 1.
4. The order of timestamps correctly displays as:
DispathNS < BeginNS < EndNS < CompleteNS
5. Users can find the values of the collected counters in the output file generated in step 2.
Workloads that use the cooperative groups function to ensure all waves can be resident at the same time may fail to restore correctly.
This issue is under investigation and will be fixed in a future release.
#### Radeon Pro V620 and W6800 Workstation GPUs
##### No Support for SMI and ROCDebugger on SRIOV
##### No Support for ROCDebugger on SRIOV
System Management Interface (SMI) and ROCDebugger are not supported in the SRIOV environment on any GPU. For more information, refer to the Systems Management Interface documentation.
ROCDebugger is not supported in the SRIOV environment on any GPU.
### Deprecations and Warnings
This is a known issue and will be fixed in a future release.
#### ROCm Libraries Changes Deprecations and Deprecation Removal
#### Random Error Messages in ROCm SMI for SR-IOV
- The hipFFT.h header is now provided only by the hipFFT package. Up to ROCm 5.0, users would get hipFFT.h in the rocFFT package too.
Random error messages are generated by unsupported functions or commands.
- The GlobalPairwiseAMG class is now entirely removed, users should use the PairwiseAMG class instead.
- The rocsparse_spmm signature in 5.0 was changed to match that of rocsparse_spmm_ex. In 5.0, rocsparse_spmm_ex is still present, but deprecated. Signature diff for rocsparse_spmm
rocsparse_spmm in 5.0
```h
rocsparse_status rocsparse_spmm(rocsparse_handle handle,
rocsparse_operation trans_A,
rocsparse_operation trans_B,
const void* alpha,
const rocsparse_spmat_descr mat_A,
const rocsparse_dnmat_descr mat_B,
const void* beta,
const rocsparse_dnmat_descr mat_C,
rocsparse_datatype compute_type,
rocsparse_spmm_alg alg,
rocsparse_spmm_stage stage,
size_t* buffer_size,
void* temp_buffer);
```
rocSPARSE_spmm in 4.0
```h
rocsparse_status rocsparse_spmm(rocsparse_handle handle,
rocsparse_operation trans_A,
rocsparse_operation trans_B,
const void* alpha,
const rocsparse_spmat_descr mat_A,
const rocsparse_dnmat_descr mat_B,
const void* beta,
const rocsparse_dnmat_descr mat_C,
rocsparse_datatype compute_type,
rocsparse_spmm_alg alg,
size_t* buffer_size,
void* temp_buffer);
```
#### HIP API Deprecations and Warnings
##### Warning - Arithmetic Operators of HIP Complex and Vector Types
In this release, arithmetic operators of HIP complex and vector types are deprecated.
- As alternatives to arithmetic operators of HIP complex types, users can use arithmetic operators of `std::complex` types.
- As alternatives to arithmetic operators of HIP vector types, users can use the operators of the native clang vector type associated with the data member of HIP vector types.
During the deprecation, two macros `_HIP_ENABLE_COMPLEX_OPERATORS` and `_HIP_ENABLE_VECTOR_OPERATORS` are provided to allow users to conditionally enable arithmetic operators of HIP complex or vector types.
Note, the two macros are mutually exclusive and, by default, set to Off.
The arithmetic operators of HIP complex and vector types will be removed in a future release.
Refer to the HIP API Guide for more information.
#### Warning - Compiler-Generated Code Object Version 4 Deprecation
Support for loading compiler-generated code object version 4 will be deprecated in a future release with no release announcement and replaced with code object 5 as the default version.
The current default is code object version 4.
#### Warning - MIOpenTensile Deprecation
MIOpenTensile will be deprecated in a future release.
This is a known issue and will be fixed in a future release.

View File

@@ -18,8 +18,8 @@ shutil.copy2('../CHANGELOG.md','./CHANGELOG.md')
project = "ROCm Documentation"
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved."
version = "5.0.0"
release = "5.0.0"
version = "5.1.0"
release = "5.1.0"
setting_all_article_info = True
all_article_info_os = ["linux"]
@@ -64,7 +64,7 @@ article_pages = [
external_toc_path = "./sphinx/_toc.yml"
docs_core = ROCmDocs("ROCm 5.0.0 Documentation Home")
docs_core = ROCmDocs("ROCm 5.1.0 Documentation Home")
docs_core.setup()
external_projects_current_project = "rocm"

View File

@@ -18,8 +18,8 @@ following commands based on your distribution.
```shell
sudo apt update
wget https://repo.radeon.com/amdgpu-install/21.50/ubuntu/bionic/amdgpu-install_21.50.50000-1_all.deb
sudo apt install ./amdgpu-install_21.50.50000-1_all.deb
wget https://repo.radeon.com/amdgpu-install/22.10/ubuntu/bionic/amdgpu-install_22.10.50100-1_all.deb
sudo apt install ./amdgpu-install_22.10.50100-1_all.deb
```
:::
@@ -28,8 +28,8 @@ sudo apt install ./amdgpu-install_21.50.50000-1_all.deb
```shell
sudo apt update
wget https://repo.radeon.com/amdgpu-install/21.50/ubuntu/focal/amdgpu-install_21.50.50000-1_all.deb
sudo apt install ./amdgpu-install_21.50.50000-1_all.deb
wget https://repo.radeon.com/amdgpu-install/22.10/ubuntu/focal/amdgpu-install_22.10.50100-1_all.deb
sudo apt install ./amdgpu-install_22.10.50100-1_all.deb
```
:::
@@ -44,16 +44,7 @@ sudo apt install ./amdgpu-install_21.50.50000-1_all.deb
:sync: RHEL-7
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/21.50/rhel/7.9/amdgpu-install-21.50.50000-1.el7.noarch.rpm
```
:::
:::{tab-item} RHEL 8.4
:sync: RHEL-8.4
:sync: RHEL-8
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/21.50/rhel/8.4/amdgpu-install-21.50.50000-1.el8.noarch.rpm
sudo yum install https://repo.radeon.com/amdgpu-install/22.20/rhel/7.9/amdgpu-install-22.20.50200-1.el7.noarch.rpm
```
:::
@@ -62,7 +53,7 @@ sudo yum install https://repo.radeon.com/amdgpu-install/21.50/rhel/8.4/amdgpu-in
:sync: RHEL-8
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/21.50/rhel/8.5/amdgpu-install-21.50.50000-1.el8.noarch.rpm
sudo yum install https://repo.radeon.com/amdgpu-install/22.20/rhel/8.5/amdgpu-install-22.20.50200-1.el8.noarch.rpm
```
:::
@@ -76,7 +67,7 @@ sudo yum install https://repo.radeon.com/amdgpu-install/21.50/rhel/8.5/amdgpu-in
:sync: SLES15-SP3
```shell
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/21.50/sle/15/amdgpu-install-21.50.50000-1.noarch.rpm
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/22.10/sle/15/amdgpu-install-22.10.50100-1.noarch.rpm
```
:::
@@ -155,9 +146,9 @@ the installer script will install packages in the single-version layout.
For the multi-version ROCm installation you must use the installer script from
the latest release of ROCm that you wish to install.
**Example:** If you want to install ROCm releases 4.5.2 and 5.0.0
**Example:** If you want to install ROCm releases 5.0.2 and 5.1
simultaneously, you are required to download the installer from the latest ROCm
release v5.0.0.
release v5.1.
### Add Required Repositories
@@ -176,12 +167,10 @@ Run the following commands based on your distribution to add the repositories:
:sync: ubuntu-18.04
```shell
for ver in 5.0; do
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver bionic main" \
| sudo tee /etc/apt/sources.list.d/rocm.list
for ver in 5.0.2; do
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver bionic main" | sudo tee /etc/apt/sources.list.d/rocm.list
done
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt update
```
@@ -190,12 +179,10 @@ sudo apt update
:sync: ubuntu-20.04
```shell
for ver in 5.0; do
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" \
| sudo tee /etc/apt/sources.list.d/rocm.list
for ver in 5.0.2; do
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" | sudo tee /etc/apt/sources.list.d/rocm.list
done
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt update
```
@@ -210,7 +197,7 @@ sudo apt update
:sync: RHEL-7
```shell
for ver in 5.0; do
for ver in 5.0.2; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -229,7 +216,7 @@ sudo yum clean all
:sync: RHEL-8
```shell
for ver in 5.0; do
for ver in 5.0.2; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -243,6 +230,7 @@ done
sudo yum clean all
```
:::
::::
:::::
:::::{tab-item} SUSE Linux Enterprise Server 15
@@ -253,7 +241,7 @@ sudo yum clean all
:sync: SLES15-SP3
```shell
for ver in 5.0; do
for ver in 5.0.2; do
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
name=rocm
baseurl=https://repo.radeon.com/rocm/$ver/sle/15/main/x86_64
@@ -282,12 +270,12 @@ sudo amdgpu-install --usecase=rocm --rocmrelease=<release-number-3>
```
Following are examples of ROCm multi-version installation. The kernel-mode
driver, associated with the ROCm release v5.0, will be installed as its latest
driver, associated with the ROCm release v5.3, will be installed as its latest
release in the list.
```none
sudo amdgpu-install --usecase=rocm --rocmrelease=4.5.2
sudo amdgpu-install --usecase=rocm --rocmrelease=5.0.0
sudo amdgpu-install --usecase=rocm --rocmrelease=5.0.2
sudo amdgpu-install --usecase=rocm --rocmrelease=5.1.0
```
## Additional options

View File

@@ -53,7 +53,18 @@ To add the AMDGPU repository, follow these steps:
```shell
# amdgpu repository for bionic
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/21.50/ubuntu bionic main' \
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/22.20/ubuntu bionic main' \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
:::
:::{tab-item} Ubuntu 20.04
:sync: ubuntu-20.04
```shell
# amdgpu repository for bionic
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/22.10/ubuntu bionic main' \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
@@ -64,7 +75,7 @@ sudo apt update
```shell
# amdgpu repository for focal
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/21.50/ubuntu focal main' \
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/22.10/ubuntu focal main' \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
@@ -91,7 +102,7 @@ To add the ROCm repository, use the following steps:
```shell
# ROCm repositories for bionic
for ver in 5.0; do
for ver in 5.0.2 5.1; do
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver bionic main" \
| sudo tee --append /etc/apt/sources.list.d/rocm.list
done
@@ -106,7 +117,7 @@ sudo apt update
```shell
# ROCm repositories for focal
for ver in 5.0; do
for ver in 5.0.2 5.1; do
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" \
| sudo tee --append /etc/apt/sources.list.d/rocm.list
done
@@ -136,7 +147,7 @@ For a comprehensive list of meta-packages, refer to
- Sample Multi-version installation
```shell
sudo apt install rocm-hip-sdk5.0.0
sudo apt install rocm-hip-sdk5.1.0 rocm-hip-sdk5.0.2
```
:::::
@@ -160,26 +171,7 @@ section.
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/21.50/rhel/7.9/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 8.4
:sync: RHEL-8.4
:sync: RHEL-8
```shell
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/21.50/rhel/8.4/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/22.10/rhel/7.9/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -198,7 +190,7 @@ sudo yum clean all
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/21.50/rhel/8.5/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/22.10/rhel/8.5/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -228,7 +220,7 @@ To add the ROCm repository, use the following steps, based on your distribution:
:sync: RHEL-7
```shell
for ver in 5.0; do
for ver in 5.0.2 5.1; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -247,7 +239,7 @@ sudo yum clean all
:sync: RHEL-8
```shell
for ver in 5.0; do
for ver in 5.0.2 5.1; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -282,7 +274,7 @@ For a comprehensive list of meta-packages, refer to
- Sample Multi-version installation
```shell
sudo yum install rocm-hip-sdk5.0.0
sudo yum install rocm-hip-sdk5.1.0 rocm-hip-sdk5.0.2
```
:::::
@@ -305,7 +297,7 @@ section.
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/21.50/sle/15.3/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/22.10/sle/15.3/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
@@ -330,7 +322,7 @@ sudo reboot
To add the ROCm repository, use the following steps:
```shell
for ver in 5.0; do
for ver in 5.0.2 5.1; do
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -362,7 +354,7 @@ For a comprehensive list of meta-packages, refer to
- Sample Multi-version installation
```shell
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk5.0.0
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk5.1.0 rocm-hip-sdk5.0.2
```
:::::
@@ -399,7 +391,7 @@ but are generally useful. Verification of the install is advised.
2. Add binary paths to the `PATH` environment variable.
```shell
export PATH=$PATH:/opt/rocm-5.0.0/bin:/opt/rocm-5.0.0/opencl/bin
export PATH=$PATH:/opt/rocm-5.1.0/bin:/opt/rocm-5.1.0/opencl/bin
```
```{attention}

View File

@@ -26,7 +26,7 @@ repository to the new release.
```shell
# amdgpu repository for bionic
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/21.50/ubuntu bionic main' \
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/22.10/ubuntu bionic main' \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
@@ -37,11 +37,12 @@ sudo apt update
```shell
# amdgpu repository for focal
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/21.50/ubuntu focal main' \
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/22.10/ubuntu focal main' \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
:::
::::
:::::
:::::{tab-item} Red Hat Enterprise Linux
@@ -56,25 +57,7 @@ sudo apt update
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/21.50/rhel/7.9/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 8.4
:sync: RHEL-8.4
:sync: RHEL-8
```shell
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/21.50/rhel/8.4/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/22.10/rhel/7.9/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -92,7 +75,7 @@ sudo yum clean all
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/21.50/rhel/8.5/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/22.10/rhel/8.5/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -116,7 +99,7 @@ sudo yum clean all
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/21.50/sle/15.3/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/22.10/sle/15.3/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
@@ -179,7 +162,7 @@ repository to the new release.
:sync: ubuntu-18.04
```shell
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.0 bionic main" \
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.1 bionic main" \
| sudo tee /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
@@ -191,7 +174,7 @@ sudo apt update
:sync: ubuntu-20.04
```shell
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.0 focal main" \
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.1 focal main" \
| sudo tee /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
@@ -210,9 +193,9 @@ sudo apt update
```shell
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-5.0]
name=ROCm5.0
baseurl=https://repo.radeon.com/rocm/yum/5.0/main
[ROCm-5.1]
name=ROCm5.1
baseurl=https://repo.radeon.com/rocm/yum/5.1/main
enabled=1
priority=50
gpgcheck=1
@@ -227,9 +210,9 @@ sudo yum clean all
```shell
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-5.0.1]
name=ROCm5.0.1
baseurl=https://repo.radeon.com/rocm/rhel8/5.0/main
[ROCm-5.1]
name=ROCm5.1
baseurl=https://repo.radeon.com/rocm/rhel8/5.1/main
enabled=1
priority=50
gpgcheck=1
@@ -246,10 +229,10 @@ sudo yum clean all
```shell
sudo tee /etc/zypp/repos.d/rocm.repo <<EOF
[ROCm-5.0]
name=ROCm5.0
[ROCm-5.1]
name=ROCm5.1
name=rocm
baseurl=https://repo.radeon.com/rocm/zyp/5.0/main
baseurl=https://repo.radeon.com/rocm/zyp/5.1/main
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key

View File

@@ -18,6 +18,7 @@ Detailed walkthroughs of specific use-cases driven by frameworks using ROCm
acceleration.
- [Implementing Inception V3 on ROCm with PyTorch](machine_learning/pytorch_inception.md)
- [Optimizing Inference with MIGraphX](machine_learning/migraphx_optimization.md)
:::

View File

@@ -10,4 +10,11 @@ A collection of detailed and guided examples for working with Inception V3 with
:::
:::{grid-item-card} Optimizing Inference with MIGraphX
:link: migraphx_optimization
:link-type: doc
Walkthroughs of optimizing inference using MIGraphX.
:::
:::::

View File

@@ -83,6 +83,10 @@ TensorFlow, \[Online image\]. [https://www.tensorflow.org/extras/tensorflow_bran
MAGMA, \[Online image\]. [https://bitbucket.org/icl/magma/src/master/docs/](https://bitbucket.org/icl/magma/src/master/docs/)
Advanced Micro Devices, Inc., \[Online\]. Available: [https://rocmsoftwareplatform.github.io/AMDMIGraphX/doc/html/](https://rocmsoftwareplatform.github.io/AMDMIGraphX/doc/html/)
Advanced Micro Devices, Inc., \[Online\]. Available: [https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/wiki](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/wiki)
Docker, \[Online\]. [https://docs.docker.com/get-started/overview/](https://docs.docker.com/get-started/overview/)
Torchvision, \[Online\]. Available [https://pytorch.org/vision/master/index.html?highlight=torchvision#module-torchvision](https://pytorch.org/vision/master/index.html?highlight=torchvision#module-torchvision)

View File

@@ -425,6 +425,10 @@ TensorFlow, \[Online image\]. [https://www.tensorflow.org/extras/tensorflow_bran
MAGMA, \[Online image\]. [https://bitbucket.org/icl/magma/src/master/docs/](https://bitbucket.org/icl/magma/src/master/docs/)
Advanced Micro Devices, Inc., \[Online\]. Available: [https://rocmsoftwareplatform.github.io/AMDMIGraphX/doc/html/](https://rocmsoftwareplatform.github.io/AMDMIGraphX/doc/html/)
Advanced Micro Devices, Inc., \[Online\]. Available: [https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/wiki](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/wiki)
Docker, \[Online\]. [https://docs.docker.com/get-started/overview/](https://docs.docker.com/get-started/overview/)
Torchvision, \[Online\]. Available [https://pytorch.org/vision/master/index.html?highlight=torchvision#module-torchvision](https://pytorch.org/vision/master/index.html?highlight=torchvision#module-torchvision)

View File

@@ -197,6 +197,10 @@ TensorFlow, \[Online image\]. [https://www.tensorflow.org/extras/tensorflow_bran
MAGMA, \[Online image\]. [https://bitbucket.org/icl/magma/src/master/docs/](https://bitbucket.org/icl/magma/src/master/docs/)
Advanced Micro Devices, Inc., \[Online\]. Available: [https://rocmsoftwareplatform.github.io/AMDMIGraphX/doc/html/](https://rocmsoftwareplatform.github.io/AMDMIGraphX/doc/html/)
Advanced Micro Devices, Inc., \[Online\]. Available: [https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/wiki](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/wiki)
Docker, \[Online\]. [https://docs.docker.com/get-started/overview/](https://docs.docker.com/get-started/overview/)
Torchvision, \[Online\]. Available [https://pytorch.org/vision/master/index.html?highlight=torchvision#module-torchvision](https://pytorch.org/vision/master/index.html?highlight=torchvision#module-torchvision)

View File

@@ -93,6 +93,7 @@ agile, flexible, rapid and secure manner. [more...](rocm)
- [Examples](https://github.com/amd/rocm-examples)
- [ML, DL, and AI](examples/machine_learning/all)
- [](examples/machine_learning/pytorch_inception)
- [](examples/machine_learning/migraphx_optimization)
:::
::::

View File

@@ -17,4 +17,11 @@ Composable Kernel: Performance Portable Programming Model for Machine Learning T
:::
:::{grid-item-card} {doc}`MIGraphX <amdmigraphx:index>`
AMD MIGraphX is AMD's graph inference engine that accelerates machine learning model inference.
- {doc}`Documentation <amdmigraphx:index>`
:::
:::::

View File

@@ -43,6 +43,7 @@ Libraries related to AI.
- {doc}`MIOpen <miopen:index>`
- {doc}`Composable Kernel <composable_kernel:index>`
- {doc}`MIGraphX <amdmigraphx:index>`
:::

View File

@@ -8,12 +8,12 @@ AMD ROCm™ Platform supports the following Linux distributions.
| Distribution |Processor Architectures| Validated Kernel |
|--------------------|-----------------------|--------------------|
| CentOS 8.3 | x86-64 | 4.18 |
| CentOS 8.4 | x86-64 | 4.18 |
| CentOS 7.9 | x86-64 | 3.10 |
| RHEL 8.5, 8.4 | x86-64 | 4.18 |
| RHEL 8.5 | x86-64 | 4.18 |
| RHEL 7.9 | x86-64 | 3.10 |
| SLES 15 SP3 | x86-64 | 5.3.18 |
| Ubuntu 20.04.3 LTS | x86-64 | 5.8 |
| Ubuntu 20.04.4 LTS | x86-64 | 5.13 |
| Ubuntu 18.04.5 LTS | x86-64 | 5.4.0 |
## Virtualization Support

View File

@@ -58,6 +58,7 @@ The table is ordered to follow ROCm's manifest file.
| [rocPRIM](https://github.com/ROCmSoftwarePlatform/rocPRIM/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/LICENSE.txt) |
| [rocWMMA](https://github.com/ROCmSoftwarePlatform/rocWMMA/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/LICENSE.md) |
| [hipfort](https://github.com/ROCmSoftwarePlatform/hipfort/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipfort/blob/master/LICENSE) |
| [AMDMIGraphX](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/) | [MIT](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/LICENSE) |
| [ROCmValidationSuite](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/blob/master/LICENSE) |
| [aomp](https://github.com/ROCm-Developer-Tools/aomp/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/LICENSE) |
| [aomp-extras](https://github.com/ROCm-Developer-Tools/aomp-extras/) | [MIT](https://github.com/ROCm-Developer-Tools/aomp-extras/blob/aomp-dev/LICENSE) |
@@ -120,4 +121,4 @@ following location: `/opt/rocm/share/doc/<component-name>/`
For example, you can fetch the licensing information of the `_amd_comgr_`
component (Code Object Manager) from the `amd_comgr` folder. A file named
`LICENSE.txt` contains the license details at:
`/opt/rocm-5.0.0/share/doc/amd_comgr/LICENSE.txt`
`/opt/rocm-5.1.0/share/doc/amd_comgr/LICENSE.txt`

View File

@@ -147,6 +147,8 @@ subtrees:
url: ${project:miopen}
- title: Composable Kernel
url: ${project:composable_kernel}
- title: MIGraphX - Graph Optimization
url: ${project:amdmigraphx}
- file: reference/computer_vision
subtrees:
- entries:
@@ -221,6 +223,7 @@ subtrees:
subtrees:
- entries:
- file: examples/machine_learning/pytorch_inception
- file: examples/machine_learning/migraphx_optimization
- caption: About
entries:

View File

@@ -224,6 +224,10 @@ ROCm CMake Packages
+-----------+----------+--------------------------------------------------------+
| MIOpen | miopen | ``MIOpen`` |
+-----------+----------+--------------------------------------------------------+
| MIGraphX | migraphx | ``migraphx::migraphx``, ``migraphx::migraphx_c``, |
| | | ``migraphx::migraphx_cpu``, ``migraphx::migraphx_gpu``,|
| | | ``migraphx::migraphx_onnx``, ``migraphx::migraphx_tf`` |
+-----------+----------+--------------------------------------------------------+
Using CMake Presets
===================