mirror of
https://github.com/ROCm/ROCm.git
synced 2026-04-05 03:01:17 -04:00
Merge branch 'docs/5.6.0' into docs/5.5.1
This commit is contained in:
2
.github/workflows/linting.yml
vendored
2
.github/workflows/linting.yml
vendored
@@ -6,11 +6,13 @@ on:
|
||||
- develop
|
||||
- main
|
||||
- 'docs/*'
|
||||
- 'roc**'
|
||||
pull_request:
|
||||
branches:
|
||||
- develop
|
||||
- main
|
||||
- 'docs/*'
|
||||
- 'roc**'
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.ref }}-${{ github.workflow }}
|
||||
|
||||
@@ -1,5 +1,9 @@
|
||||
# isv_deployment_win
|
||||
ABI
|
||||
# file_reorg
|
||||
FHS
|
||||
Filesystem
|
||||
filesystem
|
||||
incrementing
|
||||
rocm
|
||||
# gpu_aware_mpi
|
||||
DMA
|
||||
GDR
|
||||
@@ -15,12 +19,17 @@ PeerDirect
|
||||
RDMA
|
||||
UCX
|
||||
ib_core
|
||||
# isv_deployment_win
|
||||
ABI
|
||||
# linear algebra
|
||||
LAPACK
|
||||
MMA
|
||||
backends
|
||||
cuSOLVER
|
||||
cuSPARSE
|
||||
# openmp
|
||||
ICV
|
||||
Multithreaded
|
||||
# tuning_guides
|
||||
BMC
|
||||
DGEMM
|
||||
@@ -32,3 +41,9 @@ SKU
|
||||
SKUs
|
||||
PowerShell
|
||||
UAC
|
||||
# pytorch_install
|
||||
kdb
|
||||
precompiled
|
||||
# gpu_os_support
|
||||
HWE
|
||||
el
|
||||
|
||||
624
CHANGELOG.md
624
CHANGELOG.md
@@ -15,6 +15,574 @@ The release notes for the ROCm platform.
|
||||
|
||||
-------------------
|
||||
|
||||
## ROCm 5.6.0
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
<!-- markdownlint-disable header-increment -->
|
||||
#### Release Highlights
|
||||
|
||||
ROCm 5.6 consists of several AI software ecosystem improvements to our fast-growing user base. A few examples include:
|
||||
|
||||
- New documentation portal at https://rocm.docs.amd.com
|
||||
- Ongoing software enhancements for LLMs, ensuring full compliance with the HuggingFace unit test suite
|
||||
- OpenAI Triton, CuPy, HIP Graph support, and many other library performance enhancements
|
||||
- Improved ROCm deployment and development tools, including CPU-GPU (rocGDB) debugger, profiler, and docker containers
|
||||
- New pseudorandom generators are available in rocRAND. Added support for half-precision transforms in hipFFT/rocFFT. Added LU refactorization and linear system solver for sparse matrices in rocSOLVER.
|
||||
|
||||
#### OS and GPU Support Changes
|
||||
|
||||
- SLES15 SP5 support was added this release. SLES15 SP3 support was dropped.
|
||||
- AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively referred to as gfx906 GPUs) will be entering the maintenance mode starting Q3 2023. This will be aligned with ROCm 5.7 GA release date.
|
||||
- No new features and performance optimizations will be supported for the gfx906 GPUs beyond ROCm 5.7
|
||||
- Bug fixes / critical security patches will continue to be supported for the gfx906 GPUs till Q2 2024 (End of Maintenance [EOM])(will be aligned with the closest ROCm release)
|
||||
- Bug fixes during the maintenance will be made to the next ROCm point release
|
||||
- Bug fixes will not be back ported to older ROCm releases for this SKU
|
||||
- Distro / Operating system updates will continue as per the ROCm release cadence for gfx906 GPUs till EOM.
|
||||
|
||||
#### AMDSMI CLI 23.0.0.4
|
||||
|
||||
##### Added
|
||||
|
||||
- AMDSMI CLI tool enabled for Linux Bare Metal & Guest
|
||||
|
||||
- Package: amd-smi-lib
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- not all Error Correction Code (ECC) fields are currently supported
|
||||
|
||||
- RHEL 8 & SLES 15 have extra install steps
|
||||
|
||||
#### Kernel Modules (DKMS)
|
||||
|
||||
##### Fixes
|
||||
|
||||
- Stability fix for multi GPU system reproducilble via ROCm_Bandwidth_Test as reported in [Issue 2198](https://github.com/RadeonOpenCompute/ROCm/issues/2198).
|
||||
|
||||
#### HIP 5.6 (For ROCm 5.6)
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Consolidation of hipamd, rocclr and OpenCL projects in clr
|
||||
- Optimized lock for graph global capture mode
|
||||
|
||||
##### Added
|
||||
|
||||
- Added hipRTC support for amd_hip_fp16
|
||||
- Added hipStreamGetDevice implementation to get the device associated with the stream
|
||||
- Added HIP_AD_FORMAT_SIGNED_INT16 in hipArray formats
|
||||
- hipArrayGetInfo for getting information about the specified array
|
||||
- hipArrayGetDescriptor for getting 1D or 2D array descriptor
|
||||
- hipArray3DGetDescriptor to get 3D array descriptor
|
||||
|
||||
##### Changed
|
||||
|
||||
- hipMallocAsync to return success for zero size allocation to match hipMalloc
|
||||
- Separation of hipcc perl binaries from HIP project to hipcc project. hip-devel package depends on newly added hipcc package
|
||||
- Consolidation of hipamd, ROCclr, and OpenCL repositories into a single repository called clr. Instructions are updated to build HIP from sources in the HIP Installation guide
|
||||
- Removed hipBusBandwidth and hipCommander samples from hip-tests
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed regression in hipMemCpyParam3D when offset is applied
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- Limited testing on xnack+ configuration
|
||||
- Multiple HIP tests failures (gpuvm fault or hangs)
|
||||
- hipSetDevice and hipSetDeviceFlags APIs return hipErrorInvalidDevice instead of hipErrorNoDevice, on a system without GPU
|
||||
- Known memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs. Issue will be fixed in a future ROCm release
|
||||
|
||||
##### Upcoming changes in future release
|
||||
|
||||
- Removal of gcnarch from hipDeviceProp_t structure
|
||||
- Addition of new fields in hipDeviceProp_t structure
|
||||
- maxTexture1D
|
||||
- maxTexture2D
|
||||
- maxTexture1DLayered
|
||||
- maxTexture2DLayered
|
||||
- sharedMemPerMultiprocessor
|
||||
- deviceOverlap
|
||||
- asyncEngineCount
|
||||
- surfaceAlignment
|
||||
- unifiedAddressing
|
||||
- computePreemptionSupported
|
||||
- uuid
|
||||
- Removal of deprecated code
|
||||
- hip-hcc codes from hip code tree
|
||||
- Correct hipArray usage in HIP APIs such as hipMemcpyAtoH and hipMemcpyHtoA
|
||||
- HIPMEMCPY_3D fields correction (unsigned int -> size_t)
|
||||
- Renaming of 'memoryType' in hipPointerAttribute_t structure to 'type'
|
||||
|
||||
#### ROCgdb-13 (For ROCm 5.6.0)
|
||||
|
||||
##### Optimized
|
||||
|
||||
- Improved performances when handling the end of a process with a large number of threads.
|
||||
|
||||
Known Issues
|
||||
|
||||
- On certain configurations, ROCgdb can show the following warning message:
|
||||
|
||||
`warning: Probes-based dynamic linker interface failed. Reverting to original interface.`
|
||||
|
||||
This does not affect ROCgdb's functionalities.
|
||||
|
||||
#### ROCprofiler (For ROCm 5.6.0)
|
||||
|
||||
In ROCm 5.6 the `rocprofilerv1` and `rocprofilerv2` include and library files of
|
||||
ROCm 5.5 are split into separate files. The `rocmtools` files that were
|
||||
deprecated in ROCm 5.5 have been removed.
|
||||
|
||||
| ROCm 5.6 | rocprofilerv1 | rocprofilerv2 |
|
||||
|-----------------|-------------------------------------|----------------------------------------|
|
||||
| **Tool script** | `bin/rocprof` | `bin/rocprofv2` |
|
||||
| **API include** | `include/rocprofiler/rocprofiler.h` | `include/rocprofiler/v2/rocprofiler.h` |
|
||||
| **API library** | `lib/librocprofiler.so.1` | `lib/librocprofiler.so.2` |
|
||||
|
||||
The ROCm Profiler Tool that uses `rocprofilerV1` can be invoked using the
|
||||
following command:
|
||||
|
||||
```sh
|
||||
$ rocprof …
|
||||
```
|
||||
|
||||
To write a custom tool based on the `rocprofilerV1` API do the following:
|
||||
|
||||
```C
|
||||
main.c:
|
||||
#include <rocprofiler/rocprofiler.h> // Use the rocprofilerV1 API
|
||||
int main() {
|
||||
// Use the rocprofilerV1 API
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
This can be built in the following manner:
|
||||
|
||||
```sh
|
||||
$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64
|
||||
```
|
||||
|
||||
The resulting `a.out` will depend on
|
||||
`/opt/rocm-5.6.0/lib/librocprofiler64.so.1`.
|
||||
|
||||
The ROCm Profiler that uses `rocprofilerV2` API can be invoked using the
|
||||
following command:
|
||||
|
||||
```sh
|
||||
$ rocprofv2 …
|
||||
```
|
||||
|
||||
To write a custom tool based on the `rocprofilerV2` API do the following:
|
||||
|
||||
```C
|
||||
main.c:
|
||||
#include <rocprofiler/v2/rocprofiler.h> // Use the rocprofilerV2 API
|
||||
int main() {
|
||||
// Use the rocprofilerV2 API
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
This can be built in the following manner:
|
||||
|
||||
```sh
|
||||
$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64-v2
|
||||
```
|
||||
|
||||
The resulting `a.out` will depend on
|
||||
`/opt/rocm-5.6.0/lib/librocprofiler64.so.2`.
|
||||
|
||||
##### Optimized
|
||||
|
||||
- Improved Test Suite
|
||||
|
||||
##### Added
|
||||
|
||||
- 'end_time' need to be disabled in roctx_trace.txt
|
||||
|
||||
##### Fixed
|
||||
|
||||
- rocprof in ROcm/5.4.0 gpu selector broken.
|
||||
- rocprof in ROCm/5.4.1 fails to generate kernel info.
|
||||
- rocprof clobbers LD_PRELOAD.
|
||||
|
||||
### Library Changes in ROCM 5.6.0
|
||||
|
||||
| Library | Version |
|
||||
|---------|---------|
|
||||
| hipBLAS | ⇒ [1.0.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.6.0) |
|
||||
| hipCUB | ⇒ [2.13.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.6.0) |
|
||||
| hipFFT | ⇒ [1.0.12](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.6.0) |
|
||||
| hipSOLVER | ⇒ [1.8.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.6.0) |
|
||||
| hipSPARSE | ⇒ [2.3.6](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.6.0) |
|
||||
| MIOpen | ⇒ [2.19.0](https://github.com/ROCmSoftwarePlatform/MIOpen/releases/tag/rocm-5.6.0) |
|
||||
| rccl | ⇒ [2.15.5](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.6.0) |
|
||||
| rocALUTION | ⇒ [2.1.9](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.6.0) |
|
||||
| rocBLAS | ⇒ [3.0.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.6.0) |
|
||||
| rocFFT | ⇒ [1.0.23](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.6.0) |
|
||||
| rocm-cmake | ⇒ [0.9.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.6.0) |
|
||||
| rocPRIM | ⇒ [2.13.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.6.0) |
|
||||
| rocRAND | ⇒ [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.6.0) |
|
||||
| rocSOLVER | ⇒ [3.22.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.6.0) |
|
||||
| rocSPARSE | ⇒ [2.5.2](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.6.0) |
|
||||
| rocThrust | ⇒ [2.18.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.6.0) |
|
||||
| rocWMMA | ⇒ [1.1.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.6.0) |
|
||||
| Tensile | ⇒ [4.37.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.6.0) |
|
||||
|
||||
#### hipBLAS 1.0.0
|
||||
|
||||
hipBLAS 1.0.0 for ROCm 5.6.0
|
||||
|
||||
##### Changed
|
||||
|
||||
- added const qualifier to hipBLAS functions (swap, sbmv, spmv, symv, trsm) where missing
|
||||
|
||||
##### Removed
|
||||
|
||||
- removed support for deprecated hipblasInt8Datatype_t enum
|
||||
- removed support for deprecated hipblasSetInt8Datatype and hipblasGetInt8Datatype functions
|
||||
|
||||
##### Deprecated
|
||||
|
||||
- in-place trmm is deprecated. It will be replaced by trmm which includes both in-place and
|
||||
out-of-place functionality
|
||||
|
||||
#### hipCUB 2.13.1
|
||||
|
||||
hipCUB 2.13.1 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Benchmarks for `BlockShuffle`, `BlockLoad`, and `BlockStore`.
|
||||
|
||||
##### Changed
|
||||
|
||||
- CUB backend references CUB and Thrust version 1.17.2.
|
||||
- Improved benchmark coverage of `BlockScan` by adding `ExclusiveScan`, benchmark coverage of `BlockRadixSort` by adding `SortBlockedToStriped`, and benchmark coverage of `WarpScan` by adding `Broadcast`.
|
||||
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- `BlockRadixRankMatch` is currently broken under the rocPRIM backend.
|
||||
- `BlockRadixRankMatch` with a warp size that does not exactly divide the block size is broken under the CUB backend.
|
||||
|
||||
#### hipFFT 1.0.12
|
||||
|
||||
hipFFT 1.0.12 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Implemented the hipfftXtMakePlanMany, hipfftXtGetSizeMany, hipfftXtExec APIs, to allow requesting half-precision transforms.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform.
|
||||
|
||||
#### hipSOLVER 1.8.0
|
||||
|
||||
hipSOLVER 1.8.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added compatibility API with hipsolverRf prefix
|
||||
|
||||
#### hipSPARSE 2.3.6
|
||||
|
||||
hipSPARSE 2.3.6 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added SpGEMM algorithms
|
||||
|
||||
##### Changed
|
||||
|
||||
- For hipsparseXbsr2csr and hipsparseXcsr2bsr, blockDim == 0 now returns HIPSPARSE_STATUS_INVALID_SIZE
|
||||
|
||||
#### MIOpen 2.19.0
|
||||
|
||||
MIOpen 2.19.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- ROCm 5.5 support for gfx1101 (Navi32)
|
||||
|
||||
##### Changed
|
||||
|
||||
- Tuning results for MLIR on ROCm 5.5
|
||||
- Bumping MLIR commit to 5.5.0 release tag
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fix 3d convolution Host API bug
|
||||
- [HOTFIX][MI200][FP16] Disabled ConvHipImplicitGemmBwdXdlops when FP16_ALT is required.
|
||||
|
||||
#### rccl 2.15.5
|
||||
|
||||
RCCL 2.15.5 for ROCm 5.6.0
|
||||
|
||||
##### Changed
|
||||
|
||||
- Compatibility with NCCL 2.15.5
|
||||
- Unit test executable renamed to rccl-UnitTests
|
||||
|
||||
##### Added
|
||||
|
||||
- HW-topology aware binary tree implementation
|
||||
- Experimental support for MSCCL
|
||||
- New unit tests for hipGraph support
|
||||
- NPKit integration
|
||||
|
||||
##### Fixed
|
||||
|
||||
- rocm-smi ID conversion
|
||||
- Support for HIP_VISIBLE_DEVICES for unit tests
|
||||
- Support for p2p transfers to non (HIP) visible devices
|
||||
|
||||
##### Removed
|
||||
|
||||
- Removed TransferBench from tools. Exists in standalone repo: https://github.com/ROCmSoftwarePlatform/TransferBench
|
||||
|
||||
#### rocALUTION 2.1.9
|
||||
|
||||
rocALUTION 2.1.9 for ROCm 5.6.0
|
||||
|
||||
##### Improved
|
||||
|
||||
- Fixed synchronization issues in level 1 routines
|
||||
|
||||
#### rocBLAS 3.0.0
|
||||
|
||||
rocBLAS 3.0.0 for ROCm 5.6.0
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Improved performance of Level 2 rocBLAS GEMV on gfx90a GPU for non-transposed problems having small matrices and larger batch counts. Performance enhanced for problem sizes when m and n <= 32 and batch_count >= 256.
|
||||
- Improved performance of rocBLAS syr2k for single, double, and double-complex precision, and her2k for double-complex precision. Slightly improved performance for general sizes on gfx90a.
|
||||
|
||||
##### Added
|
||||
|
||||
- Added bf16 inputs and f32 compute support to Level 1 rocBLAS Extension functions axpy_ex, scal_ex and nrm2_ex.
|
||||
|
||||
##### Deprecated
|
||||
|
||||
- trmm inplace is deprecated. It will be replaced by trmm that has both inplace and out-of-place functionality
|
||||
- rocblas_query_int8_layout_flag() is deprecated and will be removed in a future release
|
||||
- rocblas_gemm_flags_pack_int8x4 enum is deprecated and will be removed in a future release
|
||||
- rocblas_set_device_memory_size() is deprecated and will be replaced by a future function rocblas_increase_device_memory_size()
|
||||
- rocblas_is_user_managing_device_memory() is deprecated and will be removed in a future release
|
||||
|
||||
##### Removed
|
||||
|
||||
- is_complex helper was deprecated and now removed. Use rocblas_is_complex instead.
|
||||
- The enum truncate_t and the value truncate was deprecated and now removed from. It was replaced by rocblas_truncate_t and rocblas_truncate, respectively.
|
||||
- rocblas_set_int8_type_for_hipblas was deprecated and is now removed.
|
||||
- rocblas_get_int8_type_for_hipblas was deprecated and is now removed.
|
||||
|
||||
##### Dependencies
|
||||
|
||||
- build only dependency on python joblib added as used by Tensile build
|
||||
- fix for cmake install on some OS when performed by install.sh -d --cmake_install
|
||||
|
||||
##### Fixed
|
||||
|
||||
- make trsm offset calculations 64 bit safe
|
||||
|
||||
##### Changed
|
||||
|
||||
- refactor rotg test code
|
||||
|
||||
#### rocFFT 1.0.23
|
||||
|
||||
rocFFT 1.0.23 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Implemented half-precision transforms, which can be requested by passing rocfft_precision_half to rocfft_plan_create.
|
||||
- Implemented a hierarchical solution map which saves how to decompose a problem and the kernels to be used.
|
||||
- Implemented a first version of offline-tuner to support tuning kernels for C2C/Z2Z problems.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Replaced std::complex with hipComplex data types for data generator.
|
||||
- FFT plan dimensions are now sorted to be row-major internally where possible, which produces better plans if the dimensions were accidentally specified in a different order (column-major, for example).
|
||||
- Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform.
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed over-allocation of LDS in some real-complex kernels, which was resulting in kernel launch failure.
|
||||
|
||||
#### rocm-cmake 0.9.0
|
||||
|
||||
rocm-cmake 0.9.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added the option ROCM_HEADER_WRAPPER_WERROR
|
||||
- Compile-time C macro in the wrapper headers causes errors to be emitted instead of warnings.
|
||||
- Configure-time CMake option sets the default for the C macro.
|
||||
|
||||
#### rocPRIM 2.13.0
|
||||
|
||||
rocPRIM 2.13.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- New block level `radix_rank` primitive.
|
||||
- New block level `radix_rank_match` primitive.
|
||||
- Added a stable block sorting implementation. This be used with `block_sort` by using the `block_sort_algorithm::stable_merge_sort` algorithm.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Improved the performance of `block_radix_sort` and `device_radix_sort`.
|
||||
- Improved the performance of `device_merge_sort`.
|
||||
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core). Contributed by: [v01dXYZ](https://github.com/v01dXYZ).
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- Disabled GPU error messages relating to incorrect warp operation usage with Navi GPUs on Windows, due to GPU printf performance issues on Windows.
|
||||
- When `ROCPRIM_DISABLE_LOOKBACK_SCAN` is set, `device_scan` fails for input sizes bigger than `scan_config::size_limit`, which defaults to `std::numeric_limits<unsigned int>::max()`.
|
||||
|
||||
#### rocRAND 2.10.17
|
||||
|
||||
rocRAND 2.10.17 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- MT19937 pseudo random number generator based on M. Matsumoto and T. Nishimura, 1998, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator.
|
||||
- New benchmark for the device API using Google Benchmark, `benchmark_rocrand_device_api`, replacing `benchmark_rocrand_kernel`. `benchmark_rocrand_kernel` is deprecated and will be removed in a future version. Likewise, `benchmark_curand_host_api` is added to replace `benchmark_curand_generate` and `benchmark_curand_device_api` is added to replace `benchmark_curand_kernel`.
|
||||
- experimental HIP-CPU feature
|
||||
- ThreeFry pseudorandom number generator based on Salmon et al., 2011, "Parallel random numbers: as easy as 1, 2, 3".
|
||||
|
||||
##### Changed
|
||||
|
||||
- Python 2.7 is no longer officially supported.
|
||||
|
||||
#### rocSOLVER 3.22.0
|
||||
|
||||
rocSOLVER 3.22.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- LU refactorization for sparse matrices
|
||||
- CSRRF_ANALYSIS
|
||||
- CSRRF_SUMLU
|
||||
- CSRRF_SPLITLU
|
||||
- CSRRF_REFACTLU
|
||||
- Linear system solver for sparse matrices
|
||||
- CSRRF_SOLVE
|
||||
- Added type `rocsolver_rfinfo` for use with sparse matrix routines
|
||||
|
||||
##### Optimized
|
||||
|
||||
- Improved the performance of BDSQR and GESVD when singular vectors are requested
|
||||
|
||||
##### Fixed
|
||||
|
||||
- BDSQR and GESVD should no longer hang when the input contains `NaN` or `Inf`
|
||||
|
||||
#### rocSPARSE 2.5.2
|
||||
|
||||
rocSPARSE 2.5.2 for ROCm 5.6.0
|
||||
|
||||
##### Improved
|
||||
|
||||
- Fixed a memory leak in csritsv
|
||||
- Fixed a bug in csrsm and bsrsm
|
||||
|
||||
#### rocThrust 2.18.0
|
||||
|
||||
rocThrust 2.18.0 for ROCm 5.6.0
|
||||
|
||||
##### Fixed
|
||||
|
||||
- `lower_bound`, `upper_bound`, and `binary_search` failed to compile for certain types.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
|
||||
|
||||
#### rocWMMA 1.1.0
|
||||
|
||||
rocWMMA 1.1.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added cross-lane operation backends (Blend, Permute, Swizzle and Dpp)
|
||||
- Added GPU kernels for rocWMMA unit test pre-process and post-process operations (fill, validation)
|
||||
- Added performance gemm samples for half, single and double precision
|
||||
- Added rocWMMA cmake versioning
|
||||
- Added vectorized support in coordinate transforms
|
||||
- Included ROCm smi for runtime clock rate detection
|
||||
- Added fragment transforms for transpose and change data layout
|
||||
|
||||
##### Changed
|
||||
|
||||
- Default to GPU rocBLAS validation against rocWMMA
|
||||
- Re-enabled int8 gemm tests on gfx9
|
||||
- Upgraded to C++17
|
||||
- Restructured unit test folder for consistency
|
||||
- Consolidated rocWMMA samples common code
|
||||
|
||||
#### Tensile 4.37.0
|
||||
|
||||
Tensile 4.37.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added user driven tuning API
|
||||
- Added decision tree fallback feature
|
||||
- Added SingleBuffer + AtomicAdd option for GlobalSplitU
|
||||
- DirectToVgpr support for fp16 and Int8 with TN orientation
|
||||
- Added new test cases for various functions
|
||||
- Added SingleBuffer algorithm for ZGEMM/CGEMM
|
||||
- Added joblib for parallel map calls
|
||||
- Added support for MFMA + LocalSplitU + DirectToVgprA+B
|
||||
- Added asmcap check for MIArchVgpr
|
||||
- Added support for MFMA + LocalSplitU
|
||||
- Added frequency, power, and temperature data to the output
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Improved the performance of GlobalSplitU with SingleBuffer algorithm
|
||||
- Reduced the running time of the extended and pre_checkin tests
|
||||
- Optimized the Tailloop section of the assembly kernel
|
||||
- Optimized complex GEMM (fixed vgpr allocation, unified CGEMM and ZGEMM code in MulMIoutAlphaToArch)
|
||||
- Improved the performance of the second kernel of MultipleBuffer algorithm
|
||||
|
||||
##### Changed
|
||||
|
||||
- Updated custom kernels with 64-bit offsets
|
||||
- Adapted 64-bit offset arguments for assembly kernels
|
||||
- Improved temporary register re-use to reduce max sgpr usage
|
||||
- Removed some restrictions on VectorWidth and DirectToVgpr
|
||||
- Updated the dependency requirements for Tensile
|
||||
- Changed the range of AssertSummationElementMultiple
|
||||
- Modified the error messages for more clarity
|
||||
- Changed DivideAndReminder to vectorStaticRemainder in case quotient is not used
|
||||
- Removed dummy vgpr for vectorStaticRemainder
|
||||
- Removed tmpVgpr parameter from vectorStaticRemainder/Divide/DivideAndReminder
|
||||
- Removed qReg parameter from vectorStaticRemainder
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed tmp sgpr allocation to avoid over-writing values (alpha)
|
||||
- 64-bit offset parameters for post kernels
|
||||
- Fixed gfx908 CI test failures
|
||||
- Fixed offset calculation to prevent overflow for large offsets
|
||||
- Fixed issues when BufferLoad and BufferStore are equal to zero
|
||||
- Fixed StoreCInUnroll + DirectToVgpr + no useInitAccVgprOpt mismatch
|
||||
- Fixed DirectToVgpr + LocalSplitU + FractionalLoad mismatch
|
||||
- Fixed the memory access error related to StaggerU + large stride
|
||||
- Fixed ZGEMM 4x4 MatrixInst mismatch
|
||||
- Fixed DGEMM 4x4 MatrixInst mismatch
|
||||
- Fixed ASEM + GSU + NoTailLoop opt mismatch
|
||||
- Fixed AssertSummationElementMultiple + GlobalSplitU issues
|
||||
- Fixed ASEM + GSU + TailLoop inner unroll
|
||||
|
||||
-------------------
|
||||
|
||||
## ROCm 5.5.1
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
@@ -50,10 +618,12 @@ The following HIP API is updated in the ROCm 5.5.1 release:
|
||||
| hipFFT | [1.0.11](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.5.1) |
|
||||
| hipSOLVER | [1.7.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.5.1) |
|
||||
| hipSPARSE | [2.3.5](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.5.1) |
|
||||
| MIOpen | [2.19.0](https://github.com/ROCmSoftwarePlatform/MIOpen/releases/tag/rocm-5.5.1) |
|
||||
| rccl | [2.15.5](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.5.1) |
|
||||
| rocALUTION | [2.1.8](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.5.1) |
|
||||
| rocBLAS | [2.47.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.5.1) |
|
||||
| rocFFT | [1.0.22](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.5.1) |
|
||||
| rocm-cmake | [0.8.1](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.5.1) |
|
||||
| rocPRIM | [2.13.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.5.1) |
|
||||
| rocRAND | [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.5.1) |
|
||||
| rocSOLVER | [3.21.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.5.1) |
|
||||
@@ -384,10 +954,12 @@ Multiple HIP directed tests fail.
|
||||
| hipFFT | 1.0.10 ⇒ [1.0.11](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.5.0) |
|
||||
| hipSOLVER | 1.6.0 ⇒ [1.7.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.5.0) |
|
||||
| hipSPARSE | 2.3.3 ⇒ [2.3.5](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.5.0) |
|
||||
| MIOpen | ⇒ [2.19.0](https://github.com/ROCmSoftwarePlatform/MIOpen/releases/tag/rocm-5.5.0) |
|
||||
| rccl | 2.13.4 ⇒ [2.15.5](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.5.0) |
|
||||
| rocALUTION | 2.1.3 ⇒ [2.1.8](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.5.0) |
|
||||
| rocBLAS | 2.46.0 ⇒ [2.47.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.5.0) |
|
||||
| rocFFT | 1.0.21 ⇒ [1.0.22](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.5.0) |
|
||||
| rocm-cmake | 0.8.0 ⇒ [0.8.1](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.5.0) |
|
||||
| rocPRIM | 2.12.0 ⇒ [2.13.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.5.0) |
|
||||
| rocRAND | 2.10.16 ⇒ [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.5.0) |
|
||||
| rocSOLVER | 3.20.0 ⇒ [3.21.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.5.0) |
|
||||
@@ -477,6 +1049,24 @@ hipSPARSE 2.3.5 for ROCm 5.5.0
|
||||
- Improved documentation
|
||||
- Fixed a bug with deprecation messages when using gcc9 (Thanks @Maetveis)
|
||||
|
||||
#### MIOpen 2.19.0
|
||||
|
||||
MIOpen 2.19.0 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- ROCm 5.5 support for gfx1101 (Navi32)
|
||||
|
||||
##### Changed
|
||||
|
||||
- Tuning results for MLIR on ROCm 5.5
|
||||
- Bumping MLIR commit to 5.5.0 release tag
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fix 3d convolution Host API bug
|
||||
- [HOTFIX][MI200][FP16] Disabled ConvHipImplicitGemmBwdXdlops when FP16_ALT is required.
|
||||
|
||||
#### rccl 2.15.5
|
||||
|
||||
RCCL 2.15.5 for ROCm 5.5.0
|
||||
@@ -588,6 +1178,18 @@ rocFFT 1.0.22 for ROCm 5.5.0
|
||||
- Removed zero-length twiddle table allocations, which fixes errors from hipMallocManaged.
|
||||
- Fixed incorrect freeing of HIP stream handles during twiddle computation when multiple devices are present.
|
||||
|
||||
#### rocm-cmake 0.8.1
|
||||
|
||||
rocm-cmake 0.8.1 for ROCm 5.5.0
|
||||
|
||||
##### Fixed
|
||||
|
||||
- ROCMInstallTargets: Added compatibility symlinks for included cmake files in `<ROCM>/lib/cmake/<PACKAGE>`.
|
||||
|
||||
##### Changed
|
||||
|
||||
- ROCMHeaderWrapper: The wrapper header deprecation message is now a deprecation warning.
|
||||
|
||||
#### rocPRIM 2.13.0
|
||||
|
||||
rocPRIM 2.13.0 for ROCm 5.5.0
|
||||
@@ -903,6 +1505,7 @@ This issue is under investigation, and the known workaround is not to use -save-
|
||||
| rocALUTION | [2.1.3](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.4.3) |
|
||||
| rocBLAS | [2.46.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.4.3) |
|
||||
| rocFFT | 1.0.20 ⇒ [1.0.21](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.4.3) |
|
||||
| rocm-cmake | [0.8.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.4.3) |
|
||||
| rocPRIM | [2.12.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.4.3) |
|
||||
| rocRAND | [2.10.16](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.4.3) |
|
||||
| rocSOLVER | [3.20.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.4.3) |
|
||||
@@ -966,6 +1569,7 @@ This is a known issue and will be fixed in a future release.
|
||||
| rocALUTION | [2.1.3](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.4.2) |
|
||||
| rocBLAS | [2.46.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.4.2) |
|
||||
| rocFFT | [1.0.20](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.4.2) |
|
||||
| rocm-cmake | [0.8.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.4.2) |
|
||||
| rocPRIM | [2.12.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.4.2) |
|
||||
| rocRAND | [2.10.16](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.4.2) |
|
||||
| rocSOLVER | [3.20.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.4.2) |
|
||||
@@ -1055,6 +1659,7 @@ Maintenance update #3, combined with ROCm 5.4.1, now provides SRIOV virtualizati
|
||||
| rocALUTION | [2.1.3](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.4.1) |
|
||||
| rocBLAS | [2.46.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.4.1) |
|
||||
| rocFFT | 1.0.19 ⇒ [1.0.20](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.4.1) |
|
||||
| rocm-cmake | [0.8.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.4.1) |
|
||||
| rocPRIM | [2.12.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.4.1) |
|
||||
| rocRAND | [2.10.16](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.4.1) |
|
||||
| rocSOLVER | [3.20.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.4.1) |
|
||||
@@ -1315,6 +1920,7 @@ GPU IDs reported by ROCTracer and ROCProfiler or ROCm Tools are HSA Driver Node
|
||||
| rocALUTION | 2.1.0 ⇒ [2.1.3](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.4.0) |
|
||||
| rocBLAS | 2.45.0 ⇒ [2.46.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.4.0) |
|
||||
| rocFFT | 1.0.18 ⇒ [1.0.19](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.4.0) |
|
||||
| rocm-cmake | [0.8.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.4.0) |
|
||||
| rocPRIM | 2.11.0 ⇒ [2.12.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.4.0) |
|
||||
| rocRAND | 2.10.15 ⇒ [2.10.16](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.4.0) |
|
||||
| rocSOLVER | 3.19.0 ⇒ [3.20.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.4.0) |
|
||||
@@ -1642,6 +2248,7 @@ This issue is resolved with the following fixes to compilation failures:
|
||||
| rocALUTION | [2.1.0](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.3.3) |
|
||||
| rocBLAS | [2.45.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.3.3) |
|
||||
| rocFFT | [1.0.18](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.3.3) |
|
||||
| rocm-cmake | [0.8.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.3.3) |
|
||||
| rocPRIM | [2.11.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.3.3) |
|
||||
| rocRAND | [2.10.15](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.3.3) |
|
||||
| rocSOLVER | [3.19.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.3.3) |
|
||||
@@ -1711,6 +2318,7 @@ This issue is currently under investigation and will be resolved in a future rel
|
||||
| rocALUTION | [2.1.0](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.3.2) |
|
||||
| rocBLAS | [2.45.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.3.2) |
|
||||
| rocFFT | [1.0.18](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.3.2) |
|
||||
| rocm-cmake | [0.8.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.3.2) |
|
||||
| rocPRIM | [2.11.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.3.2) |
|
||||
| rocRAND | [2.10.15](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.3.2) |
|
||||
| rocSOLVER | [3.19.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.3.2) |
|
||||
@@ -1894,6 +2502,7 @@ Workaround: To avoid the system crash, add `amd_iommu=on iommu=pt` as the kernel
|
||||
| rocALUTION | 2.0.3 ⇒ [2.1.0](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.3.0) |
|
||||
| rocBLAS | 2.44.0 ⇒ [2.45.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.3.0) |
|
||||
| rocFFT | 1.0.17 ⇒ [1.0.18](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.3.0) |
|
||||
| rocm-cmake | ⇒ [0.8.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.3.0) |
|
||||
| rocPRIM | 2.10.14 ⇒ [2.11.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.3.0) |
|
||||
| rocRAND | 2.10.14 ⇒ [2.10.15](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.3.0) |
|
||||
| rocSOLVER | 3.18.0 ⇒ [3.19.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.3.0) |
|
||||
@@ -2076,6 +2685,21 @@ rocFFT 1.0.18 for ROCm 5.3.0
|
||||
An example is 98^3 R2C out-of-place.
|
||||
- Fixed bugs in SBRC_ERC type.
|
||||
|
||||
#### rocm-cmake 0.8.0
|
||||
|
||||
rocm-cmake 0.8.0 for ROCm 5.3.0
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed error in prerm scripts created by `rocm_create_package` that could break uninstall for packages using the `PTH` option.
|
||||
|
||||
##### Changed
|
||||
|
||||
- `ROCM_USE_DEV_COMPONENT` set to on by default for all platforms. This means that Windows will now generate runtime and devel packages by default
|
||||
- ROCMInstallTargets now defaults `CMAKE_INSTALL_LIBDIR` to `lib` if not otherwise specified.
|
||||
- Changed default Debian compression type to xz and enabled multi-threaded package compression.
|
||||
- `rocm_create_package` will no longer warn upon failure to determine version of program rpmbuild.
|
||||
|
||||
#### rocPRIM 2.11.0
|
||||
|
||||
rocPRIM 2.11.0 for ROCm 5.3.0
|
||||
|
||||
587
RELEASE.md
587
RELEASE.md
@@ -15,53 +15,568 @@ The release notes for the ROCm platform.
|
||||
|
||||
-------------------
|
||||
|
||||
## ROCm 5.5.1
|
||||
## ROCm 5.6.0
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
### What's New in This Release
|
||||
<!-- markdownlint-disable header-increment -->
|
||||
#### Release Highlights
|
||||
|
||||
#### HIP SDK for Windows
|
||||
ROCm 5.6 consists of several AI software ecosystem improvements to our fast-growing user base. A few examples include:
|
||||
|
||||
AMD is pleased to announce the availability of the HIP SDK for Windows as part
|
||||
of the ROCm platform. The
|
||||
[HIP SDK OS and GPU support page](https://rocm.docs.amd.com/en/docs-5.5.1/release/windows_support.html)
|
||||
lists the versions of Windows and GPUs validated by AMD. HIP SDK features on
|
||||
Windows are described in detail in our
|
||||
[What is ROCm?](https://rocm.docs.amd.com/en/docs-5.5.1/rocm.html#rocm-on-windows)
|
||||
page and differs from the Linux feature set. Visit
|
||||
[Quick Start](https://rocm.docs.amd.com/en/docs-5.5.1/deploy/windows/quick_start.html#)
|
||||
page to get started. Known issues are tracked on
|
||||
[GitHub](https://github.com/RadeonOpenCompute/ROCm/issues?q=is%3Aopen+label%3A5.5.1+label%3A%22Verified+Issue%22+label%3AWindows).
|
||||
- New documentation portal at https://rocm.docs.amd.com
|
||||
- Ongoing software enhancements for LLMs, ensuring full compliance with the HuggingFace unit test suite
|
||||
- OpenAI Triton, CuPy, HIP Graph support, and many other library performance enhancements
|
||||
- Improved ROCm deployment and development tools, including CPU-GPU (rocGDB) debugger, profiler, and docker containers
|
||||
- New pseudorandom generators are available in rocRAND. Added support for half-precision transforms in hipFFT/rocFFT. Added LU refactorization and linear system solver for sparse matrices in rocSOLVER.
|
||||
|
||||
#### HIP API Change
|
||||
#### OS and GPU Support Changes
|
||||
|
||||
The following HIP API is updated in the ROCm 5.5.1 release:
|
||||
- SLES15 SP5 support was added this release. SLES15 SP3 support was dropped.
|
||||
- AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively referred to as gfx906 GPUs) will be entering the maintenance mode starting Q3 2023. This will be aligned with ROCm 5.7 GA release date.
|
||||
- No new features and performance optimizations will be supported for the gfx906 GPUs beyond ROCm 5.7
|
||||
- Bug fixes / critical security patches will continue to be supported for the gfx906 GPUs till Q2 2024 (End of Maintenance [EOM])(will be aligned with the closest ROCm release)
|
||||
- Bug fixes during the maintenance will be made to the next ROCm point release
|
||||
- Bug fixes will not be back ported to older ROCm releases for this SKU
|
||||
- Distro / Operating system updates will continue as per the ROCm release cadence for gfx906 GPUs till EOM.
|
||||
|
||||
##### `hipDeviceSetCacheConfig`
|
||||
#### AMDSMI CLI 23.0.0.4
|
||||
|
||||
- The return value for `hipDeviceSetCacheConfig` is updated from `hipErrorNotSupported` to `hipSuccess`
|
||||
##### Added
|
||||
|
||||
### Library Changes in ROCM 5.5.1
|
||||
- AMDSMI CLI tool enabled for Linux Bare Metal & Guest
|
||||
|
||||
- Package: amd-smi-lib
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- not all Error Correction Code (ECC) fields are currently supported
|
||||
|
||||
- RHEL 8 & SLES 15 have extra install steps
|
||||
|
||||
#### Kernel Modules (DKMS)
|
||||
|
||||
##### Fixes
|
||||
|
||||
- Stability fix for multi GPU system reproducilble via ROCm_Bandwidth_Test as reported in [Issue 2198](https://github.com/RadeonOpenCompute/ROCm/issues/2198).
|
||||
|
||||
#### HIP 5.6 (For ROCm 5.6)
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Consolidation of hipamd, rocclr and OpenCL projects in clr
|
||||
- Optimized lock for graph global capture mode
|
||||
|
||||
##### Added
|
||||
|
||||
- Added hipRTC support for amd_hip_fp16
|
||||
- Added hipStreamGetDevice implementation to get the device associated with the stream
|
||||
- Added HIP_AD_FORMAT_SIGNED_INT16 in hipArray formats
|
||||
- hipArrayGetInfo for getting information about the specified array
|
||||
- hipArrayGetDescriptor for getting 1D or 2D array descriptor
|
||||
- hipArray3DGetDescriptor to get 3D array descriptor
|
||||
|
||||
##### Changed
|
||||
|
||||
- hipMallocAsync to return success for zero size allocation to match hipMalloc
|
||||
- Separation of hipcc perl binaries from HIP project to hipcc project. hip-devel package depends on newly added hipcc package
|
||||
- Consolidation of hipamd, ROCclr, and OpenCL repositories into a single repository called clr. Instructions are updated to build HIP from sources in the HIP Installation guide
|
||||
- Removed hipBusBandwidth and hipCommander samples from hip-tests
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed regression in hipMemCpyParam3D when offset is applied
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- Limited testing on xnack+ configuration
|
||||
- Multiple HIP tests failures (gpuvm fault or hangs)
|
||||
- hipSetDevice and hipSetDeviceFlags APIs return hipErrorInvalidDevice instead of hipErrorNoDevice, on a system without GPU
|
||||
- Known memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs. Issue will be fixed in a future ROCm release
|
||||
|
||||
##### Upcoming changes in future release
|
||||
|
||||
- Removal of gcnarch from hipDeviceProp_t structure
|
||||
- Addition of new fields in hipDeviceProp_t structure
|
||||
- maxTexture1D
|
||||
- maxTexture2D
|
||||
- maxTexture1DLayered
|
||||
- maxTexture2DLayered
|
||||
- sharedMemPerMultiprocessor
|
||||
- deviceOverlap
|
||||
- asyncEngineCount
|
||||
- surfaceAlignment
|
||||
- unifiedAddressing
|
||||
- computePreemptionSupported
|
||||
- uuid
|
||||
- Removal of deprecated code
|
||||
- hip-hcc codes from hip code tree
|
||||
- Correct hipArray usage in HIP APIs such as hipMemcpyAtoH and hipMemcpyHtoA
|
||||
- HIPMEMCPY_3D fields correction (unsigned int -> size_t)
|
||||
- Renaming of 'memoryType' in hipPointerAttribute_t structure to 'type'
|
||||
|
||||
#### ROCgdb-13 (For ROCm 5.6.0)
|
||||
|
||||
##### Optimized
|
||||
|
||||
- Improved performances when handling the end of a process with a large number of threads.
|
||||
|
||||
Known Issues
|
||||
|
||||
- On certain configurations, ROCgdb can show the following warning message:
|
||||
|
||||
`warning: Probes-based dynamic linker interface failed. Reverting to original interface.`
|
||||
|
||||
This does not affect ROCgdb's functionalities.
|
||||
|
||||
#### ROCprofiler (For ROCm 5.6.0)
|
||||
|
||||
In ROCm 5.6 the `rocprofilerv1` and `rocprofilerv2` include and library files of
|
||||
ROCm 5.5 are split into separate files. The `rocmtools` files that were
|
||||
deprecated in ROCm 5.5 have been removed.
|
||||
|
||||
| ROCm 5.6 | rocprofilerv1 | rocprofilerv2 |
|
||||
|-----------------|-------------------------------------|----------------------------------------|
|
||||
| **Tool script** | `bin/rocprof` | `bin/rocprofv2` |
|
||||
| **API include** | `include/rocprofiler/rocprofiler.h` | `include/rocprofiler/v2/rocprofiler.h` |
|
||||
| **API library** | `lib/librocprofiler.so.1` | `lib/librocprofiler.so.2` |
|
||||
|
||||
The ROCm Profiler Tool that uses `rocprofilerV1` can be invoked using the
|
||||
following command:
|
||||
|
||||
```sh
|
||||
$ rocprof …
|
||||
```
|
||||
|
||||
To write a custom tool based on the `rocprofilerV1` API do the following:
|
||||
|
||||
```C
|
||||
main.c:
|
||||
#include <rocprofiler/rocprofiler.h> // Use the rocprofilerV1 API
|
||||
int main() {
|
||||
// Use the rocprofilerV1 API
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
This can be built in the following manner:
|
||||
|
||||
```sh
|
||||
$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64
|
||||
```
|
||||
|
||||
The resulting `a.out` will depend on
|
||||
`/opt/rocm-5.6.0/lib/librocprofiler64.so.1`.
|
||||
|
||||
The ROCm Profiler that uses `rocprofilerV2` API can be invoked using the
|
||||
following command:
|
||||
|
||||
```sh
|
||||
$ rocprofv2 …
|
||||
```
|
||||
|
||||
To write a custom tool based on the `rocprofilerV2` API do the following:
|
||||
|
||||
```C
|
||||
main.c:
|
||||
#include <rocprofiler/v2/rocprofiler.h> // Use the rocprofilerV2 API
|
||||
int main() {
|
||||
// Use the rocprofilerV2 API
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
This can be built in the following manner:
|
||||
|
||||
```sh
|
||||
$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64-v2
|
||||
```
|
||||
|
||||
The resulting `a.out` will depend on
|
||||
`/opt/rocm-5.6.0/lib/librocprofiler64.so.2`.
|
||||
|
||||
##### Optimized
|
||||
|
||||
- Improved Test Suite
|
||||
|
||||
##### Added
|
||||
|
||||
- 'end_time' need to be disabled in roctx_trace.txt
|
||||
|
||||
##### Fixed
|
||||
|
||||
- rocprof in ROcm/5.4.0 gpu selector broken.
|
||||
- rocprof in ROCm/5.4.1 fails to generate kernel info.
|
||||
- rocprof clobbers LD_PRELOAD.
|
||||
|
||||
### Library Changes in ROCM 5.6.0
|
||||
|
||||
| Library | Version |
|
||||
|---------|---------|
|
||||
| hipBLAS | [0.54.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.5.1) |
|
||||
| hipCUB | [2.13.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.5.1) |
|
||||
| hipFFT | [1.0.11](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.5.1) |
|
||||
| hipSOLVER | [1.7.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.5.1) |
|
||||
| hipSPARSE | [2.3.5](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.5.1) |
|
||||
| rccl | [2.15.5](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.5.1) |
|
||||
| rocALUTION | [2.1.8](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.5.1) |
|
||||
| rocBLAS | [2.47.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.5.1) |
|
||||
| rocFFT | [1.0.22](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.5.1) |
|
||||
| rocPRIM | [2.13.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.5.1) |
|
||||
| rocRAND | [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.5.1) |
|
||||
| rocSOLVER | [3.21.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.5.1) |
|
||||
| rocSPARSE | [2.5.1](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.5.1) |
|
||||
| rocThrust | [2.17.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.5.1) |
|
||||
| rocWMMA | [1.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.5.1) |
|
||||
| Tensile | [4.36.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.5.1) |
|
||||
| hipBLAS | ⇒ [1.0.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.6.0) |
|
||||
| hipCUB | ⇒ [2.13.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.6.0) |
|
||||
| hipFFT | ⇒ [1.0.12](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.6.0) |
|
||||
| hipSOLVER | ⇒ [1.8.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.6.0) |
|
||||
| hipSPARSE | ⇒ [2.3.6](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.6.0) |
|
||||
| MIOpen | ⇒ [2.19.0](https://github.com/ROCmSoftwarePlatform/MIOpen/releases/tag/rocm-5.6.0) |
|
||||
| rccl | ⇒ [2.15.5](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.6.0) |
|
||||
| rocALUTION | ⇒ [2.1.9](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.6.0) |
|
||||
| rocBLAS | ⇒ [3.0.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.6.0) |
|
||||
| rocFFT | ⇒ [1.0.23](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.6.0) |
|
||||
| rocm-cmake | ⇒ [0.9.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.6.0) |
|
||||
| rocPRIM | ⇒ [2.13.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.6.0) |
|
||||
| rocRAND | ⇒ [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.6.0) |
|
||||
| rocSOLVER | ⇒ [3.22.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.6.0) |
|
||||
| rocSPARSE | ⇒ [2.5.2](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.6.0) |
|
||||
| rocThrust | ⇒ [2.18.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.6.0) |
|
||||
| rocWMMA | ⇒ [1.1.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.6.0) |
|
||||
| Tensile | ⇒ [4.37.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.6.0) |
|
||||
|
||||
## Older versions
|
||||
#### hipBLAS 1.0.0
|
||||
|
||||
The release notes for older versions can be found in [the changelog](./CHANGELOG.md).
|
||||
hipBLAS 1.0.0 for ROCm 5.6.0
|
||||
|
||||
##### Changed
|
||||
|
||||
- added const qualifier to hipBLAS functions (swap, sbmv, spmv, symv, trsm) where missing
|
||||
|
||||
##### Removed
|
||||
|
||||
- removed support for deprecated hipblasInt8Datatype_t enum
|
||||
- removed support for deprecated hipblasSetInt8Datatype and hipblasGetInt8Datatype functions
|
||||
|
||||
##### Deprecated
|
||||
|
||||
- in-place trmm is deprecated. It will be replaced by trmm which includes both in-place and
|
||||
out-of-place functionality
|
||||
|
||||
#### hipCUB 2.13.1
|
||||
|
||||
hipCUB 2.13.1 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Benchmarks for `BlockShuffle`, `BlockLoad`, and `BlockStore`.
|
||||
|
||||
##### Changed
|
||||
|
||||
- CUB backend references CUB and Thrust version 1.17.2.
|
||||
- Improved benchmark coverage of `BlockScan` by adding `ExclusiveScan`, benchmark coverage of `BlockRadixSort` by adding `SortBlockedToStriped`, and benchmark coverage of `WarpScan` by adding `Broadcast`.
|
||||
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- `BlockRadixRankMatch` is currently broken under the rocPRIM backend.
|
||||
- `BlockRadixRankMatch` with a warp size that does not exactly divide the block size is broken under the CUB backend.
|
||||
|
||||
#### hipFFT 1.0.12
|
||||
|
||||
hipFFT 1.0.12 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Implemented the hipfftXtMakePlanMany, hipfftXtGetSizeMany, hipfftXtExec APIs, to allow requesting half-precision transforms.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform.
|
||||
|
||||
#### hipSOLVER 1.8.0
|
||||
|
||||
hipSOLVER 1.8.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added compatibility API with hipsolverRf prefix
|
||||
|
||||
#### hipSPARSE 2.3.6
|
||||
|
||||
hipSPARSE 2.3.6 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added SpGEMM algorithms
|
||||
|
||||
##### Changed
|
||||
|
||||
- For hipsparseXbsr2csr and hipsparseXcsr2bsr, blockDim == 0 now returns HIPSPARSE_STATUS_INVALID_SIZE
|
||||
|
||||
#### MIOpen 2.19.0
|
||||
|
||||
MIOpen 2.19.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- ROCm 5.5 support for gfx1101 (Navi32)
|
||||
|
||||
##### Changed
|
||||
|
||||
- Tuning results for MLIR on ROCm 5.5
|
||||
- Bumping MLIR commit to 5.5.0 release tag
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fix 3d convolution Host API bug
|
||||
- [HOTFIX][MI200][FP16] Disabled ConvHipImplicitGemmBwdXdlops when FP16_ALT is required.
|
||||
|
||||
#### rccl 2.15.5
|
||||
|
||||
RCCL 2.15.5 for ROCm 5.6.0
|
||||
|
||||
##### Changed
|
||||
|
||||
- Compatibility with NCCL 2.15.5
|
||||
- Unit test executable renamed to rccl-UnitTests
|
||||
|
||||
##### Added
|
||||
|
||||
- HW-topology aware binary tree implementation
|
||||
- Experimental support for MSCCL
|
||||
- New unit tests for hipGraph support
|
||||
- NPKit integration
|
||||
|
||||
##### Fixed
|
||||
|
||||
- rocm-smi ID conversion
|
||||
- Support for HIP_VISIBLE_DEVICES for unit tests
|
||||
- Support for p2p transfers to non (HIP) visible devices
|
||||
|
||||
##### Removed
|
||||
|
||||
- Removed TransferBench from tools. Exists in standalone repo: https://github.com/ROCmSoftwarePlatform/TransferBench
|
||||
|
||||
#### rocALUTION 2.1.9
|
||||
|
||||
rocALUTION 2.1.9 for ROCm 5.6.0
|
||||
|
||||
##### Improved
|
||||
|
||||
- Fixed synchronization issues in level 1 routines
|
||||
|
||||
#### rocBLAS 3.0.0
|
||||
|
||||
rocBLAS 3.0.0 for ROCm 5.6.0
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Improved performance of Level 2 rocBLAS GEMV on gfx90a GPU for non-transposed problems having small matrices and larger batch counts. Performance enhanced for problem sizes when m and n <= 32 and batch_count >= 256.
|
||||
- Improved performance of rocBLAS syr2k for single, double, and double-complex precision, and her2k for double-complex precision. Slightly improved performance for general sizes on gfx90a.
|
||||
|
||||
##### Added
|
||||
|
||||
- Added bf16 inputs and f32 compute support to Level 1 rocBLAS Extension functions axpy_ex, scal_ex and nrm2_ex.
|
||||
|
||||
##### Deprecated
|
||||
|
||||
- trmm inplace is deprecated. It will be replaced by trmm that has both inplace and out-of-place functionality
|
||||
- rocblas_query_int8_layout_flag() is deprecated and will be removed in a future release
|
||||
- rocblas_gemm_flags_pack_int8x4 enum is deprecated and will be removed in a future release
|
||||
- rocblas_set_device_memory_size() is deprecated and will be replaced by a future function rocblas_increase_device_memory_size()
|
||||
- rocblas_is_user_managing_device_memory() is deprecated and will be removed in a future release
|
||||
|
||||
##### Removed
|
||||
|
||||
- is_complex helper was deprecated and now removed. Use rocblas_is_complex instead.
|
||||
- The enum truncate_t and the value truncate was deprecated and now removed from. It was replaced by rocblas_truncate_t and rocblas_truncate, respectively.
|
||||
- rocblas_set_int8_type_for_hipblas was deprecated and is now removed.
|
||||
- rocblas_get_int8_type_for_hipblas was deprecated and is now removed.
|
||||
|
||||
##### Dependencies
|
||||
|
||||
- build only dependency on python joblib added as used by Tensile build
|
||||
- fix for cmake install on some OS when performed by install.sh -d --cmake_install
|
||||
|
||||
##### Fixed
|
||||
|
||||
- make trsm offset calculations 64 bit safe
|
||||
|
||||
##### Changed
|
||||
|
||||
- refactor rotg test code
|
||||
|
||||
#### rocFFT 1.0.23
|
||||
|
||||
rocFFT 1.0.23 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Implemented half-precision transforms, which can be requested by passing rocfft_precision_half to rocfft_plan_create.
|
||||
- Implemented a hierarchical solution map which saves how to decompose a problem and the kernels to be used.
|
||||
- Implemented a first version of offline-tuner to support tuning kernels for C2C/Z2Z problems.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Replaced std::complex with hipComplex data types for data generator.
|
||||
- FFT plan dimensions are now sorted to be row-major internally where possible, which produces better plans if the dimensions were accidentally specified in a different order (column-major, for example).
|
||||
- Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform.
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed over-allocation of LDS in some real-complex kernels, which was resulting in kernel launch failure.
|
||||
|
||||
#### rocm-cmake 0.9.0
|
||||
|
||||
rocm-cmake 0.9.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added the option ROCM_HEADER_WRAPPER_WERROR
|
||||
- Compile-time C macro in the wrapper headers causes errors to be emitted instead of warnings.
|
||||
- Configure-time CMake option sets the default for the C macro.
|
||||
|
||||
#### rocPRIM 2.13.0
|
||||
|
||||
rocPRIM 2.13.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- New block level `radix_rank` primitive.
|
||||
- New block level `radix_rank_match` primitive.
|
||||
- Added a stable block sorting implementation. This be used with `block_sort` by using the `block_sort_algorithm::stable_merge_sort` algorithm.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Improved the performance of `block_radix_sort` and `device_radix_sort`.
|
||||
- Improved the performance of `device_merge_sort`.
|
||||
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core). Contributed by: [v01dXYZ](https://github.com/v01dXYZ).
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- Disabled GPU error messages relating to incorrect warp operation usage with Navi GPUs on Windows, due to GPU printf performance issues on Windows.
|
||||
- When `ROCPRIM_DISABLE_LOOKBACK_SCAN` is set, `device_scan` fails for input sizes bigger than `scan_config::size_limit`, which defaults to `std::numeric_limits<unsigned int>::max()`.
|
||||
|
||||
#### rocRAND 2.10.17
|
||||
|
||||
rocRAND 2.10.17 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- MT19937 pseudo random number generator based on M. Matsumoto and T. Nishimura, 1998, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator.
|
||||
- New benchmark for the device API using Google Benchmark, `benchmark_rocrand_device_api`, replacing `benchmark_rocrand_kernel`. `benchmark_rocrand_kernel` is deprecated and will be removed in a future version. Likewise, `benchmark_curand_host_api` is added to replace `benchmark_curand_generate` and `benchmark_curand_device_api` is added to replace `benchmark_curand_kernel`.
|
||||
- experimental HIP-CPU feature
|
||||
- ThreeFry pseudorandom number generator based on Salmon et al., 2011, "Parallel random numbers: as easy as 1, 2, 3".
|
||||
|
||||
##### Changed
|
||||
|
||||
- Python 2.7 is no longer officially supported.
|
||||
|
||||
#### rocSOLVER 3.22.0
|
||||
|
||||
rocSOLVER 3.22.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- LU refactorization for sparse matrices
|
||||
- CSRRF_ANALYSIS
|
||||
- CSRRF_SUMLU
|
||||
- CSRRF_SPLITLU
|
||||
- CSRRF_REFACTLU
|
||||
- Linear system solver for sparse matrices
|
||||
- CSRRF_SOLVE
|
||||
- Added type `rocsolver_rfinfo` for use with sparse matrix routines
|
||||
|
||||
##### Optimized
|
||||
|
||||
- Improved the performance of BDSQR and GESVD when singular vectors are requested
|
||||
|
||||
##### Fixed
|
||||
|
||||
- BDSQR and GESVD should no longer hang when the input contains `NaN` or `Inf`
|
||||
|
||||
#### rocSPARSE 2.5.2
|
||||
|
||||
rocSPARSE 2.5.2 for ROCm 5.6.0
|
||||
|
||||
##### Improved
|
||||
|
||||
- Fixed a memory leak in csritsv
|
||||
- Fixed a bug in csrsm and bsrsm
|
||||
|
||||
#### rocThrust 2.18.0
|
||||
|
||||
rocThrust 2.18.0 for ROCm 5.6.0
|
||||
|
||||
##### Fixed
|
||||
|
||||
- `lower_bound`, `upper_bound`, and `binary_search` failed to compile for certain types.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
|
||||
|
||||
#### rocWMMA 1.1.0
|
||||
|
||||
rocWMMA 1.1.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added cross-lane operation backends (Blend, Permute, Swizzle and Dpp)
|
||||
- Added GPU kernels for rocWMMA unit test pre-process and post-process operations (fill, validation)
|
||||
- Added performance gemm samples for half, single and double precision
|
||||
- Added rocWMMA cmake versioning
|
||||
- Added vectorized support in coordinate transforms
|
||||
- Included ROCm smi for runtime clock rate detection
|
||||
- Added fragment transforms for transpose and change data layout
|
||||
|
||||
##### Changed
|
||||
|
||||
- Default to GPU rocBLAS validation against rocWMMA
|
||||
- Re-enabled int8 gemm tests on gfx9
|
||||
- Upgraded to C++17
|
||||
- Restructured unit test folder for consistency
|
||||
- Consolidated rocWMMA samples common code
|
||||
|
||||
#### Tensile 4.37.0
|
||||
|
||||
Tensile 4.37.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added user driven tuning API
|
||||
- Added decision tree fallback feature
|
||||
- Added SingleBuffer + AtomicAdd option for GlobalSplitU
|
||||
- DirectToVgpr support for fp16 and Int8 with TN orientation
|
||||
- Added new test cases for various functions
|
||||
- Added SingleBuffer algorithm for ZGEMM/CGEMM
|
||||
- Added joblib for parallel map calls
|
||||
- Added support for MFMA + LocalSplitU + DirectToVgprA+B
|
||||
- Added asmcap check for MIArchVgpr
|
||||
- Added support for MFMA + LocalSplitU
|
||||
- Added frequency, power, and temperature data to the output
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Improved the performance of GlobalSplitU with SingleBuffer algorithm
|
||||
- Reduced the running time of the extended and pre_checkin tests
|
||||
- Optimized the Tailloop section of the assembly kernel
|
||||
- Optimized complex GEMM (fixed vgpr allocation, unified CGEMM and ZGEMM code in MulMIoutAlphaToArch)
|
||||
- Improved the performance of the second kernel of MultipleBuffer algorithm
|
||||
|
||||
##### Changed
|
||||
|
||||
- Updated custom kernels with 64-bit offsets
|
||||
- Adapted 64-bit offset arguments for assembly kernels
|
||||
- Improved temporary register re-use to reduce max sgpr usage
|
||||
- Removed some restrictions on VectorWidth and DirectToVgpr
|
||||
- Updated the dependency requirements for Tensile
|
||||
- Changed the range of AssertSummationElementMultiple
|
||||
- Modified the error messages for more clarity
|
||||
- Changed DivideAndReminder to vectorStaticRemainder in case quotient is not used
|
||||
- Removed dummy vgpr for vectorStaticRemainder
|
||||
- Removed tmpVgpr parameter from vectorStaticRemainder/Divide/DivideAndReminder
|
||||
- Removed qReg parameter from vectorStaticRemainder
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed tmp sgpr allocation to avoid over-writing values (alpha)
|
||||
- 64-bit offset parameters for post kernels
|
||||
- Fixed gfx908 CI test failures
|
||||
- Fixed offset calculation to prevent overflow for large offsets
|
||||
- Fixed issues when BufferLoad and BufferStore are equal to zero
|
||||
- Fixed StoreCInUnroll + DirectToVgpr + no useInitAccVgprOpt mismatch
|
||||
- Fixed DirectToVgpr + LocalSplitU + FractionalLoad mismatch
|
||||
- Fixed the memory access error related to StaggerU + large stride
|
||||
- Fixed ZGEMM 4x4 MatrixInst mismatch
|
||||
- Fixed DGEMM 4x4 MatrixInst mismatch
|
||||
- Fixed ASEM + GSU + NoTailLoop opt mismatch
|
||||
- Fixed AssertSummationElementMultiple + GlobalSplitU issues
|
||||
- Fixed ASEM + GSU + TailLoop inner unroll
|
||||
|
||||
34
default.xml
34
default.xml
@@ -12,44 +12,41 @@ fetch="https://github.com/GPUOpen-ProfessionalCompute-Libraries/" />
|
||||
fetch="https://github.com/GPUOpen-Tools/" />
|
||||
<remote name="KhronosGroup"
|
||||
fetch="https://github.com/KhronosGroup/" />
|
||||
<default revision="refs/tags/rocm-5.5.1"
|
||||
<default revision="refs/tags/rocm-5.6.0"
|
||||
remote="roc-github"
|
||||
sync-c="true"
|
||||
sync-j="4" />
|
||||
<!--list of projects for ROCM-->
|
||||
<project name="ROCK-Kernel-Driver" />
|
||||
<project name="ROCT-Thunk-Interface" />
|
||||
<project name="ROCR-Runtime" />
|
||||
<project name="rocm_smi_lib" />
|
||||
<project name="rocm-core" />
|
||||
<project name="rocm-cmake" />
|
||||
<project name="rocminfo" />
|
||||
<project name="ROCK-Kernel-Driver" remote="roc-github" />
|
||||
<project name="ROCT-Thunk-Interface" remote="roc-github" />
|
||||
<project name="ROCR-Runtime" remote="roc-github" />
|
||||
<project name="rocm_smi_lib" remote="roc-github" />
|
||||
<project name="rocm-core" remote="roc-github" />
|
||||
<project name="rocm-cmake" remote="roc-github" />
|
||||
<project name="rocminfo" remote="roc-github" />
|
||||
<project name="rocprofiler" remote="rocm-devtools" />
|
||||
<project name="roctracer" remote="rocm-devtools" />
|
||||
<project name="ROCm-OpenCL-Runtime" />
|
||||
<project path="ROCm-OpenCL-Runtime/api/opencl/khronos/icd" name="OpenCL-ICD-Loader" remote="KhronosGroup" revision="6c03f8b58fafd9dd693eaac826749a5cfad515f8" />
|
||||
<project name="clang-ocl" />
|
||||
<project name="clang-ocl" remote="roc-github" />
|
||||
<!--HIP Projects-->
|
||||
<project name="HIP" remote="rocm-devtools" />
|
||||
<project name="hipamd" remote="rocm-devtools" />
|
||||
<project name="clr" remote="rocm-devtools" />
|
||||
<project name="HIP-Examples" remote="rocm-devtools" />
|
||||
<project name="ROCclr" remote="rocm-devtools" />
|
||||
<project name="HIPIFY" remote="rocm-devtools" />
|
||||
<project name="HIPCC" remote="rocm-devtools" />
|
||||
<!-- The following projects are all associated with the AMDGPU LLVM compiler -->
|
||||
<project name="llvm-project" />
|
||||
<project name="ROCm-Device-Libs" />
|
||||
<project name="atmi" />
|
||||
<project name="ROCm-CompilerSupport" />
|
||||
<project name="llvm-project" remote="roc-github" />
|
||||
<project name="ROCm-Device-Libs" remote="roc-github" />
|
||||
<project name="ROCm-CompilerSupport" remote="roc-github" />
|
||||
<project name="rocr_debug_agent" remote="rocm-devtools" />
|
||||
<project name="rocm_bandwidth_test" />
|
||||
<project name="rocm_bandwidth_test" remote="roc-github" />
|
||||
<project name="half" remote="rocm-swplat" revision="37742ce15b76b44e4b271c1e66d13d2fa7bd003e" />
|
||||
<project name="RCP" remote="gpuopen-tools" revision="3a49405a1500067c49d181844ec90aea606055bb" />
|
||||
<!-- gdb projects -->
|
||||
<project name="ROCgdb" remote="rocm-devtools" />
|
||||
<project name="ROCdbgapi" remote="rocm-devtools" />
|
||||
<!-- ROCm Libraries -->
|
||||
<project name="rdc" />
|
||||
<project name="rdc" remote="roc-github" />
|
||||
<project groups="mathlibs" name="rocBLAS" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="Tensile" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="hipBLAS" remote="rocm-swplat" />
|
||||
@@ -61,7 +58,6 @@ fetch="https://github.com/KhronosGroup/" />
|
||||
<project groups="mathlibs" name="hipSOLVER" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="hipSPARSE" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="rocALUTION" remote="rocm-swplat" />
|
||||
<project name="MIOpenGEMM" remote="rocm-swplat" />
|
||||
<project name="MIOpen" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="rccl" remote="rocm-swplat" />
|
||||
<project name="MIVisionX" remote="gpuopen-libs" />
|
||||
|
||||
@@ -20,8 +20,9 @@ latex_engine = "xelatex"
|
||||
project = "ROCm Documentation"
|
||||
author = "Advanced Micro Devices, Inc."
|
||||
copyright = "Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved."
|
||||
version = "5.5.1"
|
||||
release = "5.5.1"
|
||||
version = "5.6.0"
|
||||
release = "5.6.0"
|
||||
|
||||
|
||||
setting_all_article_info = True
|
||||
all_article_info_os = ["linux", "windows"]
|
||||
|
||||
@@ -18,8 +18,8 @@ following commands based on your distribution.
|
||||
|
||||
```shell
|
||||
sudo apt update
|
||||
wget https://repo.radeon.com/amdgpu-install/5.5.1/ubuntu/focal/amdgpu-install_5.5.50501-1_all.deb
|
||||
sudo apt install ./amdgpu-install_5.5.50501-1_all.deb
|
||||
wget https://repo.radeon.com/amdgpu-install/5.6/ubuntu/focal/amdgpu-install_5.6.50600-1_all.deb
|
||||
sudo apt install ./amdgpu-install_5.6.50600-1_all.deb
|
||||
```
|
||||
|
||||
:::
|
||||
@@ -28,8 +28,8 @@ sudo apt install ./amdgpu-install_5.5.50501-1_all.deb
|
||||
|
||||
```shell
|
||||
sudo apt update
|
||||
wget https://repo.radeon.com/amdgpu-install/5.5.1/ubuntu/jammy/amdgpu-install_5.5.50501-1_all.deb
|
||||
sudo apt install ./amdgpu-install_5.5.50501-1_all.deb
|
||||
wget https://repo.radeon.com/amdgpu-install/5.6/ubuntu/jammy/amdgpu-install_5.6.50600-1_all.deb
|
||||
sudo apt install ./amdgpu-install_5.6.50600-1_all.deb
|
||||
```
|
||||
|
||||
:::
|
||||
@@ -44,7 +44,7 @@ sudo apt install ./amdgpu-install_5.5.50501-1_all.deb
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/8.6/amdgpu-install-5.5.50501-1.el8.noarch.rpm
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.6/rhel/8.6/amdgpu-install-5.6.50600-1.el8.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
@@ -53,7 +53,16 @@ sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/8.6/amdgpu-in
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/8.7/amdgpu-install-5.5.50501-1.el8.noarch.rpm
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.6/rhel/8.7/amdgpu-install-5.6.50600-1.el8.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 8.8
|
||||
:sync: RHEL-8.8
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/8.8/amdgpu-install-5.5.50501-1.el8.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
@@ -62,21 +71,38 @@ sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/8.7/amdgpu-in
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/9.1/amdgpu-install-5.5.50501-1.el8.noarch.rpm
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.6/rhel/9.1/amdgpu-install-5.6.50600-1.el8.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 9.2
|
||||
:sync: RHEL-9.2
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/9.2/amdgpu-install-5.5.50501-1.el8.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
:::::{tab-item} SUSE Linux Enterprise Server
|
||||
:sync: SLES
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} Service Pack 4
|
||||
:sync: SLES15-SP4
|
||||
:::{tab-item} SLES 15.4
|
||||
:sync: SLES-15.4
|
||||
|
||||
```shell
|
||||
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.5.1/sle/15.4/amdgpu-install-5.5.50501-1.noarch.rpm
|
||||
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.6/sle/15.4/amdgpu-install-5.6.50600-1.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} SLES 15.5
|
||||
:sync: SLES-15.5
|
||||
|
||||
```shell
|
||||
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.6/sle/15.5/amdgpu-install-5.6.50600-1.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
@@ -155,9 +181,9 @@ the installer script will install packages in the single-version layout.
|
||||
For the multi-version ROCm installation you must use the installer script from
|
||||
the latest release of ROCm that you wish to install.
|
||||
|
||||
**Example:** If you want to install ROCm releases 5.3.3 and 5.5.1
|
||||
**Example:** If you want to install ROCm releases 5.3.3 and 5.4.3
|
||||
simultaneously, you are required to download the installer from the latest ROCm
|
||||
release v5.5.1.
|
||||
release v5.4.3.
|
||||
|
||||
### Add Required Repositories
|
||||
|
||||
@@ -242,14 +268,14 @@ sudo yum clean all
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
:::::{tab-item} SUSE Linux Enterprise Server
|
||||
:sync: SLES
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4.3; do
|
||||
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/$ver/sle/15.4/main/x86_64
|
||||
baseurl=https://repo.radeon.com/rocm/zyp/$ver/main
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -272,12 +298,12 @@ sudo amdgpu-install --usecase=rocm --rocmrelease=<release-number-3>
|
||||
```
|
||||
|
||||
Following are examples of ROCm multi-version installation. The kernel-mode
|
||||
driver, associated with the ROCm release v5.5.1, will be installed as its latest
|
||||
driver, associated with the ROCm release v5.4.3, will be installed as its latest
|
||||
release in the list.
|
||||
|
||||
```none
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=5.3.3
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=5.5.1
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=5.4.3
|
||||
```
|
||||
|
||||
## Additional options
|
||||
|
||||
@@ -53,7 +53,7 @@ To add the AMDGPU repository, follow these steps:
|
||||
|
||||
```shell
|
||||
# amdgpu repository for focal
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.5.1/ubuntu focal main' \
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6/ubuntu focal main' \
|
||||
| sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
```
|
||||
@@ -64,7 +64,7 @@ sudo apt update
|
||||
|
||||
```shell
|
||||
# amdgpu repository for jammy
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.5.1/ubuntu jammy main' \
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6/ubuntu jammy main' \
|
||||
| sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
```
|
||||
@@ -91,7 +91,7 @@ To add the ROCm repository, use the following steps:
|
||||
|
||||
```shell
|
||||
# ROCm repositories for focal
|
||||
for ver in 5.3.3 5.4.3 5.5.1; do
|
||||
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" \
|
||||
| sudo tee --append /etc/apt/sources.list.d/rocm.list
|
||||
done
|
||||
@@ -106,7 +106,7 @@ sudo apt update
|
||||
|
||||
```shell
|
||||
# ROCm repositories for jammy
|
||||
for ver in 5.3.3 5.4.3 5.5.1; do
|
||||
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" \
|
||||
| sudo tee --append /etc/apt/sources.list.d/rocm.list
|
||||
done
|
||||
@@ -136,7 +136,7 @@ For a comprehensive list of meta-packages, refer to
|
||||
- Sample Multi-version installation
|
||||
|
||||
```shell
|
||||
sudo apt install rocm-hip-sdk5.5.1 rocm-hip-sdk5.3.3
|
||||
sudo apt install rocm-hip-sdk5.6 rocm-hip-sdk5.3.3
|
||||
```
|
||||
|
||||
:::::
|
||||
@@ -160,7 +160,7 @@ section.
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/8.6/main/x86_64/
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/8.6/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -179,7 +179,26 @@ sudo yum clean all
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/8.7/main/x86_64/
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/8.7/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL 8.8
|
||||
:sync: RHEL-8.8
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/8.8/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -198,7 +217,26 @@ sudo yum clean all
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/9.1/main/x86_64/
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/9.1/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL 9.2
|
||||
:sync: RHEL-9.2
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/9.2/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -228,7 +266,7 @@ To add the ROCm repository, use the following steps, based on your distribution:
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4.3 5.5.1; do
|
||||
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
|
||||
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-$ver]
|
||||
name=ROCm$ver
|
||||
@@ -247,7 +285,7 @@ sudo yum clean all
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4.3 5.5.1; do
|
||||
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
|
||||
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-$ver]
|
||||
name=ROCm$ver
|
||||
@@ -282,12 +320,12 @@ For a comprehensive list of meta-packages, refer to
|
||||
- Sample Multi-version installation
|
||||
|
||||
```shell
|
||||
sudo yum install rocm-hip-sdk5.5.1 rocm-hip-sdk5.3.3
|
||||
sudo yum install rocm-hip-sdk5.6 rocm-hip-sdk5.3.3
|
||||
```
|
||||
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
:::::{tab-item} SUSE Linux Enterprise Server
|
||||
:sync: SLES
|
||||
|
||||
::::{rubric} 1. Add the AMDGPU Repository and Install the Kernel-mode Driver
|
||||
::::
|
||||
@@ -297,11 +335,15 @@ If you have a version of the kernel-mode driver installed, you may skip this
|
||||
section.
|
||||
```
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} SLES 15.4
|
||||
:sync: SLES-15.4
|
||||
|
||||
```shell
|
||||
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/sle/15.4/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/sle/15.4/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -309,6 +351,25 @@ EOF
|
||||
sudo zypper ref
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} SLES 15.5
|
||||
:sync: SLES-15.5
|
||||
|
||||
```shell
|
||||
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/sle/15.5/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo zypper ref
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
Install the kernel mode driver and reboot the system using the following
|
||||
commands:
|
||||
|
||||
@@ -323,7 +384,7 @@ sudo reboot
|
||||
To add the ROCm repository, use the following steps:
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4.3 5.5.1; do
|
||||
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
|
||||
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
|
||||
[ROCm-$ver]
|
||||
name=ROCm$ver
|
||||
@@ -355,7 +416,7 @@ For a comprehensive list of meta-packages, refer to
|
||||
- Sample Multi-version installation
|
||||
|
||||
```shell
|
||||
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk5.5.1 rocm-hip-sdk5.3.3
|
||||
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk5.6 rocm-hip-sdk5.3.3
|
||||
```
|
||||
|
||||
:::::
|
||||
@@ -392,7 +453,7 @@ but are generally useful. Verification of the install is advised.
|
||||
2. Add binary paths to the `PATH` environment variable.
|
||||
|
||||
```shell
|
||||
export PATH=$PATH:/opt/rocm-5.5.1/bin:/opt/rocm-5.5.1/opencl/bin
|
||||
export PATH=$PATH:/opt/rocm/bin:/opt/rocm-5.6/opencl/bin
|
||||
```
|
||||
|
||||
```{attention}
|
||||
|
||||
@@ -114,8 +114,8 @@ sudo yum autoremove amdgpu-dkms
|
||||
```
|
||||
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
:::::{tab-item} SUSE Linux Enterprise Server
|
||||
:sync: SLES
|
||||
|
||||
::::{rubric} Uninstalling Specific Meta-packages
|
||||
::::
|
||||
|
||||
@@ -26,7 +26,7 @@ repository to the new release.
|
||||
|
||||
```shell
|
||||
# amdgpu repository for focal
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.5.1/ubuntu focal main' \
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6/ubuntu focal main' \
|
||||
| sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
```
|
||||
@@ -37,7 +37,7 @@ sudo apt update
|
||||
|
||||
```shell
|
||||
# amdgpu repository for jammy
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.5.1/ubuntu jammy main' \
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6/ubuntu jammy main' \
|
||||
| sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
```
|
||||
@@ -57,7 +57,7 @@ sudo apt update
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/8.6/main/x86_64/
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/8.6/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -75,7 +75,25 @@ sudo yum clean all
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/8.7/main/x86_64/
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/8.7/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 8.8
|
||||
:sync: RHEL-8.8
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/8.8/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -93,7 +111,25 @@ sudo yum clean all
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/9.1/main/x86_64/
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/9.1/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 9.2
|
||||
:sync: RHEL-9.2
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/9.2/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -105,14 +141,18 @@ sudo yum clean all
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
:::::{tab-item} SUSE Linux Enterprise Server
|
||||
:sync: SLES
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} SLES 15.4
|
||||
:sync: SLES-15.4
|
||||
|
||||
```shell
|
||||
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/sle/15.4/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/sle/15.4/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -120,6 +160,24 @@ EOF
|
||||
sudo zypper ref
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} SLES 15.5
|
||||
:sync: SLES-15.5
|
||||
|
||||
```shell
|
||||
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/sle/15.5/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo zypper ref
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
::::::
|
||||
|
||||
@@ -147,8 +205,8 @@ sudo reboot
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
:::{tab-item} SUSE Linux Enterprise Server
|
||||
:sync: SLES
|
||||
|
||||
```shell
|
||||
sudo zypper --gpg-auto-import-keys install amdgpu-dkms
|
||||
@@ -172,7 +230,7 @@ repository to the new release.
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.5.1 focal main" \
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.6 focal main" \
|
||||
| sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
|
||||
| sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
@@ -184,7 +242,7 @@ sudo apt update
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.5.1 jammy main" \
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.6 jammy main" \
|
||||
| sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
|
||||
| sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
@@ -203,9 +261,9 @@ sudo apt update
|
||||
|
||||
```shell
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-5.5.1]
|
||||
name=ROCm5.5.1
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/5.5.1/main
|
||||
[ROCm-5.6]
|
||||
name=ROCm5.6
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/5.6/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -220,9 +278,9 @@ sudo yum clean all
|
||||
|
||||
```shell
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-5.5.1]
|
||||
name=ROCm5.5.1
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/5.5.1/main
|
||||
[ROCm-5.6]
|
||||
name=ROCm5.6
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/5.6/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -234,15 +292,15 @@ sudo yum clean all
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
:::::{tab-item} SUSE Linux Enterprise Server
|
||||
:sync: SLES
|
||||
|
||||
```shell
|
||||
sudo tee /etc/zypp/repos.d/rocm.repo <<EOF
|
||||
[ROCm-5.5.1]
|
||||
name=ROCm5.5.1
|
||||
[ROCm-5.6]
|
||||
name=ROCm5.6
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/zyp/5.5.1/main
|
||||
baseurl=https://repo.radeon.com/rocm/zyp/5.6/main
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -275,8 +333,8 @@ sudo yum update rocm-hip-sdk
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} Suse Linux Enterprise Server 15
|
||||
:sync: SLES15
|
||||
:::{tab-item} Suse Linux Enterprise Server
|
||||
:sync: SLES
|
||||
|
||||
```shell
|
||||
sudo zypper --gpg-auto-import-keys update rocm-hip-sdk
|
||||
|
||||
@@ -91,6 +91,7 @@ sudo rpm -ivh epel-release-latest-8.noarch.rpm
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 9
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
|
||||
@@ -110,14 +111,29 @@ sudo crb enable
|
||||
```
|
||||
|
||||
:::::
|
||||
:::::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:::::{tab-item} SUSE Linux Enterprise Server
|
||||
:sync: SLES
|
||||
|
||||
Add the perl languages repository.
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} SLES 15.4
|
||||
:sync: SLES-15.4
|
||||
|
||||
```shell
|
||||
zypper addrepo https://download.opensuse.org/repositories/devel:languages:perl/SLE_15_SP4/devel:languages:perl.repo
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} SLES 15.5
|
||||
:sync: SLES-15.5
|
||||
|
||||
```shell
|
||||
zypper addrepo https://download.opensuse.org/repositories/devel:/languages:/perl/15.5/devel:languages:perl.repo
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
:::::
|
||||
::::::
|
||||
|
||||
|
||||
@@ -127,6 +127,33 @@ EOF
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL 8.8
|
||||
:sync: RHEL-8.8
|
||||
|
||||
```shell
|
||||
# Add the amdgpu module repository for RHEL 8.8
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/8.8/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
# Add the rocm repository for RHEL 8
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
|
||||
[rocm]
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/latest/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL 9.1
|
||||
:sync: RHEL-9.1
|
||||
|
||||
@@ -152,6 +179,33 @@ gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL 9.2
|
||||
:sync: RHEL-9.2
|
||||
|
||||
```shell
|
||||
# Add the amdgpu module repository for RHEL 9.2
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/9.2/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
# Add the rocm repository for RHEL 9
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
|
||||
[rocm]
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/latest/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
@@ -171,8 +225,8 @@ sudo yum clean all
|
||||
::::
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} SLES 15 SP4
|
||||
:sync: SLES15-SP4
|
||||
:::{tab-item} SLES 15.4
|
||||
:sync: SLES-15.4
|
||||
|
||||
```shell
|
||||
|
||||
@@ -197,6 +251,33 @@ gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} SLES 15.5
|
||||
:sync: SLES-15.5
|
||||
|
||||
```shell
|
||||
|
||||
# Add the amdgpu module repository for SLES 15.5
|
||||
sudo tee /etc/zypp/repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/latest/sle/15.5/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
# Add the rocm repository for SLES
|
||||
sudo tee /etc/zypp/repos.d/rocm.repo <<'EOF'
|
||||
[rocm]
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/zyp/zypper
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
```
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
|
||||
@@ -405,6 +405,22 @@ Follow these steps:
|
||||
python3 main.py
|
||||
```
|
||||
|
||||
## Using MIOpen kdb files with ROCm PyTorch wheels
|
||||
|
||||
PyTorch uses MIOpen for machine learning primitives. These primitives are compiled into kernels at runtime. Runtime compilation causes a small warm-up phase when starting PyTorch. MIOpen kdb files contain precompiled kernels that can speed up the warm-up phase of an application. More information is available in the {doc}`MIOpeninstallation page <miopen:install>`.
|
||||
|
||||
MIOpen kdb files can be used with ROCm PyTorch wheels. However, the kdb files need to be placed in a specific location with respect to the PyTorch installation path. A helper script simplifies this task for the user. The script takes in the ROCm version and user's GPU architecture as inputs, and works for Ubuntu and CentOS.
|
||||
|
||||
Helper script: [install_kdb_files_for_pytorch_wheels.sh](https://raw.githubusercontent.com/wiki/ROCmSoftwarePlatform/pytorch/files/install_kdb_files_for_pytorch_wheels.sh)
|
||||
|
||||
Usage:
|
||||
|
||||
After installing ROCm PyTorch wheels:
|
||||
|
||||
1. [Optional] `export GFX_ARCH=gfx90a`
|
||||
2. [Optional] `export ROCM_VERSION=5.5`
|
||||
3. `./install_kdb_files_for_pytorch_wheels.sh`
|
||||
|
||||
## References
|
||||
|
||||
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," CoRR, p. abs/1512.00567, 2015
|
||||
|
||||
@@ -275,7 +275,7 @@ sudo yum install cpupowerutils
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:::{tab-item} SUSE Linux Enterprise Server
|
||||
:sync: SLES
|
||||
|
||||
```shell
|
||||
@@ -453,7 +453,7 @@ sudo yum install rocm-bandwidth-test
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:::{tab-item} SUSE Linux Enterprise Server
|
||||
:sync: SLES
|
||||
|
||||
```shell
|
||||
|
||||
@@ -258,7 +258,7 @@ sudo yum install cpupowerutils
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:::{tab-item} SUSE Linux Enterprise Server
|
||||
:sync: SLES
|
||||
|
||||
```shell
|
||||
@@ -436,7 +436,7 @@ sudo yum install rocm-bandwidth-test
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} SUSE Linux Enterprise Server 15
|
||||
:::{tab-item} SUSE Linux Enterprise Server
|
||||
:sync: SLES
|
||||
|
||||
```shell
|
||||
|
||||
@@ -7,6 +7,8 @@
|
||||
AMD's library for high performance machine learning primitives.
|
||||
|
||||
- {doc}`Documentation <miopen:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/MIOpen)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
@@ -14,6 +16,8 @@ AMD's library for high performance machine learning primitives.
|
||||
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
|
||||
|
||||
- {doc}`Documentation <composable_kernel:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/composable_kernel)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
@@ -21,6 +25,9 @@ Composable Kernel: Performance Portable Programming Model for Machine Learning T
|
||||
AMD MIGraphX is AMD's graph inference engine that accelerates machine learning model inference.
|
||||
|
||||
- {doc}`Documentation <amdmigraphx:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/CHANGELOG.md)
|
||||
|
||||
|
||||
:::
|
||||
|
||||
|
||||
@@ -8,8 +8,9 @@
|
||||
:::{grid-item-card} [HIP](./hip)
|
||||
HIP is both AMD's GPU programming language extension and the GPU runtime.
|
||||
|
||||
- {doc}`hip:.doxygen/docBin/html/index`
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)
|
||||
- {doc}`HIP <hip:index>`
|
||||
- [HIP Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)
|
||||
- {doc}`HIPIFY <hipify:index>`
|
||||
|
||||
:::
|
||||
|
||||
@@ -64,6 +65,7 @@ Computer vision related projects.
|
||||
:::{grid-item-card} [Compilers and Tools](compilers)
|
||||
|
||||
- [ROCmCC](/reference/rocmcc/rocmcc)
|
||||
- {doc}`ROCdbgapi <rocdbgapi:index>`
|
||||
- {doc}`ROCgdb <rocgdb:index>`
|
||||
- {doc}`ROCProfiler <rocprofiler:rocprof>`
|
||||
- {doc}`ROCTracer <roctracer:index>`
|
||||
@@ -72,9 +74,9 @@ Computer vision related projects.
|
||||
|
||||
:::{grid-item-card} [Management Tools](management_tools)
|
||||
|
||||
- AMD SMI
|
||||
- [ROCm SMI](https://rocmdocs.amd.com/projects/rocm_smi_lib/en/latest/)
|
||||
- {doc}`ROCm Datacenter Tool <rdc:index>`
|
||||
- {doc}`AMD SMI <amdsmi:index>`
|
||||
- {doc}`ROCm SMI <rocm_smi_lib:index>`
|
||||
- {doc}`ROCm Data Center Tool <rdc:index>`
|
||||
|
||||
:::
|
||||
|
||||
|
||||
@@ -3,42 +3,46 @@
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} ROCmCC
|
||||
:link: /reference/rocmcc/rocmcc
|
||||
:link-type: doc
|
||||
:::{grid-item-card} {doc}`ROCdbgapi <rocdbgapi:index>`
|
||||
The AMD Debugger API is a library that provides all the support necessary for a
|
||||
debugger and other tools to perform low level control of the execution and
|
||||
inspection of execution state of AMD's commercially available GPU architectures.
|
||||
|
||||
- {doc}`Documentation <rocdbgapi:index>`
|
||||
- [GitHub](https://github.com/ROCm-Developer-Tools/ROCdbgapi/)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [ROCmCC](./rocmcc/rocmcc)
|
||||
ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance
|
||||
computing on AMD GPUs and CPUs and supports various heterogeneous programming
|
||||
models such as HIP, OpenMP, and OpenCL.
|
||||
|
||||
- [Documentation](./rocmcc/rocmcc)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} ROCgdb
|
||||
:link: rocgdb:index
|
||||
:link-type: doc
|
||||
:::{grid-item-card} {doc}`ROCgdb <rocgdb:index>`
|
||||
This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.
|
||||
|
||||
- {doc}`Documentation <rocgdb:index>`
|
||||
- [GitHub](https://github.com/ROCm-Developer-Tools/ROCgdb/)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} ROCProfiler
|
||||
:link: rocprofiler:rocprof
|
||||
:link-type: doc
|
||||
:::{grid-item-card} {doc}`ROCProfiler <rocprofiler:rocprof>`
|
||||
ROC profiler library. Profiling with performance counters and derived metrics. Library supports GFX8/GFX9. Hardware specific low-level performance analysis interface for profiling of GPU compute applications. The profiling includes hardware performance counters with complex performance metrics.
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} ROCTracer
|
||||
:link: roctracer:index
|
||||
:link-type: doc
|
||||
Callback/Activity Library for Performance tracing AMD GPU's
|
||||
- {doc}`Documentation <rocprofiler:rocprof>`
|
||||
- [GitHub](https://github.com/ROCm-Developer-Tools/rocprofiler/)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} ROCdbgapi
|
||||
:link: rocdbgapi:index
|
||||
:link-type: doc
|
||||
The AMD Debugger API is a library that provides all the support necessary for a
|
||||
debugger and other tools to perform low level control of the execution and
|
||||
inspection of execution state of AMD's commercially available GPU architectures.
|
||||
:::{grid-item-card} {doc}`ROCTracer <roctracer:index>`
|
||||
Callback/Activity Library for Performance tracing AMD GPUs
|
||||
|
||||
- {doc}`Documentation <roctracer:index>`
|
||||
- [GitHub](https://github.com/ROCm-Developer-Tools/roctracer)
|
||||
|
||||
:::
|
||||
|
||||
|
||||
@@ -7,6 +7,8 @@
|
||||
MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
|
||||
|
||||
- {doc}`Documentation <mivisionx:README>`
|
||||
- [GitHub](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/)
|
||||
- [Changelog](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/blob/master/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
|
||||
@@ -11,6 +11,7 @@ transforms, reductions, scans, etc. It also serves as a common back-end for
|
||||
similar libraries found inside ROCm.
|
||||
|
||||
- {doc}`Documentation <rocprim:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocPRIM/)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocPRIM)
|
||||
|
||||
@@ -22,6 +23,7 @@ interface. Their CPU back-ends are identical, while the GPU back-end calls into
|
||||
rocPRIM.
|
||||
|
||||
- {doc}`Documentation <rocthrust:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocThrust)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocThrust/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocThrust)
|
||||
|
||||
@@ -32,6 +34,7 @@ hipCUB is a template library of algorithm primitives with a CUB-compatible
|
||||
interface. It's back-end is rocPRIM.
|
||||
|
||||
- {doc}`Documentation <hipcub:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipCUB)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/hipCUB)
|
||||
|
||||
|
||||
@@ -10,6 +10,7 @@ The collective operations are implemented using ring and tree algorithms and hav
|
||||
throughput and latency.
|
||||
|
||||
- {doc}`Documentation <rccl:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rccl)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/ROCmSoftwarePlatform/rccl/tree/develop/tools)
|
||||
|
||||
|
||||
@@ -9,6 +9,7 @@ ROCm libraries for FFT are as follows:
|
||||
rocFFT is an AMD GPU optimized library for FFT.
|
||||
|
||||
- {doc}`Documentation <rocfft:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocFFT)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
@@ -19,6 +20,7 @@ using rocFFT. hipFFT allows for a common interface for other non AMD GPU
|
||||
FFT libraries.
|
||||
|
||||
- {doc}`Documentation <hipfft:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipFFT)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
@@ -9,6 +9,7 @@ ROCm libraries for linear algebra are as follows:
|
||||
`rocBLAS` is an AMD GPU optimized library for BLAS (Basic Linear Algebra Subprograms).
|
||||
|
||||
- {doc}`Documentation <rocblas:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocBLAS)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocBLAS)
|
||||
|
||||
@@ -20,6 +21,7 @@ via `rocBLAS` and `rocSOLVER`. `hipBLAS` allows for a common interface for other
|
||||
BLAS libraries.
|
||||
|
||||
- {doc}`Documentation <hipblas:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipBLAS)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
@@ -31,6 +33,7 @@ flexible API and extends functionalities beyond traditional BLAS library.
|
||||
optimized generator as a back-end kernel provider.
|
||||
|
||||
- {doc}`Documentation <hipblaslt:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipBLASLt)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLASLt/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
@@ -41,6 +44,7 @@ fine-grained parallelism on top of AMD's ROCm runtime and toolchains, targeting
|
||||
modern CPU and GPU platforms.
|
||||
|
||||
- {doc}`Documentation <rocalution:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocALUTION)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocALUTION/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
@@ -50,6 +54,7 @@ modern CPU and GPU platforms.
|
||||
(MMA) problems into fragments and distributes these over GPU wavefronts.
|
||||
|
||||
- {doc}`Documentation <rocwmma:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocWMMA)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
@@ -58,6 +63,7 @@ modern CPU and GPU platforms.
|
||||
`rocSOLVER` provides a subset of LAPACK (Linear Algebra Package) functionality on the ROCm platform.
|
||||
|
||||
- {doc}`Documentation <rocsolver:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocSOLVER)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
@@ -67,6 +73,7 @@ modern CPU and GPU platforms.
|
||||
as backends whilst exporting a unified interface.
|
||||
|
||||
- {doc}`Documentation <hipsolver:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipSOLVER)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
@@ -75,6 +82,7 @@ as backends whilst exporting a unified interface.
|
||||
`rocSPARSE` is a library to provide BLAS for sparse computations.
|
||||
|
||||
- {doc}`Documentation <rocsparse:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocSPARSE)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
@@ -84,6 +92,7 @@ as backends whilst exporting a unified interface.
|
||||
supporting both `rocSPARSE` and `cuSPARSE` as backends.
|
||||
|
||||
- {doc}`Documentation <hipsparse:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipSPARSE)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
@@ -7,6 +7,7 @@
|
||||
rocRAND is an AMD GPU optimized library for pseudo-random number generators (PRNG).
|
||||
|
||||
- {doc}`Documentation <rocrand:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocRAND/)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocRAND/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocRAND)
|
||||
|
||||
@@ -18,6 +19,7 @@ generation (PRNG) optimized for AMD GPUs using rocRAND. hipRAND allows for a
|
||||
common interface for other non AMD GPU PRNG libraries.
|
||||
|
||||
- {doc}`Documentation <hiprand:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipRAND/)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipRAND/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
@@ -12,7 +12,8 @@ page introduces the HIP runtime and other HIP libraries and tools.
|
||||
The HIP Runtime is used to enable GPU acceleration for all HIP language based
|
||||
products.
|
||||
|
||||
- {doc}`hip:.doxygen/docBin/html/index`
|
||||
- {doc}`Documentation <hip:index>`
|
||||
- [GitHub](https://github.com/ROCm-Developer-Tools/HIP)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)
|
||||
|
||||
:::
|
||||
@@ -28,7 +29,9 @@ products.
|
||||
HIPIFY assists with porting applications from based on CUDA to the HIP Runtime.
|
||||
Supported CUDA APIs are documented here as well.
|
||||
|
||||
- {doc}`Reference Manual <hipify:index>`
|
||||
- {doc}`Documentation <hipify:index>`
|
||||
- [GitHub](https://github.com/ROCm-Developer-Tools/HIPIFY/)
|
||||
- [Changelog](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
|
||||
@@ -3,27 +3,30 @@
|
||||
:::::{grid} 1 1 3 3
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} AMD SMI
|
||||
:::{grid-item-card} {doc}`AMD SMI <amdsmi:index>`
|
||||
The AMD System Management Interface Library, or AMD SMI library, is a C library for Linux that provides a user space interface for applications to monitor and control AMD devices.
|
||||
|
||||
- {doc}`Documentation <amdsmi:index>`
|
||||
- [GitHub](https://github.com/RadeonOpenCompute/amdsmi)
|
||||
- [Examples](https://github.com/amd/go_amd_smi#example)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [ROCm SMI](https://rocmdocs.amd.com/projects/rocm_smi_lib/en/latest/)
|
||||
:::{grid-item-card} {doc}`ROCm SMI LIB <rocm_smi_lib:index>`
|
||||
This tool acts as a command line interface for manipulating and monitoring the AMD GPU kernel, and is intended to replace and deprecate the existing `rocm_smi.py` CLI tool. It uses `ctypes` to call the `rocm_smi_lib` API.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rocm_smi_lib/en/latest/)
|
||||
- {doc}`Documentation <rocm_smi_lib:index>`
|
||||
- [GitHub](https://github.com/RadeonOpenCompute/rocm_smi_lib)
|
||||
- [Examples](https://github.com/RadeonOpenCompute/rocm_smi_lib/tree/master/python_smi_tools)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} ROCm Data Center Tool
|
||||
|
||||
:::{grid-item-card} {doc}`ROCm Data Center Tool <rdc:index>`
|
||||
The ROCm™ Data Center Tool simplifies the administration and addresses key infrastructure challenges in AMD GPUs in cluster and data center environments.
|
||||
|
||||
- [GitHub](https://github.com/RadeonOpenCompute/rdc)
|
||||
- [Changelog](https://github.com/RadeonOpenCompute/rdc/blob/master/CHANGELOG.md)
|
||||
- [Examples](https://github.com/RadeonOpenCompute/rdc/tree/master/example)
|
||||
|
||||
:::
|
||||
|
||||
@@ -53,11 +53,10 @@ that are required for target offload from an OpenMP program:
|
||||
```
|
||||
|
||||
:::{note}
|
||||
The Makefile in the example above uses a more classical and verbose set of flags
|
||||
which can also be used:
|
||||
The compiler also accepts the alternative offloading notation:
|
||||
|
||||
```bash
|
||||
-fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa
|
||||
-fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=<gpu-arch>
|
||||
```
|
||||
|
||||
:::
|
||||
@@ -158,6 +157,14 @@ implemented in the past releases.
|
||||
|
||||
(openmp_usm)=
|
||||
|
||||
### Asynchronous Behavior in OpenMP Target Regions
|
||||
|
||||
- Multithreaded offloading on the same device
|
||||
The `libomptarget` plugin for GPU offloading allows creation of separate configurable HSA queues per chiplet, which enables two or more threads to concurrently offload to the same device.
|
||||
|
||||
- Parallel memory copy invocations
|
||||
Implicit asynchronous execution of single target region enables parallel memory copy invocations.
|
||||
|
||||
### Unified Shared Memory
|
||||
|
||||
Unified Shared Memory (USM) provides a pointer-based approach to memory
|
||||
@@ -178,39 +185,34 @@ with Xnack capability.
|
||||
When enabled, Xnack capability allows GPU threads to access CPU (system) memory,
|
||||
allocated with OS-allocators, such as `malloc`, `new`, and `mmap`. Xnack must be
|
||||
enabled both at compile- and run-time. To enable Xnack support at compile-time,
|
||||
the programmer should use
|
||||
use:
|
||||
|
||||
```bash
|
||||
--offload-arch=gfx908:xnack+
|
||||
```
|
||||
|
||||
Or, equivalently
|
||||
Or use another functionally equivalent option Xnack-any:
|
||||
|
||||
```bash
|
||||
--offload-arch=gfx908
|
||||
```
|
||||
|
||||
:::{note}
|
||||
The second case is called Xnack-any and it is functionally equivalent to the
|
||||
first case.
|
||||
:::
|
||||
|
||||
At runtime, programmers enable Xnack functionality on a per-application basis
|
||||
using an environment variable:
|
||||
To enable Xnack functionality at runtime on a per-application basis,
|
||||
use environment variable:
|
||||
|
||||
```bash
|
||||
HSA_XNACK=1
|
||||
```
|
||||
|
||||
When Xnack support is not needed, then applications can be built to maximize
|
||||
resource utilization using:
|
||||
When Xnack support is not needed:
|
||||
|
||||
- Build the applications to maximize resource utilization using:
|
||||
|
||||
```bash
|
||||
--offload-arch=gfx908:xnack-
|
||||
```
|
||||
|
||||
At runtime, the `HSA_XNACK` environment variable can be set to 0, as Xnack
|
||||
functionality is not needed.
|
||||
- At runtime, set the `HSA_XNACK` environment variable to 0.
|
||||
|
||||
#### Unified Shared Memory Pragma
|
||||
|
||||
@@ -431,6 +433,18 @@ for(int i=0; i<N; i++){
|
||||
See the complete sample code for global buffer overflow
|
||||
[here](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/examples/tools/asan/global_buffer_overflow/openmp/vecadd-GBO.cpp).
|
||||
|
||||
### Clang Compiler Option for Kernel Optimization
|
||||
|
||||
You can use the clang compiler option `-fopenmp-target-fast` for kernel optimization if certain constraints implied by its component options are satisfied. `-fopenmp-target-fast` enables the following options:
|
||||
|
||||
- `-fopenmp-target-ignore-env-vars`: It enables code generation of specialized kernels including No-loop and Cross-team reductions.
|
||||
|
||||
- `-fopenmp-assume-no-thread-state`: It enables the compiler to assume that no thread in a parallel region modifies an Internal Control Variable (`ICV`), thus potentially reducing the device runtime code execution.
|
||||
|
||||
- `-fopenmp-assume-no-nested-parallelism`: It enables the compiler to assume that no thread in a parallel region encounters a parallel region, thus potentially reducing the device runtime code execution.
|
||||
|
||||
- `-O3` if no `-O*` is specified by the user.
|
||||
|
||||
### Specialized Kernels
|
||||
|
||||
Clang will attempt to generate specialized kernels based on compiler options and OpenMP constructs. The following specialized kernels are supported:
|
||||
|
||||
@@ -7,6 +7,8 @@
|
||||
The ROCm Validation Suite is a system administrator’s and cluster manager's tool for detecting and troubleshooting common problems affecting AMD GPU(s) running in a high-performance computing environment, enabled using the ROCm software stack on a compatible platform.
|
||||
|
||||
- {doc}`Documentation <rocmvalidationsuite:index>`
|
||||
- [GitHub](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite)
|
||||
- [Changelog](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/blob/master/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
@@ -14,6 +16,7 @@ The ROCm Validation Suite is a system administrator’s and cluster manager's to
|
||||
TransferBench is a simple utility capable of benchmarking simultaneous transfers between user-specified devices (CPUs/GPUs).
|
||||
|
||||
- {doc}`Documentation <transferbench:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/TransferBench/)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/TransferBench/blob/develop/CHANGELOG.md)
|
||||
- {doc}`transferbench:examples/index`
|
||||
|
||||
|
||||
@@ -19,6 +19,7 @@ TensorFlow
|
||||
| 5.3.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.8, 2.9, 2.10 | |
|
||||
| 5.4.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.8, 2.9, 2.10, 2.11 | 2.5.4 |
|
||||
| 5.5.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.10, 2.11 | 2.5.4 |
|
||||
| 5.6 | 1.11, 1.12.1, 1.13.1 | 2.12 | 2.5.4 |
|
||||
|
||||
## Communication libraries
|
||||
|
||||
@@ -47,6 +48,7 @@ contemporary CUDA / NVIDIA HPC SDK alternatives.
|
||||
| 5.3.x | 1.16 | 22.7 |
|
||||
| 5.4.x | 1.16 | 22.9 |
|
||||
| 5.5.x | 1.17 | 22.9 |
|
||||
| 5.6 | 1.17.2 | 22.9 |
|
||||
|
||||
For the latest documentation of these libraries, refer to the
|
||||
[associated documentation](../reference/gpu_libraries/c%2B%2B_primitives.md).
|
||||
|
||||
@@ -1,18 +1,52 @@
|
||||
# GPU and OS Support (Linux)
|
||||
# GPU Support and OS Compatibility (Linux)
|
||||
|
||||
(supported_distributions)=
|
||||
|
||||
## Supported Distributions
|
||||
## Supported Linux Distributions
|
||||
|
||||
AMD ROCm™ Platform supports the following Linux distributions.
|
||||
|
||||
| Distribution |Processor Architectures| Validated Kernel |
|
||||
|--------------------|-----------------------|--------------------|
|
||||
| RHEL 9.1 | x86-64 | 5.14 |
|
||||
| RHEL 8.6 to 8.7 | x86-64 | 4.18 |
|
||||
| SLES 15 SP4 | x86-64 | |
|
||||
| Ubuntu 20.04.5 LTS | x86-64 | 5.15 |
|
||||
| Ubuntu 22.04.1 LTS | x86-64 | 5.15, OEM 5.17 |
|
||||
::::{tab-set}
|
||||
|
||||
:::{tab-item} Supported
|
||||
|
||||
| Distribution | Processor Architectures | Validated Kernel | Support |
|
||||
| :----------- | :---------------------: | :--------------: | ------: |
|
||||
| RHEL 9.2 | x86-64 | 5.14 (5.14.0-284.11.1.el9_2.x86_64) | ✅ |
|
||||
| RHEL 9.1 | x86-64 | 5.14.0-284.11.1.el9_2.x86_64 | ✅ |
|
||||
| RHEL 8.8 | x86-64 | 4.18.0-477.el8.x86_64 | ✅ |
|
||||
| RHEL 8.7 | x86-64 | 4.18.0-425.10.1.el8_7.x86_64 | ✅ |
|
||||
| SLES 15 SP5 | x86-64 | 5.14.21-150500.53-default | ✅ |
|
||||
| SLES 15 SP4 | x86-64 | 5.14.21-150400.24.63-default | ✅ |
|
||||
| Ubuntu 22.04.2 | x86-64 | 5.19.0-45-generic | ✅ |
|
||||
| Ubuntu 20.04.5 | x86-64 | 5.15.0-75-generic | ✅ |
|
||||
|
||||
:::{versionadded} 5.6
|
||||
|
||||
- RHEL 8.8 and 9.2 support is added.
|
||||
- SLES 15 SP5 support is added
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Unsupported
|
||||
|
||||
| Distribution | Processor Architectures | Validated Kernel | Support |
|
||||
| :----------- | :---------------------: | :--------------: | ------: |
|
||||
| RHEL 9.0 | x86-64 | 5.14 | ❌ |
|
||||
| RHEL 8.6 | x86-64 | 5.14 | ❌ |
|
||||
| SLES 15 SP3 | x86-64 | 5.3 | ❌ |
|
||||
| Ubuntu 22.04.0 | x86-64 | 5.15 LTS, 5.17 OEM | ❌ |
|
||||
| Ubuntu 20.04.4 | x86-64 | 5.13 HWE, 5.13 OEM | ❌ |
|
||||
| Ubuntu 22.04.1 | x86-64 | 5.15 LTS | ❌ |
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
- ✅: **Supported** - AMD performs full testing of all ROCm components on distro
|
||||
GA image.
|
||||
- ❌: **Unsupported** - AMD no longer performs builds and testing on these
|
||||
previously supported distro GA images.
|
||||
|
||||
## Virtualization Support
|
||||
|
||||
@@ -26,7 +60,9 @@ ROCm supports virtualization for select GPUs only as shown below.
|
||||
|
||||
(supported_gpus)=
|
||||
|
||||
## GPU Support Table
|
||||
## Supported GPUs
|
||||
|
||||
The following table shows the list of GPUs supported on Linux distributions:
|
||||
|
||||
::::{tab-set}
|
||||
|
||||
@@ -59,6 +95,17 @@ Use Driver Shipped with ROCm
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Radeon™
|
||||
:sync: radeonpro
|
||||
|
||||
[Use Radeon Pro Driver](https://www.amd.com/en/support/linux-drivers)
|
||||
|
||||
| Name | Architecture |[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Support|
|
||||
|:----:|:------------:|:--------------------------------------------------------------------:|:-------:|
|
||||
| AMD Radeon™ VII | GCN5.1 | gfx906 | ✅ |
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
### Support Status
|
||||
|
||||
@@ -88,20 +88,6 @@ subtrees:
|
||||
- caption: APIs and Reference
|
||||
entries:
|
||||
- file: reference/all
|
||||
- file: reference/compilers
|
||||
title: Compilers and Tools
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: reference/rocmcc/rocmcc
|
||||
title: ROCmCC
|
||||
- url: ${project:rocgdb}
|
||||
title: ROCgdb
|
||||
- url: ${project:rocprofiler}
|
||||
title: rocprofiler
|
||||
- url: ${project:roctracer}
|
||||
title: roctracer
|
||||
- url: ${project:rocdbgapi}
|
||||
title: ROCdbgapi
|
||||
- file: reference/hip
|
||||
subtrees:
|
||||
- entries:
|
||||
@@ -109,8 +95,6 @@ subtrees:
|
||||
url: ${project:hip}
|
||||
- title: HIPify - Port Your Code
|
||||
url: ${project:hipify}
|
||||
- file: reference/openmp/openmp
|
||||
title: OpenMP
|
||||
- file: reference/gpu_libraries/math
|
||||
title: Math Libraries
|
||||
subtrees:
|
||||
@@ -186,6 +170,22 @@ subtrees:
|
||||
- entries:
|
||||
- url: ${project:rocal}
|
||||
title: rocAL
|
||||
- file: reference/openmp/openmp
|
||||
title: OpenMP
|
||||
- file: reference/compilers
|
||||
title: Compilers and Tools
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: reference/rocmcc/rocmcc
|
||||
title: ROCmCC
|
||||
- url: ${project:rocgdb}
|
||||
title: ROCgdb
|
||||
- url: ${project:rocprofiler}
|
||||
title: rocprofiler
|
||||
- url: ${project:roctracer}
|
||||
title: roctracer
|
||||
- url: ${project:rocdbgapi}
|
||||
title: ROCdbgapi
|
||||
- file: reference/management_tools
|
||||
title: Management Tools
|
||||
subtrees:
|
||||
|
||||
@@ -1,2 +1 @@
|
||||
rocm-docs-core==0.18.3
|
||||
|
||||
|
||||
@@ -1,10 +1,12 @@
|
||||
# Linux Folder Structure Reorganization
|
||||
# ROCm FHS Reorganization
|
||||
|
||||
## Introduction
|
||||
|
||||
ROCm™ packages have adopted the Linux foundation file system hierarchy standard
|
||||
to ensure ROCm components follow open source conventions for Linux-based
|
||||
distributions. Following is the ROCm proposed file structure.
|
||||
The ROCm platform has adopted the Linux foundation Filesystem Hierarchy Standard (FHS) [https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html](https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html) in order to to ensure ROCm is consistent with standard open source conventions. The following sections specify how current and future releases of ROCm adhere to FHS, how the previous ROCm filesystem is supported, and how improved versioning specifications are applied to ROCm.
|
||||
|
||||
## Adopting the Linux foundation Filesystem Hierarchy Standard (FHS)
|
||||
|
||||
In order to standardize ROCm directory structure and directory content layout ROCm has adopted the [FHS](https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html), adhering to open source conventions for Linux-based distribution. FHS ensures internal consistency within the ROCm stack, as well as external consistency with other systems and distributions. The ROCm proposed file structure is outlined below:
|
||||
|
||||
```none
|
||||
/opt/rocm-<ver>
|
||||
@@ -42,14 +44,13 @@ distributions. Following is the ROCm proposed file structure.
|
||||
| -- architecture independent misc files
|
||||
```
|
||||
|
||||
## Changes from earlier ROCm versions
|
||||
## Changes From Earlier ROCm Versions
|
||||
|
||||
ROCm with the file reorganization is going to have a lean structure. Following
|
||||
table gives the comparison with new and old folder structure.
|
||||
The following table provides a brief overview of the new ROCm FHS layout, compared to the layout of earlier ROCm versions. Note that /opt/ is used to denote the default rocm-installation-path and should be replaced in case of a non-standard installation location of the ROCm distribution.
|
||||
|
||||
```none
|
||||
______________________________________________________
|
||||
| New File Structure | Old File Structure |
|
||||
| New ROCm Layout | Previous ROCm Layout |
|
||||
|_____________________________|________________________|
|
||||
| /opt/rocm-<ver> | /opt/rocm-<ver> |
|
||||
| | -- bin | | -- bin |
|
||||
@@ -72,39 +73,28 @@ table gives the comparison with new and old folder structure.
|
||||
|______________________________________________________|
|
||||
```
|
||||
|
||||
## ROCm File reorganization transition plan
|
||||
## ROCm FHS Reorganization: Backward Compatibility
|
||||
|
||||
New file organization for ROCm was first introduced ROCm v5.2 release. Backward
|
||||
compatibility was in place to make sure users had a chance to change their
|
||||
applications using ROCm. ROCm has moved header files and libraries to its new
|
||||
location as indicated in the above structure and included symbolic-link and
|
||||
wrapper header files in its old location for backward compatibility.
|
||||
The FHS file organization for ROCm was first introduced in the release of ROCm 5.2 . Backward compatibility was implemented to make sure users could still run their ROCm applications while transitioning to the new FHS. ROCm has moved header files and libraries to their new locations as indicated in the above structure, and included symbolic-links and wrapper header files in their old location for backward compatibility. The following sections detail ROCm backward compatibility implementation for wrapper header files, executable files, library files and CMake config files.
|
||||
|
||||
### Wrapper header files
|
||||
### Wrapper Header Files
|
||||
|
||||
Wrapper header files are placed in the old location (
|
||||
`/opt/rocm-xxx/<component>/include`) with a warning message to include files
|
||||
from the new location (`/opt/rocm-xxx/include`) as shown in the example below.
|
||||
`/opt/rocm-<ver>/<component>/include`) with a warning message to include files
|
||||
from the new location (`/opt/rocm-<ver>/include`) as shown in the example below.
|
||||
|
||||
```cpp
|
||||
#pragma message "This file is deprecated. Use file from include path /opt/rocm-ver/include/ and prefix with hip."
|
||||
#include "hip/hip_runtime.h"
|
||||
#include <hip/hip_runtime.h>
|
||||
```
|
||||
|
||||
The deprecation plan for backward compatibility wrapper header files is as
|
||||
follows
|
||||
- Starting at ROCm 5.2 release, the deprecation for backward compatibility wrapper header files is: `#pragma` message announcing `#warning`.
|
||||
- Starting from ROCm 6.0 (tentatively) backward compatibility for wrapper header files will be removed, and the `#pragma` message will be announcing `#error`.
|
||||
|
||||
- `#pragma` message announcing deprecation – ROCm v5.2 release.
|
||||
- `#pragma` message changed to `#warning` – Future release, tentatively ROCm
|
||||
v5.5.
|
||||
- `#warning` changed to `#error` – Future release, tentatively ROCm v5.6.
|
||||
- Backward compatibility wrappers removed – Future release, tentatively ROCm
|
||||
v6.0.
|
||||
### Executable Files
|
||||
|
||||
### Executable files
|
||||
|
||||
Executable files are available in the `/opt/rocm-xxx/bin` folder. For backward
|
||||
compatibility, the old library location (`/opt/rocm-xxx/<component>/bin`) has a
|
||||
Executable files are available in the `/opt/rocm-<ver>/bin` folder. For backward
|
||||
compatibility, the old library location (`/opt/rocm-<ver>/<component>/bin`) has a
|
||||
soft link to the library at the new location. Soft links will be removed in a
|
||||
future release, tentatively ROCm v6.0.
|
||||
|
||||
@@ -113,10 +103,10 @@ $ ls -l /opt/rocm/hip/bin/
|
||||
lrwxrwxrwx 1 root root 24 Jan 1 23:32 hipcc -> ../../bin/hipcc
|
||||
```
|
||||
|
||||
### Library files
|
||||
### Library Files
|
||||
|
||||
Library files are available in the `/opt/rocm-xxx/lib` folder. For backward
|
||||
compatibility, the old library location (`/opt/rocm-xxx/<component>/lib`) has a
|
||||
Library files are available in the `/opt/rocm-<ver>/lib` folder. For backward
|
||||
compatibility, the old library location (`/opt/rocm-<ver>/<component>/lib`) has a
|
||||
soft link to the library at the new location. Soft links will be removed in a
|
||||
future release, tentatively ROCm v6.0.
|
||||
|
||||
@@ -126,11 +116,11 @@ drwxr-xr-x 4 root root 4096 Jan 1 10:45 cmake
|
||||
lrwxrwxrwx 1 root root 24 Jan 1 23:32 libamdhip64.so -> ../../lib/libamdhip64.so
|
||||
```
|
||||
|
||||
### CMake Config files
|
||||
### CMake Config Files
|
||||
|
||||
All CMake configuration files are available in the
|
||||
`/opt/rocm-xxx/lib/cmake/<component>` folder. For backward compatibility, the
|
||||
old CMake locations (`/opt/rocm-xxx/<component>/lib/cmake`) consist of a soft
|
||||
`/opt/rocm-<ver>/lib/cmake/<component>` folder. For backward compatibility, the
|
||||
old CMake locations (`/opt/rocm-<ver>/<component>/lib/cmake`) consist of a soft
|
||||
link to the new CMake config. Soft links will be removed in a future release,
|
||||
tentatively ROCm v6.0.
|
||||
|
||||
@@ -139,10 +129,10 @@ $ ls -l /opt/rocm/hip/lib/cmake/hip/
|
||||
lrwxrwxrwx 1 root root 42 Jan 1 23:32 hip-config.cmake -> ../../../../lib/cmake/hip/hip-config.cmake
|
||||
```
|
||||
|
||||
## Changes required in applications using ROCm
|
||||
## Changes Required in Applications Using ROCm
|
||||
|
||||
Applications using ROCm are advised to use the new file paths. As the old files
|
||||
will be deprecated in a future release. Application have to make sure to include
|
||||
will be deprecated in a future release. Applications have to make sure to include
|
||||
correct header file and use correct search paths.
|
||||
|
||||
1. `#include<header_file.h>` needs to be changed to
|
||||
@@ -158,10 +148,18 @@ correct header file and use correct search paths.
|
||||
`VAR2=/opt/rocm/hsa` needs to be changed to `VAR2=/opt/rocm`
|
||||
|
||||
3. Any reference to `/opt/rocm/<component>/bin` or `/opt/rocm/<component>/lib`
|
||||
needs to be changed to `/opt/rocm/bin` and `/opt/rocm/lib/` respectively.
|
||||
needs to be changed to `/opt/rocm/bin` and `/opt/rocm/lib/`, respectively.
|
||||
|
||||
## References
|
||||
## Changes in Versioning Specifications
|
||||
|
||||
{ref}`ROCm deprecation warning <5_4_0_filesystem_reorg_deprecation_notice>`
|
||||
In order to better manage ROCm dependencies specification and allow smoother releases of ROCm while avoiding dependency conflicts, the ROCm platform shall adhere to the following scheme when numbering and incrementing ROCm files versions:
|
||||
|
||||
[Linux File System Standard](https://refspecs.linuxfoundation.org/fhs.shtml)
|
||||
rocm-\<ver\>, where \<ver\> = \<x.y.z\>
|
||||
|
||||
x.y.z denote: MAJOR.MINOR.PATCH
|
||||
|
||||
z: PATCH - increment z when implementing backward compatible bug fixes.
|
||||
|
||||
y: MINOR - increment y when implementing minor changes that add functionality but are still backward compatible.
|
||||
|
||||
x: MAJOR - increment x when implementing major changes that are not backward compatible.
|
||||
|
||||
@@ -81,7 +81,6 @@ class TaggingArgs(argparse.Namespace):
|
||||
def exclude(self) -> List[str]:
|
||||
"""Get the excluded libraries plus defaults."""
|
||||
defaults = [
|
||||
"AMDMIGraphX",
|
||||
"MIOpenGEMM",
|
||||
"MIOpenKernels",
|
||||
"MIOpenTensile",
|
||||
@@ -236,9 +235,21 @@ def run_tagging():
|
||||
)
|
||||
|
||||
# Find all the math libraries and their remotes.
|
||||
names_and_remotes = list(
|
||||
(entry.get("name"), entry.get("remote")) for entry in manifest_tree.findall(".//project[@groups='mathlibs']")
|
||||
)
|
||||
included_names = [
|
||||
"rocm-cmake",
|
||||
"MIOpen",
|
||||
"AMDMIGraphX",
|
||||
"rocprofiler"
|
||||
]
|
||||
included_groups = [
|
||||
"mathlibs"
|
||||
]
|
||||
projects = [ ]
|
||||
for project in manifest_tree.iterfind(".//project"):
|
||||
include = str(project.get("name")) in included_names
|
||||
if (project.get("name") in included_names) or (project.get("groups") in included_groups):
|
||||
projects.append(project)
|
||||
names_and_remotes = list((entry.get("name"), entry.get("remote")) for entry in projects)
|
||||
|
||||
# Get all the relevant ROCm releases, and only the last version if not doing previous.
|
||||
minimum_version = "5.0.0" if args.previous else args.version
|
||||
@@ -249,12 +260,17 @@ def run_tagging():
|
||||
for (version, release) in releases.items():
|
||||
for (_, library) in release.libraries.items():
|
||||
# Parse the changelog for each library and each version
|
||||
success = PROCESSORS[library.name](
|
||||
library,
|
||||
TEMPLATES[library.name],
|
||||
args.previous,
|
||||
Version(version) < Version(args.version)
|
||||
)
|
||||
try:
|
||||
success = PROCESSORS[library.name](
|
||||
library,
|
||||
TEMPLATES[library.name],
|
||||
args.previous,
|
||||
Version(version) < Version(args.version)
|
||||
)
|
||||
except Exception as e:
|
||||
success = False
|
||||
print(f"Exception parsing {library.name} for ROCm {version}")
|
||||
print(e)
|
||||
if not success:
|
||||
print(f"Error processing {library.name} for ROCm {version}")
|
||||
failed.append((version, library.name))
|
||||
|
||||
@@ -32,15 +32,15 @@ The release notes for the ROCm platform.
|
||||
| Library | Version |
|
||||
|---------|---------|
|
||||
{%- for lib_name, lib in release.libraries | dictsort %}
|
||||
{%- if rocm_ver_by_lib_ver[lib_name][lib.lib_version] == version %}
|
||||
{%- if rocm_ver_by_lib_ver[lib_name][lib.lib_version] == version and lib.lib_version %}
|
||||
| {{ lib_name }} | {{prev_lib_ver[lib_name][lib.lib_version]}} ⇒ [{{ lib.lib_version }}]({{ lib.release_url }}) |
|
||||
{%- else %}
|
||||
{%- elif lib.lib_version %}
|
||||
| {{ lib_name }} | [{{ lib.lib_version }}]({{ lib.release_url }}) |
|
||||
{%- endif %}
|
||||
{%- endfor %}
|
||||
|
||||
{%- for lib_name, lib in release.libraries | dictsort %}
|
||||
{%- if rocm_ver_by_lib_ver[lib_name][lib.lib_version] == version %}
|
||||
{%- if rocm_ver_by_lib_ver[lib_name][lib.lib_version] == version and lib.lib_version%}
|
||||
|
||||
#### {{lib_name}} {{lib.lib_version}}
|
||||
|
||||
|
||||
@@ -95,8 +95,6 @@ The `hipcc` and `hipconfig` Perl scripts are deprecated. In a future release, co
|
||||
>
|
||||
> There will be a transition period where the Perl scripts and compiled binaries are available before the scripts are removed. There will be no functional difference between the Perl scripts and their compiled binary counterpart. No user action is required. Once these are available, users can optionally switch to `hipcc.bin` and `hipconfig.bin`. The `hipcc`/`hipconfig` soft link will be assimilated to point from `hipcc`/`hipconfig` to the respective compiled binaries as the default option.
|
||||
|
||||
(5_4_0_filesystem_reorg_deprecation_notice)=
|
||||
|
||||
##### Linux Filesystem Hierarchy Standard for ROCm
|
||||
|
||||
ROCm packages have adopted the Linux foundation filesystem hierarchy standard in this release to ensure ROCm components follow open source conventions for Linux-based distributions. While moving to a new filesystem hierarchy, ROCm ensures backward compatibility with its 5.1 version or older filesystem hierarchy. See below for a detailed explanation of the new filesystem hierarchy and backward compatibility.
|
||||
|
||||
@@ -303,7 +303,3 @@ When user applications call `ncclCommAbort` to destruct communicators and then c
|
||||
communicators repeatedly, subsequent communicators may fail to initialize.
|
||||
|
||||
This issue is under investigation and will be resolved in a future release.
|
||||
|
||||
#### Failures In HIP Directed Tests
|
||||
|
||||
Multiple HIP directed tests fail.
|
||||
|
||||
191
tools/autotag/templates/rocm_changes/5.6.0.md
Normal file
191
tools/autotag/templates/rocm_changes/5.6.0.md
Normal file
@@ -0,0 +1,191 @@
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
<!-- markdownlint-disable header-increment -->
|
||||
#### Release Highlights
|
||||
|
||||
ROCm 5.6 consists of several AI software ecosystem improvements to our fast-growing user base. A few examples include:
|
||||
|
||||
- New documentation portal at https://rocm.docs.amd.com
|
||||
- Ongoing software enhancements for LLMs, ensuring full compliance with the HuggingFace unit test suite
|
||||
- OpenAI Triton, CuPy, HIP Graph support, and many other library performance enhancements
|
||||
- Improved ROCm deployment and development tools, including CPU-GPU (rocGDB) debugger, profiler, and docker containers
|
||||
- New pseudorandom generators are available in rocRAND. Added support for half-precision transforms in hipFFT/rocFFT. Added LU refactorization and linear system solver for sparse matrices in rocSOLVER.
|
||||
|
||||
#### OS and GPU Support Changes
|
||||
|
||||
- SLES15 SP5 support was added this release. SLES15 SP3 support was dropped.
|
||||
- AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively referred to as gfx906 GPUs) will be entering the maintenance mode starting Q3 2023. This will be aligned with ROCm 5.7 GA release date.
|
||||
- No new features and performance optimizations will be supported for the gfx906 GPUs beyond ROCm 5.7
|
||||
- Bug fixes / critical security patches will continue to be supported for the gfx906 GPUs till Q2 2024 (End of Maintenance [EOM])(will be aligned with the closest ROCm release)
|
||||
- Bug fixes during the maintenance will be made to the next ROCm point release
|
||||
- Bug fixes will not be back ported to older ROCm releases for this SKU
|
||||
- Distro / Operating system updates will continue as per the ROCm release cadence for gfx906 GPUs till EOM.
|
||||
|
||||
#### AMDSMI CLI 23.0.0.4
|
||||
|
||||
##### Added
|
||||
|
||||
- AMDSMI CLI tool enabled for Linux Bare Metal & Guest
|
||||
|
||||
- Package: amd-smi-lib
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- not all Error Correction Code (ECC) fields are currently supported
|
||||
|
||||
- RHEL 8 & SLES 15 have extra install steps
|
||||
|
||||
#### Kernel Modules (DKMS)
|
||||
|
||||
##### Fixes
|
||||
|
||||
- Stability fix for multi GPU system reproducilble via ROCm_Bandwidth_Test as reported in [Issue 2198](https://github.com/RadeonOpenCompute/ROCm/issues/2198).
|
||||
|
||||
#### HIP 5.6 (For ROCm 5.6)
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Consolidation of hipamd, rocclr and OpenCL projects in clr
|
||||
- Optimized lock for graph global capture mode
|
||||
|
||||
##### Added
|
||||
|
||||
- Added hipRTC support for amd_hip_fp16
|
||||
- Added hipStreamGetDevice implementation to get the device associated with the stream
|
||||
- Added HIP_AD_FORMAT_SIGNED_INT16 in hipArray formats
|
||||
- hipArrayGetInfo for getting information about the specified array
|
||||
- hipArrayGetDescriptor for getting 1D or 2D array descriptor
|
||||
- hipArray3DGetDescriptor to get 3D array descriptor
|
||||
|
||||
##### Changed
|
||||
|
||||
- hipMallocAsync to return success for zero size allocation to match hipMalloc
|
||||
- Separation of hipcc perl binaries from HIP project to hipcc project. hip-devel package depends on newly added hipcc package
|
||||
- Consolidation of hipamd, ROCclr, and OpenCL repositories into a single repository called clr. Instructions are updated to build HIP from sources in the HIP Installation guide
|
||||
- Removed hipBusBandwidth and hipCommander samples from hip-tests
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed regression in hipMemCpyParam3D when offset is applied
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- Limited testing on xnack+ configuration
|
||||
- Multiple HIP tests failures (gpuvm fault or hangs)
|
||||
- hipSetDevice and hipSetDeviceFlags APIs return hipErrorInvalidDevice instead of hipErrorNoDevice, on a system without GPU
|
||||
- Known memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs. Issue will be fixed in a future ROCm release
|
||||
|
||||
##### Upcoming changes in future release
|
||||
|
||||
- Removal of gcnarch from hipDeviceProp_t structure
|
||||
- Addition of new fields in hipDeviceProp_t structure
|
||||
- maxTexture1D
|
||||
- maxTexture2D
|
||||
- maxTexture1DLayered
|
||||
- maxTexture2DLayered
|
||||
- sharedMemPerMultiprocessor
|
||||
- deviceOverlap
|
||||
- asyncEngineCount
|
||||
- surfaceAlignment
|
||||
- unifiedAddressing
|
||||
- computePreemptionSupported
|
||||
- uuid
|
||||
- Removal of deprecated code
|
||||
- hip-hcc codes from hip code tree
|
||||
- Correct hipArray usage in HIP APIs such as hipMemcpyAtoH and hipMemcpyHtoA
|
||||
- HIPMEMCPY_3D fields correction (unsigned int -> size_t)
|
||||
- Renaming of 'memoryType' in hipPointerAttribute_t structure to 'type'
|
||||
|
||||
#### ROCgdb-13 (For ROCm 5.6.0)
|
||||
|
||||
##### Optimized
|
||||
|
||||
- Improved performances when handling the end of a process with a large number of threads.
|
||||
|
||||
Known Issues
|
||||
|
||||
- On certain configurations, ROCgdb can show the following warning message:
|
||||
|
||||
`warning: Probes-based dynamic linker interface failed. Reverting to original interface.`
|
||||
|
||||
This does not affect ROCgdb's functionalities.
|
||||
|
||||
#### ROCprofiler (For ROCm 5.6.0)
|
||||
|
||||
In ROCm 5.6 the `rocprofilerv1` and `rocprofilerv2` include and library files of
|
||||
ROCm 5.5 are split into separate files. The `rocmtools` files that were
|
||||
deprecated in ROCm 5.5 have been removed.
|
||||
|
||||
| ROCm 5.6 | rocprofilerv1 | rocprofilerv2 |
|
||||
|-----------------|-------------------------------------|----------------------------------------|
|
||||
| **Tool script** | `bin/rocprof` | `bin/rocprofv2` |
|
||||
| **API include** | `include/rocprofiler/rocprofiler.h` | `include/rocprofiler/v2/rocprofiler.h` |
|
||||
| **API library** | `lib/librocprofiler.so.1` | `lib/librocprofiler.so.2` |
|
||||
|
||||
The ROCm Profiler Tool that uses `rocprofilerV1` can be invoked using the
|
||||
following command:
|
||||
|
||||
```sh
|
||||
$ rocprof …
|
||||
```
|
||||
|
||||
To write a custom tool based on the `rocprofilerV1` API do the following:
|
||||
|
||||
```C
|
||||
main.c:
|
||||
#include <rocprofiler/rocprofiler.h> // Use the rocprofilerV1 API
|
||||
int main() {
|
||||
// Use the rocprofilerV1 API
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
This can be built in the following manner:
|
||||
|
||||
```sh
|
||||
$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64
|
||||
```
|
||||
|
||||
The resulting `a.out` will depend on
|
||||
`/opt/rocm-5.6.0/lib/librocprofiler64.so.1`.
|
||||
|
||||
The ROCm Profiler that uses `rocprofilerV2` API can be invoked using the
|
||||
following command:
|
||||
|
||||
```sh
|
||||
$ rocprofv2 …
|
||||
```
|
||||
|
||||
To write a custom tool based on the `rocprofilerV2` API do the following:
|
||||
|
||||
```C
|
||||
main.c:
|
||||
#include <rocprofiler/v2/rocprofiler.h> // Use the rocprofilerV2 API
|
||||
int main() {
|
||||
// Use the rocprofilerV2 API
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
This can be built in the following manner:
|
||||
|
||||
```sh
|
||||
$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64-v2
|
||||
```
|
||||
|
||||
The resulting `a.out` will depend on
|
||||
`/opt/rocm-5.6.0/lib/librocprofiler64.so.2`.
|
||||
|
||||
##### Optimized
|
||||
|
||||
- Improved Test Suite
|
||||
|
||||
##### Added
|
||||
|
||||
- 'end_time' need to be disabled in roctx_trace.txt
|
||||
|
||||
##### Fixed
|
||||
|
||||
- rocprof in ROcm/5.4.0 gpu selector broken.
|
||||
- rocprof in ROCm/5.4.1 fails to generate kernel info.
|
||||
- rocprof clobbers LD_PRELOAD.
|
||||
Reference in New Issue
Block a user