mirror of
https://github.com/ROCm/ROCm.git
synced 2026-04-05 03:01:17 -04:00
ROCm A-Z page & link cleanup (#2450)
This commit is contained in:
@@ -1,53 +1,59 @@
|
||||
# Compilers and tools
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} {doc}`ROCdbgapi <rocdbgapi:index>`
|
||||
The AMD Debugger API is a library that provides all the support necessary for a
|
||||
debugger and other tools to perform low level control of the execution and
|
||||
inspection of execution state of AMD's commercially available GPU architectures.
|
||||
|
||||
- {doc}`Documentation <rocdbgapi:index>`
|
||||
- [GitHub](https://github.com/ROCm-Developer-Tools/ROCdbgapi/)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [ROCmCC](../rocmcc/rocmcc)
|
||||
ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance
|
||||
computing on AMD GPUs and CPUs and supports various heterogeneous programming
|
||||
models such as HIP, OpenMP, and OpenCL.
|
||||
|
||||
- [Documentation](../rocmcc/rocmcc)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`ROCgdb <rocgdb:index>`
|
||||
This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.
|
||||
|
||||
- {doc}`Documentation <rocgdb:index>`
|
||||
- [GitHub](https://github.com/ROCm-Developer-Tools/ROCgdb/)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`ROCProfiler <rocprofiler:rocprof>`
|
||||
ROC profiler library. Profiling with performance counters and derived metrics. Library supports GFX8/GFX9. Hardware specific low-level performance analysis interface for profiling of GPU compute applications. The profiling includes hardware performance counters with complex performance metrics.
|
||||
|
||||
- {doc}`Documentation <rocprofiler:rocprof>`
|
||||
- [GitHub](https://github.com/ROCm-Developer-Tools/rocprofiler/)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`ROCTracer <roctracer:index>`
|
||||
Callback/Activity Library for Performance tracing AMD GPUs
|
||||
|
||||
- {doc}`Documentation <roctracer:index>`
|
||||
- [GitHub](https://github.com/ROCm-Developer-Tools/roctracer)
|
||||
|
||||
:::
|
||||
|
||||
:::::
|
||||
|
||||
## See Also
|
||||
|
||||
- [Compiler Disambiguation](../../conceptual/compiler_disambiguation.md)
|
||||
# ROCm compilers and tools
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} {doc}`ROCdbgapi <rocdbgapi:index>`
|
||||
|
||||
The AMD Debugger API is a library that provides all the support necessary for a
|
||||
debugger and other tools to perform low level control of the execution and
|
||||
inspection of execution state of AMD's commercially available GPU architectures.
|
||||
|
||||
* {doc}`Documentation <rocdbgapi:index>`
|
||||
* [GitHub](https://github.com/ROCm-Developer-Tools/ROCdbgapi/)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**[ROCmCC](../rocmcc/rocmcc.md)**
|
||||
|
||||
ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance
|
||||
computing on AMD GPUs and CPUs and supports various heterogeneous programming
|
||||
models such as HIP, OpenMP, and OpenCL.
|
||||
|
||||
* [Documentation](../rocmcc/rocmcc.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`ROCgdb <rocgdb:index>`
|
||||
|
||||
This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.
|
||||
|
||||
* {doc}`Documentation <rocgdb:index>`
|
||||
* [GitHub](https://github.com/ROCm-Developer-Tools/ROCgdb/)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`ROCProfiler <rocprofiler:rocprof>`
|
||||
|
||||
ROC profiler library. Profiling with performance counters and derived metrics. Library supports GFX8/GFX9. Hardware specific low-level performance analysis interface for profiling of GPU compute applications. The profiling includes hardware performance counters with complex performance metrics.
|
||||
|
||||
* {doc}`Documentation <rocprofiler:rocprof>`
|
||||
* [GitHub](https://github.com/ROCm-Developer-Tools/rocprofiler/)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`ROCTracer <roctracer:index>`
|
||||
|
||||
Callback/Activity Library for Performance tracing AMD GPUs
|
||||
|
||||
* {doc}`Documentation <roctracer:index>`
|
||||
* [GitHub](https://github.com/ROCm-Developer-Tools/roctracer)
|
||||
|
||||
:::
|
||||
|
||||
:::::
|
||||
|
||||
## See Also
|
||||
|
||||
* [Compiler Disambiguation](../../conceptual/compiler-disambiguation.md)
|
||||
@@ -4,29 +4,31 @@
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} {doc}`AMD SMI <amdsmi:index>`
|
||||
|
||||
The AMD System Management Interface Library, or AMD SMI library, is a C library for Linux that provides a user space interface for applications to monitor and control AMD devices.
|
||||
|
||||
- {doc}`Documentation <amdsmi:index>`
|
||||
- [GitHub](https://github.com/RadeonOpenCompute/amdsmi)
|
||||
- [Examples](https://github.com/amd/go_amd_smi#example)
|
||||
* [GitHub](https://github.com/RadeonOpenCompute/amdsmi)
|
||||
* [Examples](https://github.com/amd/go_amd_smi#example)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`ROCm SMI LIB <rocm_smi_lib:index>`
|
||||
|
||||
This tool acts as a command line interface for manipulating and monitoring the AMD GPU kernel, and is intended to replace and deprecate the existing `rocm_smi.py` CLI tool. It uses `ctypes` to call the `rocm_smi_lib` API.
|
||||
|
||||
- {doc}`Documentation <rocm_smi_lib:index>`
|
||||
- [GitHub](https://github.com/RadeonOpenCompute/rocm_smi_lib)
|
||||
- [Examples](https://github.com/RadeonOpenCompute/rocm_smi_lib/tree/master/python_smi_tools)
|
||||
* {doc}`Documentation <rocm_smi_lib:index>`
|
||||
* [GitHub](https://github.com/RadeonOpenCompute/rocm_smi_lib)
|
||||
* [Examples](https://github.com/RadeonOpenCompute/rocm_smi_lib/tree/master/python_smi_tools)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`ROCm Data Center Tool <rdc:index>`
|
||||
|
||||
The ROCm™ Data Center Tool simplifies the administration and addresses key infrastructure challenges in AMD GPUs in cluster and data center environments.
|
||||
|
||||
- [GitHub](https://github.com/RadeonOpenCompute/rdc)
|
||||
- [Changelog](https://github.com/RadeonOpenCompute/rdc/blob/master/CHANGELOG.md)
|
||||
- [Examples](https://github.com/RadeonOpenCompute/rdc/tree/master/example)
|
||||
* [GitHub](https://github.com/RadeonOpenCompute/rdc)
|
||||
* [Changelog](https://github.com/RadeonOpenCompute/rdc/blob/master/CHANGELOG.md)
|
||||
* [Examples](https://github.com/RadeonOpenCompute/rdc/tree/master/example)
|
||||
|
||||
:::
|
||||
|
||||
@@ -4,21 +4,23 @@
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} {doc}`RVS <rocmvalidationsuite:index>`
|
||||
|
||||
The ROCm Validation Suite is a system administrator’s and cluster manager's tool for detecting and troubleshooting common problems affecting AMD GPU(s) running in a high-performance computing environment, enabled using the ROCm software stack on a compatible platform.
|
||||
|
||||
- {doc}`Documentation <rocmvalidationsuite:index>`
|
||||
- [GitHub](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite)
|
||||
- [Changelog](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/blob/master/CHANGELOG.md)
|
||||
* {doc}`Documentation <rocmvalidationsuite:index>`
|
||||
* [GitHub](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite)
|
||||
* [Changelog](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/blob/master/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`TransferBench <transferbench:index>`
|
||||
|
||||
TransferBench is a simple utility capable of benchmarking simultaneous transfers between user-specified devices (CPUs/GPUs).
|
||||
|
||||
- {doc}`Documentation <transferbench:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/TransferBench/)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/TransferBench/blob/develop/CHANGELOG.md)
|
||||
- {doc}`transferbench:examples/index`
|
||||
* {doc}`Documentation <transferbench:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/TransferBench/)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/TransferBench/blob/develop/CHANGELOG.md)
|
||||
* {doc}`transferbench:examples/index`
|
||||
|
||||
:::
|
||||
|
||||
@@ -1,3 +0,0 @@
|
||||
# ROCm compilers and tools
|
||||
|
||||
add links...
|
||||
@@ -1,22 +0,0 @@
|
||||
# Computer Vision
|
||||
|
||||
::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} {doc}`MIVisionX <mivisionx:README>`
|
||||
MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
|
||||
|
||||
- {doc}`Documentation <mivisionx:README>`
|
||||
- [GitHub](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/)
|
||||
- [Changelog](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/blob/master/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`rocAL <rocal:README>`
|
||||
The AMD ROCm Augmentation Library (rocAL) is designed to efficiently decode and process images and videos from a variety of storage formats and modify them through a processing graph programmable by the user. rocAL currently provides C API.
|
||||
|
||||
- {doc}`Documentation <rocal:README>`
|
||||
|
||||
:::
|
||||
|
||||
:::::
|
||||
@@ -1 +0,0 @@
|
||||
# Docker
|
||||
@@ -9,12 +9,13 @@ page introduces the HIP runtime and other HIP libraries and tools.
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} {doc}`HIP Runtime <hip:index>`
|
||||
|
||||
The HIP Runtime is used to enable GPU acceleration for all HIP language based
|
||||
products.
|
||||
|
||||
- {doc}`Documentation <hip:index>`
|
||||
- [GitHub](https://github.com/ROCm-Developer-Tools/HIP)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)
|
||||
* {doc}`Documentation <hip:index>`
|
||||
* [GitHub](https://github.com/ROCm-Developer-Tools/HIP)
|
||||
* [Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)
|
||||
|
||||
:::
|
||||
|
||||
@@ -26,12 +27,13 @@ products.
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} {doc}`HIPIFY <hipify:index>`
|
||||
|
||||
HIPIFY assists with porting applications from based on CUDA to the HIP Runtime.
|
||||
Supported CUDA APIs are documented here as well.
|
||||
|
||||
- {doc}`Documentation <hipify:index>`
|
||||
- [GitHub](https://github.com/ROCm-Developer-Tools/HIPIFY/)
|
||||
- [Changelog](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/CHANGELOG.md)
|
||||
* {doc}`Documentation <hipify:index>`
|
||||
* [GitHub](https://github.com/ROCm-Developer-Tools/HIPIFY/)
|
||||
* [Changelog](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
|
||||
@@ -5,93 +5,98 @@
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [HIP](./hip)
|
||||
:::{grid-item-card}
|
||||
**[HIP](./hip.md)**
|
||||
|
||||
HIP is both AMD's GPU programming language extension and the GPU runtime.
|
||||
|
||||
- {doc}`HIP <hip:index>`
|
||||
- [HIP Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)
|
||||
- {doc}`HIPIFY <hipify:index>`
|
||||
* {doc}`HIP <hip:index>`
|
||||
* [HIP Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)
|
||||
* {doc}`HIPIFY <hipify:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Math Libraries](./libraries/gpu_libraries/math)
|
||||
:::{grid-item-card}
|
||||
**[Math Libraries](./libraries/gpu-libraries/math.md)**
|
||||
|
||||
HIP Math Libraries support the following domains:
|
||||
|
||||
- [Linear Algebra Libraries](./libraries/gpu_libraries/linear_algebra)
|
||||
- [Fast Fourier Transforms](./libraries/gpu_libraries/fft)
|
||||
- [Random Numbers](./libraries/gpu_libraries/rand)
|
||||
* [Linear Algebra Libraries](./libraries/gpu-libraries/math-linear-algebra.md)
|
||||
* [Fast Fourier Transforms](./libraries/gpu-libraries/math-fft.md)
|
||||
* [Random Numbers](./libraries/gpu-libraries/rand.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [C++ Primitive Libraries](./libraries/gpu_libraries/c++_primitives)
|
||||
:::{grid-item-card}
|
||||
**[C++ Primitive Libraries](./libraries/gpu-libraries/c++primitives.md)**
|
||||
|
||||
ROCm template libraries for C++ primitives and algorithms are as follows:
|
||||
|
||||
- {doc}`rocPRIM <rocprim:index>`
|
||||
- {doc}`rocThrust <rocthrust:index>`
|
||||
- {doc}`hipCUB <hipcub:index>`
|
||||
- {doc}`hipTensor <hiptensor:index>`
|
||||
* {doc}`rocPRIM <rocprim:index>`
|
||||
* {doc}`rocThrust <rocthrust:index>`
|
||||
* {doc}`hipCUB <hipcub:index>`
|
||||
* {doc}`hipTensor <hiptensor:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Communication Libraries](./libraries/gpu_libraries/communication)
|
||||
:::{grid-item-card} [Communication Libraries](./libraries/gpu-libraries/communication.md)
|
||||
Inter and intra-node communication is supported by the following projects:
|
||||
|
||||
- {doc}`RCCL <rccl:index>`
|
||||
* {doc}`RCCL <rccl:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Artificial intelligence](../rocm_ai/rocm_ai)
|
||||
:::{grid-item-card}
|
||||
**[Artificial intelligence](../rocm-ai.md)**
|
||||
|
||||
Libraries related to AI.
|
||||
|
||||
- {doc}`MIOpen <miopen:index>`
|
||||
- {doc}`Composable Kernel <composable_kernel:index>`
|
||||
- {doc}`MIGraphX <amdmigraphx:index>`
|
||||
* {doc}`MIOpen <miopen:index>`
|
||||
* {doc}`Composable Kernel <composable_kernel:index>`
|
||||
* {doc}`MIGraphX <amdmigraphx:index>`
|
||||
* {doc}`MIVisionX <mivisionx:README>`
|
||||
* {doc}`rocAL <rocal:README>`
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**[OpenMP](./openmp/openmp.md)**
|
||||
|
||||
* [OpenMP Support Guide](./openmp/openmp.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Computer Vision](./computer_vision)
|
||||
Computer vision related projects.
|
||||
:::{grid-item-card}
|
||||
**[Compilers and Tools](./compilers-tools/index.md)**
|
||||
|
||||
- {doc}`MIVisionX <mivisionx:README>`
|
||||
- {doc}`rocAL <rocal:README>`
|
||||
* [ROCmCC](./rocmcc/rocmcc.md)
|
||||
* {doc}`ROCdbgapi <rocdbgapi:index>`
|
||||
* {doc}`ROCgdb <rocgdb:index>`
|
||||
* {doc}`ROCProfiler <rocprofiler:rocprof>`
|
||||
* {doc}`ROCTracer <roctracer:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [OpenMP](openmp/openmp)
|
||||
:::{grid-item-card}
|
||||
**[Management Tools](./compilers-tools/management-tools.md)**
|
||||
|
||||
- [OpenMP Support Guide](openmp/openmp)
|
||||
* {doc}`AMD SMI <amdsmi:index>`
|
||||
* {doc}`ROCm SMI <rocm_smi_lib:index>`
|
||||
* {doc}`ROCm Data Center Tool <rdc:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Compilers and Tools](compilers_tools/index)
|
||||
:::{grid-item-card}
|
||||
**[Validation Tools](./compilers-tools/validation-tools.md)**
|
||||
|
||||
- [ROCmCC](./rocmcc/rocmcc)
|
||||
- {doc}`ROCdbgapi <rocdbgapi:index>`
|
||||
- {doc}`ROCgdb <rocgdb:index>`
|
||||
- {doc}`ROCProfiler <rocprofiler:rocprof>`
|
||||
- {doc}`ROCTracer <roctracer:index>`
|
||||
* {doc}`ROCm Validation Suite <rocmvalidationsuite:index>`
|
||||
* {doc}`TransferBench <transferbench:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Management Tools](./compilers_tools/management_tools)
|
||||
:::{grid-item-card} **GPU Architectures**
|
||||
|
||||
- {doc}`AMD SMI <amdsmi:index>`
|
||||
- {doc}`ROCm SMI <rocm_smi_lib:index>`
|
||||
- {doc}`ROCm Data Center Tool <rdc:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Validation Tools](./compilers_tools/validation_tools)
|
||||
|
||||
- {doc}`ROCm Validation Suite <rocmvalidationsuite:index>`
|
||||
- {doc}`TransferBench <transferbench:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} GPU Architectures
|
||||
|
||||
- [AMD Instinct MI200](../conceptual/gpu_arch/mi250.md)
|
||||
- [AMD Instinct MI100](../conceptual/gpu_arch/mi100.md)
|
||||
* [AMD Instinct MI200](../conceptual/gpu-arch/mi250.md)
|
||||
* [AMD Instinct MI100](../conceptual/gpu-arch/mi100.md)
|
||||
|
||||
:::
|
||||
|
||||
|
||||
@@ -4,29 +4,32 @@
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} {doc}`MIOpen <miopen:index>`
|
||||
|
||||
AMD's library for high performance machine learning primitives.
|
||||
|
||||
- {doc}`Documentation <miopen:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/MIOpen)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/CHANGELOG.md)
|
||||
* {doc}`Documentation <miopen:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/MIOpen)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`Composable Kernel <composable_kernel:index>`
|
||||
|
||||
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
|
||||
|
||||
- {doc}`Documentation <composable_kernel:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/composable_kernel)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/develop/CHANGELOG.md)
|
||||
* {doc}`Documentation <composable_kernel:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/composable_kernel)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`MIGraphX <amdmigraphx:index>`
|
||||
|
||||
AMD MIGraphX is AMD's graph inference engine that accelerates machine learning model inference.
|
||||
|
||||
- {doc}`Documentation <amdmigraphx:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/CHANGELOG.md)
|
||||
* {doc}`Documentation <amdmigraphx:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
@@ -6,47 +6,51 @@ ROCm template libraries for algorithms are as follows:
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} {doc}`rocPRIM <rocprim:index>`
|
||||
|
||||
rocPRIM is an AMD GPU optimized template library of algorithm primitives, like
|
||||
transforms, reductions, scans, etc. It also serves as a common back-end for
|
||||
similar libraries found inside ROCm.
|
||||
|
||||
- {doc}`Documentation <rocprim:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocPRIM/)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocPRIM)
|
||||
* {doc}`Documentation <rocprim:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocPRIM/)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/CHANGELOG.md)
|
||||
* [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocPRIM)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`rocThrust <rocthrust:index>`
|
||||
|
||||
rocThrust is a template library of algorithm primitives with a Thrust-compatible
|
||||
interface. Their CPU back-ends are identical, while the GPU back-end calls into
|
||||
rocPRIM.
|
||||
|
||||
- {doc}`Documentation <rocthrust:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocThrust)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocThrust/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocThrust)
|
||||
* {doc}`Documentation <rocthrust:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocThrust)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocThrust/blob/develop/CHANGELOG.md)
|
||||
* [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocThrust)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`hipCUB <hipcub:index>`
|
||||
|
||||
hipCUB is a template library of algorithm primitives with a CUB-compatible
|
||||
interface. It's back-end is rocPRIM.
|
||||
|
||||
- {doc}`Documentation <hipcub:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipCUB)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/hipCUB)
|
||||
* {doc}`Documentation <hipcub:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipCUB)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/CHANGELOG.md)
|
||||
* [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/hipCUB)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`hipTensor <hiptensor:index>`
|
||||
|
||||
hipTensor is AMD's C++ library for accelerating tensor primitives
|
||||
based on the composable kernel library,
|
||||
through general purpose kernel languages, like HIP C++.
|
||||
|
||||
- {doc}`Documentation <hiptensor:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipTensor)
|
||||
* {doc}`Documentation <hiptensor:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipTensor)
|
||||
|
||||
:::
|
||||
|
||||
@@ -4,15 +4,16 @@
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} {doc}`RCCL <rccl:index>`
|
||||
RCCL (pronounced "Rickle") is a stand-alone library of standard collective communication routines for GPUs,
|
||||
|
||||
RCCL (pronounced "Rickle") is a standalone library of standard collective communication routines for GPUs,
|
||||
implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, gather, scatter, and all-to-all.
|
||||
The collective operations are implemented using ring and tree algorithms and have been optimized for
|
||||
throughput and latency.
|
||||
|
||||
- {doc}`Documentation <rccl:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rccl)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/ROCmSoftwarePlatform/rccl/tree/develop/tools)
|
||||
* {doc}`Documentation <rccl:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rccl)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CHANGELOG.md)
|
||||
* [Examples](https://github.com/ROCmSoftwarePlatform/rccl/tree/develop/tools)
|
||||
|
||||
:::
|
||||
|
||||
@@ -6,22 +6,24 @@ ROCm libraries for FFT are as follows:
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} {doc}`rocFFT <rocfft:index>`
|
||||
|
||||
rocFFT is an AMD GPU optimized library for FFT.
|
||||
|
||||
- {doc}`Documentation <rocfft:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocFFT)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CHANGELOG.md)
|
||||
* {doc}`Documentation <rocfft:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocFFT)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`hipFFT <hipfft:index>`
|
||||
|
||||
hipFFT is a compatibility layer for GPU accelerated FFT optimized for AMD GPUs
|
||||
using rocFFT. hipFFT allows for a common interface for other non AMD GPU
|
||||
FFT libraries.
|
||||
|
||||
- {doc}`Documentation <hipfft:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipFFT)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/CHANGELOG.md)
|
||||
* {doc}`Documentation <hipfft:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipFFT)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
@@ -6,103 +6,113 @@ ROCm libraries for linear algebra are as follows:
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} {doc}`rocBLAS <rocblas:index>`
|
||||
|
||||
`rocBLAS` is an AMD GPU optimized library for BLAS (Basic Linear Algebra Subprograms).
|
||||
|
||||
- {doc}`Documentation <rocblas:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocBLAS)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocBLAS)
|
||||
* {doc}`Documentation <rocblas:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocBLAS)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/CHANGELOG.md)
|
||||
* [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocBLAS)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`hipBLAS <hipblas:index>`
|
||||
|
||||
`hipBLAS` is a compatibility layer for GPU accelerated BLAS optimized for AMD GPUs
|
||||
via `rocBLAS` and `rocSOLVER`. `hipBLAS` allows for a common interface for other GPU
|
||||
BLAS libraries.
|
||||
|
||||
- {doc}`Documentation <hipblas:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipBLAS)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/CHANGELOG.md)
|
||||
* {doc}`Documentation <hipblas:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipBLAS)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`hipBLASLt <hipblaslt:index>`
|
||||
|
||||
`hipBLASLt` is a library that provides general matrix-matrix operations with a
|
||||
flexible API and extends functionalities beyond traditional BLAS library.
|
||||
`hipBLASLt` is exposed APIs in HIP programming language with an underlying
|
||||
optimized generator as a back-end kernel provider.
|
||||
|
||||
- {doc}`Documentation <hipblaslt:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipBLASLt)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLASLt/blob/develop/CHANGELOG.md)
|
||||
* {doc}`Documentation <hipblaslt:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipBLASLt)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLASLt/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`rocALUTION <rocalution:index>`
|
||||
|
||||
`rocALUTION` is a sparse linear algebra library with focus on exploring
|
||||
fine-grained parallelism on top of AMD's ROCm runtime and toolchains, targeting
|
||||
modern CPU and GPU platforms.
|
||||
|
||||
- {doc}`Documentation <rocalution:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocALUTION)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocALUTION/blob/develop/CHANGELOG.md)
|
||||
* {doc}`Documentation <rocalution:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocALUTION)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocALUTION/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`rocWMMA <rocwmma:index>`
|
||||
|
||||
`rocWMMA` provides an API to break down mixed precision matrix multiply-accumulate
|
||||
(MMA) problems into fragments and distributes these over GPU wavefronts.
|
||||
|
||||
- {doc}`Documentation <rocwmma:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocWMMA)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/CHANGELOG.md)
|
||||
* {doc}`Documentation <rocwmma:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocWMMA)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`rocSOLVER <rocsolver:index>`
|
||||
|
||||
`rocSOLVER` provides a subset of LAPACK (Linear Algebra Package) functionality on the ROCm platform.
|
||||
|
||||
- {doc}`Documentation <rocsolver:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocSOLVER)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/CHANGELOG.md)
|
||||
* {doc}`Documentation <rocsolver:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocSOLVER)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`hipSOLVER <hipsolver:index>`
|
||||
|
||||
`hipSOLVER` is a LAPACK marshalling library supporting both `rocSOLVER` and `cuSOLVER`
|
||||
as backends whilst exporting a unified interface.
|
||||
|
||||
- {doc}`Documentation <hipsolver:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipSOLVER)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CHANGELOG.md)
|
||||
* {doc}`Documentation <hipsolver:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipSOLVER)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`rocSPARSE <rocsparse:index>`
|
||||
|
||||
`rocSPARSE` is a library to provide BLAS for sparse computations.
|
||||
|
||||
- {doc}`Documentation <rocsparse:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocSPARSE)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/CHANGELOG.md)
|
||||
* {doc}`Documentation <rocsparse:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocSPARSE)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`hipSPARSE <hipsparse:index>`
|
||||
|
||||
`hipSPARSE` is a marshalling library to provide sparse BLAS functionality,
|
||||
supporting both `rocSPARSE` and `cuSPARSE` as backends.
|
||||
|
||||
- {doc}`Documentation <hipsparse:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipSPARSE)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CHANGELOG.md)
|
||||
* {doc}`Documentation <hipsparse:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipSPARSE)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`hipSPARSELt <hipsparselt:index>`
|
||||
|
||||
`hipSPARSE` is a marshalling library to provide sparse BLAS functionality,
|
||||
supporting both `rocSPARSELt` and `cuSPARSELt` as backends.
|
||||
|
||||
- {doc}`Documentation <hipsparselt:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipSPARSELt)
|
||||
* {doc}`Documentation <hipsparselt:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipSPARSELt)
|
||||
|
||||
:::
|
||||
|
||||
@@ -15,32 +15,35 @@ at compile-time of the hipLIB in question. For dynamic dispatch between vendor i
|
||||
::::{grid} 1 2 3 3
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [Linear Algebra Libraries](linear_algebra)
|
||||
:::{grid-item-card}
|
||||
**[Linear Algebra Libraries](./math-linear-algebra.md)**
|
||||
|
||||
- {doc}`rocBLAS <rocblas:index>`
|
||||
- {doc}`hipBLAS <hipblas:index>`
|
||||
- {doc}`hipBLASLt <hipblaslt:index>`
|
||||
- {doc}`rocALUTION <rocalution:index>`
|
||||
- {doc}`rocWMMA <rocwmma:index>`
|
||||
- {doc}`rocSOLVER <rocsolver:index>`
|
||||
- {doc}`hipSOLVER <hipsolver:index>`
|
||||
- {doc}`rocSPARSE <rocsparse:index>`
|
||||
- {doc}`hipSPARSE <hipsparse:index>`
|
||||
- {doc}`hipSPARSELt <hipsparselt:index>`
|
||||
* {doc}`rocBLAS <rocblas:index>`
|
||||
* {doc}`hipBLAS <hipblas:index>`
|
||||
* {doc}`hipBLASLt <hipblaslt:index>`
|
||||
* {doc}`rocALUTION <rocalution:index>`
|
||||
* {doc}`rocWMMA <rocwmma:index>`
|
||||
* {doc}`rocSOLVER <rocsolver:index>`
|
||||
* {doc}`hipSOLVER <hipsolver:index>`
|
||||
* {doc}`rocSPARSE <rocsparse:index>`
|
||||
* {doc}`hipSPARSE <hipsparse:index>`
|
||||
* {doc}`hipSPARSELt <hipsparselt:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Fast Fourier Transforms](fft)
|
||||
:::{grid-item-card}
|
||||
**[Fast Fourier Transforms](./math-fft.md)**
|
||||
|
||||
- {doc}`rocFFT <rocfft:index>`
|
||||
- {doc}`hipFFT <hipfft:index>`
|
||||
* {doc}`rocFFT <rocfft:index>`
|
||||
* {doc}`hipFFT <hipfft:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Random Numbers](rand)
|
||||
:::{grid-item-card}
|
||||
**[Random Numbers](./rand.md)**
|
||||
|
||||
- {doc}`rocRAND <rocrand:index>`
|
||||
- {doc}`hipRAND <hiprand:index>`
|
||||
* {doc}`rocRAND <rocrand:index>`
|
||||
* {doc}`hipRAND <hiprand:index>`
|
||||
|
||||
:::
|
||||
|
||||
@@ -4,23 +4,25 @@
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} {doc}`rocRAND <rocrand:index>`
|
||||
|
||||
rocRAND is an AMD GPU optimized library for pseudo-random number generators (PRNG).
|
||||
|
||||
- {doc}`Documentation <rocrand:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocRAND/)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocRAND/blob/develop/CHANGELOG.md)
|
||||
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocRAND)
|
||||
* {doc}`Documentation <rocrand:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocRAND/)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocRAND/blob/develop/CHANGELOG.md)
|
||||
* [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocRAND)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`hipRAND <hiprand:index>`
|
||||
|
||||
hipRAND is a compatibility layer for GPU accelerated pseudo-random number
|
||||
generation (PRNG) optimized for AMD GPUs using rocRAND. hipRAND allows for a
|
||||
common interface for other non AMD GPU PRNG libraries.
|
||||
|
||||
- {doc}`Documentation <hiprand:index>`
|
||||
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipRAND/)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipRAND/blob/develop/CHANGELOG.md)
|
||||
* {doc}`Documentation <hiprand:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipRAND/)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipRAND/blob/develop/CHANGELOG.md)
|
||||
|
||||
:::
|
||||
|
||||
@@ -1,8 +1,20 @@
|
||||
# ROCm libraries
|
||||
|
||||
add links...
|
||||
::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
* Math
|
||||
* C++ primitive
|
||||
* Communication
|
||||
* Artificial intelligence
|
||||
:::{grid-item-card}
|
||||
**[AI libraries](./ai-libraries.md)**
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**[Math libraries](./gpu-libraries/math.md)**
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**[Communication libraries](./gpu-libraries/communication.md)**
|
||||
|
||||
:::
|
||||
::::
|
||||
|
||||
@@ -9,12 +9,12 @@ Along with host APIs, the OpenMP compilers support offloading code and data onto
|
||||
GPU devices. This document briefly describes the installation location of the
|
||||
OpenMP toolchain, example usage of device offloading, and usage of `rocprof`
|
||||
with OpenMP applications. The GPUs supported are the same as those supported by
|
||||
this ROCm release. See the list of supported GPUs in {doc}`../../about/release/linux_support`.
|
||||
this ROCm release. See the list of supported GPUs for [Linux](../../about/compatibility/linux-support.md) and [Windows](../../about/compatibility/windows-support.md).
|
||||
|
||||
The ROCm OpenMP compiler is implemented using LLVM compiler technology.
|
||||
The following image illustrates the internal steps taken to translate a user’s application into an executable that can offload computation to the AMDGPU. The compilation is a two-pass process. Pass 1 compiles the application to generate the CPU code and Pass 2 links the CPU code to the AMDGPU device code.
|
||||
|
||||
```{figure} ../../data/reference/openmp/openmp_toolchain.svg
|
||||
```{figure} ../../data/reference/openmp/openmp-toolchain.svg
|
||||
:name: openmp-toolchain
|
||||
```
|
||||
|
||||
@@ -26,13 +26,10 @@ sub-directories are:
|
||||
|
||||
bin: Compilers (`flang` and `clang`) and other binaries.
|
||||
|
||||
- examples: The usage section below shows how to compile and run these programs.
|
||||
|
||||
- include: Header files.
|
||||
|
||||
- lib: Libraries including those required for target offload.
|
||||
|
||||
- lib-debug: Debug versions of the above libraries.
|
||||
* examples: The usage section below shows how to compile and run these programs.
|
||||
* include: Header files.
|
||||
* lib: Libraries including those required for target offload.
|
||||
* lib-debug: Debug versions of the above libraries.
|
||||
|
||||
## OpenMP: Usage
|
||||
|
||||
@@ -127,10 +124,10 @@ program with:
|
||||
|
||||
The following tracing options are widely used to generate useful information:
|
||||
|
||||
- **`--hsa-trace`**: This option is used to get a JSON output file with the HSA
|
||||
* **`--hsa-trace`**: This option is used to get a JSON output file with the HSA
|
||||
API execution traces and a flat profile in a CSV file.
|
||||
|
||||
- **`--sys-trace`**: This allows programmers to trace both HIP and HSA calls.
|
||||
* **`--sys-trace`**: This allows programmers to trace both HIP and HSA calls.
|
||||
Since this option results in loading ``libamdhip64.so``, follow the
|
||||
prerequisite as mentioned above.
|
||||
|
||||
@@ -166,16 +163,16 @@ implemented in the past releases.
|
||||
|
||||
### Asynchronous Behavior in OpenMP Target Regions
|
||||
|
||||
- Controlling Asynchronous Behavior
|
||||
* Controlling Asynchronous Behavior
|
||||
|
||||
The OpenMP offloading runtime executes in an asynchronous fashion by default, allowing multiple data transfers to start concurrently. However, if the data to be transferred becomes larger than the default threshold of 1MB, the runtime falls back to a synchronous data transfer. The buffers that have been locked already are always executed asynchronously.
|
||||
You can overrule this default behavior by setting `LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_BYTES` and `OMPX_FORCE_SYNC_REGIONS`. See the [Environment Variables](#environment-variables) table for details.
|
||||
|
||||
- Multithreaded Offloading on the Same Device
|
||||
* Multithreaded Offloading on the Same Device
|
||||
|
||||
The `libomptarget` plugin for GPU offloading allows creation of separate configurable HSA queues per chiplet, which enables two or more threads to concurrently offload to the same device.
|
||||
|
||||
- Parallel Memory Copy Invocations
|
||||
* Parallel Memory Copy Invocations
|
||||
|
||||
Implicit asynchronous execution of single target region enables parallel memory copy invocations.
|
||||
|
||||
@@ -187,11 +184,9 @@ with Xnack capability.
|
||||
|
||||
#### Prerequisites
|
||||
|
||||
- Linux Kernel versions above 5.14
|
||||
|
||||
- Latest KFD driver packaged in ROCm stack
|
||||
|
||||
- Xnack, as USM support can only be tested with applications compiled with Xnack
|
||||
* Linux Kernel versions above 5.14
|
||||
* Latest KFD driver packaged in ROCm stack
|
||||
* Xnack, as USM support can only be tested with applications compiled with Xnack
|
||||
capability
|
||||
|
||||
#### Xnack Capability
|
||||
@@ -220,13 +215,13 @@ HSA_XNACK=1
|
||||
|
||||
When Xnack support is not needed:
|
||||
|
||||
- Build the applications to maximize resource utilization using:
|
||||
* Build the applications to maximize resource utilization using:
|
||||
|
||||
```bash
|
||||
--offload-arch=gfx908:xnack-
|
||||
```
|
||||
|
||||
- At runtime, set the `HSA_XNACK` environment variable to 0.
|
||||
* At runtime, set the `HSA_XNACK` environment variable to 0.
|
||||
|
||||
#### Unified Shared Memory Pragma
|
||||
|
||||
@@ -376,27 +371,19 @@ GPUs with applications written in both HIP and OpenMP.
|
||||
|
||||
**Features Supported on Host Platform (Target x86_64):**
|
||||
|
||||
- Use-after-free
|
||||
|
||||
- Buffer overflows
|
||||
|
||||
- Heap buffer overflow
|
||||
|
||||
- Stack buffer overflow
|
||||
|
||||
- Global buffer overflow
|
||||
|
||||
- Use-after-return
|
||||
|
||||
- Use-after-scope
|
||||
|
||||
- Initialization order bugs
|
||||
* Use-after-free
|
||||
* Buffer overflows
|
||||
* Heap buffer overflow
|
||||
* Stack buffer overflow
|
||||
* Global buffer overflow
|
||||
* Use-after-return
|
||||
* Use-after-scope
|
||||
* Initialization order bugs
|
||||
|
||||
**Features Supported on AMDGPU Platform (`amdgcn-amd-amdhsa`):**
|
||||
|
||||
- Heap buffer overflow
|
||||
|
||||
- Global buffer overflow
|
||||
* Heap buffer overflow
|
||||
* Global buffer overflow
|
||||
|
||||
**Software (Kernel/OS) Requirements:** Unified Shared Memory support with Xnack
|
||||
capability. See the section on [Unified Shared Memory](#unified-shared-memory)
|
||||
@@ -404,7 +391,7 @@ for prerequisites and details on Xnack.
|
||||
|
||||
**Example:**
|
||||
|
||||
- Heap buffer overflow
|
||||
* Heap buffer overflow
|
||||
|
||||
```bash
|
||||
void main() {
|
||||
@@ -424,7 +411,7 @@ void main() {
|
||||
See the complete sample code for heap buffer overflow
|
||||
[here](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/examples/tools/asan/heap_buffer_overflow/openmp/vecadd-HBO.cpp).
|
||||
|
||||
- Global buffer overflow
|
||||
* Global buffer overflow
|
||||
|
||||
```bash
|
||||
#pragma omp declare target
|
||||
@@ -453,33 +440,31 @@ See the complete sample code for global buffer overflow
|
||||
|
||||
You can use the clang compiler option `-fopenmp-target-fast` for kernel optimization if certain constraints implied by its component options are satisfied. `-fopenmp-target-fast` enables the following options:
|
||||
|
||||
- `-fopenmp-target-ignore-env-vars`: It enables code generation of specialized kernels including No-loop and Cross-team reductions.
|
||||
* `-fopenmp-target-ignore-env-vars`: It enables code generation of specialized kernels including No-loop and Cross-team reductions.
|
||||
|
||||
- `-fopenmp-assume-no-thread-state`: It enables the compiler to assume that no thread in a parallel region modifies an Internal Control Variable (`ICV`), thus potentially reducing the device runtime code execution.
|
||||
* `-fopenmp-assume-no-thread-state`: It enables the compiler to assume that no thread in a parallel region modifies an Internal Control Variable (`ICV`), thus potentially reducing the device runtime code execution.
|
||||
|
||||
- `-fopenmp-assume-no-nested-parallelism`: It enables the compiler to assume that no thread in a parallel region encounters a parallel region, thus potentially reducing the device runtime code execution.
|
||||
* `-fopenmp-assume-no-nested-parallelism`: It enables the compiler to assume that no thread in a parallel region encounters a parallel region, thus potentially reducing the device runtime code execution.
|
||||
|
||||
- `-O3` if no `-O*` is specified by the user.
|
||||
* `-O3` if no `-O*` is specified by the user.
|
||||
|
||||
### Specialized Kernels
|
||||
|
||||
Clang will attempt to generate specialized kernels based on compiler options and OpenMP constructs. The following specialized kernels are supported:
|
||||
|
||||
- No-Loop
|
||||
|
||||
- Big-Jump-Loop
|
||||
|
||||
- Cross-Team (Xteam) Reductions
|
||||
* No-Loop
|
||||
* Big-Jump-Loop
|
||||
* Cross-Team (Xteam) Reductions
|
||||
|
||||
To enable the generation of specialized kernels, follow these guidelines:
|
||||
|
||||
- Do not specify teams, threads, and schedule-related environment variables. The `num_teams` clause in an OpenMP target construct acts as an override and prevents the generation of the No-Loop kernel. If the specification of `num_teams` clause is a user requirement then clang tries to generate the Big-Jump-Loop kernel instead of the No-Loop kernel.
|
||||
* Do not specify teams, threads, and schedule-related environment variables. The `num_teams` clause in an OpenMP target construct acts as an override and prevents the generation of the No-Loop kernel. If the specification of `num_teams` clause is a user requirement then clang tries to generate the Big-Jump-Loop kernel instead of the No-Loop kernel.
|
||||
|
||||
- Assert the absence of the teams, threads, and schedule-related environment variables by adding the command-line option `-fopenmp-target-ignore-env-vars`.
|
||||
* Assert the absence of the teams, threads, and schedule-related environment variables by adding the command-line option `-fopenmp-target-ignore-env-vars`.
|
||||
|
||||
- To automatically enable the specialized kernel generation, use `-Ofast` or `-fopenmp-target-fast` for compilation.
|
||||
* To automatically enable the specialized kernel generation, use `-Ofast` or `-fopenmp-target-fast` for compilation.
|
||||
|
||||
- To disable specialized kernel generation, use `-fno-openmp-target-ignore-env-vars`.
|
||||
* To disable specialized kernel generation, use `-fno-openmp-target-ignore-env-vars`.
|
||||
|
||||
#### No-Loop Kernel Generation
|
||||
|
||||
|
||||
@@ -19,15 +19,15 @@ The differences are listed in [the table below](rocm-llvm-vs-alt).
|
||||
|
||||
For more details, see:
|
||||
|
||||
- AMD GPU usage: [llvm.org/docs/AMDGPUUsage.html](https://llvm.org/docs/AMDGPUUsage.html)
|
||||
- Releases and source: <https://github.com/RadeonOpenCompute/llvm-project>
|
||||
* AMD GPU usage: [llvm.org/docs/AMDGPUUsage.html](https://llvm.org/docs/AMDGPUUsage.html)
|
||||
* Releases and source: <https://github.com/RadeonOpenCompute/llvm-project>
|
||||
|
||||
### ROCm Compiler Interfaces
|
||||
|
||||
ROCm currently provides two compiler interfaces for compiling HIP programs:
|
||||
|
||||
- `/opt/rocm/bin/hipcc`
|
||||
- `/opt/rocm/bin/amdclang++`
|
||||
* `/opt/rocm/bin/hipcc`
|
||||
* `/opt/rocm/bin/amdclang++`
|
||||
|
||||
Both leverage the same LLVM compiler technology with the AMD GCN GPU support;
|
||||
however, they offer a slightly different user experience. The `hipcc` command-line
|
||||
@@ -237,8 +237,8 @@ minimized if the hoisted condition is executed more often. This heuristic
|
||||
prioritizes the conditions based on the number of times they are used within the
|
||||
loop. The heuristic can be controlled with the following options:
|
||||
|
||||
- `-unswitch-identical-branches-min-count=<n>`
|
||||
- Enables unswitching of a loop with respect to a branch conditional value
|
||||
* `-unswitch-identical-branches-min-count=<n>`
|
||||
* Enables unswitching of a loop with respect to a branch conditional value
|
||||
(B), where B appears in at least `<n>` compares in the loop. This option is
|
||||
enabled with `-aggressive-loop-unswitch`. The default value is 3.
|
||||
|
||||
@@ -246,8 +246,8 @@ loop. The heuristic can be controlled with the following options:
|
||||
|
||||
Where, `n` is a positive integer and lower value of `<n>` facilitates more
|
||||
unswitching.
|
||||
- `-unswitch-identical-branches-max-count=<n>`
|
||||
- Enables unswitching of a loop with respect to a branch conditional value
|
||||
* `-unswitch-identical-branches-max-count=<n>`
|
||||
* Enables unswitching of a loop with respect to a branch conditional value
|
||||
(B), where B appears in at most `<n>` compares in the loop. This option is
|
||||
enabled with `-aggressive-loop-unswitch`. The default value is 6.
|
||||
|
||||
@@ -436,19 +436,19 @@ Inline assembly (ASM) statements allow a developer to include assembly
|
||||
instructions directly in either host or device code. While the ROCm compiler
|
||||
supports ASM statements, their use is not recommended for the following reasons:
|
||||
|
||||
- The compiler's ability to produce both correct code and to optimize
|
||||
* The compiler's ability to produce both correct code and to optimize
|
||||
surrounding code is impeded.
|
||||
- The compiler does not parse the content of the ASM statements and so
|
||||
* The compiler does not parse the content of the ASM statements and so
|
||||
cannot "see" its contents.
|
||||
- The compiler must make conservative assumptions in an effort to retain
|
||||
* The compiler must make conservative assumptions in an effort to retain
|
||||
correctness.
|
||||
- The conservative assumptions may yield code that, on the whole, is less
|
||||
* The conservative assumptions may yield code that, on the whole, is less
|
||||
performant compared to code without ASM statements. It is possible that a
|
||||
syntactically correct ASM statement may cause incorrect runtime behavior.
|
||||
- ASM statements are often ASIC-specific; code containing them is less portable
|
||||
* ASM statements are often ASIC-specific; code containing them is less portable
|
||||
and adds a maintenance burden to the developer if different ASICs are
|
||||
targeted.
|
||||
- Writing correct ASM statements is often difficult; we strongly recommend
|
||||
* Writing correct ASM statements is often difficult; we strongly recommend
|
||||
thorough testing of any use of ASM statements.
|
||||
|
||||
:::{note}
|
||||
@@ -608,9 +608,9 @@ architectures.
|
||||
The ROCmCC compiler is enhanced to generate binaries that can contain
|
||||
heterogenous images. This heterogeneity could be in terms of:
|
||||
|
||||
- Images of different architectures, like AMD GCN and NVPTX
|
||||
- Images of same architectures but for different GPUs, like gfx906 and gfx908
|
||||
- Images of same architecture and same GPU but for different target features,
|
||||
* Images of different architectures, like AMD GCN and NVPTX
|
||||
* Images of same architectures but for different GPUs, like gfx906 and gfx908
|
||||
* Images of same architecture and same GPU but for different target features,
|
||||
like `gfx908:xnack+` and `gfx908:xnack-`
|
||||
|
||||
An appropriate image is selected by the OpenMP device runtime for execution
|
||||
|
||||
Reference in New Issue
Block a user