ROCm A-Z page & link cleanup (#2450)

This commit is contained in:
Lisa
2023-09-13 13:00:50 -06:00
committed by GitHub
parent dba06fe315
commit 7c5976004f
172 changed files with 7557 additions and 2908 deletions

View File

@@ -1,53 +1,59 @@
# Compilers and tools
:::::{grid} 1 1 2 2
:gutter: 1
:::{grid-item-card} {doc}`ROCdbgapi <rocdbgapi:index>`
The AMD Debugger API is a library that provides all the support necessary for a
debugger and other tools to perform low level control of the execution and
inspection of execution state of AMD's commercially available GPU architectures.
- {doc}`Documentation <rocdbgapi:index>`
- [GitHub](https://github.com/ROCm-Developer-Tools/ROCdbgapi/)
:::
:::{grid-item-card} [ROCmCC](../rocmcc/rocmcc)
ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance
computing on AMD GPUs and CPUs and supports various heterogeneous programming
models such as HIP, OpenMP, and OpenCL.
- [Documentation](../rocmcc/rocmcc)
:::
:::{grid-item-card} {doc}`ROCgdb <rocgdb:index>`
This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.
- {doc}`Documentation <rocgdb:index>`
- [GitHub](https://github.com/ROCm-Developer-Tools/ROCgdb/)
:::
:::{grid-item-card} {doc}`ROCProfiler <rocprofiler:rocprof>`
ROC profiler library. Profiling with performance counters and derived metrics. Library supports GFX8/GFX9. Hardware specific low-level performance analysis interface for profiling of GPU compute applications. The profiling includes hardware performance counters with complex performance metrics.
- {doc}`Documentation <rocprofiler:rocprof>`
- [GitHub](https://github.com/ROCm-Developer-Tools/rocprofiler/)
:::
:::{grid-item-card} {doc}`ROCTracer <roctracer:index>`
Callback/Activity Library for Performance tracing AMD GPUs
- {doc}`Documentation <roctracer:index>`
- [GitHub](https://github.com/ROCm-Developer-Tools/roctracer)
:::
:::::
## See Also
- [Compiler Disambiguation](../../conceptual/compiler_disambiguation.md)
# ROCm compilers and tools
:::::{grid} 1 1 2 2
:gutter: 1
:::{grid-item-card} {doc}`ROCdbgapi <rocdbgapi:index>`
The AMD Debugger API is a library that provides all the support necessary for a
debugger and other tools to perform low level control of the execution and
inspection of execution state of AMD's commercially available GPU architectures.
* {doc}`Documentation <rocdbgapi:index>`
* [GitHub](https://github.com/ROCm-Developer-Tools/ROCdbgapi/)
:::
:::{grid-item-card}
**[ROCmCC](../rocmcc/rocmcc.md)**
ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance
computing on AMD GPUs and CPUs and supports various heterogeneous programming
models such as HIP, OpenMP, and OpenCL.
* [Documentation](../rocmcc/rocmcc.md)
:::
:::{grid-item-card} {doc}`ROCgdb <rocgdb:index>`
This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.
* {doc}`Documentation <rocgdb:index>`
* [GitHub](https://github.com/ROCm-Developer-Tools/ROCgdb/)
:::
:::{grid-item-card} {doc}`ROCProfiler <rocprofiler:rocprof>`
ROC profiler library. Profiling with performance counters and derived metrics. Library supports GFX8/GFX9. Hardware specific low-level performance analysis interface for profiling of GPU compute applications. The profiling includes hardware performance counters with complex performance metrics.
* {doc}`Documentation <rocprofiler:rocprof>`
* [GitHub](https://github.com/ROCm-Developer-Tools/rocprofiler/)
:::
:::{grid-item-card} {doc}`ROCTracer <roctracer:index>`
Callback/Activity Library for Performance tracing AMD GPUs
* {doc}`Documentation <roctracer:index>`
* [GitHub](https://github.com/ROCm-Developer-Tools/roctracer)
:::
:::::
## See Also
* [Compiler Disambiguation](../../conceptual/compiler-disambiguation.md)

View File

@@ -4,29 +4,31 @@
:gutter: 1
:::{grid-item-card} {doc}`AMD SMI <amdsmi:index>`
The AMD System Management Interface Library, or AMD SMI library, is a C library for Linux that provides a user space interface for applications to monitor and control AMD devices.
- {doc}`Documentation <amdsmi:index>`
- [GitHub](https://github.com/RadeonOpenCompute/amdsmi)
- [Examples](https://github.com/amd/go_amd_smi#example)
* [GitHub](https://github.com/RadeonOpenCompute/amdsmi)
* [Examples](https://github.com/amd/go_amd_smi#example)
:::
:::{grid-item-card} {doc}`ROCm SMI LIB <rocm_smi_lib:index>`
This tool acts as a command line interface for manipulating and monitoring the AMD GPU kernel, and is intended to replace and deprecate the existing `rocm_smi.py` CLI tool. It uses `ctypes` to call the `rocm_smi_lib` API.
- {doc}`Documentation <rocm_smi_lib:index>`
- [GitHub](https://github.com/RadeonOpenCompute/rocm_smi_lib)
- [Examples](https://github.com/RadeonOpenCompute/rocm_smi_lib/tree/master/python_smi_tools)
* {doc}`Documentation <rocm_smi_lib:index>`
* [GitHub](https://github.com/RadeonOpenCompute/rocm_smi_lib)
* [Examples](https://github.com/RadeonOpenCompute/rocm_smi_lib/tree/master/python_smi_tools)
:::
:::{grid-item-card} {doc}`ROCm Data Center Tool <rdc:index>`
The ROCm™ Data Center Tool simplifies the administration and addresses key infrastructure challenges in AMD GPUs in cluster and data center environments.
- [GitHub](https://github.com/RadeonOpenCompute/rdc)
- [Changelog](https://github.com/RadeonOpenCompute/rdc/blob/master/CHANGELOG.md)
- [Examples](https://github.com/RadeonOpenCompute/rdc/tree/master/example)
* [GitHub](https://github.com/RadeonOpenCompute/rdc)
* [Changelog](https://github.com/RadeonOpenCompute/rdc/blob/master/CHANGELOG.md)
* [Examples](https://github.com/RadeonOpenCompute/rdc/tree/master/example)
:::

View File

@@ -4,21 +4,23 @@
:gutter: 1
:::{grid-item-card} {doc}`RVS <rocmvalidationsuite:index>`
The ROCm Validation Suite is a system administrators and cluster manager's tool for detecting and troubleshooting common problems affecting AMD GPU(s) running in a high-performance computing environment, enabled using the ROCm software stack on a compatible platform.
- {doc}`Documentation <rocmvalidationsuite:index>`
- [GitHub](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite)
- [Changelog](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/blob/master/CHANGELOG.md)
* {doc}`Documentation <rocmvalidationsuite:index>`
* [GitHub](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite)
* [Changelog](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/blob/master/CHANGELOG.md)
:::
:::{grid-item-card} {doc}`TransferBench <transferbench:index>`
TransferBench is a simple utility capable of benchmarking simultaneous transfers between user-specified devices (CPUs/GPUs).
- {doc}`Documentation <transferbench:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/TransferBench/)
- [Changelog](https://github.com/ROCmSoftwarePlatform/TransferBench/blob/develop/CHANGELOG.md)
- {doc}`transferbench:examples/index`
* {doc}`Documentation <transferbench:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/TransferBench/)
* [Changelog](https://github.com/ROCmSoftwarePlatform/TransferBench/blob/develop/CHANGELOG.md)
* {doc}`transferbench:examples/index`
:::

View File

@@ -1,3 +0,0 @@
# ROCm compilers and tools
add links...

View File

@@ -1,22 +0,0 @@
# Computer Vision
::::{grid} 1 1 2 2
:gutter: 1
:::{grid-item-card} {doc}`MIVisionX <mivisionx:README>`
MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
- {doc}`Documentation <mivisionx:README>`
- [GitHub](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/)
- [Changelog](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/blob/master/CHANGELOG.md)
:::
:::{grid-item-card} {doc}`rocAL <rocal:README>`
The AMD ROCm Augmentation Library (rocAL) is designed to efficiently decode and process images and videos from a variety of storage formats and modify them through a processing graph programmable by the user. rocAL currently provides C API.
- {doc}`Documentation <rocal:README>`
:::
:::::

View File

@@ -1 +0,0 @@
# Docker

View File

@@ -9,12 +9,13 @@ page introduces the HIP runtime and other HIP libraries and tools.
:gutter: 1
:::{grid-item-card} {doc}`HIP Runtime <hip:index>`
The HIP Runtime is used to enable GPU acceleration for all HIP language based
products.
- {doc}`Documentation <hip:index>`
- [GitHub](https://github.com/ROCm-Developer-Tools/HIP)
- [Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)
* {doc}`Documentation <hip:index>`
* [GitHub](https://github.com/ROCm-Developer-Tools/HIP)
* [Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)
:::
@@ -26,12 +27,13 @@ products.
:gutter: 1
:::{grid-item-card} {doc}`HIPIFY <hipify:index>`
HIPIFY assists with porting applications from based on CUDA to the HIP Runtime.
Supported CUDA APIs are documented here as well.
- {doc}`Documentation <hipify:index>`
- [GitHub](https://github.com/ROCm-Developer-Tools/HIPIFY/)
- [Changelog](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/CHANGELOG.md)
* {doc}`Documentation <hipify:index>`
* [GitHub](https://github.com/ROCm-Developer-Tools/HIPIFY/)
* [Changelog](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/CHANGELOG.md)
:::

View File

@@ -5,93 +5,98 @@
:::::{grid} 1 1 2 2
:gutter: 1
:::{grid-item-card} [HIP](./hip)
:::{grid-item-card}
**[HIP](./hip.md)**
HIP is both AMD's GPU programming language extension and the GPU runtime.
- {doc}`HIP <hip:index>`
- [HIP Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)
- {doc}`HIPIFY <hipify:index>`
* {doc}`HIP <hip:index>`
* [HIP Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)
* {doc}`HIPIFY <hipify:index>`
:::
:::{grid-item-card} [Math Libraries](./libraries/gpu_libraries/math)
:::{grid-item-card}
**[Math Libraries](./libraries/gpu-libraries/math.md)**
HIP Math Libraries support the following domains:
- [Linear Algebra Libraries](./libraries/gpu_libraries/linear_algebra)
- [Fast Fourier Transforms](./libraries/gpu_libraries/fft)
- [Random Numbers](./libraries/gpu_libraries/rand)
* [Linear Algebra Libraries](./libraries/gpu-libraries/math-linear-algebra.md)
* [Fast Fourier Transforms](./libraries/gpu-libraries/math-fft.md)
* [Random Numbers](./libraries/gpu-libraries/rand.md)
:::
:::{grid-item-card} [C++ Primitive Libraries](./libraries/gpu_libraries/c++_primitives)
:::{grid-item-card}
**[C++ Primitive Libraries](./libraries/gpu-libraries/c++primitives.md)**
ROCm template libraries for C++ primitives and algorithms are as follows:
- {doc}`rocPRIM <rocprim:index>`
- {doc}`rocThrust <rocthrust:index>`
- {doc}`hipCUB <hipcub:index>`
- {doc}`hipTensor <hiptensor:index>`
* {doc}`rocPRIM <rocprim:index>`
* {doc}`rocThrust <rocthrust:index>`
* {doc}`hipCUB <hipcub:index>`
* {doc}`hipTensor <hiptensor:index>`
:::
:::{grid-item-card} [Communication Libraries](./libraries/gpu_libraries/communication)
:::{grid-item-card} [Communication Libraries](./libraries/gpu-libraries/communication.md)
Inter and intra-node communication is supported by the following projects:
- {doc}`RCCL <rccl:index>`
* {doc}`RCCL <rccl:index>`
:::
:::{grid-item-card} [Artificial intelligence](../rocm_ai/rocm_ai)
:::{grid-item-card}
**[Artificial intelligence](../rocm-ai.md)**
Libraries related to AI.
- {doc}`MIOpen <miopen:index>`
- {doc}`Composable Kernel <composable_kernel:index>`
- {doc}`MIGraphX <amdmigraphx:index>`
* {doc}`MIOpen <miopen:index>`
* {doc}`Composable Kernel <composable_kernel:index>`
* {doc}`MIGraphX <amdmigraphx:index>`
* {doc}`MIVisionX <mivisionx:README>`
* {doc}`rocAL <rocal:README>`
:::
:::{grid-item-card}
**[OpenMP](./openmp/openmp.md)**
* [OpenMP Support Guide](./openmp/openmp.md)
:::
:::{grid-item-card} [Computer Vision](./computer_vision)
Computer vision related projects.
:::{grid-item-card}
**[Compilers and Tools](./compilers-tools/index.md)**
- {doc}`MIVisionX <mivisionx:README>`
- {doc}`rocAL <rocal:README>`
* [ROCmCC](./rocmcc/rocmcc.md)
* {doc}`ROCdbgapi <rocdbgapi:index>`
* {doc}`ROCgdb <rocgdb:index>`
* {doc}`ROCProfiler <rocprofiler:rocprof>`
* {doc}`ROCTracer <roctracer:index>`
:::
:::{grid-item-card} [OpenMP](openmp/openmp)
:::{grid-item-card}
**[Management Tools](./compilers-tools/management-tools.md)**
- [OpenMP Support Guide](openmp/openmp)
* {doc}`AMD SMI <amdsmi:index>`
* {doc}`ROCm SMI <rocm_smi_lib:index>`
* {doc}`ROCm Data Center Tool <rdc:index>`
:::
:::{grid-item-card} [Compilers and Tools](compilers_tools/index)
:::{grid-item-card}
**[Validation Tools](./compilers-tools/validation-tools.md)**
- [ROCmCC](./rocmcc/rocmcc)
- {doc}`ROCdbgapi <rocdbgapi:index>`
- {doc}`ROCgdb <rocgdb:index>`
- {doc}`ROCProfiler <rocprofiler:rocprof>`
- {doc}`ROCTracer <roctracer:index>`
* {doc}`ROCm Validation Suite <rocmvalidationsuite:index>`
* {doc}`TransferBench <transferbench:index>`
:::
:::{grid-item-card} [Management Tools](./compilers_tools/management_tools)
:::{grid-item-card} **GPU Architectures**
- {doc}`AMD SMI <amdsmi:index>`
- {doc}`ROCm SMI <rocm_smi_lib:index>`
- {doc}`ROCm Data Center Tool <rdc:index>`
:::
:::{grid-item-card} [Validation Tools](./compilers_tools/validation_tools)
- {doc}`ROCm Validation Suite <rocmvalidationsuite:index>`
- {doc}`TransferBench <transferbench:index>`
:::
:::{grid-item-card} GPU Architectures
- [AMD Instinct MI200](../conceptual/gpu_arch/mi250.md)
- [AMD Instinct MI100](../conceptual/gpu_arch/mi100.md)
* [AMD Instinct MI200](../conceptual/gpu-arch/mi250.md)
* [AMD Instinct MI100](../conceptual/gpu-arch/mi100.md)
:::

View File

@@ -4,29 +4,32 @@
:gutter: 1
:::{grid-item-card} {doc}`MIOpen <miopen:index>`
AMD's library for high performance machine learning primitives.
- {doc}`Documentation <miopen:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/MIOpen)
- [Changelog](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/CHANGELOG.md)
* {doc}`Documentation <miopen:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/MIOpen)
* [Changelog](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/CHANGELOG.md)
:::
:::{grid-item-card} {doc}`Composable Kernel <composable_kernel:index>`
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
- {doc}`Documentation <composable_kernel:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/composable_kernel)
- [Changelog](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/develop/CHANGELOG.md)
* {doc}`Documentation <composable_kernel:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/composable_kernel)
* [Changelog](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/develop/CHANGELOG.md)
:::
:::{grid-item-card} {doc}`MIGraphX <amdmigraphx:index>`
AMD MIGraphX is AMD's graph inference engine that accelerates machine learning model inference.
- {doc}`Documentation <amdmigraphx:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX)
- [Changelog](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/CHANGELOG.md)
* {doc}`Documentation <amdmigraphx:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX)
* [Changelog](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/CHANGELOG.md)
:::

View File

@@ -6,47 +6,51 @@ ROCm template libraries for algorithms are as follows:
:gutter: 1
:::{grid-item-card} {doc}`rocPRIM <rocprim:index>`
rocPRIM is an AMD GPU optimized template library of algorithm primitives, like
transforms, reductions, scans, etc. It also serves as a common back-end for
similar libraries found inside ROCm.
- {doc}`Documentation <rocprim:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocPRIM/)
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/CHANGELOG.md)
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocPRIM)
* {doc}`Documentation <rocprim:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocPRIM/)
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/CHANGELOG.md)
* [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocPRIM)
:::
:::{grid-item-card} {doc}`rocThrust <rocthrust:index>`
rocThrust is a template library of algorithm primitives with a Thrust-compatible
interface. Their CPU back-ends are identical, while the GPU back-end calls into
rocPRIM.
- {doc}`Documentation <rocthrust:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocThrust)
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocThrust/blob/develop/CHANGELOG.md)
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocThrust)
* {doc}`Documentation <rocthrust:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocThrust)
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocThrust/blob/develop/CHANGELOG.md)
* [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocThrust)
:::
:::{grid-item-card} {doc}`hipCUB <hipcub:index>`
hipCUB is a template library of algorithm primitives with a CUB-compatible
interface. It's back-end is rocPRIM.
- {doc}`Documentation <hipcub:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipCUB)
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/CHANGELOG.md)
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/hipCUB)
* {doc}`Documentation <hipcub:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipCUB)
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/CHANGELOG.md)
* [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/hipCUB)
:::
:::{grid-item-card} {doc}`hipTensor <hiptensor:index>`
hipTensor is AMD's C++ library for accelerating tensor primitives
based on the composable kernel library,
through general purpose kernel languages, like HIP C++.
- {doc}`Documentation <hiptensor:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipTensor)
* {doc}`Documentation <hiptensor:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipTensor)
:::

View File

@@ -4,15 +4,16 @@
:gutter: 1
:::{grid-item-card} {doc}`RCCL <rccl:index>`
RCCL (pronounced "Rickle") is a stand-alone library of standard collective communication routines for GPUs,
RCCL (pronounced "Rickle") is a standalone library of standard collective communication routines for GPUs,
implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, gather, scatter, and all-to-all.
The collective operations are implemented using ring and tree algorithms and have been optimized for
throughput and latency.
- {doc}`Documentation <rccl:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/rccl)
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CHANGELOG.md)
- [Examples](https://github.com/ROCmSoftwarePlatform/rccl/tree/develop/tools)
* {doc}`Documentation <rccl:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/rccl)
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CHANGELOG.md)
* [Examples](https://github.com/ROCmSoftwarePlatform/rccl/tree/develop/tools)
:::

View File

@@ -6,22 +6,24 @@ ROCm libraries for FFT are as follows:
:gutter: 1
:::{grid-item-card} {doc}`rocFFT <rocfft:index>`
rocFFT is an AMD GPU optimized library for FFT.
- {doc}`Documentation <rocfft:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocFFT)
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CHANGELOG.md)
* {doc}`Documentation <rocfft:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocFFT)
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CHANGELOG.md)
:::
:::{grid-item-card} {doc}`hipFFT <hipfft:index>`
hipFFT is a compatibility layer for GPU accelerated FFT optimized for AMD GPUs
using rocFFT. hipFFT allows for a common interface for other non AMD GPU
FFT libraries.
- {doc}`Documentation <hipfft:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipFFT)
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/CHANGELOG.md)
* {doc}`Documentation <hipfft:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipFFT)
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/CHANGELOG.md)
:::

View File

@@ -6,103 +6,113 @@ ROCm libraries for linear algebra are as follows:
:gutter: 1
:::{grid-item-card} {doc}`rocBLAS <rocblas:index>`
`rocBLAS` is an AMD GPU optimized library for BLAS (Basic Linear Algebra Subprograms).
- {doc}`Documentation <rocblas:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocBLAS)
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/CHANGELOG.md)
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocBLAS)
* {doc}`Documentation <rocblas:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocBLAS)
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/CHANGELOG.md)
* [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocBLAS)
:::
:::{grid-item-card} {doc}`hipBLAS <hipblas:index>`
`hipBLAS` is a compatibility layer for GPU accelerated BLAS optimized for AMD GPUs
via `rocBLAS` and `rocSOLVER`. `hipBLAS` allows for a common interface for other GPU
BLAS libraries.
- {doc}`Documentation <hipblas:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipBLAS)
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/CHANGELOG.md)
* {doc}`Documentation <hipblas:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipBLAS)
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/CHANGELOG.md)
:::
:::{grid-item-card} {doc}`hipBLASLt <hipblaslt:index>`
`hipBLASLt` is a library that provides general matrix-matrix operations with a
flexible API and extends functionalities beyond traditional BLAS library.
`hipBLASLt` is exposed APIs in HIP programming language with an underlying
optimized generator as a back-end kernel provider.
- {doc}`Documentation <hipblaslt:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipBLASLt)
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLASLt/blob/develop/CHANGELOG.md)
* {doc}`Documentation <hipblaslt:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipBLASLt)
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLASLt/blob/develop/CHANGELOG.md)
:::
:::{grid-item-card} {doc}`rocALUTION <rocalution:index>`
`rocALUTION` is a sparse linear algebra library with focus on exploring
fine-grained parallelism on top of AMD's ROCm runtime and toolchains, targeting
modern CPU and GPU platforms.
- {doc}`Documentation <rocalution:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocALUTION)
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocALUTION/blob/develop/CHANGELOG.md)
* {doc}`Documentation <rocalution:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocALUTION)
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocALUTION/blob/develop/CHANGELOG.md)
:::
:::{grid-item-card} {doc}`rocWMMA <rocwmma:index>`
`rocWMMA` provides an API to break down mixed precision matrix multiply-accumulate
(MMA) problems into fragments and distributes these over GPU wavefronts.
- {doc}`Documentation <rocwmma:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocWMMA)
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/CHANGELOG.md)
* {doc}`Documentation <rocwmma:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocWMMA)
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/CHANGELOG.md)
:::
:::{grid-item-card} {doc}`rocSOLVER <rocsolver:index>`
`rocSOLVER` provides a subset of LAPACK (Linear Algebra Package) functionality on the ROCm platform.
- {doc}`Documentation <rocsolver:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocSOLVER)
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/CHANGELOG.md)
* {doc}`Documentation <rocsolver:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocSOLVER)
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/CHANGELOG.md)
:::
:::{grid-item-card} {doc}`hipSOLVER <hipsolver:index>`
`hipSOLVER` is a LAPACK marshalling library supporting both `rocSOLVER` and `cuSOLVER`
as backends whilst exporting a unified interface.
- {doc}`Documentation <hipsolver:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipSOLVER)
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CHANGELOG.md)
* {doc}`Documentation <hipsolver:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipSOLVER)
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CHANGELOG.md)
:::
:::{grid-item-card} {doc}`rocSPARSE <rocsparse:index>`
`rocSPARSE` is a library to provide BLAS for sparse computations.
- {doc}`Documentation <rocsparse:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocSPARSE)
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/CHANGELOG.md)
* {doc}`Documentation <rocsparse:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocSPARSE)
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/CHANGELOG.md)
:::
:::{grid-item-card} {doc}`hipSPARSE <hipsparse:index>`
`hipSPARSE` is a marshalling library to provide sparse BLAS functionality,
supporting both `rocSPARSE` and `cuSPARSE` as backends.
- {doc}`Documentation <hipsparse:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipSPARSE)
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CHANGELOG.md)
* {doc}`Documentation <hipsparse:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipSPARSE)
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CHANGELOG.md)
:::
:::{grid-item-card} {doc}`hipSPARSELt <hipsparselt:index>`
`hipSPARSE` is a marshalling library to provide sparse BLAS functionality,
supporting both `rocSPARSELt` and `cuSPARSELt` as backends.
- {doc}`Documentation <hipsparselt:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipSPARSELt)
* {doc}`Documentation <hipsparselt:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipSPARSELt)
:::

View File

@@ -15,32 +15,35 @@ at compile-time of the hipLIB in question. For dynamic dispatch between vendor i
::::{grid} 1 2 3 3
:gutter: 1
:::{grid-item-card} [Linear Algebra Libraries](linear_algebra)
:::{grid-item-card}
**[Linear Algebra Libraries](./math-linear-algebra.md)**
- {doc}`rocBLAS <rocblas:index>`
- {doc}`hipBLAS <hipblas:index>`
- {doc}`hipBLASLt <hipblaslt:index>`
- {doc}`rocALUTION <rocalution:index>`
- {doc}`rocWMMA <rocwmma:index>`
- {doc}`rocSOLVER <rocsolver:index>`
- {doc}`hipSOLVER <hipsolver:index>`
- {doc}`rocSPARSE <rocsparse:index>`
- {doc}`hipSPARSE <hipsparse:index>`
- {doc}`hipSPARSELt <hipsparselt:index>`
* {doc}`rocBLAS <rocblas:index>`
* {doc}`hipBLAS <hipblas:index>`
* {doc}`hipBLASLt <hipblaslt:index>`
* {doc}`rocALUTION <rocalution:index>`
* {doc}`rocWMMA <rocwmma:index>`
* {doc}`rocSOLVER <rocsolver:index>`
* {doc}`hipSOLVER <hipsolver:index>`
* {doc}`rocSPARSE <rocsparse:index>`
* {doc}`hipSPARSE <hipsparse:index>`
* {doc}`hipSPARSELt <hipsparselt:index>`
:::
:::{grid-item-card} [Fast Fourier Transforms](fft)
:::{grid-item-card}
**[Fast Fourier Transforms](./math-fft.md)**
- {doc}`rocFFT <rocfft:index>`
- {doc}`hipFFT <hipfft:index>`
* {doc}`rocFFT <rocfft:index>`
* {doc}`hipFFT <hipfft:index>`
:::
:::{grid-item-card} [Random Numbers](rand)
:::{grid-item-card}
**[Random Numbers](./rand.md)**
- {doc}`rocRAND <rocrand:index>`
- {doc}`hipRAND <hiprand:index>`
* {doc}`rocRAND <rocrand:index>`
* {doc}`hipRAND <hiprand:index>`
:::

View File

@@ -4,23 +4,25 @@
:gutter: 1
:::{grid-item-card} {doc}`rocRAND <rocrand:index>`
rocRAND is an AMD GPU optimized library for pseudo-random number generators (PRNG).
- {doc}`Documentation <rocrand:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/rocRAND/)
- [Changelog](https://github.com/ROCmSoftwarePlatform/rocRAND/blob/develop/CHANGELOG.md)
- [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocRAND)
* {doc}`Documentation <rocrand:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocRAND/)
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocRAND/blob/develop/CHANGELOG.md)
* [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocRAND)
:::
:::{grid-item-card} {doc}`hipRAND <hiprand:index>`
hipRAND is a compatibility layer for GPU accelerated pseudo-random number
generation (PRNG) optimized for AMD GPUs using rocRAND. hipRAND allows for a
common interface for other non AMD GPU PRNG libraries.
- {doc}`Documentation <hiprand:index>`
- [GitHub](https://github.com/ROCmSoftwarePlatform/hipRAND/)
- [Changelog](https://github.com/ROCmSoftwarePlatform/hipRAND/blob/develop/CHANGELOG.md)
* {doc}`Documentation <hiprand:index>`
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipRAND/)
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipRAND/blob/develop/CHANGELOG.md)
:::

View File

@@ -1,8 +1,20 @@
# ROCm libraries
add links...
::::{grid} 1 1 2 2
:gutter: 1
* Math
* C++ primitive
* Communication
* Artificial intelligence
:::{grid-item-card}
**[AI libraries](./ai-libraries.md)**
:::
:::{grid-item-card}
**[Math libraries](./gpu-libraries/math.md)**
:::
:::{grid-item-card}
**[Communication libraries](./gpu-libraries/communication.md)**
:::
::::

View File

@@ -9,12 +9,12 @@ Along with host APIs, the OpenMP compilers support offloading code and data onto
GPU devices. This document briefly describes the installation location of the
OpenMP toolchain, example usage of device offloading, and usage of `rocprof`
with OpenMP applications. The GPUs supported are the same as those supported by
this ROCm release. See the list of supported GPUs in {doc}`../../about/release/linux_support`.
this ROCm release. See the list of supported GPUs for [Linux](../../about/compatibility/linux-support.md) and [Windows](../../about/compatibility/windows-support.md).
The ROCm OpenMP compiler is implemented using LLVM compiler technology.
The following image illustrates the internal steps taken to translate a users application into an executable that can offload computation to the AMDGPU. The compilation is a two-pass process. Pass 1 compiles the application to generate the CPU code and Pass 2 links the CPU code to the AMDGPU device code.
```{figure} ../../data/reference/openmp/openmp_toolchain.svg
```{figure} ../../data/reference/openmp/openmp-toolchain.svg
:name: openmp-toolchain
```
@@ -26,13 +26,10 @@ sub-directories are:
bin: Compilers (`flang` and `clang`) and other binaries.
- examples: The usage section below shows how to compile and run these programs.
- include: Header files.
- lib: Libraries including those required for target offload.
- lib-debug: Debug versions of the above libraries.
* examples: The usage section below shows how to compile and run these programs.
* include: Header files.
* lib: Libraries including those required for target offload.
* lib-debug: Debug versions of the above libraries.
## OpenMP: Usage
@@ -127,10 +124,10 @@ program with:
The following tracing options are widely used to generate useful information:
- **`--hsa-trace`**: This option is used to get a JSON output file with the HSA
* **`--hsa-trace`**: This option is used to get a JSON output file with the HSA
API execution traces and a flat profile in a CSV file.
- **`--sys-trace`**: This allows programmers to trace both HIP and HSA calls.
* **`--sys-trace`**: This allows programmers to trace both HIP and HSA calls.
Since this option results in loading ``libamdhip64.so``, follow the
prerequisite as mentioned above.
@@ -166,16 +163,16 @@ implemented in the past releases.
### Asynchronous Behavior in OpenMP Target Regions
- Controlling Asynchronous Behavior
* Controlling Asynchronous Behavior
The OpenMP offloading runtime executes in an asynchronous fashion by default, allowing multiple data transfers to start concurrently. However, if the data to be transferred becomes larger than the default threshold of 1MB, the runtime falls back to a synchronous data transfer. The buffers that have been locked already are always executed asynchronously.
You can overrule this default behavior by setting `LIBOMPTARGET_AMDGPU_MAX_ASYNC_COPY_BYTES` and `OMPX_FORCE_SYNC_REGIONS`. See the [Environment Variables](#environment-variables) table for details.
- Multithreaded Offloading on the Same Device
* Multithreaded Offloading on the Same Device
The `libomptarget` plugin for GPU offloading allows creation of separate configurable HSA queues per chiplet, which enables two or more threads to concurrently offload to the same device.
- Parallel Memory Copy Invocations
* Parallel Memory Copy Invocations
Implicit asynchronous execution of single target region enables parallel memory copy invocations.
@@ -187,11 +184,9 @@ with Xnack capability.
#### Prerequisites
- Linux Kernel versions above 5.14
- Latest KFD driver packaged in ROCm stack
- Xnack, as USM support can only be tested with applications compiled with Xnack
* Linux Kernel versions above 5.14
* Latest KFD driver packaged in ROCm stack
* Xnack, as USM support can only be tested with applications compiled with Xnack
capability
#### Xnack Capability
@@ -220,13 +215,13 @@ HSA_XNACK=1
When Xnack support is not needed:
- Build the applications to maximize resource utilization using:
* Build the applications to maximize resource utilization using:
```bash
--offload-arch=gfx908:xnack-
```
- At runtime, set the `HSA_XNACK` environment variable to 0.
* At runtime, set the `HSA_XNACK` environment variable to 0.
#### Unified Shared Memory Pragma
@@ -376,27 +371,19 @@ GPUs with applications written in both HIP and OpenMP.
**Features Supported on Host Platform (Target x86_64):**
- Use-after-free
- Buffer overflows
- Heap buffer overflow
- Stack buffer overflow
- Global buffer overflow
- Use-after-return
- Use-after-scope
- Initialization order bugs
* Use-after-free
* Buffer overflows
* Heap buffer overflow
* Stack buffer overflow
* Global buffer overflow
* Use-after-return
* Use-after-scope
* Initialization order bugs
**Features Supported on AMDGPU Platform (`amdgcn-amd-amdhsa`):**
- Heap buffer overflow
- Global buffer overflow
* Heap buffer overflow
* Global buffer overflow
**Software (Kernel/OS) Requirements:** Unified Shared Memory support with Xnack
capability. See the section on [Unified Shared Memory](#unified-shared-memory)
@@ -404,7 +391,7 @@ for prerequisites and details on Xnack.
**Example:**
- Heap buffer overflow
* Heap buffer overflow
```bash
void main() {
@@ -424,7 +411,7 @@ void main() {
See the complete sample code for heap buffer overflow
[here](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/examples/tools/asan/heap_buffer_overflow/openmp/vecadd-HBO.cpp).
- Global buffer overflow
* Global buffer overflow
```bash
#pragma omp declare target
@@ -453,33 +440,31 @@ See the complete sample code for global buffer overflow
You can use the clang compiler option `-fopenmp-target-fast` for kernel optimization if certain constraints implied by its component options are satisfied. `-fopenmp-target-fast` enables the following options:
- `-fopenmp-target-ignore-env-vars`: It enables code generation of specialized kernels including No-loop and Cross-team reductions.
* `-fopenmp-target-ignore-env-vars`: It enables code generation of specialized kernels including No-loop and Cross-team reductions.
- `-fopenmp-assume-no-thread-state`: It enables the compiler to assume that no thread in a parallel region modifies an Internal Control Variable (`ICV`), thus potentially reducing the device runtime code execution.
* `-fopenmp-assume-no-thread-state`: It enables the compiler to assume that no thread in a parallel region modifies an Internal Control Variable (`ICV`), thus potentially reducing the device runtime code execution.
- `-fopenmp-assume-no-nested-parallelism`: It enables the compiler to assume that no thread in a parallel region encounters a parallel region, thus potentially reducing the device runtime code execution.
* `-fopenmp-assume-no-nested-parallelism`: It enables the compiler to assume that no thread in a parallel region encounters a parallel region, thus potentially reducing the device runtime code execution.
- `-O3` if no `-O*` is specified by the user.
* `-O3` if no `-O*` is specified by the user.
### Specialized Kernels
Clang will attempt to generate specialized kernels based on compiler options and OpenMP constructs. The following specialized kernels are supported:
- No-Loop
- Big-Jump-Loop
- Cross-Team (Xteam) Reductions
* No-Loop
* Big-Jump-Loop
* Cross-Team (Xteam) Reductions
To enable the generation of specialized kernels, follow these guidelines:
- Do not specify teams, threads, and schedule-related environment variables. The `num_teams` clause in an OpenMP target construct acts as an override and prevents the generation of the No-Loop kernel. If the specification of `num_teams` clause is a user requirement then clang tries to generate the Big-Jump-Loop kernel instead of the No-Loop kernel.
* Do not specify teams, threads, and schedule-related environment variables. The `num_teams` clause in an OpenMP target construct acts as an override and prevents the generation of the No-Loop kernel. If the specification of `num_teams` clause is a user requirement then clang tries to generate the Big-Jump-Loop kernel instead of the No-Loop kernel.
- Assert the absence of the teams, threads, and schedule-related environment variables by adding the command-line option `-fopenmp-target-ignore-env-vars`.
* Assert the absence of the teams, threads, and schedule-related environment variables by adding the command-line option `-fopenmp-target-ignore-env-vars`.
- To automatically enable the specialized kernel generation, use `-Ofast` or `-fopenmp-target-fast` for compilation.
* To automatically enable the specialized kernel generation, use `-Ofast` or `-fopenmp-target-fast` for compilation.
- To disable specialized kernel generation, use `-fno-openmp-target-ignore-env-vars`.
* To disable specialized kernel generation, use `-fno-openmp-target-ignore-env-vars`.
#### No-Loop Kernel Generation

View File

@@ -19,15 +19,15 @@ The differences are listed in [the table below](rocm-llvm-vs-alt).
For more details, see:
- AMD GPU usage: [llvm.org/docs/AMDGPUUsage.html](https://llvm.org/docs/AMDGPUUsage.html)
- Releases and source: <https://github.com/RadeonOpenCompute/llvm-project>
* AMD GPU usage: [llvm.org/docs/AMDGPUUsage.html](https://llvm.org/docs/AMDGPUUsage.html)
* Releases and source: <https://github.com/RadeonOpenCompute/llvm-project>
### ROCm Compiler Interfaces
ROCm currently provides two compiler interfaces for compiling HIP programs:
- `/opt/rocm/bin/hipcc`
- `/opt/rocm/bin/amdclang++`
* `/opt/rocm/bin/hipcc`
* `/opt/rocm/bin/amdclang++`
Both leverage the same LLVM compiler technology with the AMD GCN GPU support;
however, they offer a slightly different user experience. The `hipcc` command-line
@@ -237,8 +237,8 @@ minimized if the hoisted condition is executed more often. This heuristic
prioritizes the conditions based on the number of times they are used within the
loop. The heuristic can be controlled with the following options:
- `-unswitch-identical-branches-min-count=<n>`
- Enables unswitching of a loop with respect to a branch conditional value
* `-unswitch-identical-branches-min-count=<n>`
* Enables unswitching of a loop with respect to a branch conditional value
(B), where B appears in at least `<n>` compares in the loop. This option is
enabled with `-aggressive-loop-unswitch`. The default value is 3.
@@ -246,8 +246,8 @@ loop. The heuristic can be controlled with the following options:
Where, `n` is a positive integer and lower value of `<n>` facilitates more
unswitching.
- `-unswitch-identical-branches-max-count=<n>`
- Enables unswitching of a loop with respect to a branch conditional value
* `-unswitch-identical-branches-max-count=<n>`
* Enables unswitching of a loop with respect to a branch conditional value
(B), where B appears in at most `<n>` compares in the loop. This option is
enabled with `-aggressive-loop-unswitch`. The default value is 6.
@@ -436,19 +436,19 @@ Inline assembly (ASM) statements allow a developer to include assembly
instructions directly in either host or device code. While the ROCm compiler
supports ASM statements, their use is not recommended for the following reasons:
- The compiler's ability to produce both correct code and to optimize
* The compiler's ability to produce both correct code and to optimize
surrounding code is impeded.
- The compiler does not parse the content of the ASM statements and so
* The compiler does not parse the content of the ASM statements and so
cannot "see" its contents.
- The compiler must make conservative assumptions in an effort to retain
* The compiler must make conservative assumptions in an effort to retain
correctness.
- The conservative assumptions may yield code that, on the whole, is less
* The conservative assumptions may yield code that, on the whole, is less
performant compared to code without ASM statements. It is possible that a
syntactically correct ASM statement may cause incorrect runtime behavior.
- ASM statements are often ASIC-specific; code containing them is less portable
* ASM statements are often ASIC-specific; code containing them is less portable
and adds a maintenance burden to the developer if different ASICs are
targeted.
- Writing correct ASM statements is often difficult; we strongly recommend
* Writing correct ASM statements is often difficult; we strongly recommend
thorough testing of any use of ASM statements.
:::{note}
@@ -608,9 +608,9 @@ architectures.
The ROCmCC compiler is enhanced to generate binaries that can contain
heterogenous images. This heterogeneity could be in terms of:
- Images of different architectures, like AMD GCN and NVPTX
- Images of same architectures but for different GPUs, like gfx906 and gfx908
- Images of same architecture and same GPU but for different target features,
* Images of different architectures, like AMD GCN and NVPTX
* Images of same architectures but for different GPUs, like gfx906 and gfx908
* Images of same architecture and same GPU but for different target features,
like `gfx908:xnack+` and `gfx908:xnack-`
An appropriate image is selected by the OpenMP device runtime for execution