mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-09 14:48:06 -05:00
Update titles to sentence case (#2455)
This commit is contained in:
@@ -1 +1 @@
|
||||
# Development Tools
|
||||
# Development tools
|
||||
|
||||
@@ -9,7 +9,6 @@ The AMD Debugger API is a library that provides all the support necessary for a
|
||||
debugger and other tools to perform low level control of the execution and
|
||||
inspection of execution state of AMD's commercially available GPU architectures.
|
||||
|
||||
* {doc}`Documentation <rocdbgapi:index>`
|
||||
* [GitHub](https://github.com/ROCm-Developer-Tools/ROCdbgapi/)
|
||||
|
||||
:::
|
||||
@@ -21,15 +20,12 @@ ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance
|
||||
computing on AMD GPUs and CPUs and supports various heterogeneous programming
|
||||
models such as HIP, OpenMP, and OpenCL.
|
||||
|
||||
* [Documentation](../rocmcc/rocmcc.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} {doc}`ROCgdb <rocgdb:index>`
|
||||
|
||||
This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.
|
||||
|
||||
* {doc}`Documentation <rocgdb:index>`
|
||||
* [GitHub](https://github.com/ROCm-Developer-Tools/ROCgdb/)
|
||||
|
||||
:::
|
||||
@@ -38,7 +34,6 @@ This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU
|
||||
|
||||
ROC profiler library. Profiling with performance counters and derived metrics. Library supports GFX8/GFX9. Hardware specific low-level performance analysis interface for profiling of GPU compute applications. The profiling includes hardware performance counters with complex performance metrics.
|
||||
|
||||
* {doc}`Documentation <rocprofiler:rocprof>`
|
||||
* [GitHub](https://github.com/ROCm-Developer-Tools/rocprofiler/)
|
||||
|
||||
:::
|
||||
@@ -47,13 +42,12 @@ ROC profiler library. Profiling with performance counters and derived metrics. L
|
||||
|
||||
Callback/Activity Library for Performance tracing AMD GPUs
|
||||
|
||||
* {doc}`Documentation <roctracer:index>`
|
||||
* [GitHub](https://github.com/ROCm-Developer-Tools/roctracer)
|
||||
|
||||
:::
|
||||
|
||||
:::::
|
||||
|
||||
## See Also
|
||||
## See also
|
||||
|
||||
* [Compiler Disambiguation](../../conceptual/compiler-disambiguation.md)
|
||||
* [Compiler disambiguation](../../conceptual/compiler-disambiguation.md)
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Management Tools
|
||||
# Management tools
|
||||
|
||||
:::::{grid} 1 1 3 3
|
||||
:gutter: 1
|
||||
@@ -14,9 +14,8 @@ The AMD System Management Interface Library, or AMD SMI library, is a C library
|
||||
|
||||
:::{grid-item-card} {doc}`ROCm SMI LIB <rocm_smi_lib:index>`
|
||||
|
||||
This tool acts as a command line interface for manipulating and monitoring the AMD GPU kernel, and is intended to replace and deprecate the existing `rocm_smi.py` CLI tool. It uses `ctypes` to call the `rocm_smi_lib` API.
|
||||
This tool acts as a command-line interface (CLI) for manipulating and monitoring the AMD GPU kernel, and is intended to replace and deprecate the existing `rocm_smi.py` CLI tool. It uses `ctypes` to call the `rocm_smi_lib` API.
|
||||
|
||||
* {doc}`Documentation <rocm_smi_lib:index>`
|
||||
* [GitHub](https://github.com/RadeonOpenCompute/rocm_smi_lib)
|
||||
* [Examples](https://github.com/RadeonOpenCompute/rocm_smi_lib/tree/master/python_smi_tools)
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Validation Tools
|
||||
# Validation tools
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
@@ -7,7 +7,6 @@
|
||||
|
||||
The ROCm Validation Suite is a system administrator’s and cluster manager's tool for detecting and troubleshooting common problems affecting AMD GPU(s) running in a high-performance computing environment, enabled using the ROCm software stack on a compatible platform.
|
||||
|
||||
* {doc}`Documentation <rocmvalidationsuite:index>`
|
||||
* [GitHub](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite)
|
||||
* [Changelog](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/blob/master/CHANGELOG.md)
|
||||
|
||||
@@ -17,7 +16,6 @@ The ROCm Validation Suite is a system administrator’s and cluster manager's to
|
||||
|
||||
TransferBench is a simple utility capable of benchmarking simultaneous transfers between user-specified devices (CPUs/GPUs).
|
||||
|
||||
* {doc}`Documentation <transferbench:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/TransferBench/)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/TransferBench/blob/develop/CHANGELOG.md)
|
||||
* {doc}`transferbench:examples/index`
|
||||
|
||||
@@ -3,17 +3,16 @@
|
||||
HIP is both AMD's GPU programming language extension and the GPU runtime. This
|
||||
page introduces the HIP runtime and other HIP libraries and tools.
|
||||
|
||||
## HIP Runtime
|
||||
## HIP runtime
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} {doc}`HIP Runtime <hip:index>`
|
||||
:::{grid-item-card} {doc}`HIP runtime <hip:index>`
|
||||
|
||||
The HIP Runtime is used to enable GPU acceleration for all HIP language based
|
||||
The HIP runtime is used to enable GPU acceleration for all HIP language based
|
||||
products.
|
||||
|
||||
* {doc}`Documentation <hip:index>`
|
||||
* [GitHub](https://github.com/ROCm-Developer-Tools/HIP)
|
||||
* [Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)
|
||||
|
||||
@@ -28,10 +27,9 @@ products.
|
||||
|
||||
:::{grid-item-card} {doc}`HIPIFY <hipify:index>`
|
||||
|
||||
HIPIFY assists with porting applications from based on CUDA to the HIP Runtime.
|
||||
HIPIFY assists with porting applications from based on CUDA to the HIP runtime.
|
||||
Supported CUDA APIs are documented here as well.
|
||||
|
||||
* {doc}`Documentation <hipify:index>`
|
||||
* [GitHub](https://github.com/ROCm-Developer-Tools/HIPIFY/)
|
||||
* [Changelog](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/CHANGELOG.md)
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Reference material
|
||||
|
||||
## ROCm Software Groups
|
||||
## ROCm software groups
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
@@ -10,14 +10,13 @@
|
||||
|
||||
HIP is both AMD's GPU programming language extension and the GPU runtime.
|
||||
|
||||
* {doc}`HIP <hip:index>`
|
||||
* [HIP Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)
|
||||
* {doc}`HIPIFY <hipify:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**[Math Libraries](./libraries/gpu-libraries/math.md)**
|
||||
**[Math libraries](./libraries/gpu-libraries/math.md)**
|
||||
|
||||
HIP Math Libraries support the following domains:
|
||||
|
||||
@@ -28,7 +27,7 @@ HIP Math Libraries support the following domains:
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**[C++ Primitive Libraries](./libraries/gpu-libraries/c++primitives.md)**
|
||||
**[C++ primitive libraries](./libraries/gpu-libraries/c++primitives.md)**
|
||||
|
||||
ROCm template libraries for C++ primitives and algorithms are as follows:
|
||||
|
||||
@@ -39,7 +38,7 @@ ROCm template libraries for C++ primitives and algorithms are as follows:
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Communication Libraries](./libraries/gpu-libraries/communication.md)
|
||||
:::{grid-item-card} [Communication libraries](./libraries/gpu-libraries/communication.md)
|
||||
Inter and intra-node communication is supported by the following projects:
|
||||
|
||||
* {doc}`RCCL <rccl:index>`
|
||||
@@ -47,26 +46,26 @@ Inter and intra-node communication is supported by the following projects:
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**[Artificial intelligence](../rocm-ai.md)**
|
||||
**Artificial intelligence**
|
||||
|
||||
Libraries related to AI.
|
||||
Libraries related to artificial intelligence.
|
||||
|
||||
* {doc}`MIOpen <miopen:index>`
|
||||
* {doc}`Composable Kernel <composable_kernel:index>`
|
||||
* {doc}`MIGraphX <amdmigraphx:index>`
|
||||
* {doc}`MIVisionX <mivisionx:README>`
|
||||
* {doc}`rocAL <rocal:README>`
|
||||
* {doc}`ROCm Augmentation Library (rocAL) <rocal:README>`
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**[OpenMP](./openmp/openmp.md)**
|
||||
|
||||
* [OpenMP Support Guide](./openmp/openmp.md)
|
||||
* [OpenMP support guide](./openmp/openmp.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**[Compilers and Tools](./compilers-tools/index.md)**
|
||||
**[Compilers and tools](./compilers-tools/index.md)**
|
||||
|
||||
* [ROCmCC](./rocmcc/rocmcc.md)
|
||||
* {doc}`ROCdbgapi <rocdbgapi:index>`
|
||||
@@ -77,7 +76,7 @@ Libraries related to AI.
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**[Management Tools](./compilers-tools/management-tools.md)**
|
||||
**[Management tools](./compilers-tools/management-tools.md)**
|
||||
|
||||
* {doc}`AMD SMI <amdsmi:index>`
|
||||
* {doc}`ROCm SMI <rocm_smi_lib:index>`
|
||||
@@ -86,14 +85,14 @@ Libraries related to AI.
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**[Validation Tools](./compilers-tools/validation-tools.md)**
|
||||
**[Validation tools](./compilers-tools/validation-tools.md)**
|
||||
|
||||
* {doc}`ROCm Validation Suite <rocmvalidationsuite:index>`
|
||||
* {doc}`TransferBench <transferbench:index>`
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} **GPU Architectures**
|
||||
:::{grid-item-card} **GPU architectures**
|
||||
|
||||
* [AMD Instinct MI200](../conceptual/gpu-arch/mi250.md)
|
||||
* [AMD Instinct MI100](../conceptual/gpu-arch/mi100.md)
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# AI Libraries
|
||||
# AI libraries
|
||||
|
||||
::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
@@ -7,7 +7,6 @@
|
||||
|
||||
AMD's library for high performance machine learning primitives.
|
||||
|
||||
* {doc}`Documentation <miopen:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/MIOpen)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/CHANGELOG.md)
|
||||
|
||||
@@ -17,7 +16,6 @@ AMD's library for high performance machine learning primitives.
|
||||
|
||||
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
|
||||
|
||||
* {doc}`Documentation <composable_kernel:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/composable_kernel)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/develop/CHANGELOG.md)
|
||||
|
||||
@@ -27,7 +25,6 @@ Composable Kernel: Performance Portable Programming Model for Machine Learning T
|
||||
|
||||
AMD MIGraphX is AMD's graph inference engine that accelerates machine learning model inference.
|
||||
|
||||
* {doc}`Documentation <amdmigraphx:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/CHANGELOG.md)
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# C++ Primitive Libraries
|
||||
# C++ primitive libraries
|
||||
|
||||
ROCm template libraries for algorithms are as follows:
|
||||
|
||||
@@ -11,7 +11,6 @@ rocPRIM is an AMD GPU optimized template library of algorithm primitives, like
|
||||
transforms, reductions, scans, etc. It also serves as a common back-end for
|
||||
similar libraries found inside ROCm.
|
||||
|
||||
* {doc}`Documentation <rocprim:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocPRIM/)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/CHANGELOG.md)
|
||||
* [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocPRIM)
|
||||
@@ -24,7 +23,6 @@ rocThrust is a template library of algorithm primitives with a Thrust-compatible
|
||||
interface. Their CPU back-ends are identical, while the GPU back-end calls into
|
||||
rocPRIM.
|
||||
|
||||
* {doc}`Documentation <rocthrust:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocThrust)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocThrust/blob/develop/CHANGELOG.md)
|
||||
* [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocThrust)
|
||||
@@ -36,7 +34,6 @@ rocPRIM.
|
||||
hipCUB is a template library of algorithm primitives with a CUB-compatible
|
||||
interface. It's back-end is rocPRIM.
|
||||
|
||||
* {doc}`Documentation <hipcub:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipCUB)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/CHANGELOG.md)
|
||||
* [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/hipCUB)
|
||||
@@ -49,7 +46,6 @@ hipTensor is AMD's C++ library for accelerating tensor primitives
|
||||
based on the composable kernel library,
|
||||
through general purpose kernel languages, like HIP C++.
|
||||
|
||||
* {doc}`Documentation <hiptensor:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipTensor)
|
||||
|
||||
:::
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Communication Libraries
|
||||
# Communication libraries
|
||||
|
||||
:::::{grid} 1 1 1 1
|
||||
:gutter: 1
|
||||
@@ -10,7 +10,6 @@ implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, gather,
|
||||
The collective operations are implemented using ring and tree algorithms and have been optimized for
|
||||
throughput and latency.
|
||||
|
||||
* {doc}`Documentation <rccl:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rccl)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CHANGELOG.md)
|
||||
* [Examples](https://github.com/ROCmSoftwarePlatform/rccl/tree/develop/tools)
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Fast Fourier transforms
|
||||
|
||||
ROCm libraries for FFT are as follows:
|
||||
ROCm libraries for fast Fourier transforms (FFTs) are as follows:
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Linear Algebra Libraries
|
||||
# Linear algebra libraries
|
||||
|
||||
ROCm libraries for linear algebra are as follows:
|
||||
|
||||
@@ -9,7 +9,6 @@ ROCm libraries for linear algebra are as follows:
|
||||
|
||||
`rocBLAS` is an AMD GPU optimized library for BLAS (Basic Linear Algebra Subprograms).
|
||||
|
||||
* {doc}`Documentation <rocblas:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocBLAS)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/CHANGELOG.md)
|
||||
* [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocBLAS)
|
||||
@@ -22,7 +21,6 @@ ROCm libraries for linear algebra are as follows:
|
||||
via `rocBLAS` and `rocSOLVER`. `hipBLAS` allows for a common interface for other GPU
|
||||
BLAS libraries.
|
||||
|
||||
* {doc}`Documentation <hipblas:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipBLAS)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/CHANGELOG.md)
|
||||
|
||||
@@ -35,7 +33,6 @@ flexible API and extends functionalities beyond traditional BLAS library.
|
||||
`hipBLASLt` is exposed APIs in HIP programming language with an underlying
|
||||
optimized generator as a back-end kernel provider.
|
||||
|
||||
* {doc}`Documentation <hipblaslt:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipBLASLt)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLASLt/blob/develop/CHANGELOG.md)
|
||||
|
||||
@@ -47,7 +44,6 @@ optimized generator as a back-end kernel provider.
|
||||
fine-grained parallelism on top of AMD's ROCm runtime and toolchains, targeting
|
||||
modern CPU and GPU platforms.
|
||||
|
||||
* {doc}`Documentation <rocalution:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocALUTION)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocALUTION/blob/develop/CHANGELOG.md)
|
||||
|
||||
@@ -58,7 +54,6 @@ modern CPU and GPU platforms.
|
||||
`rocWMMA` provides an API to break down mixed precision matrix multiply-accumulate
|
||||
(MMA) problems into fragments and distributes these over GPU wavefronts.
|
||||
|
||||
* {doc}`Documentation <rocwmma:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocWMMA)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/CHANGELOG.md)
|
||||
|
||||
@@ -68,7 +63,6 @@ modern CPU and GPU platforms.
|
||||
|
||||
`rocSOLVER` provides a subset of LAPACK (Linear Algebra Package) functionality on the ROCm platform.
|
||||
|
||||
* {doc}`Documentation <rocsolver:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocSOLVER)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/CHANGELOG.md)
|
||||
|
||||
@@ -79,7 +73,6 @@ modern CPU and GPU platforms.
|
||||
`hipSOLVER` is a LAPACK marshalling library supporting both `rocSOLVER` and `cuSOLVER`
|
||||
as backends whilst exporting a unified interface.
|
||||
|
||||
* {doc}`Documentation <hipsolver:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipSOLVER)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CHANGELOG.md)
|
||||
|
||||
@@ -89,7 +82,6 @@ as backends whilst exporting a unified interface.
|
||||
|
||||
`rocSPARSE` is a library to provide BLAS for sparse computations.
|
||||
|
||||
* {doc}`Documentation <rocsparse:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocSPARSE)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/CHANGELOG.md)
|
||||
|
||||
@@ -100,7 +92,6 @@ as backends whilst exporting a unified interface.
|
||||
`hipSPARSE` is a marshalling library to provide sparse BLAS functionality,
|
||||
supporting both `rocSPARSE` and `cuSPARSE` as backends.
|
||||
|
||||
* {doc}`Documentation <hipsparse:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipSPARSE)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CHANGELOG.md)
|
||||
|
||||
@@ -111,7 +102,6 @@ supporting both `rocSPARSE` and `cuSPARSE` as backends.
|
||||
`hipSPARSE` is a marshalling library to provide sparse BLAS functionality,
|
||||
supporting both `rocSPARSELt` and `cuSPARSELt` as backends.
|
||||
|
||||
* {doc}`Documentation <hipsparselt:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipSPARSELt)
|
||||
|
||||
:::
|
||||
|
||||
@@ -16,7 +16,7 @@ at compile-time of the hipLIB in question. For dynamic dispatch between vendor i
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card}
|
||||
**[Linear Algebra Libraries](./math-linear-algebra.md)**
|
||||
**[Linear algebra libraries](./math-linear-algebra.md)**
|
||||
|
||||
* {doc}`rocBLAS <rocblas:index>`
|
||||
* {doc}`hipBLAS <hipblas:index>`
|
||||
@@ -40,7 +40,7 @@ at compile-time of the hipLIB in question. For dynamic dispatch between vendor i
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**[Random Numbers](./rand.md)**
|
||||
**[Random numbers](./rand.md)**
|
||||
|
||||
* {doc}`rocRAND <rocrand:index>`
|
||||
* {doc}`hipRAND <hiprand:index>`
|
||||
|
||||
@@ -1,13 +1,12 @@
|
||||
# Random Numbers
|
||||
# Random numbers
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} {doc}`rocRAND <rocrand:index>`
|
||||
|
||||
rocRAND is an AMD GPU optimized library for pseudo-random number generators (PRNG).
|
||||
rocRAND is an AMD GPU optimized library for pseudorandom number generators.
|
||||
|
||||
* {doc}`Documentation <rocrand:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/rocRAND/)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/rocRAND/blob/develop/CHANGELOG.md)
|
||||
* [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocRAND)
|
||||
@@ -16,11 +15,10 @@ rocRAND is an AMD GPU optimized library for pseudo-random number generators (PRN
|
||||
|
||||
:::{grid-item-card} {doc}`hipRAND <hiprand:index>`
|
||||
|
||||
hipRAND is a compatibility layer for GPU accelerated pseudo-random number
|
||||
generation (PRNG) optimized for AMD GPUs using rocRAND. hipRAND allows for a
|
||||
common interface for other non AMD GPU PRNG libraries.
|
||||
hipRAND is a compatibility layer for GPU accelerated pseudorandom number generation optimized for
|
||||
AMD GPUs using rocRAND. hipRAND allows for a common interface for other non AMD GPU
|
||||
pseudorandom number generation libraries.
|
||||
|
||||
* {doc}`Documentation <hiprand:index>`
|
||||
* [GitHub](https://github.com/ROCmSoftwarePlatform/hipRAND/)
|
||||
* [Changelog](https://github.com/ROCmSoftwarePlatform/hipRAND/blob/develop/CHANGELOG.md)
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# OpenMP Support in ROCm
|
||||
# OpenMP support in ROCm
|
||||
|
||||
## Introduction
|
||||
|
||||
@@ -14,9 +14,7 @@ this ROCm release. See the list of supported GPUs for [Linux](../../about/compat
|
||||
The ROCm OpenMP compiler is implemented using LLVM compiler technology.
|
||||
The following image illustrates the internal steps taken to translate a user’s application into an executable that can offload computation to the AMDGPU. The compilation is a two-pass process. Pass 1 compiles the application to generate the CPU code and Pass 2 links the CPU code to the AMDGPU device code.
|
||||
|
||||
```{figure} ../../data/reference/openmp/openmp-toolchain.svg
|
||||
:name: openmp-toolchain
|
||||
```
|
||||

|
||||
|
||||
### Installation
|
||||
|
||||
@@ -31,7 +29,7 @@ bin: Compilers (`flang` and `clang`) and other binaries.
|
||||
* lib: Libraries including those required for target offload.
|
||||
* lib-debug: Debug versions of the above libraries.
|
||||
|
||||
## OpenMP: Usage
|
||||
## OpenMP: usage
|
||||
|
||||
The example programs can be compiled and run by pointing the environment
|
||||
variable `ROCM_PATH` to the ROCm install directory.
|
||||
@@ -44,10 +42,10 @@ cd $ROCM_PATH/share/openmp-extras/examples/openmp/veccopy
|
||||
sudo make run
|
||||
```
|
||||
|
||||
:::{note}
|
||||
```{note}
|
||||
`sudo` is required since we are building inside the `/opt` directory.
|
||||
Alternatively, copy the files to your home directory first.
|
||||
:::
|
||||
```
|
||||
|
||||
The above invocation of Make compiles and runs the program. Note the options
|
||||
that are required for target offload from an OpenMP program:
|
||||
@@ -56,15 +54,13 @@ that are required for target offload from an OpenMP program:
|
||||
-fopenmp --offload-arch=<gpu-arch>
|
||||
```
|
||||
|
||||
:::{note}
|
||||
```{note}
|
||||
The compiler also accepts the alternative offloading notation:
|
||||
|
||||
```bash
|
||||
-fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=<gpu-arch>
|
||||
-fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=<gpu-arch>
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
Obtain the value of `gpu-arch` by running the following command:
|
||||
|
||||
```bash
|
||||
@@ -113,7 +109,7 @@ code compiled with AOMP:
|
||||
|
||||
For more details on `rocprof`, refer to the {doc}`ROCProfilerV1 User Manual <rocprofiler:rocprofv1>`.
|
||||
|
||||
### Using Tracing Options
|
||||
### Using tracing options
|
||||
|
||||
**Prerequisite:** When using the `--sys-trace` option, compile the OpenMP
|
||||
program with:
|
||||
@@ -139,7 +135,7 @@ HSA calls.
|
||||
|
||||
For more details on tracing, refer to the {doc}`ROCProfilerV1 User Manual <rocprofiler:rocprofv1>`.
|
||||
|
||||
### Environment Variables
|
||||
### Environment variables
|
||||
|
||||
:::{table}
|
||||
:widths: auto
|
||||
@@ -154,14 +150,14 @@ For more details on tracing, refer to the {doc}`ROCProfilerV1 User Manual <rocpr
|
||||
| `OMPX_FORCE_SYNC_REGIONS` | To force the runtime to execute all operations synchronously, i.e., wait for an operation to complete immediately. This affects data transfers and kernel execution. While it is mainly designed for debugging, it may have a minor positive effect on performance in certain situations. |
|
||||
:::
|
||||
|
||||
## OpenMP: Features
|
||||
## OpenMP: features
|
||||
|
||||
The OpenMP programming model is greatly enhanced with the following new features
|
||||
implemented in the past releases.
|
||||
|
||||
(openmp_usm)=
|
||||
|
||||
### Asynchronous Behavior in OpenMP Target Regions
|
||||
### Asynchronous behavior in OpenMP target regions
|
||||
|
||||
* Controlling Asynchronous Behavior
|
||||
|
||||
@@ -176,7 +172,7 @@ The `libomptarget` plugin for GPU offloading allows creation of separate configu
|
||||
|
||||
Implicit asynchronous execution of single target region enables parallel memory copy invocations.
|
||||
|
||||
### Unified Shared Memory
|
||||
### Unified shared memory
|
||||
|
||||
Unified Shared Memory (USM) provides a pointer-based approach to memory
|
||||
management. To implement USM, fulfill the following system requirements along
|
||||
@@ -189,7 +185,7 @@ with Xnack capability.
|
||||
* Xnack, as USM support can only be tested with applications compiled with Xnack
|
||||
capability
|
||||
|
||||
#### Xnack Capability
|
||||
#### Xnack capability
|
||||
|
||||
When enabled, Xnack capability allows GPU threads to access CPU (system) memory,
|
||||
allocated with OS-allocators, such as `malloc`, `new`, and `mmap`. Xnack must be
|
||||
@@ -223,7 +219,7 @@ When Xnack support is not needed:
|
||||
|
||||
* At runtime, set the `HSA_XNACK` environment variable to 0.
|
||||
|
||||
#### Unified Shared Memory Pragma
|
||||
#### Unified shared memory pragma
|
||||
|
||||
This OpenMP pragma is available on MI200 through `xnack+` support.
|
||||
|
||||
@@ -277,7 +273,7 @@ to by “b” are in coarse-grain memory during and after the execution of the
|
||||
target region. This is accomplished in the OpenMP runtime library with calls to
|
||||
the ROCr runtime to set the pages pointed by “b” as coarse grain.
|
||||
|
||||
### OMPT Target Support
|
||||
### OMPT target support
|
||||
|
||||
The OpenMP runtime in ROCm implements a subset of the OMPT device APIs, as
|
||||
described in the OpenMP specification document. These APIs allow first-party
|
||||
@@ -302,7 +298,7 @@ The file `veccopy-ompt-target-tracing.c` simulates how a tool initiates device
|
||||
activity tracing. The file `callbacks.h` shows the callbacks registered and
|
||||
implemented by the tool.
|
||||
|
||||
### Floating Point Atomic Operations
|
||||
### Floating point atomic operations
|
||||
|
||||
The MI200-series GPUs support the generation of hardware floating-point atomics
|
||||
using the OpenMP atomic pragma. The support includes single- and
|
||||
@@ -326,10 +322,10 @@ double a = 0.0;
|
||||
a = a + 1.0;
|
||||
```
|
||||
|
||||
:::{note}
|
||||
```{note}
|
||||
`AMD_unsafe_fp_atomics` is an alias for `AMD_fast_fp_atomics`, and
|
||||
`AMD_safe_fp_atomics` is implemented with a compare-and-swap loop.
|
||||
:::
|
||||
```
|
||||
|
||||
To disable the generation of fast floating-point atomic instructions at the file
|
||||
level, build using the option `-msafe-fp-atomics` or use a hint clause on a
|
||||
@@ -362,14 +358,14 @@ double b = 0.0;
|
||||
b = b + 1.0;
|
||||
```
|
||||
|
||||
### Address Sanitizer (ASan) Tool
|
||||
### AddressSanitizer tool
|
||||
|
||||
Address Sanitizer is a memory error detector tool utilized by applications to
|
||||
AddressSanitizer (ASan) is a memory error detector tool utilized by applications to
|
||||
detect various errors ranging from spatial issues such as out-of-bound access to
|
||||
temporal issues such as use-after-free. The AOMP compiler supports ASan for AMD
|
||||
GPUs with applications written in both HIP and OpenMP.
|
||||
|
||||
**Features Supported on Host Platform (Target x86_64):**
|
||||
**Features supported on host platform (Target x86_64):**
|
||||
|
||||
* Use-after-free
|
||||
* Buffer overflows
|
||||
@@ -380,12 +376,12 @@ GPUs with applications written in both HIP and OpenMP.
|
||||
* Use-after-scope
|
||||
* Initialization order bugs
|
||||
|
||||
**Features Supported on AMDGPU Platform (`amdgcn-amd-amdhsa`):**
|
||||
**Features supported on AMDGPU platform (`amdgcn-amd-amdhsa`):**
|
||||
|
||||
* Heap buffer overflow
|
||||
* Global buffer overflow
|
||||
|
||||
**Software (Kernel/OS) Requirements:** Unified Shared Memory support with Xnack
|
||||
**Software (kernel/OS) requirements:** Unified Shared Memory support with Xnack
|
||||
capability. See the section on [Unified Shared Memory](#unified-shared-memory)
|
||||
for prerequisites and details on Xnack.
|
||||
|
||||
@@ -436,7 +432,7 @@ for(int i=0; i<N; i++){
|
||||
See the complete sample code for global buffer overflow
|
||||
[here](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/examples/tools/asan/global_buffer_overflow/openmp/vecadd-GBO.cpp).
|
||||
|
||||
### Clang Compiler Option for Kernel Optimization
|
||||
### Clang compiler option for kernel optimization
|
||||
|
||||
You can use the clang compiler option `-fopenmp-target-fast` for kernel optimization if certain constraints implied by its component options are satisfied. `-fopenmp-target-fast` enables the following options:
|
||||
|
||||
@@ -448,7 +444,7 @@ You can use the clang compiler option `-fopenmp-target-fast` for kernel optimiza
|
||||
|
||||
* `-O3` if no `-O*` is specified by the user.
|
||||
|
||||
### Specialized Kernels
|
||||
### Specialized kernels
|
||||
|
||||
Clang will attempt to generate specialized kernels based on compiler options and OpenMP constructs. The following specialized kernels are supported:
|
||||
|
||||
@@ -466,14 +462,14 @@ To enable the generation of specialized kernels, follow these guidelines:
|
||||
|
||||
* To disable specialized kernel generation, use `-fno-openmp-target-ignore-env-vars`.
|
||||
|
||||
#### No-Loop Kernel Generation
|
||||
#### No-loop kernel generation
|
||||
|
||||
The No-loop kernel generation feature optimizes the compiler performance by generating a specialized kernel for certain OpenMP target constructs such as target teams distribute parallel for. The specialized kernel generation feature assumes every thread executes a single iteration of the user loop, which leads the runtime to launch a total number of GPU threads equal to or greater than the iteration space size of the target region loop. This allows the compiler to generate code for the loop body without an enclosing loop, resulting in reduced control-flow complexity and potentially better performance.
|
||||
|
||||
#### Big-Jump-Loop Kernel Generation
|
||||
#### Big-jump-loop kernel generation
|
||||
|
||||
A No-Loop kernel is not generated if the OpenMP teams construct uses a `num_teams` clause. Instead, the compiler attempts to generate a different specialized kernel called the Big-Jump-Loop kernel. The compiler launches the kernel with a grid size determined by the number of teams specified by the OpenMP `num_teams` clause and the `blocksize` chosen either by the compiler or specified by the corresponding OpenMP clause.
|
||||
A No-loop kernel is not generated if the OpenMP teams construct uses a `num_teams` clause. Instead, the compiler attempts to generate a different specialized kernel called the Big-Jump-Loop kernel. The compiler launches the kernel with a grid size determined by the number of teams specified by the OpenMP `num_teams` clause and the `blocksize` chosen either by the compiler or specified by the corresponding OpenMP clause.
|
||||
|
||||
#### Xteam Optimized Reduction Kernel Generation
|
||||
#### Cross-team optimized reduction kernel generation
|
||||
|
||||
If the OpenMP construct has a reduction clause, the compiler attempts to generate optimized code by utilizing efficient Xteam communication. New APIs for Xteam reduction are implemented in the device runtime and are automatically generated by clang.
|
||||
If the OpenMP construct has a reduction clause, the compiler attempts to generate optimized code by utilizing efficient cross-team communication. New APIs for cross-team reduction are implemented in the device runtime and are automatically generated by clang.
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Compiler Reference Guide
|
||||
# Compiler reference guide
|
||||
|
||||
## Introduction to Compiler Reference Guide
|
||||
## Introduction to compiler reference guide
|
||||
|
||||
ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance
|
||||
computing on AMD GPUs and CPUs and supports various heterogeneous programming
|
||||
@@ -22,7 +22,7 @@ For more details, see:
|
||||
* AMD GPU usage: [llvm.org/docs/AMDGPUUsage.html](https://llvm.org/docs/AMDGPUUsage.html)
|
||||
* Releases and source: <https://github.com/RadeonOpenCompute/llvm-project>
|
||||
|
||||
### ROCm Compiler Interfaces
|
||||
### ROCm compiler interfaces
|
||||
|
||||
ROCm currently provides two compiler interfaces for compiling HIP programs:
|
||||
|
||||
@@ -52,11 +52,11 @@ The major differences between `hipcc` and `amdclang++` are listed below:
|
||||
| Source code location | <https://github.com/ROCm-Developer-Tools/HIPCC> | <https://github.com/RadeonOpenCompute/llvm-project> |
|
||||
::::
|
||||
|
||||
## Compiler Options and Features
|
||||
## Compiler options and features
|
||||
|
||||
This chapter discusses compiler options and features.
|
||||
|
||||
### AMD GPU Compilation
|
||||
### AMD GPU compilation
|
||||
|
||||
This section outlines commonly used compiler flags for `hipcc` and `amdclang++`.
|
||||
:::{option} -x hip
|
||||
@@ -111,7 +111,7 @@ This section outlines commonly used compiler flags for `hipcc` and `amdclang++`.
|
||||
Generates relocatable device code, also known as separate compilation mode.
|
||||
:::
|
||||
|
||||
### AMD Optimizations for Zen Architectures
|
||||
### AMD optimizations for zen architectures
|
||||
|
||||
The CPU compiler optimizations described in this chapter originate from the AMD
|
||||
Optimizing C/C++ Compiler (AOCC) compiler. They are available in ROCmCC if the
|
||||
@@ -134,12 +134,12 @@ The `-famd-opt` flag is useful when a user wants to build with the proprietary
|
||||
optimization compiler and not have to depend on setting any of the other
|
||||
proprietary optimization flags.
|
||||
|
||||
:::{note}
|
||||
```{note}
|
||||
`-famd-opt` can be used in addition to the other proprietary CPU optimization
|
||||
flags. The table of optimizations below implicitly enables the invocation of the
|
||||
AMD proprietary optimizations compiler, whereas the `-famd-opt` flag requires
|
||||
this to be handled explicitly.
|
||||
:::
|
||||
```
|
||||
|
||||
#### `-fstruct-layout=[1,2,3,4,5,6,7]`
|
||||
|
||||
@@ -213,7 +213,7 @@ This is an experimental option to generate non-temporal store instruction for
|
||||
array accesses in a loop, whose iteration count cannot be determined at compile
|
||||
time. In this case, the compiler assumes the iteration count to be huge.
|
||||
|
||||
#### Optimizations Through Driver `-mllvm <options>`
|
||||
#### Optimizations through driver `-mllvm <options>`
|
||||
|
||||
The following optimization options must be invoked through driver
|
||||
`-mllvm <options>`:
|
||||
@@ -255,12 +255,12 @@ loop. The heuristic can be controlled with the following options:
|
||||
|
||||
Where, `n` is a positive integer and higher value of `<n>` facilitates more unswitching.
|
||||
|
||||
:::{note}
|
||||
```{note}
|
||||
These options may facilitate more unswitching under some workloads. Since
|
||||
loop-unswitching inherently leads to code bloat, facilitating more
|
||||
unswitching may significantly increase the code size. Hence, it may also lead
|
||||
to longer compilation times.
|
||||
:::
|
||||
```
|
||||
|
||||
##### `-enable-strided-vectorization`
|
||||
|
||||
@@ -430,7 +430,7 @@ such as loop transformations and other optimizations requiring de-linearized
|
||||
index expressions should use the Hz option. This option has no impact on any
|
||||
other aspects of the Flang front end.
|
||||
|
||||
### Inline ASM Statements
|
||||
### Inline ASM statements
|
||||
|
||||
Inline assembly (ASM) statements allow a developer to include assembly
|
||||
instructions directly in either host or device code. While the ROCm compiler
|
||||
@@ -451,18 +451,18 @@ supports ASM statements, their use is not recommended for the following reasons:
|
||||
* Writing correct ASM statements is often difficult; we strongly recommend
|
||||
thorough testing of any use of ASM statements.
|
||||
|
||||
:::{note}
|
||||
```{note}
|
||||
For developers who choose to include ASM statements in the code, AMD is
|
||||
interested in understanding the use case and appreciates feedback at
|
||||
[https://github.com/RadeonOpenCompute/ROCm/issues](https://github.com/RadeonOpenCompute/ROCm/issues)
|
||||
:::
|
||||
```
|
||||
|
||||
### Miscellaneous OpenMP Compiler Features
|
||||
### Miscellaneous OpenMP compiler features
|
||||
|
||||
This section discusses features that have been added or enhanced in the OpenMP
|
||||
compiler.
|
||||
|
||||
#### Offload-arch Tool
|
||||
#### Offload-arch tool
|
||||
|
||||
An LLVM library and tool that is used to query the execution capability of the
|
||||
current system as well as to query requirements of a binary file. It is used by
|
||||
@@ -534,7 +534,7 @@ There are symbolic link aliases `amdgpu-offload-arch` and `nvidia-arch` for
|
||||
These aliases are useful in determining whether architecture-specific tests
|
||||
should be run or to conditionally load architecture-specific software.
|
||||
|
||||
#### Command-Line Simplification Using `offload-arch` Flag
|
||||
#### Command-line simplification using `offload-arch` flag
|
||||
|
||||
Legacy mechanism of specifying offloading target for OpenMP involves using three
|
||||
flags, `-fopenmp-targets`, `-Xopenmp-target`, and `-march`. The first two flags
|
||||
@@ -563,7 +563,7 @@ clang -fopenmp -target x86_64-linux-gnu \
|
||||
To ensure backward compatibility, both styles are supported. This option is
|
||||
compatible with target ID support and multi-image fat binaries.
|
||||
|
||||
#### Target ID Support for OpenMP
|
||||
#### Target ID support for OpenMP
|
||||
|
||||
The ROCmCC compiler supports specification of target features along with the GPU
|
||||
name while specifying a target offload device in the command line, using
|
||||
@@ -603,7 +603,7 @@ to linker using `-plugin-opt=-mattr` flag. This feature is compatible with
|
||||
offload-arch command-line option and multi-image binaries for multiple
|
||||
architectures.
|
||||
|
||||
#### Multi-image Fat Binary for OpenMP
|
||||
#### Multi-image fat binary for OpenMP
|
||||
|
||||
The ROCmCC compiler is enhanced to generate binaries that can contain
|
||||
heterogenous images. This heterogeneity could be in terms of:
|
||||
@@ -656,7 +656,7 @@ of target triple and the target GPU (along with the associated target features).
|
||||
modified to query this structure to identify a compatible image based on the
|
||||
capability of the current system.
|
||||
|
||||
#### Unified Shared Memory (USM)
|
||||
#### Unified shared memory (USM)
|
||||
|
||||
The following OpenMP pragma is available on MI200, and it must be executed with
|
||||
`xnack+` support.
|
||||
@@ -668,7 +668,7 @@ omp requires unified_shared_memory
|
||||
For more details on USM refer to the {ref}`openmp_usm` section of the OpenMP
|
||||
Guide.
|
||||
|
||||
### Support Status of Other Clang Options
|
||||
### Support status of other Clang options
|
||||
|
||||
The following table lists the other Clang options and their support status.
|
||||
|
||||
@@ -679,8 +679,8 @@ The following table lists the other Clang options and their support status.
|
||||
:widths: auto
|
||||
:align: center
|
||||
|
||||
| **Option** | **Support Status** | **Description** |
|
||||
|------------------------------------------|:------------------:|--------------------------------------------------------------------------------------------------------------------------------|
|
||||
| **Option** | **Support Status** | **Description** |
|
||||
|--------------|:-----------------------:|-------------------------|
|
||||
| `-###` | Supported | Prints (but does not run) the commands to run for this compilation |
|
||||
| `--analyzer-output <value>` | Supported | "Static analyzer report output format (`html|plist|plist-multi-file|plist-html|sarif|text`)" |
|
||||
| `--analyze` | Supported | Runs the static analyzer |
|
||||
@@ -1184,7 +1184,7 @@ The following table lists the other Clang options and their support status.
|
||||
|-isysroot \<dir\>|Supported|Sets the system root directory (usually /)|
|
||||
|-isystem-after \<directory\>|Supported|Adds the directory to end of the SYSTEM include search path|
|
||||
|-isystem \<directory\>|Supported|Adds the directory to SYSTEM include search path|
|
||||
|-ivfsoverlay \<value\>|Supported|Overlays the virtual filesystem described by the specified file over the real file system|
|
||||
|-ivfsoverlay \<value\>|Supported|Overlays the virtual file system described by the specified file over the real file system|
|
||||
|-iwithprefixbefore \<dir\>|Supported|Sets the directory to include search path with prefix|
|
||||
|-iwithprefix \<dir\>|Supported|Sets the directory to SYSTEM include search path with prefix|
|
||||
|-iwithsysroot \<directory\>|Supported|Adds directory to SYSTEM include search path; absolute paths are relative to -isysroot|
|
||||
|
||||
Reference in New Issue
Block a user