Fix some linting issues (#2046)

This commit is contained in:
Nara
2023-04-14 15:17:21 +02:00
committed by GitHub
parent b81a27c2a2
commit 2de2059feb
29 changed files with 263 additions and 235 deletions

View File

@@ -4,14 +4,14 @@
:gutter: 1
:::{grid-item-card} [MIOpen](https://rocmdocs.amd.com/projects/MIOpen/en/latest/)
AMD's library for high performance machine learning primitives.
AMD's library for high performance machine learning primitives.
- [Documentation](https://rocmdocs.amd.com/projects/MIOpen/en/latest/)
:::
:::{grid-item-card} [Composable Kernel](https://rocmdocs.amd.com/projects/composable_kernel/en/latest/)
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
- [Documentation](https://rocmdocs.amd.com/projects/composable_kernel/en/latest/)

View File

@@ -1,6 +1,7 @@
# All Reference Material
## ROCm Software Groups
:::::{grid} 1 1 2 2
:gutter: 1

View File

@@ -11,7 +11,7 @@ ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance comp
:::
:::{grid-item-card} [ROCgdb](https://rocmdocs.amd.com/projects/ROCgdb/en/latest/)
This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.
This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.
- [Documentation](https://rocmdocs.amd.com/projects/ROCgdb/en/latest/)
@@ -25,7 +25,7 @@ ROC profiler library. Profiling with perf-counters and derived metrics. Library
:::
:::{grid-item-card} [ROCTracer](https://rocmdocs.amd.com/projects/roctracer/en/latest/)
Callback/Activity Library for Performance tracing AMD GPU's
Callback/Activity Library for Performance tracing AMD GPU's
- [Documentation](https://rocmdocs.amd.com/projects/roctracer/en/latest/)

View File

@@ -4,14 +4,14 @@
:gutter: 1
:::{grid-item-card} [MIVisionX](https://rocmdocs.amd.com/projects/MIVisionX/en/latest/)
MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
- [Documentation](https://rocmdocs.amd.com/projects/MIVisionX/en/latest/)
:::
:::{grid-item-card} [rocAL](https://rocmdocs.amd.com/projects/rocAL/en/latest/)
The AMD ROCm Augmentation Library (rocAL) is designed to efficiently decode and process images and videos from a variety of storage formats and modify them through a processing graph programmable by the user. rocAL currently provides C API.
The AMD ROCm Augmentation Library (rocAL) is designed to efficiently decode and process images and videos from a variety of storage formats and modify them through a processing graph programmable by the user. rocAL currently provides C API.
- [Documentation](https://rocmdocs.amd.com/projects/rocAL/en/latest/)

View File

@@ -50,34 +50,42 @@ Figure 1: Structure of a single GCD in the AMD Instinct MI250 accelerator.
:header-rows: 1
:name: mi250-perf
* - Computation and Data Type
*
- Computation and Data Type
- FLOPS/CLOCK/CU
- Peak TFLOPS
* - Matrix FP64
*
- Matrix FP64
- 256
- 90.5
* - Vector FP64
*
- Vector FP64
- 128
- 45.3
* - Matrix FP32
*
- Matrix FP32
- 256
- 90.5
* - Packed FP32
*
- Packed FP32
- 256
- 90.5
* - Vector FP32
*
- Vector FP32
- 128
- 45.3
* - Matrix FP16
*
- Matrix FP16
- 1024
- 362.1
* - Matrix BF16
*
- Matrix BF16
- 1024
- 362.1
* - Matrix INT8
*
- Matrix INT8
- 1024
- 362.1
```
{numref}`mi250-perf` summarizes the aggregated peak performance of the AMD

View File

@@ -1,4 +1,5 @@
# C++ Primitive Libraries
ROCm template libraries for algorithms are as follows:
:::::{grid} 1 1 3 3

View File

@@ -25,7 +25,7 @@ This tool acts as a command line interface for manipulating and monitoring the a
:::
:::{grid-item-card} [ROCm Datacenter Tool](https://rocmdocs.amd.com/projects/rdc/en/latest/)
The ROCm™ Data Center Tool simplifies the administration and addresses key infrastructure challenges in AMD GPUs in cluster and datacenter environments.
The ROCm™ Data Center Tool simplifies the administration and addresses key infrastructure challenges in AMD GPUs in cluster and datacenter environments.
- [Documentation](https://rocmdocs.amd.com/projects/rdc/en/latest/)
- [Examples](https://github.com/RadeonOpenCompute/rdc/tree/master/example)

View File

@@ -77,7 +77,7 @@ For more details on rocprof, refer to the ROCm Profiling Tools document on [http
**Prerequisite:** When using the --sys-trace option, compile the OpenMP program with:
```bash
-Wl,rpath,/opt/rocm-{version}/lib -lamdhip64
-Wl,rpath,/opt/rocm-{version}/lib -lamdhip64
```
The following tracing options are widely used to generate useful information:
@@ -159,25 +159,25 @@ A simple program demonstrating the use of this feature is:
$ cat parallel_for.cpp
#include <stdlib.h>
#include <stdio.h>
#define N 64
#pragma omp requires unified_shared_memory
int main() {
int n = N;
int *a = new int[n];
int *b = new int[n];
for(int i = 0; i < n; i++)
b[i] = i;
#pragma omp target parallel for map(to:b[:n])
for(int i = 0; i < n; i++)
a[i] = b[i];
for(int i = 0; i < n; i++)
if(a[i] != i)
printf("error at %d: expected %d, got %d\n", i, i+1, a[i]);
return 0;
}
$ clang++ -O2 -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx90a:xnack+ parallel_for.cpp
@@ -231,7 +231,7 @@ See the example below, where the user builds the program using -msafe-fp-atomics
double a = 0.0;.
#pragma omp atomic hint(AMD_fast_fp_atomics)
a = a + 1.0;
double b = 0.0;
#pragma omp atomic
b = b + 1.0;
@@ -260,11 +260,12 @@ Address Sanitizer is a memory error detector tool utilized by applications to de
- Initialization order bugs
**Features Supported on AMDGPU Platform (amdgcn-amd-amdhsa):**
- Heap buffer overflow
- Global buffer overflow
**Software (Kernel/OS) Requirements:** Unified Shared Memory support with Xnack capability. See the section on [Unified Shared Memory](#unified-shared-memory) for prerequisites and details on Xnack.
**Software (Kernel/OS) Requirements:** Unified Shared Memory support with Xnack capability. See the section on [Unified Shared Memory](#unified-shared-memory) for prerequisites and details on Xnack.
**Example:**
@@ -276,7 +277,7 @@ void main() {
....... // Some program statements
#pragma omp target map(to : A[0:N], B[0:N]) map(from: C[0:N])
{
#pragma omp parallel for
#pragma omp parallel for
for(int i =0 ; i < N; i++){
C[i+10] = A[i] + B[i];
} // end of for loop
@@ -290,7 +291,7 @@ See the complete sample code for heap buffer overflow [here](https://github.com/
- Global buffer overflow
```bash
#pragma omp declare target
#pragma omp declare target
int A[N],B[N],C[N];
#pragma omp end declare target
void main(){
@@ -300,7 +301,7 @@ void main(){
{
#pragma omp target update to(A,B)
#pragma omp target parallel for
for(int i=0; i<N; i++){
for(int i=0; i<N; i++){
C[i]=A[i*100]+B[i+22];
} // end of for loop
#pragma omp target update from(C)

View File

@@ -18,6 +18,7 @@ The differences are listed in [the table below](rocm-llvm-vs-alt).
:::
For more details, see:
- AMD GPU usage: [llvm.org/docs/AMDGPUUsage.html](https://llvm.org/docs/AMDGPUUsage.html)
- Releases and source: <https://github.com/RadeonOpenCompute/llvm-project>
@@ -153,7 +154,6 @@ to perform this optimization. Users can choose different levels of
aggressiveness with which this optimization can be applied to the application,
with 1 being the least aggressive and 7 being the most aggressive level.
:::{table} -fstruct-layout Values and Their Effects
| -fstruct-layout value | Structure peeling | Pointer size after selective compression of self-referential pointers in structures, wherever safe | Type of structure fields eligible for compression | Whether compression performed under safety check |
| ----------- | ----------- | ----------- | ----------- | ----------- |

View File

@@ -4,14 +4,14 @@
:gutter: 1
:::{grid-item-card} [RVS](https://rocmdocs.amd.com/projects/RVS/en/latest/)
The ROCm Validation Suite is a system administrators and cluster manager's tool for detecting and troubleshooting common problems affecting AMD GPU(s) running in a high-performance computing environment, enabled using the ROCm software stack on a compatible platform.
The ROCm Validation Suite is a system administrators and cluster manager's tool for detecting and troubleshooting common problems affecting AMD GPU(s) running in a high-performance computing environment, enabled using the ROCm software stack on a compatible platform.
- [Documentation](https://rocmdocs.amd.com/projects/RVS/en/latest/)
:::
:::{grid-item-card} [TransferBench](https://rocmdocs.amd.com/projects/TransferBench/en/latest/)
TransferBench is a simple utility capable of benchmarking simultaneous transfers between user-specified devices (CPUs/GPUs).
TransferBench is a simple utility capable of benchmarking simultaneous transfers between user-specified devices (CPUs/GPUs).
- [Documentation](https://rocmdocs.amd.com/projects/TransferBench/en/latest/)
- [Changelog](https://github.com/ROCmSoftwarePlatform/TransferBench/blob/develop/CHANGELOG.md)