mirror of
https://github.com/ROCm/ROCm.git
synced 2026-02-04 11:25:03 -05:00
Fix some linting issues (#2046)
This commit is contained in:
@@ -4,14 +4,14 @@
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [MIOpen](https://rocmdocs.amd.com/projects/MIOpen/en/latest/)
|
||||
AMD's library for high performance machine learning primitives.
|
||||
AMD's library for high performance machine learning primitives.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/MIOpen/en/latest/)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [Composable Kernel](https://rocmdocs.amd.com/projects/composable_kernel/en/latest/)
|
||||
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
|
||||
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/composable_kernel/en/latest/)
|
||||
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
# All Reference Material
|
||||
|
||||
## ROCm Software Groups
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
|
||||
@@ -11,7 +11,7 @@ ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance comp
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [ROCgdb](https://rocmdocs.amd.com/projects/ROCgdb/en/latest/)
|
||||
This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.
|
||||
This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/ROCgdb/en/latest/)
|
||||
|
||||
@@ -25,7 +25,7 @@ ROC profiler library. Profiling with perf-counters and derived metrics. Library
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [ROCTracer](https://rocmdocs.amd.com/projects/roctracer/en/latest/)
|
||||
Callback/Activity Library for Performance tracing AMD GPU's
|
||||
Callback/Activity Library for Performance tracing AMD GPU's
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/roctracer/en/latest/)
|
||||
|
||||
|
||||
@@ -4,14 +4,14 @@
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [MIVisionX](https://rocmdocs.amd.com/projects/MIVisionX/en/latest/)
|
||||
MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
|
||||
MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/MIVisionX/en/latest/)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [rocAL](https://rocmdocs.amd.com/projects/rocAL/en/latest/)
|
||||
The AMD ROCm Augmentation Library (rocAL) is designed to efficiently decode and process images and videos from a variety of storage formats and modify them through a processing graph programmable by the user. rocAL currently provides C API.
|
||||
The AMD ROCm Augmentation Library (rocAL) is designed to efficiently decode and process images and videos from a variety of storage formats and modify them through a processing graph programmable by the user. rocAL currently provides C API.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rocAL/en/latest/)
|
||||
|
||||
|
||||
@@ -50,34 +50,42 @@ Figure 1: Structure of a single GCD in the AMD Instinct MI250 accelerator.
|
||||
:header-rows: 1
|
||||
:name: mi250-perf
|
||||
|
||||
* - Computation and Data Type
|
||||
*
|
||||
- Computation and Data Type
|
||||
- FLOPS/CLOCK/CU
|
||||
- Peak TFLOPS
|
||||
* - Matrix FP64
|
||||
*
|
||||
- Matrix FP64
|
||||
- 256
|
||||
- 90.5
|
||||
* - Vector FP64
|
||||
*
|
||||
- Vector FP64
|
||||
- 128
|
||||
- 45.3
|
||||
* - Matrix FP32
|
||||
*
|
||||
- Matrix FP32
|
||||
- 256
|
||||
- 90.5
|
||||
* - Packed FP32
|
||||
*
|
||||
- Packed FP32
|
||||
- 256
|
||||
- 90.5
|
||||
* - Vector FP32
|
||||
*
|
||||
- Vector FP32
|
||||
- 128
|
||||
- 45.3
|
||||
* - Matrix FP16
|
||||
*
|
||||
- Matrix FP16
|
||||
- 1024
|
||||
- 362.1
|
||||
* - Matrix BF16
|
||||
*
|
||||
- Matrix BF16
|
||||
- 1024
|
||||
- 362.1
|
||||
* - Matrix INT8
|
||||
*
|
||||
- Matrix INT8
|
||||
- 1024
|
||||
- 362.1
|
||||
|
||||
```
|
||||
|
||||
{numref}`mi250-perf` summarizes the aggregated peak performance of the AMD
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
# C++ Primitive Libraries
|
||||
|
||||
ROCm template libraries for algorithms are as follows:
|
||||
|
||||
:::::{grid} 1 1 3 3
|
||||
|
||||
@@ -25,7 +25,7 @@ This tool acts as a command line interface for manipulating and monitoring the a
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [ROCm Datacenter Tool](https://rocmdocs.amd.com/projects/rdc/en/latest/)
|
||||
The ROCm™ Data Center Tool simplifies the administration and addresses key infrastructure challenges in AMD GPUs in cluster and datacenter environments.
|
||||
The ROCm™ Data Center Tool simplifies the administration and addresses key infrastructure challenges in AMD GPUs in cluster and datacenter environments.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/rdc/en/latest/)
|
||||
- [Examples](https://github.com/RadeonOpenCompute/rdc/tree/master/example)
|
||||
|
||||
@@ -77,7 +77,7 @@ For more details on rocprof, refer to the ROCm Profiling Tools document on [http
|
||||
**Prerequisite:** When using the --sys-trace option, compile the OpenMP program with:
|
||||
|
||||
```bash
|
||||
-Wl,–rpath,/opt/rocm-{version}/lib -lamdhip64
|
||||
-Wl,–rpath,/opt/rocm-{version}/lib -lamdhip64
|
||||
```
|
||||
|
||||
The following tracing options are widely used to generate useful information:
|
||||
@@ -159,25 +159,25 @@ A simple program demonstrating the use of this feature is:
|
||||
$ cat parallel_for.cpp
|
||||
#include <stdlib.h>
|
||||
#include <stdio.h>
|
||||
|
||||
|
||||
#define N 64
|
||||
#pragma omp requires unified_shared_memory
|
||||
int main() {
|
||||
int n = N;
|
||||
int *a = new int[n];
|
||||
int *b = new int[n];
|
||||
|
||||
|
||||
for(int i = 0; i < n; i++)
|
||||
b[i] = i;
|
||||
|
||||
|
||||
#pragma omp target parallel for map(to:b[:n])
|
||||
for(int i = 0; i < n; i++)
|
||||
a[i] = b[i];
|
||||
|
||||
|
||||
for(int i = 0; i < n; i++)
|
||||
if(a[i] != i)
|
||||
printf("error at %d: expected %d, got %d\n", i, i+1, a[i]);
|
||||
|
||||
|
||||
return 0;
|
||||
}
|
||||
$ clang++ -O2 -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx90a:xnack+ parallel_for.cpp
|
||||
@@ -231,7 +231,7 @@ See the example below, where the user builds the program using -msafe-fp-atomics
|
||||
double a = 0.0;.
|
||||
#pragma omp atomic hint(AMD_fast_fp_atomics)
|
||||
a = a + 1.0;
|
||||
|
||||
|
||||
double b = 0.0;
|
||||
#pragma omp atomic
|
||||
b = b + 1.0;
|
||||
@@ -260,11 +260,12 @@ Address Sanitizer is a memory error detector tool utilized by applications to de
|
||||
- Initialization order bugs
|
||||
|
||||
**Features Supported on AMDGPU Platform (amdgcn-amd-amdhsa):**
|
||||
|
||||
- Heap buffer overflow
|
||||
|
||||
- Global buffer overflow
|
||||
|
||||
**Software (Kernel/OS) Requirements:** Unified Shared Memory support with Xnack capability. See the section on [Unified Shared Memory](#unified-shared-memory) for prerequisites and details on Xnack.
|
||||
**Software (Kernel/OS) Requirements:** Unified Shared Memory support with Xnack capability. See the section on [Unified Shared Memory](#unified-shared-memory) for prerequisites and details on Xnack.
|
||||
|
||||
**Example:**
|
||||
|
||||
@@ -276,7 +277,7 @@ void main() {
|
||||
....... // Some program statements
|
||||
#pragma omp target map(to : A[0:N], B[0:N]) map(from: C[0:N])
|
||||
{
|
||||
#pragma omp parallel for
|
||||
#pragma omp parallel for
|
||||
for(int i =0 ; i < N; i++){
|
||||
C[i+10] = A[i] + B[i];
|
||||
} // end of for loop
|
||||
@@ -290,7 +291,7 @@ See the complete sample code for heap buffer overflow [here](https://github.com/
|
||||
- Global buffer overflow
|
||||
|
||||
```bash
|
||||
#pragma omp declare target
|
||||
#pragma omp declare target
|
||||
int A[N],B[N],C[N];
|
||||
#pragma omp end declare target
|
||||
void main(){
|
||||
@@ -300,7 +301,7 @@ void main(){
|
||||
{
|
||||
#pragma omp target update to(A,B)
|
||||
#pragma omp target parallel for
|
||||
for(int i=0; i<N; i++){
|
||||
for(int i=0; i<N; i++){
|
||||
C[i]=A[i*100]+B[i+22];
|
||||
} // end of for loop
|
||||
#pragma omp target update from(C)
|
||||
|
||||
@@ -18,6 +18,7 @@ The differences are listed in [the table below](rocm-llvm-vs-alt).
|
||||
:::
|
||||
|
||||
For more details, see:
|
||||
|
||||
- AMD GPU usage: [llvm.org/docs/AMDGPUUsage.html](https://llvm.org/docs/AMDGPUUsage.html)
|
||||
- Releases and source: <https://github.com/RadeonOpenCompute/llvm-project>
|
||||
|
||||
@@ -153,7 +154,6 @@ to perform this optimization. Users can choose different levels of
|
||||
aggressiveness with which this optimization can be applied to the application,
|
||||
with 1 being the least aggressive and 7 being the most aggressive level.
|
||||
|
||||
|
||||
:::{table} -fstruct-layout Values and Their Effects
|
||||
| -fstruct-layout value | Structure peeling | Pointer size after selective compression of self-referential pointers in structures, wherever safe | Type of structure fields eligible for compression | Whether compression performed under safety check |
|
||||
| ----------- | ----------- | ----------- | ----------- | ----------- |
|
||||
|
||||
@@ -4,14 +4,14 @@
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card} [RVS](https://rocmdocs.amd.com/projects/RVS/en/latest/)
|
||||
The ROCm Validation Suite is a system administrator’s and cluster manager's tool for detecting and troubleshooting common problems affecting AMD GPU(s) running in a high-performance computing environment, enabled using the ROCm software stack on a compatible platform.
|
||||
The ROCm Validation Suite is a system administrator’s and cluster manager's tool for detecting and troubleshooting common problems affecting AMD GPU(s) running in a high-performance computing environment, enabled using the ROCm software stack on a compatible platform.
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/RVS/en/latest/)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card} [TransferBench](https://rocmdocs.amd.com/projects/TransferBench/en/latest/)
|
||||
TransferBench is a simple utility capable of benchmarking simultaneous transfers between user-specified devices (CPUs/GPUs).
|
||||
TransferBench is a simple utility capable of benchmarking simultaneous transfers between user-specified devices (CPUs/GPUs).
|
||||
|
||||
- [Documentation](https://rocmdocs.amd.com/projects/TransferBench/en/latest/)
|
||||
- [Changelog](https://github.com/ROCmSoftwarePlatform/TransferBench/blob/develop/CHANGELOG.md)
|
||||
|
||||
Reference in New Issue
Block a user