Update titles to sentence case (#2455)

2026-01-09 14:48:06 -05:00 · 2023-09-18 12:26:31 -06:00
parent 772b51a7d2
commit d0d4eed1a6
128 changed files with 1857 additions and 2019 deletions
--- a/docs/reference/compilers-tools/dev-tools.md
+++ b/docs/reference/compilers-tools/dev-tools.md
@@ -1 +1 @@
-# Development Tools
+# Development tools
--- a/docs/reference/compilers-tools/index.md
+++ b/docs/reference/compilers-tools/index.md
@@ -9,7 +9,6 @@ The AMD Debugger API is a library that provides all the support necessary for a
 debugger and other tools to perform low level control of the execution and
 inspection of execution state of AMD's commercially available GPU architectures.

-* {doc}`Documentation <rocdbgapi:index>`
 * [GitHub](https://github.com/ROCm-Developer-Tools/ROCdbgapi/)

 :::
@@ -21,15 +20,12 @@ ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance
 computing on AMD GPUs and CPUs and supports various heterogeneous programming
 models such as HIP, OpenMP, and OpenCL.

-* [Documentation](../rocmcc/rocmcc.md)
-
 :::

 :::{grid-item-card} {doc}`ROCgdb <rocgdb:index>`

 This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.

-* {doc}`Documentation <rocgdb:index>`
 * [GitHub](https://github.com/ROCm-Developer-Tools/ROCgdb/)

 :::
@@ -38,7 +34,6 @@ This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU

 ROC profiler library. Profiling with performance counters and derived metrics. Library supports GFX8/GFX9. Hardware specific low-level performance analysis interface for profiling of GPU compute applications. The profiling includes hardware performance counters with complex performance metrics.

-* {doc}`Documentation <rocprofiler:rocprof>`
 * [GitHub](https://github.com/ROCm-Developer-Tools/rocprofiler/)

 :::
@@ -47,13 +42,12 @@ ROC profiler library. Profiling with performance counters and derived metrics. L

 Callback/Activity Library for Performance tracing AMD GPUs

-* {doc}`Documentation <roctracer:index>`
 * [GitHub](https://github.com/ROCm-Developer-Tools/roctracer)

 :::

 :::::

-## See Also
+## See also

-* [Compiler Disambiguation](../../conceptual/compiler-disambiguation.md)
+* [Compiler disambiguation](../../conceptual/compiler-disambiguation.md)
--- a/docs/reference/compilers-tools/management-tools.md
+++ b/docs/reference/compilers-tools/management-tools.md
@@ -1,4 +1,4 @@
-# Management Tools
+# Management tools

 :::::{grid} 1 1 3 3
 :gutter: 1
@@ -14,9 +14,8 @@ The AMD System Management Interface Library, or AMD SMI library, is a C library

 :::{grid-item-card} {doc}`ROCm SMI LIB <rocm_smi_lib:index>`

-This tool acts as a command line interface for manipulating and monitoring the AMD GPU kernel, and is intended to replace and deprecate the existing `rocm_smi.py` CLI tool. It uses `ctypes` to call the `rocm_smi_lib` API.
+This tool acts as a command-line interface (CLI) for manipulating and monitoring the AMD GPU kernel, and is intended to replace and deprecate the existing `rocm_smi.py` CLI tool. It uses `ctypes` to call the `rocm_smi_lib` API.

-* {doc}`Documentation <rocm_smi_lib:index>`
 * [GitHub](https://github.com/RadeonOpenCompute/rocm_smi_lib)
 * [Examples](https://github.com/RadeonOpenCompute/rocm_smi_lib/tree/master/python_smi_tools)

--- a/docs/reference/compilers-tools/validation-tools.md
+++ b/docs/reference/compilers-tools/validation-tools.md
@@ -1,4 +1,4 @@
-# Validation Tools
+# Validation tools

 :::::{grid} 1 1 2 2
 :gutter: 1
@@ -7,7 +7,6 @@

 The ROCm Validation Suite is a system administrator’s and cluster manager's tool for detecting and troubleshooting common problems affecting AMD GPU(s) running in a high-performance computing environment, enabled using the ROCm software stack on a compatible platform.

-* {doc}`Documentation <rocmvalidationsuite:index>`
 * [GitHub](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite)
 * [Changelog](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/blob/master/CHANGELOG.md)

@@ -17,7 +16,6 @@ The ROCm Validation Suite is a system administrator’s and cluster manager's to

 TransferBench is a simple utility capable of benchmarking simultaneous transfers between user-specified devices (CPUs/GPUs).

-* {doc}`Documentation <transferbench:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/TransferBench/)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/TransferBench/blob/develop/CHANGELOG.md)
 * {doc}`transferbench:examples/index`
--- a/docs/reference/hip.md
+++ b/docs/reference/hip.md
@@ -3,17 +3,16 @@
 HIP is both AMD's GPU programming language extension and the GPU runtime. This
 page introduces the HIP runtime and other HIP libraries and tools.

-## HIP Runtime
+## HIP runtime

 :::::{grid} 1 1 2 2
 :gutter: 1

-:::{grid-item-card} {doc}`HIP Runtime <hip:index>`
+:::{grid-item-card} {doc}`HIP runtime <hip:index>`

-The HIP Runtime is used to enable GPU acceleration for all HIP language based
+The HIP runtime is used to enable GPU acceleration for all HIP language based
 products.

-* {doc}`Documentation <hip:index>`
 * [GitHub](https://github.com/ROCm-Developer-Tools/HIP)
 * [Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)

@@ -28,10 +27,9 @@ products.

 :::{grid-item-card} {doc}`HIPIFY <hipify:index>`

-HIPIFY assists with porting applications from based on CUDA to the HIP Runtime.
+HIPIFY assists with porting applications from based on CUDA to the HIP runtime.
 Supported CUDA APIs are documented here as well.

-* {doc}`Documentation <hipify:index>`
 * [GitHub](https://github.com/ROCm-Developer-Tools/HIPIFY/)
 * [Changelog](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/CHANGELOG.md)

--- a/docs/reference/index.md
+++ b/docs/reference/index.md
@@ -1,6 +1,6 @@
 # Reference material

-## ROCm Software Groups
+## ROCm software groups

 :::::{grid} 1 1 2 2
 :gutter: 1
@@ -10,14 +10,13 @@

 HIP is both AMD's GPU programming language extension and the GPU runtime.

-* {doc}`HIP <hip:index>`
 * [HIP Examples](https://github.com/amd/rocm-examples/tree/develop/HIP-Basic)
 * {doc}`HIPIFY <hipify:index>`

 :::

 :::{grid-item-card}
-**[Math Libraries](./libraries/gpu-libraries/math.md)**
+**[Math libraries](./libraries/gpu-libraries/math.md)**

 HIP Math Libraries support the following domains:

@@ -28,7 +27,7 @@ HIP Math Libraries support the following domains:
 :::

 :::{grid-item-card}
-**[C++ Primitive Libraries](./libraries/gpu-libraries/c++primitives.md)**
+**[C++ primitive libraries](./libraries/gpu-libraries/c++primitives.md)**

 ROCm template libraries for C++ primitives and algorithms are as follows:

@@ -39,7 +38,7 @@ ROCm template libraries for C++ primitives and algorithms are as follows:

 :::

-:::{grid-item-card} [Communication Libraries](./libraries/gpu-libraries/communication.md)
+:::{grid-item-card} [Communication libraries](./libraries/gpu-libraries/communication.md)
 Inter and intra-node communication is supported by the following projects:

 * {doc}`RCCL <rccl:index>`
@@ -47,26 +46,26 @@ Inter and intra-node communication is supported by the following projects:
 :::

 :::{grid-item-card}
-**[Artificial intelligence](../rocm-ai.md)**
+**Artificial intelligence**

-Libraries related to AI.
+Libraries related to artificial intelligence.

 * {doc}`MIOpen <miopen:index>`
 * {doc}`Composable Kernel <composable_kernel:index>`
 * {doc}`MIGraphX <amdmigraphx:index>`
 * {doc}`MIVisionX <mivisionx:README>`
-* {doc}`rocAL <rocal:README>`
+* {doc}`ROCm Augmentation Library (rocAL) <rocal:README>`
 :::

 :::{grid-item-card}
 **[OpenMP](./openmp/openmp.md)**

-* [OpenMP Support Guide](./openmp/openmp.md)
+* [OpenMP support guide](./openmp/openmp.md)

 :::

 :::{grid-item-card}
-**[Compilers and Tools](./compilers-tools/index.md)**
+**[Compilers and tools](./compilers-tools/index.md)**

 * [ROCmCC](./rocmcc/rocmcc.md)
 * {doc}`ROCdbgapi <rocdbgapi:index>`
@@ -77,7 +76,7 @@ Libraries related to AI.
 :::

 :::{grid-item-card}
-**[Management Tools](./compilers-tools/management-tools.md)**
+**[Management tools](./compilers-tools/management-tools.md)**

 * {doc}`AMD SMI <amdsmi:index>`
 * {doc}`ROCm SMI <rocm_smi_lib:index>`
@@ -86,14 +85,14 @@ Libraries related to AI.
 :::

 :::{grid-item-card}
-**[Validation Tools](./compilers-tools/validation-tools.md)**
+**[Validation tools](./compilers-tools/validation-tools.md)**

 * {doc}`ROCm Validation Suite <rocmvalidationsuite:index>`
 * {doc}`TransferBench <transferbench:index>`

 :::

-:::{grid-item-card} **GPU Architectures**
+:::{grid-item-card} **GPU architectures**

 * [AMD Instinct MI200](../conceptual/gpu-arch/mi250.md)
 * [AMD Instinct MI100](../conceptual/gpu-arch/mi100.md)
--- a/docs/reference/libraries/ai-libraries.md
+++ b/docs/reference/libraries/ai-libraries.md
@@ -1,4 +1,4 @@
-# AI Libraries
+# AI libraries

 ::::{grid} 1 1 2 2
 :gutter: 1
@@ -7,7 +7,6 @@

 AMD's library for high performance machine learning primitives.

-* {doc}`Documentation <miopen:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/MIOpen)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/CHANGELOG.md)

@@ -17,7 +16,6 @@ AMD's library for high performance machine learning primitives.

 Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

-* {doc}`Documentation <composable_kernel:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/composable_kernel)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/composable_kernel/blob/develop/CHANGELOG.md)

@@ -27,7 +25,6 @@ Composable Kernel: Performance Portable Programming Model for Machine Learning T

 AMD MIGraphX is AMD's graph inference engine that accelerates machine learning model inference.

-* {doc}`Documentation <amdmigraphx:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/CHANGELOG.md)

--- a/docs/reference/libraries/gpu-libraries/c++primitives.md
+++ b/docs/reference/libraries/gpu-libraries/c++primitives.md
@@ -1,4 +1,4 @@
-# C++ Primitive Libraries
+# C++ primitive libraries

 ROCm template libraries for algorithms are as follows:

@@ -11,7 +11,6 @@ rocPRIM is an AMD GPU optimized template library of algorithm primitives, like
 transforms, reductions, scans, etc. It also serves as a common back-end for
 similar libraries found inside ROCm.

-* {doc}`Documentation <rocprim:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/rocPRIM/)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/CHANGELOG.md)
 * [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocPRIM)
@@ -24,7 +23,6 @@ rocThrust is a template library of algorithm primitives with a Thrust-compatible
 interface. Their CPU back-ends are identical, while the GPU back-end calls into
 rocPRIM.

-* {doc}`Documentation <rocthrust:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/rocThrust)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/rocThrust/blob/develop/CHANGELOG.md)
 * [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocThrust)
@@ -36,7 +34,6 @@ rocPRIM.
 hipCUB is a template library of algorithm primitives with a CUB-compatible
 interface. It's back-end is rocPRIM.

-* {doc}`Documentation <hipcub:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/hipCUB)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/CHANGELOG.md)
 * [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/hipCUB)
@@ -49,7 +46,6 @@ hipTensor is AMD's C++ library for accelerating tensor primitives
 based on the composable kernel library,
 through general purpose kernel languages, like HIP C++.

-* {doc}`Documentation <hiptensor:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/hipTensor)

 :::
--- a/docs/reference/libraries/gpu-libraries/communication.md
+++ b/docs/reference/libraries/gpu-libraries/communication.md
@@ -1,4 +1,4 @@
-# Communication Libraries
+# Communication libraries

 :::::{grid} 1 1 1 1
 :gutter: 1
@@ -10,7 +10,6 @@ implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, gather,
 The collective operations are implemented using ring and tree algorithms and have been optimized for
 throughput and latency.

-* {doc}`Documentation <rccl:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/rccl)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/CHANGELOG.md)
 * [Examples](https://github.com/ROCmSoftwarePlatform/rccl/tree/develop/tools)
--- a/docs/reference/libraries/gpu-libraries/math-fft.md
+++ b/docs/reference/libraries/gpu-libraries/math-fft.md
@@ -1,6 +1,6 @@
 # Fast Fourier transforms

-ROCm libraries for FFT are as follows:
+ROCm libraries for fast Fourier transforms (FFTs) are as follows:

 :::::{grid} 1 1 2 2
 :gutter: 1
--- a/docs/reference/libraries/gpu-libraries/math-linear-algebra.md
+++ b/docs/reference/libraries/gpu-libraries/math-linear-algebra.md
@@ -1,4 +1,4 @@
-# Linear Algebra Libraries
+# Linear algebra libraries

 ROCm libraries for linear algebra are as follows:

@@ -9,7 +9,6 @@ ROCm libraries for linear algebra are as follows:

 `rocBLAS` is an AMD GPU optimized library for BLAS (Basic Linear Algebra Subprograms).

-* {doc}`Documentation <rocblas:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/rocBLAS)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/CHANGELOG.md)
 * [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocBLAS)
@@ -22,7 +21,6 @@ ROCm libraries for linear algebra are as follows:
 via `rocBLAS` and `rocSOLVER`. `hipBLAS` allows for a common interface for other GPU
 BLAS libraries.

-* {doc}`Documentation <hipblas:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/hipBLAS)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/CHANGELOG.md)

@@ -35,7 +33,6 @@ flexible API and extends functionalities beyond traditional BLAS library.
 `hipBLASLt` is exposed APIs in HIP programming language with an underlying
 optimized generator as a back-end kernel provider.

-* {doc}`Documentation <hipblaslt:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/hipBLASLt)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/hipBLASLt/blob/develop/CHANGELOG.md)

@@ -47,7 +44,6 @@ optimized generator as a back-end kernel provider.
 fine-grained parallelism on top of AMD's ROCm runtime and toolchains, targeting
 modern CPU and GPU platforms.

-* {doc}`Documentation <rocalution:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/rocALUTION)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/rocALUTION/blob/develop/CHANGELOG.md)

@@ -58,7 +54,6 @@ modern CPU and GPU platforms.
 `rocWMMA` provides an API to break down mixed precision matrix multiply-accumulate
 (MMA) problems into fragments and distributes these over GPU wavefronts.

-* {doc}`Documentation <rocwmma:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/rocWMMA)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/CHANGELOG.md)

@@ -68,7 +63,6 @@ modern CPU and GPU platforms.

 `rocSOLVER` provides a subset of LAPACK (Linear Algebra Package) functionality on the ROCm platform.

-* {doc}`Documentation <rocsolver:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/rocSOLVER)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/CHANGELOG.md)

@@ -79,7 +73,6 @@ modern CPU and GPU platforms.
 `hipSOLVER` is a LAPACK marshalling library supporting both `rocSOLVER` and `cuSOLVER`
 as backends whilst exporting a unified interface.

-* {doc}`Documentation <hipsolver:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/hipSOLVER)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CHANGELOG.md)

@@ -89,7 +82,6 @@ as backends whilst exporting a unified interface.

 `rocSPARSE` is a library to provide BLAS for sparse computations.

-* {doc}`Documentation <rocsparse:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/rocSPARSE)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/CHANGELOG.md)

@@ -100,7 +92,6 @@ as backends whilst exporting a unified interface.
 `hipSPARSE` is a marshalling library to provide sparse BLAS functionality,
 supporting both `rocSPARSE` and `cuSPARSE` as backends.

-* {doc}`Documentation <hipsparse:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/hipSPARSE)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CHANGELOG.md)

@@ -111,7 +102,6 @@ supporting both `rocSPARSE` and `cuSPARSE` as backends.
 `hipSPARSE` is a marshalling library to provide sparse BLAS functionality,
 supporting both `rocSPARSELt` and `cuSPARSELt` as backends.

-* {doc}`Documentation <hipsparselt:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/hipSPARSELt)

 :::
--- a/docs/reference/libraries/gpu-libraries/math.md
+++ b/docs/reference/libraries/gpu-libraries/math.md
@@ -16,7 +16,7 @@ at compile-time of the hipLIB in question. For dynamic dispatch between vendor i
 :gutter: 1

 :::{grid-item-card}
-**[Linear Algebra Libraries](./math-linear-algebra.md)**
+**[Linear algebra libraries](./math-linear-algebra.md)**

 * {doc}`rocBLAS <rocblas:index>`
 * {doc}`hipBLAS <hipblas:index>`
@@ -40,7 +40,7 @@ at compile-time of the hipLIB in question. For dynamic dispatch between vendor i
 :::

 :::{grid-item-card}
-**[Random Numbers](./rand.md)**
+**[Random numbers](./rand.md)**

 * {doc}`rocRAND <rocrand:index>`
 * {doc}`hipRAND <hiprand:index>`
--- a/docs/reference/libraries/gpu-libraries/rand.md
+++ b/docs/reference/libraries/gpu-libraries/rand.md
@@ -1,13 +1,12 @@
-# Random Numbers
+# Random numbers

 :::::{grid} 1 1 2 2
 :gutter: 1

 :::{grid-item-card} {doc}`rocRAND <rocrand:index>`

-rocRAND is an AMD GPU optimized library for pseudo-random number generators (PRNG).
+rocRAND is an AMD GPU optimized library for pseudorandom number generators.

-* {doc}`Documentation <rocrand:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/rocRAND/)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/rocRAND/blob/develop/CHANGELOG.md)
 * [Examples](https://github.com/amd/rocm-examples/tree/develop/Libraries/rocRAND)
@@ -16,11 +15,10 @@ rocRAND is an AMD GPU optimized library for pseudo-random number generators (PRN

 :::{grid-item-card} {doc}`hipRAND <hiprand:index>`

-hipRAND is a compatibility layer for GPU accelerated pseudo-random number
-generation (PRNG) optimized for AMD GPUs using rocRAND. hipRAND allows for a
-common interface for other non AMD GPU PRNG libraries.
+hipRAND is a compatibility layer for GPU accelerated pseudorandom number generation optimized for
+AMD GPUs using rocRAND. hipRAND allows for a common interface for other non AMD GPU
+pseudorandom number generation libraries.

-* {doc}`Documentation <hiprand:index>`
 * [GitHub](https://github.com/ROCmSoftwarePlatform/hipRAND/)
 * [Changelog](https://github.com/ROCmSoftwarePlatform/hipRAND/blob/develop/CHANGELOG.md)

--- a/docs/reference/openmp/openmp.md
+++ b/docs/reference/openmp/openmp.md
@@ -1,4 +1,4 @@
-# OpenMP Support in ROCm
+# OpenMP support in ROCm

 ## Introduction

@@ -14,9 +14,7 @@ this ROCm release. See the list of supported GPUs for [Linux](../../about/compat
 The ROCm OpenMP compiler is implemented using LLVM compiler technology.
 The following image illustrates the internal steps taken to translate a user’s application into an executable that can offload computation to the AMDGPU. The compilation is a two-pass process. Pass 1 compiles the application to generate the CPU code and Pass 2 links the CPU code to the AMDGPU device code.

-```{figure} ../../data/reference/openmp/openmp-toolchain.svg
-:name: openmp-toolchain
-```
+![OpenMP toolchain](../../data/reference/openmp/openmp-toolchain.svg "OpenMP toolchain")

 ### Installation

@@ -31,7 +29,7 @@ bin: Compilers (`flang` and `clang`) and other binaries.
 * lib: Libraries including those required for target offload.
 * lib-debug: Debug versions of the above libraries.

-## OpenMP: Usage
+## OpenMP: usage

 The example programs can be compiled and run by pointing the environment
 variable `ROCM_PATH` to the ROCm install directory.
@@ -44,10 +42,10 @@ cd $ROCM_PATH/share/openmp-extras/examples/openmp/veccopy
 sudo make run
 ```

-:::{note}
+```{note}
 `sudo` is required since we are building inside the `/opt` directory.
 Alternatively, copy the files to your home directory first.
-:::
+```

 The above invocation of Make compiles and runs the program. Note the options
 that are required for target offload from an OpenMP program:
@@ -56,15 +54,13 @@ that are required for target offload from an OpenMP program:
 -fopenmp --offload-arch=<gpu-arch>
 ```

-:::{note}
+```{note}
 The compiler also accepts the alternative offloading notation:

 ```bash
-fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=<gpu-arch> 
+-fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=<gpu-arch>
 ```

-:::
-
 Obtain the value of `gpu-arch` by running the following command:

 ```bash
@@ -113,7 +109,7 @@ code compiled with AOMP:

 For more details on `rocprof`, refer to the {doc}`ROCProfilerV1 User Manual <rocprofiler:rocprofv1>`.

-### Using Tracing Options
+### Using tracing options

 **Prerequisite:** When using the `--sys-trace` option, compile the OpenMP
 program with:
@@ -139,7 +135,7 @@ HSA calls.

 For more details on tracing, refer to the {doc}`ROCProfilerV1 User Manual <rocprofiler:rocprofv1>`.

-### Environment Variables
+### Environment variables

 :::{table}
 :widths: auto
@@ -154,14 +150,14 @@ For more details on tracing, refer to the {doc}`ROCProfilerV1 User Manual <rocpr
 | `OMPX_FORCE_SYNC_REGIONS` | To force the runtime to execute all operations synchronously, i.e., wait for an operation to complete immediately. This affects data transfers and kernel execution. While it is mainly designed for debugging, it may have a minor positive effect on performance in certain situations. |
 :::

-## OpenMP: Features
+## OpenMP: features

 The OpenMP programming model is greatly enhanced with the following new features
 implemented in the past releases.

 (openmp_usm)=

-### Asynchronous Behavior in OpenMP Target Regions
+### Asynchronous behavior in OpenMP target regions

 * Controlling Asynchronous Behavior

@@ -176,7 +172,7 @@ The `libomptarget` plugin for GPU offloading allows creation of separate configu

 Implicit asynchronous execution of single target region enables parallel memory copy invocations.

-### Unified Shared Memory
+### Unified shared memory

 Unified Shared Memory (USM) provides a pointer-based approach to memory
 management. To implement USM, fulfill the following system requirements along
@@ -189,7 +185,7 @@ with Xnack capability.
 * Xnack, as USM support can only be tested with applications compiled with Xnack
  capability

-#### Xnack Capability
+#### Xnack capability

 When enabled, Xnack capability allows GPU threads to access CPU (system) memory,
 allocated with OS-allocators, such as `malloc`, `new`, and `mmap`. Xnack must be
@@ -223,7 +219,7 @@ When Xnack support is not needed:

 * At runtime, set the `HSA_XNACK` environment variable to 0.

-#### Unified Shared Memory Pragma
+#### Unified shared memory pragma

 This OpenMP pragma is available on MI200 through `xnack+` support.

@@ -277,7 +273,7 @@ to by “b” are in coarse-grain memory during and after the execution of the
 target region. This is accomplished in the OpenMP runtime library with calls to
 the ROCr runtime to set the pages pointed by “b” as coarse grain.

-### OMPT Target Support
+### OMPT target support

 The OpenMP runtime in ROCm implements a subset of the OMPT device APIs, as
 described in the OpenMP specification document. These APIs allow first-party
@@ -302,7 +298,7 @@ The file `veccopy-ompt-target-tracing.c` simulates how a tool initiates device
 activity tracing. The file `callbacks.h` shows the callbacks registered and
 implemented by the tool.

-### Floating Point Atomic Operations
+### Floating point atomic operations

 The MI200-series GPUs support the generation of hardware floating-point atomics
 using the OpenMP atomic pragma. The support includes single- and
@@ -326,10 +322,10 @@ double a = 0.0;
 a = a + 1.0;
 ```

-:::{note}
+```{note}
 `AMD_unsafe_fp_atomics` is an alias for `AMD_fast_fp_atomics`, and
 `AMD_safe_fp_atomics` is implemented with a compare-and-swap loop.
-:::
+```

 To disable the generation of fast floating-point atomic instructions at the file
 level, build using the option `-msafe-fp-atomics` or use a hint clause on a
@@ -362,14 +358,14 @@ double b = 0.0;
 b = b + 1.0;
 ```

-### Address Sanitizer (ASan) Tool
+### AddressSanitizer tool

-Address Sanitizer is a memory error detector tool utilized by applications to
+AddressSanitizer (ASan) is a memory error detector tool utilized by applications to
 detect various errors ranging from spatial issues such as out-of-bound access to
 temporal issues such as use-after-free. The AOMP compiler supports ASan for AMD
 GPUs with applications written in both HIP and OpenMP.

-**Features Supported on Host Platform (Target x86_64):**
+**Features supported on host platform (Target x86_64):**

 * Use-after-free
 * Buffer overflows
@@ -380,12 +376,12 @@ GPUs with applications written in both HIP and OpenMP.
 * Use-after-scope
 * Initialization order bugs

-**Features Supported on AMDGPU Platform (`amdgcn-amd-amdhsa`):**
+**Features supported on AMDGPU platform (`amdgcn-amd-amdhsa`):**

 * Heap buffer overflow
 * Global buffer overflow

-**Software (Kernel/OS) Requirements:** Unified Shared Memory support with Xnack
+**Software (kernel/OS) requirements:** Unified Shared Memory support with Xnack
 capability. See the section on [Unified Shared Memory](#unified-shared-memory)
 for prerequisites and details on Xnack.

@@ -436,7 +432,7 @@ for(int i=0; i<N; i++){
 See the complete sample code for global buffer overflow
 [here](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/examples/tools/asan/global_buffer_overflow/openmp/vecadd-GBO.cpp).

-### Clang Compiler Option for Kernel Optimization
+### Clang compiler option for kernel optimization

 You can use the clang compiler option `-fopenmp-target-fast` for kernel optimization if certain constraints implied by its component options are satisfied. `-fopenmp-target-fast` enables the following options:

@@ -448,7 +444,7 @@ You can use the clang compiler option `-fopenmp-target-fast` for kernel optimiza

 * `-O3` if no `-O*` is specified by the user.

-### Specialized Kernels
+### Specialized kernels

 Clang will attempt to generate specialized kernels based on compiler options and OpenMP constructs. The following specialized kernels are supported:

@@ -466,14 +462,14 @@ To enable the generation of specialized kernels, follow these guidelines:

 * To disable specialized kernel generation, use `-fno-openmp-target-ignore-env-vars`.

-#### No-Loop Kernel Generation
+#### No-loop kernel generation

 The No-loop kernel generation feature optimizes the compiler performance by generating a specialized kernel for certain OpenMP target constructs such as target teams distribute parallel for. The specialized kernel generation feature assumes every thread executes a single iteration of the user loop, which leads the runtime to launch a total number of GPU threads equal to or greater than the iteration space size of the target region loop. This allows the compiler to generate code for the loop body without an enclosing loop, resulting in reduced control-flow complexity and potentially better performance.

-#### Big-Jump-Loop Kernel Generation
+#### Big-jump-loop kernel generation

-A No-Loop kernel is not generated if the OpenMP teams construct uses a `num_teams` clause. Instead, the compiler attempts to generate a different specialized kernel called the Big-Jump-Loop kernel. The compiler launches the kernel with a grid size determined by the number of teams specified by the OpenMP `num_teams` clause and the `blocksize` chosen either by the compiler or specified by the corresponding OpenMP clause.
+A No-loop kernel is not generated if the OpenMP teams construct uses a `num_teams` clause. Instead, the compiler attempts to generate a different specialized kernel called the Big-Jump-Loop kernel. The compiler launches the kernel with a grid size determined by the number of teams specified by the OpenMP `num_teams` clause and the `blocksize` chosen either by the compiler or specified by the corresponding OpenMP clause.

-#### Xteam Optimized Reduction Kernel Generation
+#### Cross-team optimized reduction kernel generation

-If the OpenMP construct has a reduction clause, the compiler attempts to generate optimized code by utilizing efficient Xteam communication. New APIs for Xteam reduction are implemented in the device runtime and are automatically generated by clang.
+If the OpenMP construct has a reduction clause, the compiler attempts to generate optimized code by utilizing efficient cross-team communication. New APIs for cross-team reduction are implemented in the device runtime and are automatically generated by clang.
--- a/docs/reference/rocmcc/rocmcc.md
+++ b/docs/reference/rocmcc/rocmcc.md
@@ -1,6 +1,6 @@
-# Compiler Reference Guide
+# Compiler reference guide

-## Introduction to Compiler Reference Guide
+## Introduction to compiler reference guide

 ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance
 computing on AMD GPUs and CPUs and supports various heterogeneous programming
@@ -22,7 +22,7 @@ For more details, see:
 * AMD GPU usage: [llvm.org/docs/AMDGPUUsage.html](https://llvm.org/docs/AMDGPUUsage.html)
 * Releases and source: <https://github.com/RadeonOpenCompute/llvm-project>

-### ROCm Compiler Interfaces
+### ROCm compiler interfaces

 ROCm currently provides two compiler interfaces for compiling HIP programs:

@@ -52,11 +52,11 @@ The major differences between `hipcc` and `amdclang++` are listed below:
 | Source code location               | <https://github.com/ROCm-Developer-Tools/HIPCC>                                                                          | <https://github.com/RadeonOpenCompute/llvm-project> |
 ::::

-## Compiler Options and Features
+## Compiler options and features

 This chapter discusses compiler options and features.

-### AMD GPU Compilation
+### AMD GPU compilation

 This section outlines commonly used compiler flags for `hipcc` and `amdclang++`.
 :::{option} -x hip
@@ -111,7 +111,7 @@ This section outlines commonly used compiler flags for `hipcc` and `amdclang++`.
  Generates relocatable device code, also known as separate compilation mode.
 :::

-### AMD Optimizations for Zen Architectures
+### AMD optimizations for zen architectures

 The CPU compiler optimizations described in this chapter originate from the AMD
 Optimizing C/C++ Compiler (AOCC) compiler. They are available in ROCmCC if the
@@ -134,12 +134,12 @@ The `-famd-opt` flag is useful when a user wants to build with the proprietary
 optimization compiler and not have to depend on setting any of the other
 proprietary optimization flags.

-:::{note}
+```{note}
 `-famd-opt` can be used in addition to the other proprietary CPU optimization
 flags. The table of optimizations below implicitly enables the invocation of the
 AMD proprietary optimizations compiler, whereas the `-famd-opt` flag requires
 this to be handled explicitly.
-:::
+```

 #### `-fstruct-layout=[1,2,3,4,5,6,7]`

@@ -213,7 +213,7 @@ This is an experimental option to generate non-temporal store instruction for
 array accesses in a loop, whose iteration count cannot be determined at compile
 time. In this case, the compiler assumes the iteration count to be huge.

-#### Optimizations Through Driver `-mllvm <options>`
+#### Optimizations through driver `-mllvm <options>`

 The following optimization options must be invoked through driver
 `-mllvm <options>`:
@@ -255,12 +255,12 @@ loop. The heuristic can be controlled with the following options:

  Where, `n` is a positive integer and higher value of `<n>` facilitates more unswitching.

-   :::{note}
+   ```{note}
   These options may facilitate more unswitching under some workloads. Since
   loop-unswitching inherently leads to code bloat, facilitating more
   unswitching may significantly increase the code size. Hence, it may also lead
   to longer compilation times.
-   :::
+   ```

 ##### `-enable-strided-vectorization`

@@ -430,7 +430,7 @@ such as loop transformations and other optimizations requiring de-linearized
 index expressions should use the Hz option. This option has no impact on any
 other aspects of the Flang front end.

-### Inline ASM Statements
+### Inline ASM statements

 Inline assembly (ASM) statements allow a developer to include assembly
 instructions directly in either host or device code. While the ROCm compiler
@@ -451,18 +451,18 @@ supports ASM statements, their use is not recommended for the following reasons:
 * Writing correct ASM statements is often difficult; we strongly recommend
  thorough testing of any use of ASM statements.

-:::{note}
+```{note}
 For developers who choose to include ASM statements in the code, AMD is
 interested in understanding the use case and appreciates feedback at
 [https://github.com/RadeonOpenCompute/ROCm/issues](https://github.com/RadeonOpenCompute/ROCm/issues)
-:::
+```

-### Miscellaneous OpenMP Compiler Features
+### Miscellaneous OpenMP compiler features

 This section discusses features that have been added or enhanced in the OpenMP
 compiler.

-#### Offload-arch Tool
+#### Offload-arch tool

 An LLVM library and tool that is used to query the execution capability of the
 current system as well as to query requirements of a binary file. It is used by
@@ -534,7 +534,7 @@ There are symbolic link aliases `amdgpu-offload-arch` and `nvidia-arch` for
 These aliases are useful in determining whether architecture-specific tests
 should be run or to conditionally load architecture-specific software.

-#### Command-Line Simplification Using `offload-arch` Flag
+#### Command-line simplification using `offload-arch` flag

 Legacy mechanism of specifying offloading target for OpenMP involves using three
 flags, `-fopenmp-targets`, `-Xopenmp-target`, and `-march`. The first two flags
@@ -563,7 +563,7 @@ clang -fopenmp -target x86_64-linux-gnu \
 To ensure backward compatibility, both styles are supported. This option is
 compatible with target ID support and multi-image fat binaries.

-#### Target ID Support for OpenMP
+#### Target ID support for OpenMP

 The ROCmCC compiler supports specification of target features along with the GPU
 name while specifying a target offload device in the command line, using
@@ -603,7 +603,7 @@ to linker using `-plugin-opt=-mattr` flag. This feature is compatible with
 offload-arch command-line option and multi-image binaries for multiple
 architectures.

-#### Multi-image Fat Binary for OpenMP
+#### Multi-image fat binary for OpenMP

 The ROCmCC compiler is enhanced to generate binaries that can contain
 heterogenous images. This heterogeneity could be in terms of:
@@ -656,7 +656,7 @@ of target triple and the target GPU (along with the associated target features).
 modified to query this structure to identify a compatible image based on the
 capability of the current system.

-#### Unified Shared Memory (USM)
+#### Unified shared memory (USM)

 The following OpenMP pragma is available on MI200, and it must be executed with
 `xnack+` support.
@@ -668,7 +668,7 @@ omp requires unified_shared_memory
 For more details on USM refer to the {ref}`openmp_usm` section of the OpenMP
 Guide.

-### Support Status of Other Clang Options
+### Support status of other Clang options

 The following table lists the other Clang options and their support status.

@@ -679,8 +679,8 @@ The following table lists the other Clang options and their support status.
 :widths: auto
 :align: center

-| **Option**                               | **Support Status** | **Description**                                                                                                                |
-|------------------------------------------|:------------------:|--------------------------------------------------------------------------------------------------------------------------------|
+| **Option** | **Support Status** | **Description** |
+|--------------|:-----------------------:|-------------------------|
 | `-###`                                   | Supported          | Prints (but does not run) the commands to run for this compilation                                                             |
 | `--analyzer-output <value>`              | Supported          | "Static analyzer report output format (`html|plist|plist-multi-file|plist-html|sarif|text`)"                                   |
 | `--analyze`                              | Supported          | Runs the static analyzer                                                                                                       |
@@ -1184,7 +1184,7 @@ The following table lists the other Clang options and their support status.
 |-isysroot  \<dir\>|Supported|Sets the system root directory (usually /)|
 |-isystem-after  \<directory\>|Supported|Adds the directory to end of the SYSTEM include search path|
 |-isystem  \<directory\>|Supported|Adds the directory to SYSTEM include search path|
- |-ivfsoverlay  \<value\>|Supported|Overlays the virtual filesystem described by the specified file over the real file system|
+ |-ivfsoverlay  \<value\>|Supported|Overlays the virtual file system described by the specified file over the real file system|
 |-iwithprefixbefore  \<dir\>|Supported|Sets the directory to include search path with prefix|
 |-iwithprefix  \<dir\>|Supported|Sets the directory to SYSTEM include search path with prefix|
 |-iwithsysroot  \<directory\>|Supported|Adds directory to SYSTEM include search path; absolute paths are relative to -isysroot|