diff --git a/docs/hip_sdk_install_win/hip_sdk_install_win.md b/docs/hip_sdk_install_win/hip_sdk_install_win.md index 737c4d125..755d144ad 100644 --- a/docs/hip_sdk_install_win/hip_sdk_install_win.md +++ b/docs/hip_sdk_install_win/hip_sdk_install_win.md @@ -11,16 +11,18 @@ TODO: provide link to supported GPU guide. ## SDK Installation -Installation options are listed in Table 1. +Installation options are listed in {numref}`installation-components`. -| **Table 1. Components for Installation** ||| | -|:------------------------:|:----------------:|:------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +:::{table} Components for Installation +:name: installation-components | **HIP Components** | **Install Type** | **Display Driver** | **Install Options** | +|:------------------------:|:----------------:|:------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| | **HIP SDK Core** | **Full** | Adrenalin 22.40 | **Full:** Provides all AMD Software features and controls for gaming, recording, streaming, and tweaking your performance on your graphics hardware. | | **HIP Libraries** | **Full** | | **Minimal:** Provides only the basic controls for AMD Software features and does not include advanced features such as performance tweaking or recording and capturing content.| | **HIP Runtime Compiler** | **Full** | | **Driver Only:** Provides no user interface for AMD Software features. | | **Ray Tracing** | **Full** | | **Do Not Install** | | **BitCode Profiler** | **Full** | | | +::: TODO: describe each installation option. @@ -30,26 +32,32 @@ The AMD HIP SDK Installer manages the installation and uninstallation process of HIP SDK for Windows. This includes system configuration checks, installing components, and installing the display driver. -To launch the AMD HIP SDK Installer, click the **Setup** icon shown in Figure 1. -The installer will begin to load and detect your system's configuration and -compatibility, as shown in Figure 2. A completely loaded AMD HIP SDK Installer -window will appear, as shown in Figure 3. +To launch the AMD HIP SDK Installer, click the **Setup** icon shown in +{numref}`setup-icon`. The installer will begin to load and detect your system's +configuration and compatibility, as shown in {numref}`loading-window`. A +completely loaded AMD HIP SDK Installer window will appear, as shown in {numref}`installer-window`. -| ![Setup](../data/hip_sdk_install_win/Setup-Icon.png) | -|:----------------------------------------------------:| -| **Figure 1. Setup Icon** | +:::{figure} ../data/hip_sdk_install_win/Setup-Icon.png +:name: setup-icon +:alt: Setup +Setup Icon. +::: -| ![Loading Window](../data/hip_sdk_install_win/Loading-Window.png) | -|:-----------------------------------------------------------------:| -| **Figure 2. AMD HIP SDK Loading Window** | +:::{figure} ../data/hip_sdk_install_win/Loading-Window.png +:name: loading-window +:alt: Loading Window +AMD HIP SDK Loading Window +::: -| ![Installer Window](../data/hip_sdk_install_win/Installer-Window.png) | -|:---------------------------------------------------------------------:| -| **Figure 3. AMD HIP SDK Installer Window** | +:::{figure} ../data/hip_sdk_install_win/Installer-Window.png +:name: installer-window +:alt: Installer Window +AMD HIP SDK Installer Window +::: ### Installation Selections -By default, all components are selected for installation. Refer to Figure 3 for +By default, all components are selected for installation. Refer to {numref}`installer-window` for an instance when the Select All option is turned on. **Note** The Select All option only applies to the installation of HIP @@ -72,48 +80,62 @@ to [Installing Components](#installing-components). #### Deselect All To select individual component installs onto your system click **Deselect All** -in the upper right corner of the installer window, as seen in Figure 3. Figure 4 +in the upper right corner of the installer window, as seen in {numref}`installer-window`. {numref}`deselect-all` demonstrates the installer window once the installation components are all deselected. -| ![DeSelect All](../data/hip_sdk_install_win/DeSelectAll.png) | -|:------------------------------------------------------------:| -| **Figure 4. Deselect All Selection** | +:::{figure} ../data/hip_sdk_install_win/DeSelectAll.png +:name: deselect-all +:alt: DeSelect All +Deselect All Selection +::: #### HIP Components By default, each HIP component will be checked off for full installation, -Figures 4 through 8 demonstrate the options available to you when you click +{numref}`deselect-all` through {numref}`bitcode-profiler` demonstrate the options available to you when you click **Additional Options** under each component. -| **Table 2. Custom Selections for Installation** | | -|:------------------------------------------------------------------|:---------------------------------------------------- | +:::{table} Custom Selections for Installation +:name: custom-selections | **If:** | **Then:** | +|:------------------------------------------------------------------|:---------------------------------------------------- | | You intend to make custom selections for this installation | Skip to the section _Deselect All_. | | You do not intend to make custom selections for this installation | Continue to the section _AMD Display Driver_. | +::: **Note** You can manually select installation locations for the HIP SDK Core, as -shown in Figure 5. +shown in {numref}`hip-sdk-core`. -| ![HIP SDK Core](../data/hip_sdk_install_win/HIP-SDK-Core.png) | -|:-------------------------------------------------------------:| -| **Figure 5. HIP SDK Core Option** | +:::{figure} ../data/hip_sdk_install_win/HIP-SDK-Core.png +:name: hip-sdk-core +:alt: HIP SDK Core +HIP SDK Core Option +::: -| ![HIP Libraries](../data/hip_sdk_install_win/HIP-Libraries.png) | -|:---------------------------------------------------------------:| -| **Figure 6. HIP Libraries Option** | +:::{figure} ../data/hip_sdk_install_win/HIP-Libraries.png +:name: hip-libraries +:alt: Hip Libraries +Hip Libraries Option +::: -| ![HIP Runtime Compiler](../data/hip_sdk_install_win/HIP-Runtime-Compiler.png) | -|:-----------------------------------------------------------------------------:| -| **Figure 7. HIP Runtime Compiler Option** | +:::{figure} ../data/hip_sdk_install_win/HIP-Runtime-Compiler.png +:name: hip-rtc +:alt: HIP Runtime Compiler +HIP Runtime Compiler Option +::: -| ![HIP Ray Tracing](../data/hip_sdk_install_win/HIP-Ray-Tracing.png) | -|:-------------------------------------------------------------------:| -| **Figure 8. HIP Ray Tracing** | +:::{figure} ../data/hip_sdk_install_win/HIP-Ray-Tracing.png +:name: hip-ray-tracing +:alt: HIP Ray Tracing +HIP Ray Tracing +::: -| ![BitCode Profiler](../data/hip_sdk_install_win/BitCode-Profiler.png) | -|:---------------------------------------------------------------------:| -| **Figure 9. BitCode Profiler** | +:::{figure} ../data/hip_sdk_install_win/BitCode-Profiler.png +:name: bitcode-profiler +:alt: BitCode Profiler +BitCode Profiler +::: #### AMD Display Driver @@ -123,7 +145,8 @@ The AMD Display Driver offers three install types: - Minimal Install - Driver only -Table 3 describes the difference in each option shown in Figure 10. +{numref}`driver-options` describes the difference in each option shown in +{numref}`display-driver`. **Note** You must perform a system restart for a complete installation of the Display Driver. @@ -133,33 +156,43 @@ Display Driver. prior versions of AMD HIP SDK and drivers. You will not be able to roll back to previously installed drivers. -| **Table 3. Display Driver Install Options** | | -|:-------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +:::{table} **Table 3. Display Driver Install Options** +:name: driver-options + | **Install Option** | **Description** | +|---------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------| | **Full Install** | Provides all AMD Software features and controls for gaming, recording, streaming, and tweaking the performance on your graphics hardware. | | **Minimal Install** | Provides only the basic controls for AMD Software features and does not include advanced features such as performance tweaking or recording and capturing content. | | **Driver Only** | Provides no user interface for AMD Software features. | +::: -| ![Display Driver](../data/hip_sdk_install_win/AMD-Display-Driver.png) | -|:---------------------------------------------------------------------:| -| **Figure 10. AMD Display Driver Options** | +:::{figure} ../data/hip_sdk_install_win/AMD-Display-Driver.png +:name: display-driver +:alt: Display Driver +AMD Display Driver Options +::: ## Installing Components -Please wait for the installation to complete during as shown in Figure 11. +Please wait for the installation to complete during as shown in {numref}`installing`. -| ![Installing](../data/hip_sdk_install_win/Installation.png) | -|:-----------------------------------------------------------:| -| **Figure 11. Active Installation** | +:::{figure} ../data/hip_sdk_install_win/Installation.png +:name: installing +:alt: Installing +Active Installation +::: ### Installation Complete Once the installation is complete, the installer window may prompt you for a -system restart. Click **Restart** at the lower right corner, shown in Figure 12. +system restart. Click **Restart** at the lower right corner, shown in +{numref}`installation-complete` -| ![Installation Complete](../data/hip_sdk_install_win/Installation-Complete.png) | -|:-------------------------------------------------------------------------------:| -| **Figure 12. Installation Complete** | +:::{figure} ../data/hip_sdk_install_win/Installation-Complete.png +:name: installation-complete +:alt: Installation Complete +Installation Complete +::: ## Uninstallation @@ -174,6 +207,8 @@ uninstallation of the HIP SDK Core and drivers repeat the steps in the sections **Note** Selecting **Install** once ROCm has already installed results in its uninstallation. -| ![Uninstall](../data/hip_sdk_install_win/Uninstallation.png) | -|:------------------------------------------------------------:| -| **Figure 13. HIP SDK Uninstalling** | +:::{figure} ../data/hip_sdk_install_win/Uninstallation.png +:name: uninstall +:alt: Uninstall +HIP SDK Uninstalling +::: diff --git a/docs/reference/rocmcc/rocmcc.md b/docs/reference/rocmcc/rocmcc.md index 144f0e135..196c238f1 100644 --- a/docs/reference/rocmcc/rocmcc.md +++ b/docs/reference/rocmcc/rocmcc.md @@ -6,30 +6,27 @@ ROCmCC is a Clang/LLVM-based compiler. It is optimized for high-performance computing on AMD GPUs and CPUs and supports various heterogeneous programming models such as HIP, OpenMP, and OpenCL. -ROCmCC is made available via two packages: rocm-llvm and rocm-llvm-alt. The -differences are shown in this table: +ROCmCC is made available via two packages: `rocm-llvm` and `rocm-llvm-alt`. +The differences are listed in [the table below](rocm-llvm-vs-alt). -| **Table 1. rocm-llvm vs. rocm-llvm-alt** | | -|:---------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------:| +:::{table} Differences between `rocm-llvm` and `rocm-llvm-alt` +:name: rocm-llvm-vs-alt | **rocm-llvm** | **rocm-llvm-alt** | +|:---------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------:| | Installed by default when ROCm™ itself is installed | An optional package | | Provides an open-source compiler | Provides an additional closed-source compiler for users interested in additional CPU optimizations not available in rocm-llvm | +::: -For more details, follow this table: - -| **Table 2. Details Table** | | -|:---------------------------------------------:|:------------------------------------------------------------------------------------------------------:| -| **For** | **See** | -| The latest usage information for AMD GPU | [https://llvm.org/docs/AMDGPUUsage.html](https://llvm.org/docs/AMDGPUUsage.html) | -| Usage information for a specific ROCm release | [https://llvm.org/docs/AMDGPUUsage.html](https://llvm.org/docs/AMDGPUUsage.html) | -| Source code for rocm-llvm | [https://github.com/RadeonOpenCompute/llvm-project](https://github.com/RadeonOpenCompute/llvm-project) | +For more details, see: +- AMD GPU usage: [llvm.org/docs/AMDGPUUsage.html](https://llvm.org/docs/AMDGPUUsage.html) +- Releases and source: ### ROCm Compiler Interfaces ROCm currently provides two compiler interfaces for compiling HIP programs: -- /opt/rocm/bin/hipcc -- /opt/rocm/bin/amdclang++ +- `/opt/rocm/bin/hipcc` +- `/opt/rocm/bin/amdclang++` Both leverage the same LLVM compiler technology with the AMD GCN GPU support; however, they offer a slightly different user experience. The hipcc command-line @@ -42,15 +39,17 @@ build process. The major differences between hipcc and amdclang++ are listed below: -| **Table 3. Differences Between hipcc and amdclang++** | | | -|:-----------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|:--------------:| -| * | **hipcc** | **amdclang++** | -| Compiling HIP source files | Treats all source files as HIP language source files | Enables the HIP language support for files with the “.hip” extension or through the -x hip compiler option | -| Automatic GPU architecture detection | Auto-detects the GPUs available on the system and generates code for those devices when no GPU architecture is specified | Has AMD GCN gfx803 as the default GPU architecture. The --offload-arch compiler option may be used to target other GPU architectures | -| Finding a HIP installation | Finds the HIP installation based on its own location and its knowledge about the ROCm directory structure | First looks for HIP under the same parent directory as its own LLVM directory and then falls back on /opt/rocm. Users can use the --rocm-path option to instruct the compiler to use HIP from the specified ROCm installation. | -| Linking to the HIP runtime library | Is configured to automatically link to the HIP runtime from the detected HIP installation | Requires the --hip-link flag to be specified to link to the HIP runtime. Alternatively, users can use the -l`` -lamdhip64 option to link to a HIP runtime library. | -| Device function inlining | Inlines all GPU device functions, which provide greater performance and compatibility for codes that contain file scoped or device function scoped `__shared__` variables. However, it may increase compile time. | Relies on inlining heuristics to control inlining. Users experiencing performance or compilation issues with code using file scoped or device function scoped `__shared__` variables could try -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false to work around the issue. There are plans to address these issues with future compiler improvements. | -| Source code location | Developed at [https://github.com/ROCm-Developer-Tools/HIPCC](https://github.com/ROCm-Developer-Tools/HIPCC) | Developed at [https://github.com/RadeonOpenCompute/llvm-project](https://github.com/RadeonOpenCompute/llvm-project) | +::::{table} Differences between hipcc and amdclang++ +:name: hipcc-vs-amdclang +| * | **hipcc** | **amdclang++** | +|:----------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|:--------------:| +| Compiling HIP source files | Treats all source files as HIP language source files | Enables the HIP language support for files with the “.hip” extension or through the -x hip compiler option | +| Detecting GPU architecture | Auto-detects the GPUs available on the system and generates code for those devices when no GPU architecture is specified | Has AMD GCN gfx803 as the default GPU architecture. The --offload-arch compiler option may be used to target other GPU architectures | +| Finding a HIP installation | Finds the HIP installation based on its own location and its knowledge about the ROCm directory structure | First looks for HIP under the same parent directory as its own LLVM directory and then falls back on /opt/rocm. Users can use the --rocm-path option to instruct the compiler to use HIP from the specified ROCm installation. | +| Linking to the HIP runtime library | Is configured to automatically link to the HIP runtime from the detected HIP installation | Requires the --hip-link flag to be specified to link to the HIP runtime. Alternatively, users can use the -l`` -lamdhip64 option to link to a HIP runtime library. | +| Device function inlining | Inlines all GPU device functions, which provide greater performance and compatibility for codes that contain file scoped or device function scoped `__shared__` variables. However, it may increase compile time. | Relies on inlining heuristics to control inlining. Users experiencing performance or compilation issues with code using file scoped or device function scoped `__shared__` variables could try -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false to work around the issue. There are plans to address these issues with future compiler improvements. | +| Source code location | | | +:::: ## Compiler Options and Features @@ -58,22 +57,58 @@ This chapter discusses compiler options and features. ### AMD GPU Compilation -This table provides the most commonly used compiler options for GPU code. +This section outlines commonly used compiler flags for `hipcc` and `amdclang++`. +:::{option} -x hip + Compiles the source file as a HIP program. +::: -| **Table 4. Compiler Options** | | -|:----------------------------------------:|:---------------------------------------------------------------------------:| -| **Option** | **Description** | -| `-x hip` | Compiles the source file as a HIP program | -| `-fopenmp` | Enables the OpenMP support | -| `-fopenmp-targets=` | Enables the OpenMP target offload support of the specified GPU architecture | -| `--gpu-max-threads-per-block=` | Sets default launch bounds for kernels | -| `-munsafe-fp-atomics` | Enables unsafe floating point atomic instructions (AMDGPU only) | -| `-ffast-math` | Allows aggressive, lossy floating-point optimizations | -| `-mwavefrontsize64/-mno-wavefrontsize64` | Sets wavefront size to be 64 or 32 on RDNA architectures | -| `-mcumode` | Switches between CU and WGP modes on RDNA architectures | -| `--offload-arch=` | HIP offloading target ID in the form of a device architecture followed by target ID features delimited by a colon. Each target ID feature is a predefined string followed by a plus or minus sign (e.g. gfx908:xnack+:sramecc-). May be specified more than once | -| `-g` | Generates source-level debug information | -| `-fgpu-rdc`/`-fno-gpu-rdc` | Generates relocatable device code, also known as separate compilation mode | +:::{option} -fopenmp + Enables the OpenMP support. +::: + +:::{option} -fopenmp-targets= + Enables the OpenMP target offload support of the specified GPU architecture. + + :gpu: The GPU architecture. E.g. gfx908. +::: + +:::{option} --gpu-max-threads-per-block=: + Sets the default limit of threads per block. Also referred to as the launch bounds. + + :value: The default maximum amount of threads per block. +::: + +:::{option} -munsafe-fp-atomics + Enables unsafe floating point atomic instructions (AMDGPU only). +::: + +:::{option} -ffast-math + Allows aggressive, lossy floating-point optimizations. +::: + +:::{option} -mwavefrontsize64, -mno-wavefrontsize64 + Sets wavefront size to be 64 or 32 on RDNA architectures. +::: + +:::{option} -mcumode + Switches between CU and WGP modes on RDNA architectures. +::: + +:::{option} --offload-arch= + HIP offloading target ID. May be specified more than once. + + :gpu: The a device architecture followed by target ID features + delimited by a colon. Each target ID feature is a predefined + string followed by a plus or minus sign (e.g. `gfx908:xnack+:sramecc-`). +::: + +:::{option} -g + Generates source-level debug information. +::: + +:::{option} -fgpu-rdc, -fno-gpu-rdc + Generates relocatable device code, also known as separate compilation mode. +::: ### AMD Optimizations for Zen Architectures @@ -118,11 +153,8 @@ to perform this optimization. Users can choose different levels of aggressiveness with which this optimization can be applied to the application, with 1 being the least aggressive and 7 being the most aggressive level. -|| -|:--:| -| **Table 5. -fstruct-layout Values and Their Effects**| -|| +:::{table} -fstruct-layout Values and Their Effects | -fstruct-layout value | Structure peeling | Pointer size after selective compression of self-referential pointers in structures, wherever safe | Type of structure fields eligible for compression | Whether compression performed under safety check | | ----------- | ----------- | ----------- | ----------- | ----------- | | 1 | Enabled | NA | NA | NA | @@ -132,6 +164,7 @@ with 1 being the least aggressive and 7 being the most aggressive level. | 5 | Enabled | 16-bit | Integer | Yes | | 6 | Enabled | 32-bit | 64-bit signed int or unsigned int. Users must ensure that the values assigned to 64-bit signed int fields are in range -(2^31 - 1) to +(2^31 - 1) and 64-bit unsigned int fields are in the range 0 to +(2^31 - 1). Otherwise, you may obtain incorrect results. | No. Users must ensure the safety based on the program compiled. | | 7 | Enabled | 16-bit | 64-bit signed int or unsigned int. Users must ensure that the values assigned to 64-bit signed int fields are in range -(2^31 - 1) to +(2^31 - 1) and 64-bit unsigned int fields are in the range 0 to +(2^31 - 1). Otherwise, you may obtain incorrect results. | No. Users must ensure the safety based on the program compiled. | +::: #### `-fitodcalls` @@ -280,13 +313,14 @@ aggressiveness of heuristics increases with the level (1-4). The default level is 2. Higher levels may lead to code bloat due to expansion of recursive functions at call sites. -| **Table 6. -inline-recursion Values and Their Effects**| | -|:------------------------------------------------------:|:------------------------------------------------------------------------------:| -| `-inline-recursion` **value** | **Inline depth of heuristics used to enable inlining for recursive functions** | -| 1 | 1 | -| 2 | 1 | -| 3 | 1 | -| 4 | 10 | +:::{table} -inline-recursion Level and Their Effects +| `-inline-recursion` **value** | **Inline depth of heuristics used to enable inlining for recursive functions** | +|:-----------------------------:|:------------------------------------------------------------------------------:| +| 1 | 1 | +| 2 | 1 | +| 3 | 1 | +| 4 | 10 | +::: This is more effective with flto as the whole program needs to be analyzed to perform this optimization, which can be invoked as @@ -296,16 +330,13 @@ perform this optimization, which can be invoked as Performs array dataflow analysis and optimizes the unused array computations. -|| -|:--:| -| **Table 7. -reduce-array-computations Values and Their Effects**| -|| - +:::{table} -reduce-array-computations Values and Their Effects | -reduce-array-computations value | Array elements eligible for elimination of computations | -| ----------- | ----------- | -| 1 | Unused | -| 2 | Zero valued | -| 3 | Both unused and zero valued | +| -------------------------------- | --------------------------- | +| 1 | Unused | +| 2 | Zero valued | +| 3 | Both unused and zero valued | +::: This optimization is effective with flto as the whole program needs to be analyzed to perform this optimization, which can be invoked as @@ -466,17 +497,38 @@ offload-arch gfx906 -v The options are listed below: -| **Table 8. offload-arch Command-line Options** | | -|:----------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -| **Option** | **Description** | -| `h` | Prints the help message | -| `a` | Prints values for all devices. Do not stop at the first device found. | -| `m` | Prints device code name (often found in pci.ids file) | -| `n` | Prints numeric pci-id | -| `t` | Prints clang offload triple to use for the offload arch | -| `v` | Verbose = `-a -m -n -t`. For: all devices, prints codename, numeric value, and triple | -| `f ` | Prints offload requirements including offload-arch for each compiled offload image built into an application binary file | -| `c` | Prints offload capabilities of the underlying system. This option is used by the language runtime to select an image when multiple images are available. A capability must exist for each requirement of the selected image. | +:::{program} offload-arch +:::{option} -h + Prints the help message. +::: + +:::{option} -a + Prints values for all devices. Do not stop at the first device found. +::: + +:::{option} -m + Prints device code name (often found in pci.ids file). +::: + +:::{option} -n + Prints numeric pci-id. +::: + +:::{option} -t + Prints clang offload triple to use for the offload arch. +::: + +:::{option} -v + Verbose. Implies: `-a -m -n -t`. For: all devices, prints codename, numeric value, and triple. +::: + +:::{option} -f + Prints offload requirements including offload-arch for each compiled offload image built into an application binary file. +::: + +:::{option} -c + Prints offload capabilities of the underlying system. This option is used by the language runtime to select an image when multiple images are available. A capability must exist for each requirement of the selected image. +::: There are symbolic link aliases amdgpu-offload-arch and nvidia-arch for offload-arch. These aliases return 1 if no amdgcn GPU or cuda GPU is found. @@ -622,9 +674,13 @@ refer to the OpenMP Support Guide at [https://docs.amd.com](https://docs.amd.com The following table lists the other Clang options and their support status. -| **Table 9. Clang Options** | | | -|:----------------------------------------:|:------------------:|:------------------------------------------------------------------------------------------------------------------------------:| +:::{table} Clang Options +:name: clang-options +:widths: auto +:align: center + | **Option** | **Support Status** | **Description** | +|------------------------------------------|:------------------:|--------------------------------------------------------------------------------------------------------------------------------| | `-###` | Supported | Prints (but does not run) the commands to run for this compilation | | `--analyzer-output ` | Supported | "Static analyzer report output format (`html|plist|plist-multi-file|plist-html|sarif|text`)" | | `--analyze` | Supported | Runs the static analyzer | @@ -1383,3 +1439,4 @@ The following table lists the other Clang options and their support status. |-Xpreprocessor \|Supported|Passes \ to the preprocessor| |-x \|Supported|Assumes subsequent input files to have the given type \| |-z \|Supported|Passes -z \ to the linker| +:::