Merge pull request #3169 from ROCm/develop

Merge develop into roc-6.1.x
2026-02-03 02:45:18 -05:00 · 2024-05-29 12:25:51 -06:00
parent cd575e2926 3a68f43df7
commit e63ff81549
42 changed files with 891 additions and 205 deletions
--- a/docs/conceptual/setting-cus.rst
+++ b/docs/conceptual/setting-cus.rst
@@ -0,0 +1,47 @@
+.. meta::
+    :description: Setting the number of CUs
+    :keywords: AMD, ROCm, cu, number of cus
+
+.. _env-variables-reference:
+
+*************************************************************
+Setting the number of CUs
+*************************************************************
+
+When using GPUs to accelerate compute workloads, it sometimes becomes necessary
+to configure the hardware's usage of Compute Units (CU). This is a more advanced
+option, so please read this page before experimentation.
+
+The GPU driver provides two environment variables to set the number of CUs used. The
+first one is ``HSA_CU_MASK`` and the second one is ``ROC_GLOBAL_CU_MASK``. The main
+difference is that ``ROC_GLOBAL_CU_MASK`` sets the CU mask on queues created by the HIP
+or the OpenCL runtimes. While ``HSA_CU_MASK`` sets the mask on a lower level of queue
+creation in the driver, this mask will also be set for queues being profiled.
+
+The environment variables have the following syntax:
+
+::
+
+    ID = [0-9][0-9]*                         ex. base 10 numbers
+    ID_list = (ID | ID-ID)[, (ID | ID-ID)]*  ex. 0,2-4,7
+    GPU_list = ID_list                       ex. 0,2-4,7
+    CU_list = 0x[0-F]* | ID_list             ex. 0x337F OR 0,2-4,7
+    CU_Set = GPU_list : CU_list              ex. 0,2-4,7:0-15,32-47 OR 0,2-4,7:0x337F
+    HSA_CU_MASK = CU_Set [; CU_Set]*         ex. 0,2-4,7:0-15,32-47; 3-9:0x337F
+
+The GPU indices are taken post ``ROCR_VISIBLE_DEVICES`` reordering. For GPUs listed,
+the listed or masked CUs will be enabled, the rest disabled. Unlisted GPUs will not
+be affected, their CUs will all be enabled.
+
+The parsing of the variable is stopped when a syntax error occurs. The erroneous set
+and the ones following will be ignored. Repeating GPU or CU IDs are a syntax error.
+Specifying a mask with no usable CUs (CU_list is 0x0) is a syntax error. For excluding
+GPU devices use ``ROCR_VISIBLE_DEVICES``.
+
+These environment variables only affect ROCm software, not graphics applications.
+
+It's important to know that not all CU configurations are valid on all devices. For
+instance, on devices where two CUs can be combined into a WGP (for kernels running in
+WGP mode), it is not valid to disable only a single CU in a WGP. `This paper
+<https://www.cs.unc.edu/~otternes/papers/rtsj2022.pdf>`_ can provide more information
+about what to expect, when disabling CUs.
--- a/docs/data/how-to/framework_install_2024_05_23.png
+++ b/docs/data/how-to/framework_install_2024_05_23.png
--- a/docs/how-to/deep-learning-rocm.md
+++ b/docs/how-to/deep-learning-rocm.md
@@ -1,22 +0,0 @@
-<head>
-  <meta charset="UTF-8">
-  <meta name="description" content="Deep learning using ROCm">
-  <meta name="keywords" content="deep learning, frameworks, installation, PyTorch, TensorFlow,
-  MAGMA, AMD, ROCm">
-</head>
-
-# Deep learning guide
-
-The following sections cover the different framework installations for ROCm and
-deep-learning applications. The following image provides
-the sequential flow for the use of each framework. Refer to the ROCm Compatible
-Frameworks Release Notes for each framework's most current release notes at
-{doc}`Third-party support<rocm-install-on-linux:reference/3rd-party-support-matrix>`.
-
-![ROCm Compatible Frameworks Flowchart](../data/how-to/magma005.png "ROCm Compatible Frameworks")
-
-## Frameworks installation
-
-* {doc}`PyTorch for ROCm<rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
-* {doc}`TensorFlow for ROCm<rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
-* {doc}`MAGMA for ROCm<rocm-install-on-linux:how-to/3rd-party/magma-install>`
--- a/docs/how-to/deep-learning-rocm.rst
+++ b/docs/how-to/deep-learning-rocm.rst
@@ -0,0 +1,69 @@
+.. meta::
+   :description: How to install deep learning frameworks for ROCm
+   :keywords: deep learning, frameworks, ROCm, install, PyTorch, TensorFlow, JAX, MAGMA, DeepSpeed, ML, AI
+
+********************************************
+Installing deep learning frameworks for ROCm
+********************************************
+
+ROCm provides a comprehensive ecosystem for deep learning development, including
+:ref:`libraries <artificial-intelligence-apis>` for optimized deep learning operations and ROCm-aware versions of popular
+deep learning frameworks and libraries such as PyTorch, TensorFlow, JAX, and MAGMA. ROCm works closely with these
+frameworks to ensure that framework-specific optimizations take advantage of AMD accelerator and GPU architectures.
+
+The following guides cover installation processes for ROCm-aware deep learning frameworks.
+
+.. grid::
+
+   .. grid-item::
+      :columns: 3
+
+      :doc:`PyTorch for ROCm <rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
+
+   .. grid-item::
+      :columns: 3
+
+      :doc:`TensorFlow for ROCm <rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
+
+   .. grid-item::
+      :columns: 3
+
+   .. grid-item::
+      :columns: 3
+
+   .. grid-item::
+      :columns: 3
+
+      :doc:`JAX for ROCm <rocm-install-on-linux:how-to/3rd-party/jax-install>`
+
+   .. grid-item::
+      :columns: 3
+
+      :doc:`MAGMA for ROCm <rocm-install-on-linux:how-to/3rd-party/magma-install>`
+
+   .. grid-item::
+      :columns: 3
+
+   .. grid-item::
+      :columns: 3
+
+The following chart steps through typical installation workflows for installing deep learning frameworks for ROCm.
+
+.. image:: ../data/how-to/framework_install_2024_05_23.png
+   :alt: Flowchart for installing ROCm-aware machine learning frameworks
+   :align: center
+
+Find information on version compatibility and framework release notes in :doc:`Third-party support matrix
+<rocm-install-on-linux:reference/3rd-party-support-matrix>`.
+
+.. Learn how to take advantage of your ROCm-aware deep learning environment using the following tutorials.
+..
+..    * :doc:`How to use ROCm for AI <how-to/rocm-for-ai/index>`
+..
+..    * :doc:`How to fine-tune LLMs with ROCm <how-to/fine-tuning-llms/index>`
+..
+
+.. note::
+
+   For guidance on installing ROCm itself, refer to :doc:`ROCm installation for Linux <rocm-install-on-linux:index>`.
+
--- a/docs/how-to/tuning-guides.md
+++ b/docs/how-to/tuning-guides.md
@@ -1,13 +1,14 @@
 <head>
  <meta charset="UTF-8">
-  <meta name="description" content="Tuning guides">
+  <meta name="description" content="AMD hardware optimization for specific workloads">
  <meta name="keywords" content="high-performance computing, HPC, Instinct accelerators,
  Radeon, tuning, tuning guide, AMD, ROCm">
 </head>

-# Tuning guides
+# System optimization

-Use case-specific system setup and tuning guides.
+This guide outlines system setup and tuning suggestions for AMD hardware to optimize performance for specific types of
+workloads or use-cases.

 ## High-performance computing

--- a/docs/index.md
+++ b/docs/index.md
@@ -37,12 +37,13 @@ Our documentation is organized into the following categories:
 * Windows
  * {doc}`Windows install guide<rocm-install-on-windows:how-to/install>`
  * {doc}`Application deployment guidelines<rocm-install-on-windows:conceptual/deployment-guidelines>`
-* {doc}`Install Docker containers<rocm-install-on-linux:how-to/docker>`
-* {doc}`PyTorch for ROCm<rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
-* {doc}`TensorFlow for ROCm<rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
-* {doc}`JAX for ROCm<rocm-install-on-linux:how-to/3rd-party/jax-install>`
-* {doc}`MAGMA for ROCm<rocm-install-on-linux:how-to/3rd-party/magma-install>`
-* {doc}`ROCm & Spack<rocm-install-on-linux:how-to/spack>`
+* [Deep learning frameworks](./how-to/deep-learning-rocm.rst)
+  * {doc}`Install Docker containers<rocm-install-on-linux:how-to/docker>`
+  * {doc}`PyTorch for ROCm<rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
+  * {doc}`TensorFlow for ROCm<rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
+  * {doc}`JAX for ROCm<rocm-install-on-linux:how-to/3rd-party/jax-install>`
+  * {doc}`MAGMA for ROCm<rocm-install-on-linux:how-to/3rd-party/magma-install>`
+  * {doc}`ROCm & Spack<rocm-install-on-linux:how-to/spack>`
 :::

 :::{grid-item-card}
@@ -94,7 +95,6 @@ Our documentation is organized into the following categories:
  * [MI100](./how-to/tuning-guides/mi100.md)
  * [MI200](./how-to/tuning-guides/mi200.md)
  * [RDNA2](./how-to/tuning-guides/w6000-v620.md)
-* [Setting up for deep learning with ROCm](./how-to/deep-learning-rocm.md)
 * [GPU-enabled MPI](./how-to/gpu-enabled-mpi.rst)
 * [Using compiler features](./conceptual/compiler-topics.md)
  * [Using AddressSanitizer](./conceptual/using-gpu-sanitizer.md)
@@ -115,6 +115,7 @@ Our documentation is organized into the following categories:
  * [MI250](./conceptual/gpu-arch/mi250.md)
  * [MI300](./conceptual/gpu-arch/mi300.md)
 * [GPU memory](./conceptual/gpu-memory.md)
+* [Setting the number of CUs](./conceptual/setting-cus)
 * [File structure (Linux FHS)](./conceptual/file-reorg.md)
 * [GPU isolation techniques](./conceptual/gpu-isolation.md)
 * [Using CMake](./conceptual/cmake-packages.rst)
--- a/docs/sphinx/_toc.yml.in
+++ b/docs/sphinx/_toc.yml.in
@@ -22,6 +22,8 @@ subtrees:
    title: ROCm on Linux
  - url: https://rocm.docs.amd.com/projects/install-on-windows/en/${branch}/
    title: HIP SDK on Windows
+  - file: how-to/deep-learning-rocm.md
+    title: Deep learning frameworks

 - caption: Compatibility
  entries:
@@ -48,7 +50,7 @@ subtrees:
 - caption: How to
  entries:
  - file: how-to/tuning-guides.md
-    title: Tuning guides
+    title: System optimization
    subtrees:
    - entries:
      - file: how-to/tuning-guides/mi100.md
@@ -57,8 +59,6 @@ subtrees:
        title: MI200
      - file: how-to/tuning-guides/w6000-v620.md
        title: RDNA2
-  - file: how-to/deep-learning-rocm.md
-    title: Deep learning
  - file: how-to/gpu-enabled-mpi.rst
    title: Using MPI
  - file: conceptual/compiler-topics.md
@@ -110,6 +110,8 @@ subtrees:
            title: White paper
  - file: conceptual/gpu-memory.md
    title: GPU memory
+  - file: conceptual/setting-cus
+    title: Setting the number of CUs
  - file: conceptual/file-reorg.md
    title: File structure (Linux FHS)
  - file: conceptual/gpu-isolation.md