Merge pull request #3169 from ROCm/develop

Merge develop into roc-6.1.x
This commit is contained in:
Sam Wu
2024-05-29 12:25:51 -06:00
committed by GitHub
42 changed files with 891 additions and 205 deletions

View File

@@ -0,0 +1,47 @@
.. meta::
:description: Setting the number of CUs
:keywords: AMD, ROCm, cu, number of cus
.. _env-variables-reference:
*************************************************************
Setting the number of CUs
*************************************************************
When using GPUs to accelerate compute workloads, it sometimes becomes necessary
to configure the hardware's usage of Compute Units (CU). This is a more advanced
option, so please read this page before experimentation.
The GPU driver provides two environment variables to set the number of CUs used. The
first one is ``HSA_CU_MASK`` and the second one is ``ROC_GLOBAL_CU_MASK``. The main
difference is that ``ROC_GLOBAL_CU_MASK`` sets the CU mask on queues created by the HIP
or the OpenCL runtimes. While ``HSA_CU_MASK`` sets the mask on a lower level of queue
creation in the driver, this mask will also be set for queues being profiled.
The environment variables have the following syntax:
::
ID = [0-9][0-9]* ex. base 10 numbers
ID_list = (ID | ID-ID)[, (ID | ID-ID)]* ex. 0,2-4,7
GPU_list = ID_list ex. 0,2-4,7
CU_list = 0x[0-F]* | ID_list ex. 0x337F OR 0,2-4,7
CU_Set = GPU_list : CU_list ex. 0,2-4,7:0-15,32-47 OR 0,2-4,7:0x337F
HSA_CU_MASK = CU_Set [; CU_Set]* ex. 0,2-4,7:0-15,32-47; 3-9:0x337F
The GPU indices are taken post ``ROCR_VISIBLE_DEVICES`` reordering. For GPUs listed,
the listed or masked CUs will be enabled, the rest disabled. Unlisted GPUs will not
be affected, their CUs will all be enabled.
The parsing of the variable is stopped when a syntax error occurs. The erroneous set
and the ones following will be ignored. Repeating GPU or CU IDs are a syntax error.
Specifying a mask with no usable CUs (CU_list is 0x0) is a syntax error. For excluding
GPU devices use ``ROCR_VISIBLE_DEVICES``.
These environment variables only affect ROCm software, not graphics applications.
It's important to know that not all CU configurations are valid on all devices. For
instance, on devices where two CUs can be combined into a WGP (for kernels running in
WGP mode), it is not valid to disable only a single CU in a WGP. `This paper
<https://www.cs.unc.edu/~otternes/papers/rtsj2022.pdf>`_ can provide more information
about what to expect, when disabling CUs.

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

View File

@@ -1,22 +0,0 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="Deep learning using ROCm">
<meta name="keywords" content="deep learning, frameworks, installation, PyTorch, TensorFlow,
MAGMA, AMD, ROCm">
</head>
# Deep learning guide
The following sections cover the different framework installations for ROCm and
deep-learning applications. The following image provides
the sequential flow for the use of each framework. Refer to the ROCm Compatible
Frameworks Release Notes for each framework's most current release notes at
{doc}`Third-party support<rocm-install-on-linux:reference/3rd-party-support-matrix>`.
![ROCm Compatible Frameworks Flowchart](../data/how-to/magma005.png "ROCm Compatible Frameworks")
## Frameworks installation
* {doc}`PyTorch for ROCm<rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
* {doc}`TensorFlow for ROCm<rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
* {doc}`MAGMA for ROCm<rocm-install-on-linux:how-to/3rd-party/magma-install>`

View File

@@ -0,0 +1,69 @@
.. meta::
:description: How to install deep learning frameworks for ROCm
:keywords: deep learning, frameworks, ROCm, install, PyTorch, TensorFlow, JAX, MAGMA, DeepSpeed, ML, AI
********************************************
Installing deep learning frameworks for ROCm
********************************************
ROCm provides a comprehensive ecosystem for deep learning development, including
:ref:`libraries <artificial-intelligence-apis>` for optimized deep learning operations and ROCm-aware versions of popular
deep learning frameworks and libraries such as PyTorch, TensorFlow, JAX, and MAGMA. ROCm works closely with these
frameworks to ensure that framework-specific optimizations take advantage of AMD accelerator and GPU architectures.
The following guides cover installation processes for ROCm-aware deep learning frameworks.
.. grid::
.. grid-item::
:columns: 3
:doc:`PyTorch for ROCm <rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
.. grid-item::
:columns: 3
:doc:`TensorFlow for ROCm <rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
.. grid-item::
:columns: 3
.. grid-item::
:columns: 3
.. grid-item::
:columns: 3
:doc:`JAX for ROCm <rocm-install-on-linux:how-to/3rd-party/jax-install>`
.. grid-item::
:columns: 3
:doc:`MAGMA for ROCm <rocm-install-on-linux:how-to/3rd-party/magma-install>`
.. grid-item::
:columns: 3
.. grid-item::
:columns: 3
The following chart steps through typical installation workflows for installing deep learning frameworks for ROCm.
.. image:: ../data/how-to/framework_install_2024_05_23.png
:alt: Flowchart for installing ROCm-aware machine learning frameworks
:align: center
Find information on version compatibility and framework release notes in :doc:`Third-party support matrix
<rocm-install-on-linux:reference/3rd-party-support-matrix>`.
.. Learn how to take advantage of your ROCm-aware deep learning environment using the following tutorials.
..
.. * :doc:`How to use ROCm for AI <how-to/rocm-for-ai/index>`
..
.. * :doc:`How to fine-tune LLMs with ROCm <how-to/fine-tuning-llms/index>`
..
.. note::
For guidance on installing ROCm itself, refer to :doc:`ROCm installation for Linux <rocm-install-on-linux:index>`.

View File

@@ -1,13 +1,14 @@
<head>
<meta charset="UTF-8">
<meta name="description" content="Tuning guides">
<meta name="description" content="AMD hardware optimization for specific workloads">
<meta name="keywords" content="high-performance computing, HPC, Instinct accelerators,
Radeon, tuning, tuning guide, AMD, ROCm">
</head>
# Tuning guides
# System optimization
Use case-specific system setup and tuning guides.
This guide outlines system setup and tuning suggestions for AMD hardware to optimize performance for specific types of
workloads or use-cases.
## High-performance computing

View File

@@ -37,12 +37,13 @@ Our documentation is organized into the following categories:
* Windows
* {doc}`Windows install guide<rocm-install-on-windows:how-to/install>`
* {doc}`Application deployment guidelines<rocm-install-on-windows:conceptual/deployment-guidelines>`
* {doc}`Install Docker containers<rocm-install-on-linux:how-to/docker>`
* {doc}`PyTorch for ROCm<rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
* {doc}`TensorFlow for ROCm<rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
* {doc}`JAX for ROCm<rocm-install-on-linux:how-to/3rd-party/jax-install>`
* {doc}`MAGMA for ROCm<rocm-install-on-linux:how-to/3rd-party/magma-install>`
* {doc}`ROCm & Spack<rocm-install-on-linux:how-to/spack>`
* [Deep learning frameworks](./how-to/deep-learning-rocm.rst)
* {doc}`Install Docker containers<rocm-install-on-linux:how-to/docker>`
* {doc}`PyTorch for ROCm<rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
* {doc}`TensorFlow for ROCm<rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
* {doc}`JAX for ROCm<rocm-install-on-linux:how-to/3rd-party/jax-install>`
* {doc}`MAGMA for ROCm<rocm-install-on-linux:how-to/3rd-party/magma-install>`
* {doc}`ROCm & Spack<rocm-install-on-linux:how-to/spack>`
:::
:::{grid-item-card}
@@ -94,7 +95,6 @@ Our documentation is organized into the following categories:
* [MI100](./how-to/tuning-guides/mi100.md)
* [MI200](./how-to/tuning-guides/mi200.md)
* [RDNA2](./how-to/tuning-guides/w6000-v620.md)
* [Setting up for deep learning with ROCm](./how-to/deep-learning-rocm.md)
* [GPU-enabled MPI](./how-to/gpu-enabled-mpi.rst)
* [Using compiler features](./conceptual/compiler-topics.md)
* [Using AddressSanitizer](./conceptual/using-gpu-sanitizer.md)
@@ -115,6 +115,7 @@ Our documentation is organized into the following categories:
* [MI250](./conceptual/gpu-arch/mi250.md)
* [MI300](./conceptual/gpu-arch/mi300.md)
* [GPU memory](./conceptual/gpu-memory.md)
* [Setting the number of CUs](./conceptual/setting-cus)
* [File structure (Linux FHS)](./conceptual/file-reorg.md)
* [GPU isolation techniques](./conceptual/gpu-isolation.md)
* [Using CMake](./conceptual/cmake-packages.rst)

View File

@@ -22,6 +22,8 @@ subtrees:
title: ROCm on Linux
- url: https://rocm.docs.amd.com/projects/install-on-windows/en/${branch}/
title: HIP SDK on Windows
- file: how-to/deep-learning-rocm.md
title: Deep learning frameworks
- caption: Compatibility
entries:
@@ -48,7 +50,7 @@ subtrees:
- caption: How to
entries:
- file: how-to/tuning-guides.md
title: Tuning guides
title: System optimization
subtrees:
- entries:
- file: how-to/tuning-guides/mi100.md
@@ -57,8 +59,6 @@ subtrees:
title: MI200
- file: how-to/tuning-guides/w6000-v620.md
title: RDNA2
- file: how-to/deep-learning-rocm.md
title: Deep learning
- file: how-to/gpu-enabled-mpi.rst
title: Using MPI
- file: conceptual/compiler-topics.md
@@ -110,6 +110,8 @@ subtrees:
title: White paper
- file: conceptual/gpu-memory.md
title: GPU memory
- file: conceptual/setting-cus
title: Setting the number of CUs
- file: conceptual/file-reorg.md
title: File structure (Linux FHS)
- file: conceptual/gpu-isolation.md