mirror of
https://github.com/ROCm/ROCm.git
synced 2026-02-03 02:45:18 -05:00
Merge pull request #3169 from ROCm/develop
Merge develop into roc-6.1.x
This commit is contained in:
47
docs/conceptual/setting-cus.rst
Normal file
47
docs/conceptual/setting-cus.rst
Normal file
@@ -0,0 +1,47 @@
|
||||
.. meta::
|
||||
:description: Setting the number of CUs
|
||||
:keywords: AMD, ROCm, cu, number of cus
|
||||
|
||||
.. _env-variables-reference:
|
||||
|
||||
*************************************************************
|
||||
Setting the number of CUs
|
||||
*************************************************************
|
||||
|
||||
When using GPUs to accelerate compute workloads, it sometimes becomes necessary
|
||||
to configure the hardware's usage of Compute Units (CU). This is a more advanced
|
||||
option, so please read this page before experimentation.
|
||||
|
||||
The GPU driver provides two environment variables to set the number of CUs used. The
|
||||
first one is ``HSA_CU_MASK`` and the second one is ``ROC_GLOBAL_CU_MASK``. The main
|
||||
difference is that ``ROC_GLOBAL_CU_MASK`` sets the CU mask on queues created by the HIP
|
||||
or the OpenCL runtimes. While ``HSA_CU_MASK`` sets the mask on a lower level of queue
|
||||
creation in the driver, this mask will also be set for queues being profiled.
|
||||
|
||||
The environment variables have the following syntax:
|
||||
|
||||
::
|
||||
|
||||
ID = [0-9][0-9]* ex. base 10 numbers
|
||||
ID_list = (ID | ID-ID)[, (ID | ID-ID)]* ex. 0,2-4,7
|
||||
GPU_list = ID_list ex. 0,2-4,7
|
||||
CU_list = 0x[0-F]* | ID_list ex. 0x337F OR 0,2-4,7
|
||||
CU_Set = GPU_list : CU_list ex. 0,2-4,7:0-15,32-47 OR 0,2-4,7:0x337F
|
||||
HSA_CU_MASK = CU_Set [; CU_Set]* ex. 0,2-4,7:0-15,32-47; 3-9:0x337F
|
||||
|
||||
The GPU indices are taken post ``ROCR_VISIBLE_DEVICES`` reordering. For GPUs listed,
|
||||
the listed or masked CUs will be enabled, the rest disabled. Unlisted GPUs will not
|
||||
be affected, their CUs will all be enabled.
|
||||
|
||||
The parsing of the variable is stopped when a syntax error occurs. The erroneous set
|
||||
and the ones following will be ignored. Repeating GPU or CU IDs are a syntax error.
|
||||
Specifying a mask with no usable CUs (CU_list is 0x0) is a syntax error. For excluding
|
||||
GPU devices use ``ROCR_VISIBLE_DEVICES``.
|
||||
|
||||
These environment variables only affect ROCm software, not graphics applications.
|
||||
|
||||
It's important to know that not all CU configurations are valid on all devices. For
|
||||
instance, on devices where two CUs can be combined into a WGP (for kernels running in
|
||||
WGP mode), it is not valid to disable only a single CU in a WGP. `This paper
|
||||
<https://www.cs.unc.edu/~otternes/papers/rtsj2022.pdf>`_ can provide more information
|
||||
about what to expect, when disabling CUs.
|
||||
BIN
docs/data/how-to/framework_install_2024_05_23.png
Normal file
BIN
docs/data/how-to/framework_install_2024_05_23.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 108 KiB |
@@ -1,22 +0,0 @@
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="description" content="Deep learning using ROCm">
|
||||
<meta name="keywords" content="deep learning, frameworks, installation, PyTorch, TensorFlow,
|
||||
MAGMA, AMD, ROCm">
|
||||
</head>
|
||||
|
||||
# Deep learning guide
|
||||
|
||||
The following sections cover the different framework installations for ROCm and
|
||||
deep-learning applications. The following image provides
|
||||
the sequential flow for the use of each framework. Refer to the ROCm Compatible
|
||||
Frameworks Release Notes for each framework's most current release notes at
|
||||
{doc}`Third-party support<rocm-install-on-linux:reference/3rd-party-support-matrix>`.
|
||||
|
||||

|
||||
|
||||
## Frameworks installation
|
||||
|
||||
* {doc}`PyTorch for ROCm<rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
|
||||
* {doc}`TensorFlow for ROCm<rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
|
||||
* {doc}`MAGMA for ROCm<rocm-install-on-linux:how-to/3rd-party/magma-install>`
|
||||
69
docs/how-to/deep-learning-rocm.rst
Normal file
69
docs/how-to/deep-learning-rocm.rst
Normal file
@@ -0,0 +1,69 @@
|
||||
.. meta::
|
||||
:description: How to install deep learning frameworks for ROCm
|
||||
:keywords: deep learning, frameworks, ROCm, install, PyTorch, TensorFlow, JAX, MAGMA, DeepSpeed, ML, AI
|
||||
|
||||
********************************************
|
||||
Installing deep learning frameworks for ROCm
|
||||
********************************************
|
||||
|
||||
ROCm provides a comprehensive ecosystem for deep learning development, including
|
||||
:ref:`libraries <artificial-intelligence-apis>` for optimized deep learning operations and ROCm-aware versions of popular
|
||||
deep learning frameworks and libraries such as PyTorch, TensorFlow, JAX, and MAGMA. ROCm works closely with these
|
||||
frameworks to ensure that framework-specific optimizations take advantage of AMD accelerator and GPU architectures.
|
||||
|
||||
The following guides cover installation processes for ROCm-aware deep learning frameworks.
|
||||
|
||||
.. grid::
|
||||
|
||||
.. grid-item::
|
||||
:columns: 3
|
||||
|
||||
:doc:`PyTorch for ROCm <rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
|
||||
|
||||
.. grid-item::
|
||||
:columns: 3
|
||||
|
||||
:doc:`TensorFlow for ROCm <rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
|
||||
|
||||
.. grid-item::
|
||||
:columns: 3
|
||||
|
||||
.. grid-item::
|
||||
:columns: 3
|
||||
|
||||
.. grid-item::
|
||||
:columns: 3
|
||||
|
||||
:doc:`JAX for ROCm <rocm-install-on-linux:how-to/3rd-party/jax-install>`
|
||||
|
||||
.. grid-item::
|
||||
:columns: 3
|
||||
|
||||
:doc:`MAGMA for ROCm <rocm-install-on-linux:how-to/3rd-party/magma-install>`
|
||||
|
||||
.. grid-item::
|
||||
:columns: 3
|
||||
|
||||
.. grid-item::
|
||||
:columns: 3
|
||||
|
||||
The following chart steps through typical installation workflows for installing deep learning frameworks for ROCm.
|
||||
|
||||
.. image:: ../data/how-to/framework_install_2024_05_23.png
|
||||
:alt: Flowchart for installing ROCm-aware machine learning frameworks
|
||||
:align: center
|
||||
|
||||
Find information on version compatibility and framework release notes in :doc:`Third-party support matrix
|
||||
<rocm-install-on-linux:reference/3rd-party-support-matrix>`.
|
||||
|
||||
.. Learn how to take advantage of your ROCm-aware deep learning environment using the following tutorials.
|
||||
..
|
||||
.. * :doc:`How to use ROCm for AI <how-to/rocm-for-ai/index>`
|
||||
..
|
||||
.. * :doc:`How to fine-tune LLMs with ROCm <how-to/fine-tuning-llms/index>`
|
||||
..
|
||||
|
||||
.. note::
|
||||
|
||||
For guidance on installing ROCm itself, refer to :doc:`ROCm installation for Linux <rocm-install-on-linux:index>`.
|
||||
|
||||
@@ -1,13 +1,14 @@
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="description" content="Tuning guides">
|
||||
<meta name="description" content="AMD hardware optimization for specific workloads">
|
||||
<meta name="keywords" content="high-performance computing, HPC, Instinct accelerators,
|
||||
Radeon, tuning, tuning guide, AMD, ROCm">
|
||||
</head>
|
||||
|
||||
# Tuning guides
|
||||
# System optimization
|
||||
|
||||
Use case-specific system setup and tuning guides.
|
||||
This guide outlines system setup and tuning suggestions for AMD hardware to optimize performance for specific types of
|
||||
workloads or use-cases.
|
||||
|
||||
## High-performance computing
|
||||
|
||||
|
||||
@@ -37,12 +37,13 @@ Our documentation is organized into the following categories:
|
||||
* Windows
|
||||
* {doc}`Windows install guide<rocm-install-on-windows:how-to/install>`
|
||||
* {doc}`Application deployment guidelines<rocm-install-on-windows:conceptual/deployment-guidelines>`
|
||||
* {doc}`Install Docker containers<rocm-install-on-linux:how-to/docker>`
|
||||
* {doc}`PyTorch for ROCm<rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
|
||||
* {doc}`TensorFlow for ROCm<rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
|
||||
* {doc}`JAX for ROCm<rocm-install-on-linux:how-to/3rd-party/jax-install>`
|
||||
* {doc}`MAGMA for ROCm<rocm-install-on-linux:how-to/3rd-party/magma-install>`
|
||||
* {doc}`ROCm & Spack<rocm-install-on-linux:how-to/spack>`
|
||||
* [Deep learning frameworks](./how-to/deep-learning-rocm.rst)
|
||||
* {doc}`Install Docker containers<rocm-install-on-linux:how-to/docker>`
|
||||
* {doc}`PyTorch for ROCm<rocm-install-on-linux:how-to/3rd-party/pytorch-install>`
|
||||
* {doc}`TensorFlow for ROCm<rocm-install-on-linux:how-to/3rd-party/tensorflow-install>`
|
||||
* {doc}`JAX for ROCm<rocm-install-on-linux:how-to/3rd-party/jax-install>`
|
||||
* {doc}`MAGMA for ROCm<rocm-install-on-linux:how-to/3rd-party/magma-install>`
|
||||
* {doc}`ROCm & Spack<rocm-install-on-linux:how-to/spack>`
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
@@ -94,7 +95,6 @@ Our documentation is organized into the following categories:
|
||||
* [MI100](./how-to/tuning-guides/mi100.md)
|
||||
* [MI200](./how-to/tuning-guides/mi200.md)
|
||||
* [RDNA2](./how-to/tuning-guides/w6000-v620.md)
|
||||
* [Setting up for deep learning with ROCm](./how-to/deep-learning-rocm.md)
|
||||
* [GPU-enabled MPI](./how-to/gpu-enabled-mpi.rst)
|
||||
* [Using compiler features](./conceptual/compiler-topics.md)
|
||||
* [Using AddressSanitizer](./conceptual/using-gpu-sanitizer.md)
|
||||
@@ -115,6 +115,7 @@ Our documentation is organized into the following categories:
|
||||
* [MI250](./conceptual/gpu-arch/mi250.md)
|
||||
* [MI300](./conceptual/gpu-arch/mi300.md)
|
||||
* [GPU memory](./conceptual/gpu-memory.md)
|
||||
* [Setting the number of CUs](./conceptual/setting-cus)
|
||||
* [File structure (Linux FHS)](./conceptual/file-reorg.md)
|
||||
* [GPU isolation techniques](./conceptual/gpu-isolation.md)
|
||||
* [Using CMake](./conceptual/cmake-packages.rst)
|
||||
|
||||
@@ -22,6 +22,8 @@ subtrees:
|
||||
title: ROCm on Linux
|
||||
- url: https://rocm.docs.amd.com/projects/install-on-windows/en/${branch}/
|
||||
title: HIP SDK on Windows
|
||||
- file: how-to/deep-learning-rocm.md
|
||||
title: Deep learning frameworks
|
||||
|
||||
- caption: Compatibility
|
||||
entries:
|
||||
@@ -48,7 +50,7 @@ subtrees:
|
||||
- caption: How to
|
||||
entries:
|
||||
- file: how-to/tuning-guides.md
|
||||
title: Tuning guides
|
||||
title: System optimization
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: how-to/tuning-guides/mi100.md
|
||||
@@ -57,8 +59,6 @@ subtrees:
|
||||
title: MI200
|
||||
- file: how-to/tuning-guides/w6000-v620.md
|
||||
title: RDNA2
|
||||
- file: how-to/deep-learning-rocm.md
|
||||
title: Deep learning
|
||||
- file: how-to/gpu-enabled-mpi.rst
|
||||
title: Using MPI
|
||||
- file: conceptual/compiler-topics.md
|
||||
@@ -110,6 +110,8 @@ subtrees:
|
||||
title: White paper
|
||||
- file: conceptual/gpu-memory.md
|
||||
title: GPU memory
|
||||
- file: conceptual/setting-cus
|
||||
title: Setting the number of CUs
|
||||
- file: conceptual/file-reorg.md
|
||||
title: File structure (Linux FHS)
|
||||
- file: conceptual/gpu-isolation.md
|
||||
|
||||
Reference in New Issue
Block a user