mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-08 22:28:06 -05:00
Left nav updates (#2647)
* update gpu-enabled-mpi update the documentation to also include libfabric based network interconnects, not just UCX. * add some technical terms to wordlist * shorten left nav * grid updates --------- Co-authored-by: Edgar Gabriel <Edgar.Gabriel@amd.com> Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
This commit is contained in:
@@ -200,6 +200,7 @@ hipSPARSELt
|
||||
hipTensor
|
||||
HPC
|
||||
HPCG
|
||||
HPE
|
||||
HPL
|
||||
HSA
|
||||
hsa
|
||||
@@ -245,6 +246,7 @@ KVM
|
||||
LAPACK
|
||||
LCLK
|
||||
LDS
|
||||
libfabric
|
||||
libjpeg
|
||||
libs
|
||||
linearized
|
||||
@@ -383,6 +385,7 @@ Rickle
|
||||
roadmap
|
||||
roc
|
||||
ROC
|
||||
RoCE
|
||||
rocAL
|
||||
rocALUTION
|
||||
rocalution
|
||||
@@ -451,6 +454,7 @@ SKUs
|
||||
skylake
|
||||
sL
|
||||
SLES
|
||||
sm
|
||||
SMEM
|
||||
SMI
|
||||
smi
|
||||
|
||||
@@ -53,14 +53,14 @@ The following sequences of build commands assume either the ROCmCC or the AOMP
|
||||
compiler is active in the environment, which will execute the commands.
|
||||
```
|
||||
|
||||
## Install UCX
|
||||
### Installing UCX
|
||||
|
||||
The next step is to set up UCX by compiling its source code and install it:
|
||||
|
||||
```shell
|
||||
export UCX_DIR=$INSTALL_DIR/ucx
|
||||
cd $BUILD_DIR
|
||||
git clone https://github.com/openucx/ucx.git -b v1.14.1
|
||||
git clone https://github.com/openucx/ucx.git -b v1.15.x
|
||||
cd ucx
|
||||
./autogen.sh
|
||||
mkdir build
|
||||
@@ -74,7 +74,7 @@ make -j $(nproc) install
|
||||
The [communication libraries tables](../reference/library-index.md)
|
||||
documents the compatibility of UCX versions with ROCm versions.
|
||||
|
||||
## Install Open MPI
|
||||
### Installing Open MPI
|
||||
|
||||
These are the steps to build Open MPI:
|
||||
|
||||
@@ -90,12 +90,12 @@ cd build
|
||||
../configure --prefix=$OMPI_DIR --with-ucx=$UCX_DIR \
|
||||
--with-rocm=/opt/rocm
|
||||
make -j $(nproc)
|
||||
make -j $(nproc) install
|
||||
make install
|
||||
```
|
||||
|
||||
## ROCm-enabled OSU
|
||||
### ROCm-enabled OSU
|
||||
|
||||
The OSU Micro Benchmarks v5.9 (OMB) can be used to evaluate the performance of
|
||||
The OSU Micro Benchmarks (OMB) can be used to evaluate the performance of
|
||||
various primitives with an AMD GPU device and ROCm support. This functionality
|
||||
is exposed when configured with `--enable-rocm` option. We can use the following
|
||||
steps to compile OMB:
|
||||
@@ -103,10 +103,10 @@ steps to compile OMB:
|
||||
```shell
|
||||
export OSU_DIR=$INSTALL_DIR/osu
|
||||
cd $BUILD_DIR
|
||||
wget http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.9.tar.gz
|
||||
tar xfz osu-micro-benchmarks-5.9.tar.gz
|
||||
cd osu-micro-benchmarks-5.9
|
||||
./configure --prefix=$INSTALL_DIR/osu --enable-rocm \
|
||||
wget http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-7.2.tar.gz
|
||||
tar xfz osu-micro-benchmarks-7.2.tar.gz
|
||||
cd osu-micro-benchmarks-7.2
|
||||
./configure --enable-rocm \
|
||||
--with-rocm=/opt/rocm \
|
||||
CC=$OMPI_DIR/bin/mpicc CXX=$OMPI_DIR/bin/mpicxx \
|
||||
LDFLAGS="-L$OMPI_DIR/lib/ -lmpi -L/opt/rocm/lib/ \
|
||||
@@ -114,7 +114,7 @@ cd osu-micro-benchmarks-5.9
|
||||
make -j $(nproc)
|
||||
```
|
||||
|
||||
## Intra-node run
|
||||
### Intra-node run
|
||||
|
||||
Before running an Open MPI job, it is essential to set some environment variables to
|
||||
ensure that the correct version of Open MPI and UCX is being used.
|
||||
@@ -125,31 +125,35 @@ export PATH=$OMPI_DIR/bin:$PATH
|
||||
```
|
||||
|
||||
The following command runs the OSU bandwidth benchmark between the first two GPU
|
||||
devices (i.e., GPU 0 and GPU 1, same OAM) by default inside the same node. It
|
||||
devices (i.e., GPU 0 and GPU 1) by default inside the same node. It
|
||||
measures the unidirectional bandwidth from the first device to the other.
|
||||
|
||||
```shell
|
||||
$OMPI_DIR/bin/mpirun -np 2 \
|
||||
-x UCX_TLS=sm,self,rocm \
|
||||
--mca pml ucx mpi/pt2pt/osu_bw -d rocm D D
|
||||
--mca pml ucx \
|
||||
./c/mpi/pt2pt/standard/osu_bw D D
|
||||
```
|
||||
|
||||
To select different devices, for example 2 and 3, use the following command:
|
||||
|
||||
```shell
|
||||
export HIP_VISIBLE_DEVICES=2,3
|
||||
```
|
||||
|
||||
To force using a copy kernel instead of a DMA engine for the data transfer, use the following command:
|
||||
|
||||
```shell
|
||||
export HSA_ENABLE_SDMA=0
|
||||
```
|
||||
|
||||
The following output shows the effective transfer bandwidth measured for
|
||||
inter-die data transfer between GPU device 2 and 3 (same OAM). For messages
|
||||
larger than 67MB, an effective utilization of about 150GB/sec is achieved, which
|
||||
corresponds to 75% of the peak transfer bandwidth of 200GB/sec for that
|
||||
connection:
|
||||
inter-die data transfer between GPU device 2 and 3 on a system with MI250 GPUs. For messages
|
||||
larger than 67MB, an effective utilization of about 150GB/sec is achieved:
|
||||
|
||||

|
||||
|
||||
## Collective operations
|
||||
### Collective operations
|
||||
|
||||
Collective Operations on GPU buffers are best handled through the
|
||||
Unified Collective Communication Library (UCC) component in Open MPI.
|
||||
@@ -164,8 +168,9 @@ is shown below:
|
||||
|
||||
```shell
|
||||
export UCC_DIR=$INSTALL_DIR/ucc
|
||||
git clone https://github.com/openucx/ucc.git
|
||||
git clone https://github.com/openucx/ucc.git -b v1.2.x
|
||||
cd ucc
|
||||
./autogen.sh
|
||||
./configure --with-rocm=/opt/rocm \
|
||||
--with-ucx=$UCX_DIR \
|
||||
--prefix=$UCC_DIR
|
||||
@@ -187,3 +192,92 @@ mpirun --mca pml ucx --mca osc ucx \
|
||||
--mca coll_ucc_enable 1 \
|
||||
--mca coll_ucc_priority 100 -np 64 ./my_mpi_app
|
||||
```
|
||||
|
||||
## ROCm-aware Open MPI using libfabric
|
||||
|
||||
For network interconnects that are not covered in the previous category,
|
||||
such as HPE Slingshot, ROCm-aware communication can often be
|
||||
achieved through the libfabric library. For more details on
|
||||
libfabric, please refer to its
|
||||
[documentation](https://github.com/ofiwg/libfabric/wiki).
|
||||
|
||||
### Installing libfabric
|
||||
|
||||
In many instances libfabric is already pre-installed on a system. Please verify
|
||||
using e.g.
|
||||
|
||||
```shell
|
||||
module avail libfabric
|
||||
```
|
||||
|
||||
the availability of the libfabric library on your system.
|
||||
|
||||
Alternatively, one can also download and compile libfabric with ROCm
|
||||
support. Note however that not all components required to
|
||||
support e.g. HPE Slingshot networks are available in the open source
|
||||
repository. Therefore, using a pre-installed libfabric library is strongly
|
||||
preferred over compiling libfabric yourself.
|
||||
|
||||
If a pre-compiled libfabric library is available on your system,
|
||||
please skip the subsequent steps and go to [Installing Open MPI
|
||||
with libfabric support](#installing-open-mpi-with-libfabric-support).
|
||||
|
||||
Compiling libfabric with ROCm support can be achieved with the following
|
||||
steps:
|
||||
|
||||
```shell
|
||||
export OFI_DIR=$INSTALL_DIR/ofi
|
||||
cd $BUILD_DIR
|
||||
git clone https://github.com/ofiwg/libfabric.git -b v1.19.x
|
||||
cd libfabric
|
||||
./autogen.sh
|
||||
./configure --prefix=$OFI_DIR \
|
||||
--with-rocr=/opt/rocm
|
||||
make -j $(nproc)
|
||||
make install
|
||||
```
|
||||
|
||||
### Installing Open MPI with libfabric support
|
||||
|
||||
These are the steps to build Open MPI with libfabric:
|
||||
|
||||
```shell
|
||||
export OMPI_DIR=$INSTALL_DIR/ompi
|
||||
cd $BUILD_DIR
|
||||
git clone --recursive https://github.com/open-mpi/ompi.git \
|
||||
-b v5.0.x
|
||||
cd ompi
|
||||
./autogen.pl
|
||||
mkdir build
|
||||
cd build
|
||||
../configure --prefix=$OMPI_DIR --with-ofi=$OFI_DIR \
|
||||
--with-rocm=/opt/rocm
|
||||
make -j $(nproc)
|
||||
make install
|
||||
```
|
||||
|
||||
### ROCm-aware OSU with Open MPI and libfabric
|
||||
|
||||
Compiling a ROCm-aware version of the OSU benchmarks with Open MPI and
|
||||
libfabric is identical to steps laid out in the section [ROCm-enabled
|
||||
OSU](#rocm-enabled-osu).
|
||||
|
||||
Running an OSU benchmark using multiple nodes requires the following
|
||||
steps:
|
||||
|
||||
```shell
|
||||
export LD_LIBRARY_PATH=$OMPI_DIR/lib:$OFI_DIR/lib64:/opt/rocm/lib
|
||||
$OMPI_DIR/bin/mpirun -np 2 \
|
||||
./c/mpi/pt2pt/standard/osu_bw D D
|
||||
```
|
||||
|
||||
### Notes
|
||||
|
||||
When using Open MPI v5.0.x with libfabric support, shared memory
|
||||
communication between processes on the same node will go through the
|
||||
*ob1/sm* component. While this component has fundamental support for
|
||||
GPU memory, it will accomplish this in case of ROCm devices by using a
|
||||
staging host buffer. Consequently, the performance of
|
||||
device-to-device shared memory communication will be lower than the
|
||||
theoretical peak performance of the GPU to GPU interconnect would
|
||||
allow.
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# ROCm API libraries & tools
|
||||
|
||||
::::{grid} 1 2 2 2
|
||||
::::{grid} 1 3 3 3
|
||||
:class-container: rocm-doc-grid
|
||||
|
||||
:::{grid-item-card}
|
||||
|
||||
@@ -81,74 +81,6 @@ subtrees:
|
||||
entries:
|
||||
- file: reference/library-index.md
|
||||
title: API libraries & tools
|
||||
subtrees:
|
||||
- entries:
|
||||
- url: ${project:composable_kernel}
|
||||
title: Composable kernel
|
||||
- url: ${project:hipblas}
|
||||
title: hipBLAS
|
||||
- url: ${project:hipblaslt}
|
||||
title: hipBLASLt
|
||||
- url: ${project:hipcc}
|
||||
title: hipCC
|
||||
- url: ${project:hipcub}
|
||||
title: hipCUB
|
||||
- url: ${project:hipfft}
|
||||
title: hipFFT
|
||||
- url: ${project:hipify}
|
||||
title: HIPIFY
|
||||
- url: ${project:hiprand}
|
||||
title: hipRAND
|
||||
- url: ${project:hip}
|
||||
title: HIP runtime
|
||||
- url: ${project:hipsolver}
|
||||
title: hipSOLVER
|
||||
- url: ${project:hipsparse}
|
||||
title: hipSPARSE
|
||||
- url: ${project:hipsparselt}
|
||||
title: hipSPARSELt
|
||||
- url: ${project:hiptensor}
|
||||
title: hipTensor
|
||||
- url: ${project:miopen}
|
||||
title: MIOpen
|
||||
- url: ${project:amdmigraphx}
|
||||
title: MIGraphX
|
||||
- url: ${project:rccl}
|
||||
title: RCCL
|
||||
- url: ${project:rocalution}
|
||||
title: rocALUTION
|
||||
- url: ${project:rocblas}
|
||||
title: rocBLAS
|
||||
- url: ${project:rocdbgapi}
|
||||
title: ROCdbgapi
|
||||
- url: ${project:rocfft}
|
||||
title: rocFFT
|
||||
- file: reference/rocmcc.md
|
||||
title: ROCmCC
|
||||
- url: ${project:rdc}
|
||||
title: ROCm Data Center Tool
|
||||
- url: ${project:rocm_smi_lib}
|
||||
title: ROCm SMI LIB
|
||||
- url: ${project:rocmvalidationsuite}
|
||||
title: ROCm validation suite
|
||||
- url: ${project:rocprim}
|
||||
title: rocPRIM
|
||||
- url: ${project:rocprofiler}
|
||||
title: ROCProfiler
|
||||
- url: ${project:rocrand}
|
||||
title: rocRAND
|
||||
- url: ${project:rocsolver}
|
||||
title: rocSOLVER
|
||||
- url: ${project:rocsparse}
|
||||
title: rocSPARSE
|
||||
- url: ${project:rocthrust}
|
||||
title: rocThrust
|
||||
- url: ${project:roctracer}
|
||||
title: rocTracer
|
||||
- url: ${project:rocwmma}
|
||||
title: rocWMMA
|
||||
- url: ${project:transferbench}
|
||||
title: TransferBench
|
||||
|
||||
- caption: Conceptual
|
||||
entries:
|
||||
|
||||
Reference in New Issue
Block a user