Left nav updates (#2647)

* update gpu-enabled-mpi update the documentation to also include libfabric based network interconnects, not just UCX. * add some technical terms to wordlist * shorten left nav * grid updates --------- Co-authored-by: Edgar Gabriel <Edgar.Gabriel@amd.com> Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
2026-01-08 22:28:06 -05:00 · 2023-11-24 07:15:10 -07:00
parent 0d6fc80070
commit 4adaff02a6
4 changed files with 118 additions and 88 deletions
--- a/.wordlist.txt
+++ b/.wordlist.txt
@@ -200,6 +200,7 @@ hipSPARSELt
 hipTensor
 HPC
 HPCG
+HPE
 HPL
 HSA
 hsa
@@ -245,6 +246,7 @@ KVM
 LAPACK
 LCLK
 LDS
+libfabric
 libjpeg
 libs
 linearized
@@ -383,6 +385,7 @@ Rickle
 roadmap
 roc
 ROC
+RoCE
 rocAL
 rocALUTION
 rocalution
@@ -451,6 +454,7 @@ SKUs
 skylake
 sL
 SLES
+sm
 SMEM
 SMI
 smi
--- a/docs/how-to/gpu-enabled-mpi.md
+++ b/docs/how-to/gpu-enabled-mpi.md
@@ -53,14 +53,14 @@ The following sequences of build commands assume either the ROCmCC or the AOMP
 compiler is active in the environment, which will execute the commands.
 ```

-## Install UCX
+### Installing UCX

 The next step is to set up UCX by compiling its source code and install it:

 ```shell
 export UCX_DIR=$INSTALL_DIR/ucx
 cd $BUILD_DIR
-git clone https://github.com/openucx/ucx.git -b v1.14.1
+git clone https://github.com/openucx/ucx.git -b v1.15.x
 cd ucx
 ./autogen.sh
 mkdir build
@@ -74,7 +74,7 @@ make -j $(nproc) install
 The [communication libraries tables](../reference/library-index.md)
 documents the compatibility of UCX versions with ROCm versions.

-## Install Open MPI
+### Installing Open MPI

 These are the steps to build Open MPI:

@@ -90,12 +90,12 @@ cd build
 ../configure --prefix=$OMPI_DIR --with-ucx=$UCX_DIR \
    --with-rocm=/opt/rocm
 make -j $(nproc)
-make -j $(nproc) install
+make install
 ```

-## ROCm-enabled OSU
+### ROCm-enabled OSU

-The OSU Micro Benchmarks v5.9 (OMB) can be used to evaluate the performance of
+The OSU Micro Benchmarks (OMB) can be used to evaluate the performance of
 various primitives with an AMD GPU device and ROCm support. This functionality
 is exposed when configured with `--enable-rocm` option. We can use the following
 steps to compile OMB:
@@ -103,10 +103,10 @@ steps to compile OMB:
 ```shell
 export OSU_DIR=$INSTALL_DIR/osu
 cd $BUILD_DIR
-wget http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-5.9.tar.gz
-tar xfz osu-micro-benchmarks-5.9.tar.gz
-cd osu-micro-benchmarks-5.9
-./configure --prefix=$INSTALL_DIR/osu --enable-rocm \
+wget http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-7.2.tar.gz
+tar xfz osu-micro-benchmarks-7.2.tar.gz
+cd osu-micro-benchmarks-7.2
+./configure --enable-rocm \
    --with-rocm=/opt/rocm \
    CC=$OMPI_DIR/bin/mpicc CXX=$OMPI_DIR/bin/mpicxx \
    LDFLAGS="-L$OMPI_DIR/lib/ -lmpi -L/opt/rocm/lib/ \
@@ -114,7 +114,7 @@ cd osu-micro-benchmarks-5.9
 make -j $(nproc)
 ```

-## Intra-node run
+### Intra-node run

 Before running an Open MPI job, it is essential to set some environment variables to
 ensure that the correct version of Open MPI and UCX is being used.
@@ -125,31 +125,35 @@ export PATH=$OMPI_DIR/bin:$PATH
 ```

 The following command runs the OSU bandwidth benchmark between the first two GPU
-devices (i.e., GPU 0 and GPU 1, same OAM) by default inside the same node. It
+devices (i.e., GPU 0 and GPU 1) by default inside the same node. It
 measures the unidirectional bandwidth from the first device to the other.

 ```shell
 $OMPI_DIR/bin/mpirun -np 2               \
   -x UCX_TLS=sm,self,rocm               \
-   --mca pml ucx mpi/pt2pt/osu_bw -d rocm D D
+   --mca pml ucx                         \
+   ./c/mpi/pt2pt/standard/osu_bw D D
 ```

 To select different devices, for example 2 and 3, use the following command:

 ```shell
 export HIP_VISIBLE_DEVICES=2,3
+```
+
+To force using a copy kernel instead of a DMA engine for the data transfer, use the following command:
+
+```shell
 export HSA_ENABLE_SDMA=0
 ```

 The following output shows the effective transfer bandwidth measured for
-inter-die data transfer between GPU device 2 and 3 (same OAM). For messages
-larger than 67MB, an effective utilization of about 150GB/sec is achieved, which
-corresponds to 75% of the peak transfer bandwidth of 200GB/sec for that
-connection:
+inter-die data transfer between GPU device 2 and 3 on a system with MI250 GPUs. For messages
+larger than 67MB, an effective utilization of about 150GB/sec is achieved:

 ![OSU execution showing transfer bandwidth increasing alongside payload increase](../data/how-to/gpu-enabled-mpi-1.png "Inter-GPU bandwidth with various payload sizes")

-## Collective operations
+### Collective operations

 Collective Operations on GPU buffers are best handled through the
 Unified Collective Communication Library (UCC) component in Open MPI.
@@ -164,8 +168,9 @@ is shown below:

 ```shell
 export UCC_DIR=$INSTALL_DIR/ucc
-git clone https://github.com/openucx/ucc.git
+git clone https://github.com/openucx/ucc.git -b v1.2.x
 cd ucc
+./autogen.sh
 ./configure --with-rocm=/opt/rocm \
            --with-ucx=$UCX_DIR   \
            --prefix=$UCC_DIR
@@ -187,3 +192,92 @@ mpirun --mca pml ucx --mca osc ucx \
       --mca coll_ucc_enable 1     \
       --mca coll_ucc_priority 100 -np 64 ./my_mpi_app
 ```
+
+## ROCm-aware Open MPI using libfabric
+
+For network interconnects that are not covered in the previous category,
+such as HPE Slingshot, ROCm-aware communication can often be
+achieved through the libfabric library. For more details on
+libfabric, please refer to its
+[documentation](https://github.com/ofiwg/libfabric/wiki).
+
+### Installing libfabric
+
+In many instances libfabric is already pre-installed on a system. Please verify
+using e.g.
+
+```shell
+module avail libfabric
+```
+
+the availability of the libfabric library on your system.
+
+Alternatively, one can also download and compile libfabric with ROCm
+support. Note however that not all components required to
+support e.g. HPE Slingshot networks are available in the open source
+repository. Therefore, using a pre-installed libfabric library is strongly
+preferred over compiling libfabric yourself.
+
+If a pre-compiled libfabric library is available on your system,
+please skip the subsequent steps and go to [Installing Open MPI
+with libfabric support](#installing-open-mpi-with-libfabric-support).
+
+Compiling libfabric with ROCm support can be achieved with the following
+steps:
+
+```shell
+export OFI_DIR=$INSTALL_DIR/ofi
+cd $BUILD_DIR
+git clone https://github.com/ofiwg/libfabric.git -b v1.19.x
+cd libfabric
+./autogen.sh
+./configure --prefix=$OFI_DIR   \
+            --with-rocr=/opt/rocm
+make -j $(nproc)
+make install
+```
+
+### Installing Open MPI with libfabric support
+
+These are the steps to build Open MPI with libfabric:
+
+```shell
+export OMPI_DIR=$INSTALL_DIR/ompi
+cd $BUILD_DIR
+git clone --recursive https://github.com/open-mpi/ompi.git \
+    -b v5.0.x
+cd ompi
+./autogen.pl
+mkdir build
+cd build
+../configure --prefix=$OMPI_DIR --with-ofi=$OFI_DIR \
+             --with-rocm=/opt/rocm
+make -j $(nproc)
+make install
+```
+
+### ROCm-aware OSU with Open MPI and libfabric
+
+Compiling a ROCm-aware version of the OSU benchmarks with Open MPI and
+libfabric is identical to steps laid out in the section [ROCm-enabled
+OSU](#rocm-enabled-osu).
+
+Running an OSU benchmark using multiple nodes requires the following
+steps:
+
+```shell
+export LD_LIBRARY_PATH=$OMPI_DIR/lib:$OFI_DIR/lib64:/opt/rocm/lib
+$OMPI_DIR/bin/mpirun -np 2               \
+   ./c/mpi/pt2pt/standard/osu_bw D D
+```
+
+### Notes
+
+When using Open MPI v5.0.x with libfabric support, shared memory
+communication between processes on the same node will go through the
+*ob1/sm* component.  While this component has fundamental support for
+GPU memory, it will accomplish this in case of ROCm devices by using a
+staging host buffer. Consequently, the performance of
+device-to-device shared memory communication will be lower than the
+theoretical peak performance of the GPU to GPU interconnect would
+allow.
--- a/docs/reference/library-index.md
+++ b/docs/reference/library-index.md
@@ -1,6 +1,6 @@
 # ROCm API libraries & tools

-::::{grid} 1 2 2 2
+::::{grid} 1 3 3 3
 :class-container: rocm-doc-grid

 :::{grid-item-card}
--- a/docs/sphinx/_toc.yml.in
+++ b/docs/sphinx/_toc.yml.in
@@ -81,74 +81,6 @@ subtrees:
  entries:
    - file: reference/library-index.md
      title: API libraries & tools
-      subtrees:
-      - entries:
-        - url: ${project:composable_kernel}
-          title: Composable kernel
-        - url: ${project:hipblas}
-          title: hipBLAS
-        - url: ${project:hipblaslt}
-          title: hipBLASLt
-        - url: ${project:hipcc}
-          title: hipCC
-        - url: ${project:hipcub}
-          title: hipCUB
-        - url: ${project:hipfft}
-          title: hipFFT
-        - url: ${project:hipify}
-          title: HIPIFY
-        - url: ${project:hiprand}
-          title: hipRAND
-        - url: ${project:hip}
-          title: HIP runtime
-        - url: ${project:hipsolver}
-          title: hipSOLVER
-        - url: ${project:hipsparse}
-          title: hipSPARSE
-        - url: ${project:hipsparselt}
-          title: hipSPARSELt
-        - url: ${project:hiptensor}
-          title: hipTensor
-        - url: ${project:miopen}
-          title: MIOpen
-        - url: ${project:amdmigraphx}
-          title: MIGraphX
-        - url: ${project:rccl}
-          title: RCCL
-        - url: ${project:rocalution}
-          title: rocALUTION
-        - url: ${project:rocblas}
-          title: rocBLAS
-        - url: ${project:rocdbgapi}
-          title: ROCdbgapi
-        - url: ${project:rocfft}
-          title: rocFFT
-        - file: reference/rocmcc.md
-          title: ROCmCC
-        - url: ${project:rdc}
-          title: ROCm Data Center Tool
-        - url: ${project:rocm_smi_lib}
-          title: ROCm SMI LIB
-        - url: ${project:rocmvalidationsuite}
-          title: ROCm validation suite
-        - url: ${project:rocprim}
-          title: rocPRIM
-        - url: ${project:rocprofiler}
-          title: ROCProfiler
-        - url: ${project:rocrand}
-          title: rocRAND
-        - url: ${project:rocsolver}
-          title: rocSOLVER
-        - url: ${project:rocsparse}
-          title: rocSPARSE
-        - url: ${project:rocthrust}
-          title: rocThrust
-        - url: ${project:roctracer}
-          title: rocTracer
-        - url: ${project:rocwmma}
-          title: rocWMMA
-        - url: ${project:transferbench}
-          title: TransferBench

 - caption: Conceptual
  entries: