Remove gpu-cluster-networking and 'Using MPI' page due to migration to Instinct Docs (#4201)

* remove 'Using MPI' and 'gpu-cluster-networking' sections due to migration to dcgpu * remove gpu-cluster-networking from index page --------- Co-authored-by: Alex Xu <alex.xu@amd.com>
2026-01-09 14:48:06 -05:00 · 2024-12-30 09:39:46 -05:00
parent f76145c2ad
commit 85bd6e98f5
4 changed files with 0 additions and 269 deletions
--- a/docs/data/how-to/gpu-enabled-mpi-1.png
+++ b/docs/data/how-to/gpu-enabled-mpi-1.png
--- a/docs/how-to/gpu-enabled-mpi.rst
+++ b/docs/how-to/gpu-enabled-mpi.rst
@@ -1,264 +0,0 @@
-.. meta::
-   :description: GPU-enabled Message Passing Interface
-   :keywords: Message Passing Interface, MPI, AMD, ROCm
-
-***************************************************************************************************
-GPU-enabled Message Passing Interface
-***************************************************************************************************
-
-The Message Passing Interface (`MPI <https://www.mpi-forum.org>`_) is a standard API for distributed
-and parallel application development that can scale to multi-node clusters. To facilitate the porting of
-applications to clusters with GPUs, ROCm enables various technologies. You can use these
-technologies add GPU pointers to MPI calls and enable ROCm-aware MPI libraries to deliver optimal
-performance for both intra-node and inter-node GPU-to-GPU communication.
-
-The AMD kernel driver exposes remote direct memory access (RDMA) through *PeerDirect* interfaces.
-This allows network interface cards (NICs) to directly read and write to RDMA-capable GPU device
-memory, resulting in high-speed direct memory access (DMA) transfers between GPU and NIC. These
-interfaces are used to optimize inter-node MPI message communication.
-
-The Open MPI project is an open source implementation of the MPI. It's developed and maintained by
-a consortium of academic, research, and industry partners. To compile Open MPI with ROCm support,
-refer to the following sections:
-
-* :ref:`open-mpi-ucx`
-* :ref:`open-mpi-libfabric`
-
-.. _open-mpi-ucx:
-
-ROCm-aware Open MPI on InfiniBand and RoCE networks using UCX
-================================================================
-
-The `Unified Communication Framework <https://www.openucx.org/documentation>`_ (UCX), is an
-open source, cross-platform framework designed to provide a common set of communication
-interfaces for various network programming models and interfaces. UCX uses ROCm technologies to
-implement various network operation primitives. UCX is the standard communication library for
-InfiniBand and RDMA over Converged Ethernet (RoCE) network interconnect. To optimize data
-transfer operations, many MPI libraries, including Open MPI, can leverage UCX internally.
-
-UCX and Open MPI have a compile option to enable ROCm support. To install and configure UCX to compile Open MPI for ROCm, use the following instructions.
-
-1. Set environment variables to install all software components in the same base directory. We use the
-home directory in our example, but you can specify a different location if you want.
-
-    .. code-block:: shell
-
-        export INSTALL_DIR=$HOME/ompi_for_gpu
-        export BUILD_DIR=/tmp/ompi_for_gpu_build
-        mkdir -p $BUILD_DIR
-
-2. Install UCX. To view UCX and ROCm version compatibility, refer to the
-`communication libraries tables <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/3rd-party-support-matrix.html>`_
-
-    .. code-block:: shell
-
-        export UCX_DIR=$INSTALL_DIR/ucx
-        cd $BUILD_DIR
-        git clone https://github.com/openucx/ucx.git -b v1.15.x
-        cd ucx
-        ./autogen.sh
-        mkdir build
-        cd build
-        ../configure -prefix=$UCX_DIR \
-            --with-rocm=/opt/rocm
-        make -j $(nproc)
-        make -j $(nproc) install
-
-3. Install Open MPI.
-
-    .. code-block:: shell
-
-        export OMPI_DIR=$INSTALL_DIR/ompi
-        cd $BUILD_DIR
-        git clone --recursive https://github.com/open-mpi/ompi.git \
-            -b v5.0.x
-        cd ompi
-        ./autogen.pl
-        mkdir build
-        cd build
-        ../configure --prefix=$OMPI_DIR --with-ucx=$UCX_DIR \
-            --with-rocm=/opt/rocm
-        make -j $(nproc)
-        make install
-
-.. _rocm-enabled-osu:
-
-ROCm-enabled OSU benchmarks
---------------------------------------------------------------------------------------------------------------
-
-You can use OSU Micro Benchmarks (OMB) to evaluate the performance of various primitives on
-ROCm-supported AMD GPUs. The ``--enable-rocm`` option exposes this functionality.
-
-.. code-block:: shell
-
-    export OSU_DIR=$INSTALL_DIR/osu
-    cd $BUILD_DIR
-    wget http://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-7.2.tar.gz
-    tar xfz osu-micro-benchmarks-7.2.tar.gz
-    cd osu-micro-benchmarks-7.2
-    ./configure --enable-rocm \
-        --with-rocm=/opt/rocm \
-        CC=$OMPI_DIR/bin/mpicc CXX=$OMPI_DIR/bin/mpicxx \
-        LDFLAGS="-L$OMPI_DIR/lib/ -lmpi -L/opt/rocm/lib/ \
-        $(hipconfig -C) -lamdhip64" CXXFLAGS="-std=c++11"
-    make -j $(nproc)
-
-Intra-node run
----------------------------------------------------------------------------------------------------------------
-
-Before running an Open MPI job, you must set the following environment variables to ensure that
-you're using the correct versions of Open MPI and UCX.
-
-.. code-block:: shell
-
-    export LD_LIBRARY_PATH=$OMPI_DIR/lib:$UCX_DIR/lib:/opt/rocm/lib
-    export PATH=$OMPI_DIR/bin:$PATH
-
-To run the OSU bandwidth benchmark between the first two GPU devices (``GPU 0`` and ``GPU 1``)
-inside the same node, use the following code.
-
-.. code-block:: shell
-
-    $OMPI_DIR/bin/mpirun -np 2 \
-    -x UCX_TLS=sm,self,rocm \
-    --mca pml ucx \
-    ./c/mpi/pt2pt/standard/osu_bw D D
-
-This measures the unidirectional bandwidth from the first device (``GPU 0``) to the second device
-(``GPU 1``). To select specific devices, for example ``GPU 2`` and ``GPU 3``, include the following
-command:
-
-.. code-block:: shell
-
-    export HIP_VISIBLE_DEVICES=2,3
-
-To force using a copy kernel instead of a DMA engine for the data transfer, use the following
-command:
-
-.. code-block:: shell
-
-    export HSA_ENABLE_SDMA=0
-
-The following output shows the effective transfer bandwidth measured for inter-die data transfer
-between ``GPU 2`` and ``GPU 3`` on a system with MI250 GPUs. For messages larger than 67 MB, an effective
-utilization of about 150 GB/sec is achieved:
-
-.. image:: ../data/how-to/gpu-enabled-mpi-1.png
-  :width: 400
-  :alt: Inter-GPU bandwidth for various payload sizes
-
-Collective operations
----------------------------------------------------------------------------------------------------------------
-
-Collective operations on GPU buffers are best handled through the Unified Collective Communication
-(UCC) library component in Open MPI. To accomplish this, you must configure and compile the UCC
-library with ROCm support.
-
-.. note::
-
-    You can verify UCC and ROCm version compatibility using the
-    `communication libraries tables <https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/3rd-party-support-matrix.html>`_
-
-.. code-block:: shell
-
-    export UCC_DIR=$INSTALL_DIR/ucc
-    git clone https://github.com/openucx/ucc.git -b v1.2.x
-    cd ucc
-    ./autogen.sh
-    ./configure --with-rocm=/opt/rocm \
-                --with-ucx=$UCX_DIR   \
-                --prefix=$UCC_DIR
-    make -j && make install
-
-    # Configure and compile Open MPI with UCX, UCC, and ROCm support
-    cd ompi
-    ./configure --with-rocm=/opt/rocm  \
-                --with-ucx=$UCX_DIR    \
-                --with-ucc=$UCC_DIR
-                --prefix=$OMPI_DIR
-
-To use the UCC component with an MPI application, you must set additional parameters:
-
-.. code-block:: shell
-
-    mpirun --mca pml ucx --mca osc ucx \
-       --mca coll_ucc_enable 1     \
-       --mca coll_ucc_priority 100 -np 64 ./my_mpi_app
-
-.. _open-mpi-libfabric:
-
-ROCm-aware Open MPI using libfabric
-================================================================
-
-For network interconnects that are not covered in the previous category, such as HPE Slingshot,
-ROCm-aware communication can often be achieved through the libfabric library. For more information,
-refer to the `libfabric documentation <https://github.com/ofiwg/libfabric/wiki>`_.
-
-.. note::
-
-    When using Open MPI v5.0.x with libfabric support, shared memory communication between
-    processes on the same node goes through the *ob1/sm* component. This component has
-    fundamental support for GPU memory that is, accomplished by using a staging host buffer
-    Consequently, the performance of device-to-device shared memory communication is lower than
-    the theoretical peak performance allowed by the GPU-to-GPU interconnect.
-
-1.	Install libfabric. Note that libfabric is often pre-installed. To determine if it's already installed, run:
-
-    .. code-block:: shell
-
-        module avail libfabric
-
-    Alternatively, you can download and compile libfabric with ROCm support. Note that not all
-    components required to support some networks (e.g., HPE Slingshot) are available in the open source
-    repository. Therefore, using a pre-installed libfabric library is strongly recommended over compiling
-    libfabric manually.
-
-    If a pre-compiled libfabric library is available on your system, you can skip the following step.
-
-2.	Compile libfabric with ROCm support.
-
-    .. code-block:: shell
-
-        export OFI_DIR=$INSTALL_DIR/ofi
-        cd $BUILD_DIR
-        git clone https://github.com/ofiwg/libfabric.git -b v1.19.x
-        cd libfabric
-        ./autogen.sh
-        ./configure --prefix=$OFI_DIR   \
-                    --with-rocr=/opt/rocm
-        make -j $(nproc)
-        make install
-
-Installing Open MPI with libfabric support
----------------------------------------------------------------------------------------------------------------
-
-To build Open MPI with libfabric, use the following code:
-
-.. code-block:: shell
-
-    export OMPI_DIR=$INSTALL_DIR/ompi
-    cd $BUILD_DIR
-    git clone --recursive https://github.com/open-mpi/ompi.git \
-        -b v5.0.x
-    cd ompi
-    ./autogen.pl
-    mkdir build
-    cd build
-    ../configure --prefix=$OMPI_DIR --with-ofi=$OFI_DIR \
-                    --with-rocm=/opt/rocm
-    make -j $(nproc)
-    make install
-
-ROCm-aware OSU with Open MPI and libfabric
----------------------------------------------------------------------------------------------------------------
-
-Compiling a ROCm-aware version of OSU benchmarks with Open MPI and libfabric uses the same
-process described in :ref:`rocm-enabled-osu`.
-
-To run an OSU benchmark using multiple nodes, use the following code:
-
-.. code-block:: shell
-
-    export LD_LIBRARY_PATH=$OMPI_DIR/lib:$OFI_DIR/lib64:/opt/rocm/lib
-    $OMPI_DIR/bin/mpirun --mca pml ob1 --mca btl_ofi_mode 2 -np 2 \
-    ./c/mpi/pt2pt/standard/osu_bw D D
--- a/docs/index.md
+++ b/docs/index.md
@@ -42,7 +42,6 @@ ROCm documentation is organized into the following categories:
 * [Fine-tune LLMs and inference optimization](./how-to/llm-fine-tuning-optimization/index.rst)
 * [System optimization](./how-to/system-optimization/index.rst)
 * [AMD Instinct MI300X performance validation and tuning](./how-to/tuning-guides/mi300x/index.rst)
-* [GPU cluster networking](https://dcgpu.docs.amd.com/projects/gpu-cluster-networking/en/latest/index.html)
 * [System debugging](./how-to/system-debugging.md)
 * [Use MPI](./how-to/gpu-enabled-mpi.rst)
 * [Use advanced compiler features](./conceptual/compiler-topics.md)
--- a/docs/sphinx/_toc.yml.in
+++ b/docs/sphinx/_toc.yml.in
@@ -94,10 +94,6 @@ subtrees:
        title: System tuning
      - file: how-to/tuning-guides/mi300x/workload.rst
        title: Workload tuning
-  - url: https://dcgpu.docs.amd.com/projects/gpu-cluster-networking/en/latest/index.html
-    title: GPU cluster networking
-  - file: how-to/gpu-enabled-mpi.rst
-    title: Use MPI
  - file: how-to/system-debugging.md
  - file: conceptual/compiler-topics.md
    title: Use advanced compiler features