Add FBGEMM/FBGEMM_GPU to the Model acceleration libraries page (#3659)

* Add FBGEMM/FBGEMM_GPU to the Model acceleration libraries page

* Add words to wordlist and fix a typo

* Add new sections for Docker and testing

* Incorporate comments from the external review

* Some minor edits and clarifications

* Incorporate further review coments and fix test section

* Add comment to test section

* Change git clone command for FBGEMM repo

* Change Docker command

* Changes from internal review

* Fix linting issue
This commit is contained in:
Jeffrey Novotny
2024-09-09 11:20:50 -04:00
committed by GitHub
parent 35b2822c68
commit 4992db3e6c
2 changed files with 291 additions and 0 deletions

View File

@@ -251,3 +251,287 @@ page describes the options.
Learn more about optimizing kernels with TunableOp in
:ref:`Optimizing Triton kernels <mi300x-tunableop>`.
FBGEMM and FBGEMM_GPU
=====================
FBGEMM (Facebook General Matrix Multiplication) is a low-precision, high-performance CPU kernel library
for matrix-matrix multiplications and convolutions. It is used for server-side inference
and as a back end for PyTorch quantized operators. FBGEMM offers optimized on-CPU performance for reduced precision calculations,
strong performance on native tensor formats, and the ability to generate
high-performance shape- and size-specific kernels at runtime.
FBGEMM_GPU collects several high-performance PyTorch GPU operator libraries
for use in training and inference. It provides efficient table-batched embedding functionality,
data layout transformation, and quantization support.
For more information about FBGEMM and FBGEMM_GPU, see the `PyTorch FBGEMM GitHub <https://github.com/pytorch/FBGEMM>`_
and the `PyTorch FBGEMM documentation <https://pytorch.org/FBGEMM/>`_.
The `Meta blog post about FBGEMM <https://engineering.fb.com/2018/11/07/ml-applications/fbgemm/>`_ provides
additional background about the library.
Installing FBGEMM_GPU
----------------------
Installing FBGEMM_GPU consists of the following steps:
* Set up an isolated Miniconda environment
* Install ROCm using Docker or the :doc:`package manager <rocm-install-on-linux:install/native-install/index>`
* Install the nightly `PyTorch <https://pytorch.org/>`_ build
* Complete the pre-build and build tasks
.. note::
FBGEMM_GPU doesn't require the installation of FBGEMM. To optionally install
FBGEMM, see the `FBGEMM install instructions <https://pytorch.org/FBGEMM/fbgemm-development/BuildInstructions.html>`_.
Set up the Miniconda environment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To install Miniconda, use the following commands.
#. Install a `Miniconda environment <https://docs.anaconda.com/miniconda/>`_ for reproducible builds.
All subsequent commands run inside this environment.
.. code-block:: shell
export PLATFORM_NAME="$(uname -s)-$(uname -m)"
# Set the Miniconda prefix directory
miniconda_prefix=$HOME/miniconda
# Download the Miniconda installer
wget -q "https://repo.anaconda.com/miniconda/Miniconda3-latest-${PLATFORM_NAME}.sh" -O miniconda.sh
# Run the installer
bash miniconda.sh -b -p "$miniconda_prefix" -u
# Load the shortcuts
. ~/.bashrc
# Run updates
conda update -n base -c defaults -y conda
#. Create a Miniconda environment with Python 3.12:
.. code-block:: shell
env_name=<ENV NAME>
python_version=3.12
# Create the environment
conda create -y --name ${env_name} python="${python_version}"
# Upgrade PIP and pyOpenSSL package
conda run -n ${env_name} pip install --upgrade pip
conda run -n ${env_name} python -m pip install pyOpenSSL>22.1.0
#. Install additional build tools:
.. code-block:: shell
conda install -n ${env_name} -y \
click \
cmake \
hypothesis \
jinja2 \
make \
ncurses \
ninja \
numpy \
scikit-build \
wheel
Install the ROCm components
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FBGEMM_GPU can run in a ROCm Docker container or in conjunction with the full ROCm installation.
The Docker method is recommended because it requires fewer steps and provides a stable environment.
To run FBGEMM_GPU in the Docker container, pull the `Minimal Docker image for ROCm <https://hub.docker.com/r/rocm/rocm-terminal>`_.
This image includes all preinstalled ROCm packages required to integrate FBGEMM. To pull
and run the ROCm Docker image, use this command:
.. code-block:: shell
# Run for ROCm 6.2.0
docker run -it --network=host --shm-size 16G --device=/dev/kfd --device=/dev/dri --group-add video \
--cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ipc=host rocm/rocm-terminal:6.2 /bin/bash
.. note::
The `Full Docker image for ROCm <https://hub.docker.com/r/rocm/dev-ubuntu-20.04>`_, which includes all
ROCm packages, can also be used. However, it results in a very large container, so the minimal
Docker image is recommended.
You can also install ROCm using the package manager. FBGEMM_GPU requires the installation of the full ROCm package.
For more information, see :doc:`the ROCm installation guide <rocm-install-on-linux:install/detailed-install>`.
The ROCm package also requires the :doc:`MIOpen <miopen:index>` component as a dependency.
To install MIOpen, use the ``apt install`` command.
.. code-block:: shell
apt install hipify-clang miopen-hip miopen-hip-dev
Install PyTorch
^^^^^^^^^^^^^^^^^^^^^^^
Install `PyTorch <https://pytorch.org/>`_ using ``pip`` for the most reliable and consistent results.
#. Install the nightly PyTorch build using ``pip``.
.. code-block:: shell
# Install the latest nightly, ROCm variant
conda run -n ${env_name} pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/rocm6.2/
#. Ensure PyTorch loads correctly. Verify the version and variant of the installation using an ``import`` test.
.. code-block:: shell
# Ensure that the package loads properly
conda run -n ${env_name} python -c "import torch.distributed"
# Verify the version and variant of the installation
conda run -n ${env_name} python -c "import torch; print(torch.__version__)"
Perform the prebuild and build
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#. Clone the FBGEMM repository and the relevant submodules. Use ``pip`` to install the
components in ``requirements.txt``. Run the following commands inside the Miniconda environment.
.. code-block:: shell
# Select a version tag
FBGEMM_VERSION=v0.8.0
# Clone the repo along with its submodules
git clone https://github.com/pytorch/FBGEMM.git --branch=v0.8.0 --recursive fbgemm_${FBGEMM_VERSION}
# Install additional required packages for building and testing
cd fbgemm_${FBGEMM_VERSION}/fbgemm_gpu
pip install requirements.txt
#. Clear the build cache to remove stale build information.
.. code-block:: shell
# !! Run in fbgemm_gpu/ directory inside the Conda environment !!
python setup.py clean
#. Set the wheel build variables, including the package name, Python version tag, and Python platform name.
.. code-block:: shell
# Set the package name depending on the build variant
export package_name=fbgemm_gpu_rocm
# Set the Python version tag. It should follow the convention `py<major><minor>`,
# for example, Python 3.12 --> py312
export python_tag=py312
# Determine the processor architecture
export ARCH=$(uname -m)
# Set the Python platform name for the Linux case
export python_plat_name="manylinux2014_${ARCH}"
#. Build FBGEMM_GPU for the ROCm platform. Set ``ROCM_PATH`` to the path to your ROCm installation.
Run these commands from the ``fbgemm_gpu/`` directory inside the Miniconda environment.
.. code-block:: shell
# !! Run in the fbgemm_gpu/ directory inside the Conda environment !!
export ROCM_PATH=</path/to/rocm>
# Build for the target architecture of the ROCm device installed on the machine (for example, 'gfx942;gfx90a')
# See :doc:`The Linux system requirements <../../reference/system-requirements>` for a list of supported GPUs.
export PYTORCH_ROCM_ARCH=$(${ROCM_PATH}/bin/rocminfo | grep -o -m 1 'gfx.*')
# Build the wheel artifact only
python setup.py bdist_wheel \
--package_variant=rocm \
--python-tag="${python_tag}" \
--plat-name="${python_plat_name}" \
-DHIP_ROOT_DIR="${ROCM_PATH}" \
-DCMAKE_C_FLAGS="-DTORCH_USE_HIP_DSA" \
-DCMAKE_CXX_FLAGS="-DTORCH_USE_HIP_DSA"
# Build and install the library into the Conda environment
python setup.py install \
--package_variant=rocm \
-DHIP_ROOT_DIR="${ROCM_PATH}" \
-DCMAKE_C_FLAGS="-DTORCH_USE_HIP_DSA" \
-DCMAKE_CXX_FLAGS="-DTORCH_USE_HIP_DSA"
Post-build validation
----------------------
After building FBGEMM_GPU, run some verification checks to ensure the build is correct. Continue
to run all commands inside the ``fbgemm_gpu/`` directory inside the Miniconda environment.
#. The build process generates many build artifacts and C++ templates, so
it is important to confirm no undefined symbols remain.
.. code-block:: shell
# !! Run in fbgemm_gpu/ directory inside the Conda environment !!
# Locate the built .SO file
fbgemm_gpu_lib_path=$(find . -name fbgemm_gpu_py.so)
# Check that the undefined symbols don't include fbgemm_gpu-defined functions
nm -gDCu "${fbgemm_gpu_lib_path}" | sort
#. Verify the referenced version number of ``GLIBCXX`` and the presence of certain function symbols:
.. code-block:: shell
# !! Run in fbgemm_gpu/ directory inside the Conda environment !!
# Locate the built .SO file
fbgemm_gpu_lib_path=$(find . -name fbgemm_gpu_py.so)
# Note the versions of GLIBCXX referenced by the .SO
# The libstdc++.so.6 available on the install target must support these versions
objdump -TC "${fbgemm_gpu_lib_path}" | grep GLIBCXX | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat
# Test for the existence of a given function symbol in the .SO
nm -gDC "${fbgemm_gpu_lib_path}" | grep " fbgemm_gpu::merge_pooled_embeddings("
nm -gDC "${fbgemm_gpu_lib_path}" | grep " fbgemm_gpu::jagged_2d_to_dense("
Testing FBGEMM
----------------------
FBGEMM includes tests and benchmarks to validate performance. To run these tests,
you must use ROCm 5.7 or a more recent version on the host and container. To run FBGEMM tests,
follow these instructions:
.. code-block:: shell
# !! Run inside the Conda environment !!
# From the /fbgemm_gpu/ directory
cd test
export FBGEMM_TEST_WITH_ROCM=1
# Enable for debugging failed kernel executions
export HIP_LAUNCH_BLOCKING=1
# Run the test
python -m pytest -v -rsx -s -W ignore::pytest.PytestCollectionWarning split_table_batched_embeddings_test.py
To run the FBGEMM_GPU ``uvm`` test, use these commands. These tests only support the AMD MI210 and
more recent accelerators.
.. code-block:: shell
# Run this inside the Conda environment from the /fbgemm_gpu/ directory
export HSA_XNACK=1
cd test
python -m pytest -v -rsx -s -W ignore::pytest.PytestCollectionWarning ./uvm/uvm_test.py