mirror of
https://github.com/ROCm/ROCm.git
synced 2026-02-04 03:15:28 -05:00
Add FBGEMM/FBGEMM_GPU to the Model acceleration libraries page (#3659)
* Add FBGEMM/FBGEMM_GPU to the Model acceleration libraries page * Add words to wordlist and fix a typo * Add new sections for Docker and testing * Incorporate comments from the external review * Some minor edits and clarifications * Incorporate further review coments and fix test section * Add comment to test section * Change git clone command for FBGEMM repo * Change Docker command * Changes from internal review * Fix linting issue
This commit is contained in:
@@ -251,3 +251,287 @@ page describes the options.
|
||||
|
||||
Learn more about optimizing kernels with TunableOp in
|
||||
:ref:`Optimizing Triton kernels <mi300x-tunableop>`.
|
||||
|
||||
|
||||
FBGEMM and FBGEMM_GPU
|
||||
=====================
|
||||
|
||||
FBGEMM (Facebook General Matrix Multiplication) is a low-precision, high-performance CPU kernel library
|
||||
for matrix-matrix multiplications and convolutions. It is used for server-side inference
|
||||
and as a back end for PyTorch quantized operators. FBGEMM offers optimized on-CPU performance for reduced precision calculations,
|
||||
strong performance on native tensor formats, and the ability to generate
|
||||
high-performance shape- and size-specific kernels at runtime.
|
||||
|
||||
FBGEMM_GPU collects several high-performance PyTorch GPU operator libraries
|
||||
for use in training and inference. It provides efficient table-batched embedding functionality,
|
||||
data layout transformation, and quantization support.
|
||||
|
||||
For more information about FBGEMM and FBGEMM_GPU, see the `PyTorch FBGEMM GitHub <https://github.com/pytorch/FBGEMM>`_
|
||||
and the `PyTorch FBGEMM documentation <https://pytorch.org/FBGEMM/>`_.
|
||||
The `Meta blog post about FBGEMM <https://engineering.fb.com/2018/11/07/ml-applications/fbgemm/>`_ provides
|
||||
additional background about the library.
|
||||
|
||||
Installing FBGEMM_GPU
|
||||
----------------------
|
||||
|
||||
Installing FBGEMM_GPU consists of the following steps:
|
||||
|
||||
* Set up an isolated Miniconda environment
|
||||
* Install ROCm using Docker or the :doc:`package manager <rocm-install-on-linux:install/native-install/index>`
|
||||
* Install the nightly `PyTorch <https://pytorch.org/>`_ build
|
||||
* Complete the pre-build and build tasks
|
||||
|
||||
.. note::
|
||||
|
||||
FBGEMM_GPU doesn't require the installation of FBGEMM. To optionally install
|
||||
FBGEMM, see the `FBGEMM install instructions <https://pytorch.org/FBGEMM/fbgemm-development/BuildInstructions.html>`_.
|
||||
|
||||
Set up the Miniconda environment
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
To install Miniconda, use the following commands.
|
||||
|
||||
#. Install a `Miniconda environment <https://docs.anaconda.com/miniconda/>`_ for reproducible builds.
|
||||
All subsequent commands run inside this environment.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
export PLATFORM_NAME="$(uname -s)-$(uname -m)"
|
||||
|
||||
# Set the Miniconda prefix directory
|
||||
miniconda_prefix=$HOME/miniconda
|
||||
|
||||
# Download the Miniconda installer
|
||||
wget -q "https://repo.anaconda.com/miniconda/Miniconda3-latest-${PLATFORM_NAME}.sh" -O miniconda.sh
|
||||
|
||||
# Run the installer
|
||||
bash miniconda.sh -b -p "$miniconda_prefix" -u
|
||||
|
||||
# Load the shortcuts
|
||||
. ~/.bashrc
|
||||
|
||||
# Run updates
|
||||
conda update -n base -c defaults -y conda
|
||||
|
||||
#. Create a Miniconda environment with Python 3.12:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
env_name=<ENV NAME>
|
||||
python_version=3.12
|
||||
|
||||
# Create the environment
|
||||
conda create -y --name ${env_name} python="${python_version}"
|
||||
|
||||
# Upgrade PIP and pyOpenSSL package
|
||||
conda run -n ${env_name} pip install --upgrade pip
|
||||
conda run -n ${env_name} python -m pip install pyOpenSSL>22.1.0
|
||||
|
||||
#. Install additional build tools:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
conda install -n ${env_name} -y \
|
||||
click \
|
||||
cmake \
|
||||
hypothesis \
|
||||
jinja2 \
|
||||
make \
|
||||
ncurses \
|
||||
ninja \
|
||||
numpy \
|
||||
scikit-build \
|
||||
wheel
|
||||
|
||||
Install the ROCm components
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
FBGEMM_GPU can run in a ROCm Docker container or in conjunction with the full ROCm installation.
|
||||
The Docker method is recommended because it requires fewer steps and provides a stable environment.
|
||||
|
||||
To run FBGEMM_GPU in the Docker container, pull the `Minimal Docker image for ROCm <https://hub.docker.com/r/rocm/rocm-terminal>`_.
|
||||
This image includes all preinstalled ROCm packages required to integrate FBGEMM. To pull
|
||||
and run the ROCm Docker image, use this command:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# Run for ROCm 6.2.0
|
||||
docker run -it --network=host --shm-size 16G --device=/dev/kfd --device=/dev/dri --group-add video \
|
||||
--cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ipc=host rocm/rocm-terminal:6.2 /bin/bash
|
||||
|
||||
.. note::
|
||||
|
||||
The `Full Docker image for ROCm <https://hub.docker.com/r/rocm/dev-ubuntu-20.04>`_, which includes all
|
||||
ROCm packages, can also be used. However, it results in a very large container, so the minimal
|
||||
Docker image is recommended.
|
||||
|
||||
You can also install ROCm using the package manager. FBGEMM_GPU requires the installation of the full ROCm package.
|
||||
For more information, see :doc:`the ROCm installation guide <rocm-install-on-linux:install/detailed-install>`.
|
||||
The ROCm package also requires the :doc:`MIOpen <miopen:index>` component as a dependency.
|
||||
To install MIOpen, use the ``apt install`` command.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
apt install hipify-clang miopen-hip miopen-hip-dev
|
||||
|
||||
Install PyTorch
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Install `PyTorch <https://pytorch.org/>`_ using ``pip`` for the most reliable and consistent results.
|
||||
|
||||
#. Install the nightly PyTorch build using ``pip``.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# Install the latest nightly, ROCm variant
|
||||
conda run -n ${env_name} pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/rocm6.2/
|
||||
|
||||
#. Ensure PyTorch loads correctly. Verify the version and variant of the installation using an ``import`` test.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# Ensure that the package loads properly
|
||||
conda run -n ${env_name} python -c "import torch.distributed"
|
||||
|
||||
# Verify the version and variant of the installation
|
||||
conda run -n ${env_name} python -c "import torch; print(torch.__version__)"
|
||||
|
||||
Perform the prebuild and build
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
#. Clone the FBGEMM repository and the relevant submodules. Use ``pip`` to install the
|
||||
components in ``requirements.txt``. Run the following commands inside the Miniconda environment.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# Select a version tag
|
||||
FBGEMM_VERSION=v0.8.0
|
||||
|
||||
# Clone the repo along with its submodules
|
||||
git clone https://github.com/pytorch/FBGEMM.git --branch=v0.8.0 --recursive fbgemm_${FBGEMM_VERSION}
|
||||
|
||||
# Install additional required packages for building and testing
|
||||
cd fbgemm_${FBGEMM_VERSION}/fbgemm_gpu
|
||||
pip install requirements.txt
|
||||
|
||||
#. Clear the build cache to remove stale build information.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# !! Run in fbgemm_gpu/ directory inside the Conda environment !!
|
||||
|
||||
python setup.py clean
|
||||
|
||||
#. Set the wheel build variables, including the package name, Python version tag, and Python platform name.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# Set the package name depending on the build variant
|
||||
export package_name=fbgemm_gpu_rocm
|
||||
|
||||
# Set the Python version tag. It should follow the convention `py<major><minor>`,
|
||||
# for example, Python 3.12 --> py312
|
||||
export python_tag=py312
|
||||
|
||||
# Determine the processor architecture
|
||||
export ARCH=$(uname -m)
|
||||
|
||||
# Set the Python platform name for the Linux case
|
||||
export python_plat_name="manylinux2014_${ARCH}"
|
||||
|
||||
#. Build FBGEMM_GPU for the ROCm platform. Set ``ROCM_PATH`` to the path to your ROCm installation.
|
||||
Run these commands from the ``fbgemm_gpu/`` directory inside the Miniconda environment.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# !! Run in the fbgemm_gpu/ directory inside the Conda environment !!
|
||||
|
||||
export ROCM_PATH=</path/to/rocm>
|
||||
|
||||
# Build for the target architecture of the ROCm device installed on the machine (for example, 'gfx942;gfx90a')
|
||||
# See :doc:`The Linux system requirements <../../reference/system-requirements>` for a list of supported GPUs.
|
||||
export PYTORCH_ROCM_ARCH=$(${ROCM_PATH}/bin/rocminfo | grep -o -m 1 'gfx.*')
|
||||
|
||||
# Build the wheel artifact only
|
||||
python setup.py bdist_wheel \
|
||||
--package_variant=rocm \
|
||||
--python-tag="${python_tag}" \
|
||||
--plat-name="${python_plat_name}" \
|
||||
-DHIP_ROOT_DIR="${ROCM_PATH}" \
|
||||
-DCMAKE_C_FLAGS="-DTORCH_USE_HIP_DSA" \
|
||||
-DCMAKE_CXX_FLAGS="-DTORCH_USE_HIP_DSA"
|
||||
|
||||
# Build and install the library into the Conda environment
|
||||
python setup.py install \
|
||||
--package_variant=rocm \
|
||||
-DHIP_ROOT_DIR="${ROCM_PATH}" \
|
||||
-DCMAKE_C_FLAGS="-DTORCH_USE_HIP_DSA" \
|
||||
-DCMAKE_CXX_FLAGS="-DTORCH_USE_HIP_DSA"
|
||||
|
||||
Post-build validation
|
||||
----------------------
|
||||
|
||||
After building FBGEMM_GPU, run some verification checks to ensure the build is correct. Continue
|
||||
to run all commands inside the ``fbgemm_gpu/`` directory inside the Miniconda environment.
|
||||
|
||||
#. The build process generates many build artifacts and C++ templates, so
|
||||
it is important to confirm no undefined symbols remain.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# !! Run in fbgemm_gpu/ directory inside the Conda environment !!
|
||||
|
||||
# Locate the built .SO file
|
||||
fbgemm_gpu_lib_path=$(find . -name fbgemm_gpu_py.so)
|
||||
|
||||
# Check that the undefined symbols don't include fbgemm_gpu-defined functions
|
||||
nm -gDCu "${fbgemm_gpu_lib_path}" | sort
|
||||
|
||||
#. Verify the referenced version number of ``GLIBCXX`` and the presence of certain function symbols:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# !! Run in fbgemm_gpu/ directory inside the Conda environment !!
|
||||
|
||||
# Locate the built .SO file
|
||||
fbgemm_gpu_lib_path=$(find . -name fbgemm_gpu_py.so)
|
||||
|
||||
# Note the versions of GLIBCXX referenced by the .SO
|
||||
# The libstdc++.so.6 available on the install target must support these versions
|
||||
objdump -TC "${fbgemm_gpu_lib_path}" | grep GLIBCXX | sed 's/.*GLIBCXX_\([.0-9]*\).*/GLIBCXX_\1/g' | sort -Vu | cat
|
||||
|
||||
# Test for the existence of a given function symbol in the .SO
|
||||
nm -gDC "${fbgemm_gpu_lib_path}" | grep " fbgemm_gpu::merge_pooled_embeddings("
|
||||
nm -gDC "${fbgemm_gpu_lib_path}" | grep " fbgemm_gpu::jagged_2d_to_dense("
|
||||
|
||||
Testing FBGEMM
|
||||
----------------------
|
||||
|
||||
FBGEMM includes tests and benchmarks to validate performance. To run these tests,
|
||||
you must use ROCm 5.7 or a more recent version on the host and container. To run FBGEMM tests,
|
||||
follow these instructions:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# !! Run inside the Conda environment !!
|
||||
|
||||
# From the /fbgemm_gpu/ directory
|
||||
cd test
|
||||
|
||||
export FBGEMM_TEST_WITH_ROCM=1
|
||||
# Enable for debugging failed kernel executions
|
||||
export HIP_LAUNCH_BLOCKING=1
|
||||
|
||||
# Run the test
|
||||
python -m pytest -v -rsx -s -W ignore::pytest.PytestCollectionWarning split_table_batched_embeddings_test.py
|
||||
|
||||
To run the FBGEMM_GPU ``uvm`` test, use these commands. These tests only support the AMD MI210 and
|
||||
more recent accelerators.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
# Run this inside the Conda environment from the /fbgemm_gpu/ directory
|
||||
export HSA_XNACK=1
|
||||
cd test
|
||||
|
||||
python -m pytest -v -rsx -s -W ignore::pytest.PytestCollectionWarning ./uvm/uvm_test.py
|
||||
|
||||
Reference in New Issue
Block a user