mirror of
https://github.com/ROCm/ROCm.git
synced 2026-02-04 19:35:02 -05:00
421 lines
12 KiB
Markdown
421 lines
12 KiB
Markdown
# PyTorch Installation for ROCm
|
|
|
|
## PyTorch
|
|
|
|
PyTorch is an open-source machine learning Python library, primarily
|
|
differentiated by Tensor computing with GPU acceleration and a type-based
|
|
automatic differentiation. Other advanced features include:
|
|
|
|
* Support for distributed training
|
|
* Native ONNX support
|
|
* C++ front-end
|
|
* The ability to deploy at scale using TorchServe
|
|
* A production-ready deployment mechanism through TorchScript
|
|
|
|
### Installing PyTorch
|
|
|
|
To install ROCm on bare metal, refer to the sections
|
|
[GPU and OS Support (Linux)](../../about/compatibility/linux-support.md) and
|
|
[Compatibility](../../about/compatibility/index.md) for hardware, software and
|
|
3rd-party framework compatibility between ROCm and PyTorch. The recommended
|
|
option to get a PyTorch environment is through Docker. However, installing the
|
|
PyTorch wheels package on bare metal is also supported.
|
|
|
|
#### Option 1 (Recommended): Use Docker Image with PyTorch Pre-Installed
|
|
|
|
Using Docker gives you portability and access to a prebuilt Docker container
|
|
that has been rigorously tested within AMD. This might also save on the
|
|
compilation time and should perform as it did when tested without facing
|
|
potential installation issues.
|
|
|
|
Follow these steps:
|
|
|
|
1. Pull the latest public PyTorch Docker image.
|
|
|
|
```bash
|
|
docker pull rocm/pytorch:latest
|
|
```
|
|
|
|
Optionally, you may download a specific and supported configuration with
|
|
different user-space ROCm versions, PyTorch versions, and supported operating
|
|
systems. To download the PyTorch Docker image, refer to
|
|
[https://hub.docker.com/r/rocm/pytorch](https://hub.docker.com/r/rocm/pytorch).
|
|
|
|
2. Start a Docker container using the downloaded image.
|
|
|
|
```bash
|
|
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest
|
|
```
|
|
|
|
:::{note}
|
|
This will automatically download the image if it does not exist on the host.
|
|
You can also pass the -v argument to mount any data directories from the host
|
|
onto the container.
|
|
:::
|
|
|
|
(install-pytorch-using-wheels)=
|
|
|
|
#### Option 2: Install PyTorch Using Wheels Package
|
|
|
|
PyTorch supports the ROCm platform by providing tested wheels packages. To
|
|
access this feature, refer to
|
|
[https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/)
|
|
and choose the "ROCm" compute platform. The following image is a matrix from <https://pytorch.org/> that illustrates the installation compatibility between ROCm and the PyTorch build.
|
|
|
|
```{figure} ../../data/tutorials/install/magma-install/magma006.png
|
|
:name: installation-matrix-pytorch
|
|
:align: center
|
|
|
|
Installation Matrix from Pytorch
|
|
```
|
|
|
|
To install PyTorch using the wheels package, follow these installation steps:
|
|
|
|
1. Choose one of the following options:
|
|
a. Obtain a base Docker image with the correct user-space ROCm version
|
|
installed from
|
|
[https://hub.docker.com/repository/docker/rocm/dev-ubuntu-20.04](https://hub.docker.com/repository/docker/rocm/dev-ubuntu-20.04).
|
|
|
|
or
|
|
|
|
b. Download a base OS Docker image and install ROCm following the
|
|
installation directions in the [Installation](../../tutorials/install/linux/index) section. ROCm 5.2 is installed in
|
|
this example, as supported by the installation matrix from
|
|
<https://pytorch.org/>.
|
|
|
|
or
|
|
|
|
c. Install on bare metal. Skip to Step 3.
|
|
|
|
```bash
|
|
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/dev-ubuntu-20.04:latest
|
|
```
|
|
|
|
2. Start the Docker container, if not installing on bare metal.
|
|
|
|
```dockerfile
|
|
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/dev-ubuntu-20.04:latest
|
|
```
|
|
|
|
3. Install any dependencies needed for installing the wheels package.
|
|
|
|
```bash
|
|
sudo apt update
|
|
sudo apt install libjpeg-dev python3-dev
|
|
pip3 install wheel setuptools
|
|
```
|
|
|
|
4. Install torch, `torchvision`, and `torchaudio` as specified by the installation
|
|
matrix.
|
|
|
|
:::{note}
|
|
ROCm 5.2 PyTorch wheel in the command below is shown for reference.
|
|
:::
|
|
|
|
```bash
|
|
pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/rocm5.2/
|
|
```
|
|
|
|
#### Option 3: Install PyTorch Using PyTorch ROCm Base Docker Image
|
|
|
|
A prebuilt base Docker image is used to build PyTorch in this option. The base
|
|
Docker has all dependencies installed, including:
|
|
|
|
* ROCm
|
|
* Torchvision
|
|
* Conda packages
|
|
* Compiler toolchain
|
|
|
|
Additionally, a particular environment flag (`BUILD_ENVIRONMENT`) is set, and
|
|
the build scripts utilize that to determine the build environment configuration.
|
|
|
|
Follow these steps:
|
|
|
|
1. Obtain the Docker image.
|
|
|
|
```bash
|
|
docker pull rocm/pytorch:latest-base
|
|
```
|
|
|
|
The above will download the base container, which does not contain PyTorch.
|
|
|
|
2. Start a Docker container using the image.
|
|
|
|
```bash
|
|
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest-base
|
|
```
|
|
|
|
You can also pass the -v argument to mount any data directories from the host
|
|
onto the container.
|
|
|
|
3. Clone the PyTorch repository.
|
|
|
|
```bash
|
|
cd ~
|
|
git clone https://github.com/pytorch/pytorch.git
|
|
cd pytorch
|
|
git submodule update --init --recursive
|
|
```
|
|
|
|
4. Build PyTorch for ROCm.
|
|
|
|
:::{note}
|
|
By default in the `rocm/pytorch:latest-base`, PyTorch builds for these
|
|
architectures simultaneously:
|
|
* gfx900
|
|
* gfx906
|
|
* gfx908
|
|
* gfx90a
|
|
* gfx1030
|
|
:::
|
|
|
|
5. To determine your AMD uarch, run:
|
|
|
|
```bash
|
|
rocminfo | grep gfx
|
|
```
|
|
|
|
6. In the event you want to compile only for your uarch, use:
|
|
|
|
```bash
|
|
export PYTORCH_ROCM_ARCH=<uarch>
|
|
```
|
|
|
|
`<uarch>` is the architecture reported by the `rocminfo` command.
|
|
|
|
7. Build PyTorch using the following command:
|
|
|
|
```bash
|
|
./.jenkins/pytorch/build.sh
|
|
```
|
|
|
|
This will first convert PyTorch sources for HIP compatibility and build the
|
|
PyTorch framework.
|
|
|
|
8. Alternatively, build PyTorch by issuing the following commands:
|
|
|
|
```bash
|
|
python3 tools/amd_build/build_amd.py
|
|
USE_ROCM=1 MAX_JOBS=4 python3 setup.py install --user
|
|
```
|
|
|
|
#### Option 4: Install Using PyTorch Upstream Docker File
|
|
|
|
Instead of using a prebuilt base Docker image, you can build a custom base
|
|
Docker image using scripts from the PyTorch repository. This will utilize a
|
|
standard Docker image from operating system maintainers and install all the
|
|
dependencies required to build PyTorch, including
|
|
|
|
* ROCm
|
|
* Torchvision
|
|
* Conda packages
|
|
* Compiler toolchain
|
|
|
|
Follow these steps:
|
|
|
|
1. Clone the PyTorch repository on the host.
|
|
|
|
```bash
|
|
cd ~
|
|
git clone https://github.com/pytorch/pytorch.git
|
|
cd pytorch
|
|
git submodule update --init --recursive
|
|
```
|
|
|
|
2. Build the PyTorch Docker image.
|
|
|
|
```bash
|
|
cd.circleci/docker
|
|
./build.sh pytorch-linux-bionic-rocm<version>-py3.7
|
|
# eg. ./build.sh pytorch-linux-bionic-rocm3.10-py3.7
|
|
```
|
|
|
|
This should be complete with a message "Successfully build `<image_id>`."
|
|
|
|
3. Start a Docker container using the image:
|
|
|
|
```bash
|
|
docker run -it --cap-add=SYS_PTRACE --security-opt
|
|
seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add
|
|
video --ipc=host --shm-size 8G <image_id>
|
|
```
|
|
|
|
You can also pass -v argument to mount any data directories from the host
|
|
onto the container.
|
|
|
|
4. Clone the PyTorch repository.
|
|
|
|
```bash
|
|
cd ~
|
|
git clone https://github.com/pytorch/pytorch.git
|
|
cd pytorch
|
|
git submodule update --init --recursive
|
|
```
|
|
|
|
5. Build PyTorch for ROCm.
|
|
|
|
:::{note}
|
|
By default in the `rocm/pytorch:latest-base`, PyTorch builds for these
|
|
architectures simultaneously:
|
|
* gfx900
|
|
* gfx906
|
|
* gfx908
|
|
* gfx90a
|
|
* gfx1030
|
|
:::
|
|
|
|
6. To determine your AMD uarch, run:
|
|
|
|
```bash
|
|
rocminfo | grep gfx
|
|
```
|
|
|
|
7. If you want to compile only for your uarch:
|
|
|
|
```bash
|
|
export PYTORCH_ROCM_ARCH=<uarch>
|
|
```
|
|
|
|
`<uarch>` is the architecture reported by the `rocminfo` command.
|
|
|
|
8. Build PyTorch using:
|
|
|
|
```bash
|
|
./.jenkins/pytorch/build.sh
|
|
```
|
|
|
|
This will first convert PyTorch sources to be HIP compatible and then build the
|
|
PyTorch framework.
|
|
|
|
Alternatively, build PyTorch by issuing the following commands:
|
|
|
|
```bash
|
|
python3 tools/amd_build/build_amd.py
|
|
USE_ROCM=1 MAX_JOBS=4 python3 setup.py install --user
|
|
```
|
|
|
|
### Test the PyTorch Installation
|
|
|
|
You can use PyTorch unit tests to validate a PyTorch installation. If using a
|
|
prebuilt PyTorch Docker image from AMD ROCm DockerHub or installing an official
|
|
wheels package, these tests are already run on those configurations.
|
|
Alternatively, you can manually run the unit tests to validate the PyTorch
|
|
installation fully.
|
|
|
|
Follow these steps:
|
|
|
|
1. Test if PyTorch is installed and accessible by importing the torch package in
|
|
Python.
|
|
|
|
:::{note}
|
|
Do not run in the PyTorch git folder.
|
|
:::
|
|
|
|
```bash
|
|
python3 -c 'import torch' 2> /dev/null && echo 'Success' || echo 'Failure'
|
|
```
|
|
|
|
2. Test if the GPU is accessible from PyTorch. In the PyTorch framework,
|
|
`torch.cuda` is a generic mechanism to access the GPU; it will access an AMD
|
|
GPU only if available.
|
|
|
|
```bash
|
|
python3 -c 'import torch; print(torch.cuda.is_available())'
|
|
```
|
|
|
|
3. Run the unit tests to validate the PyTorch installation fully. Run the
|
|
following command from the PyTorch home directory:
|
|
|
|
```bash
|
|
BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT:-rocm} ./.jenkins/pytorch/test.sh
|
|
```
|
|
|
|
This ensures that even for wheel installs in a non-controlled environment,
|
|
the required environment variable will be set to skip certain unit tests for
|
|
ROCm.
|
|
|
|
:::{note}
|
|
Make sure the PyTorch source code is corresponding to the PyTorch wheel or
|
|
installation in the Docker image. Incompatible PyTorch source code might give
|
|
errors when running the unit tests.
|
|
:::
|
|
|
|
This will first install some dependencies, such as a supported `torchvision`
|
|
version for PyTorch. `torchvision` is used in some PyTorch tests for loading
|
|
models. Next, this will run all the unit tests.
|
|
|
|
:::{note}
|
|
Some tests may be skipped, as appropriate, based on your system
|
|
configuration. All features of PyTorch are not supported on ROCm, and the
|
|
tests that evaluate these features are skipped. In addition, depending on the
|
|
host memory, or the number of available GPUs, other tests may be skipped. No
|
|
test should fail if the compilation and installation are correct.
|
|
:::
|
|
|
|
4. Run individual unit tests with the following command:
|
|
|
|
```bash
|
|
PYTORCH_TEST_WITH_ROCM=1 python3 test/test_nn.py --verbose
|
|
```
|
|
|
|
`test_nn.py` can be replaced with any other test set.
|
|
|
|
### Run a Basic PyTorch Example
|
|
|
|
The PyTorch examples repository provides basic examples that exercise the
|
|
functionality of the framework. MNIST (Modified National Institute of Standards
|
|
and Technology) database is a collection of handwritten digits that may be used
|
|
to train a Convolutional Neural Network for handwriting recognition.
|
|
Alternatively, ImageNet is a database of images used to train a network for
|
|
visual object recognition.
|
|
|
|
Follow these steps:
|
|
|
|
1. Clone the PyTorch examples repository.
|
|
|
|
```bash
|
|
git clone https://github.com/pytorch/examples.git
|
|
```
|
|
|
|
2. Run the MNIST example.
|
|
|
|
```bash
|
|
cd examples/mnist
|
|
```
|
|
|
|
3. Follow the instructions in the `README` file in this folder. In this case:
|
|
|
|
```bash
|
|
pip3 install -r requirements.txt
|
|
python3 main.py
|
|
```
|
|
|
|
4. Run the ImageNet example.
|
|
|
|
```bash
|
|
cd examples/imagenet
|
|
```
|
|
|
|
5. Follow the instructions in the `README` file in this folder. In this case:
|
|
|
|
```bash
|
|
pip3 install -r requirements.txt
|
|
python3 main.py
|
|
```
|
|
|
|
## Using MIOpen kdb files with ROCm PyTorch wheels
|
|
|
|
PyTorch uses MIOpen for machine learning primitives. These primitives are compiled into kernels at runtime. Runtime compilation causes a small warm-up phase when starting PyTorch. MIOpen kdb files contain precompiled kernels that can speed up the warm-up phase of an application. More information is available in the {doc}`MIOpeninstallation page <miopen:install>`.
|
|
|
|
MIOpen kdb files can be used with ROCm PyTorch wheels. However, the kdb files need to be placed in a specific location with respect to the PyTorch installation path. A helper script simplifies this task for the user. The script takes in the ROCm version and user's GPU architecture as inputs, and works for Ubuntu and CentOS.
|
|
|
|
Helper script: [install_kdb_files_for_pytorch_wheels.sh](https://raw.githubusercontent.com/wiki/ROCmSoftwarePlatform/pytorch/files/install_kdb_files_for_pytorch_wheels.sh)
|
|
|
|
Usage:
|
|
|
|
After installing ROCm PyTorch wheels:
|
|
|
|
1. [Optional] `export GFX_ARCH=gfx90a`
|
|
2. [Optional] `export ROCM_VERSION=5.5`
|
|
3. `./install_kdb_files_for_pytorch_wheels.sh`
|