mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-10 07:08:08 -05:00
446 lines
14 KiB
Markdown
446 lines
14 KiB
Markdown
# Installing PyTorch for ROCm
|
|
|
|
[PyTorch](https://pytorch.org/) is an open-source tensor library designed for deep learning. PyTorch on
|
|
ROCm provides mixed-precision and large-scale training using our
|
|
[MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen) and
|
|
[RCCL](https://github.com/ROCmSoftwarePlatform/rccl) libraries.
|
|
|
|
To install [PyTorch for ROCm](https://pytorch.org/blog/pytorch-for-amd-rocm-platform-now-available-as-python-package/), you have the following options:
|
|
|
|
* [Use a Docker image with PyTorch pre-installed](#using-a-docker-image-with-pytorch-pre-installed)
|
|
(recommended)
|
|
* [Use a wheels package](#using-a-wheels-package)
|
|
* [Use the PyTorch ROCm base Docker image](#using-the-pytorch-rocm-base-docker-image)
|
|
* [Use the PyTorch upstream Docker file](#using-the-pytorch-upstream-docker-file)
|
|
|
|
For hardware, software, and third-party framework compatibility between ROCm and PyTorch, refer to:
|
|
|
|
* [GPU and OS support (Linux)](../../about/compatibility/linux-support.md)
|
|
* [Compatibility](../../about/compatibility/3rd-party-support-matrix.md)
|
|
|
|
## Using a Docker image with PyTorch pre-installed
|
|
|
|
1. Download the latest public PyTorch Docker image
|
|
([https://hub.docker.com/r/rocm/pytorch](https://hub.docker.com/r/rocm/pytorch)).
|
|
|
|
```bash
|
|
docker pull rocm/pytorch:latest
|
|
```
|
|
|
|
You can also download a specific and supported configuration with different user-space ROCm
|
|
versions, PyTorch versions, and operating systems.
|
|
|
|
2. Start a Docker container using the image.
|
|
|
|
```bash
|
|
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
|
|
--device=/dev/kfd --device=/dev/dri --group-add video \
|
|
--ipc=host --shm-size 8G rocm/pytorch:latest
|
|
```
|
|
|
|
:::{note}
|
|
This will automatically download the image if it does not exist on the host. You can also pass the `-v`
|
|
argument to mount any data directories from the host onto the container.
|
|
:::
|
|
|
|
(install_pytorch_wheels)=
|
|
|
|
## Using a wheels package
|
|
|
|
PyTorch supports the ROCm platform by providing tested wheels packages. To access this feature, go
|
|
to [https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/). In the interactive
|
|
table, choose ROCm from the _Compute Platform_ row.
|
|
|
|
1. Choose one of the following three options:
|
|
|
|
**Option 1:**
|
|
|
|
a. Download a base Docker image with the correct user-space ROCm version.
|
|
| Base OS | Docker image | Link to Docker image|
|
|
|----------------|-----------------------------|----------------|
|
|
| Ubuntu 20.04 | `rocm/dev-ubuntu-20.04` | [https://hub.docker.com/r/rocm/dev-ubuntu-20.04](https://hub.docker.com/r/rocm/dev-ubuntu-20.04)
|
|
| Ubuntu 22.04 | `rocm/dev-ubuntu-22.04` | [https://hub.docker.com/r/rocm/dev-ubuntu-22.04](https://hub.docker.com/r/rocm/dev-ubuntu-22.04)
|
|
| CentOS 7 | `rocm/dev-centos-7` | [https://hub.docker.com/r/rocm/dev-centos-7](https://hub.docker.com/r/rocm/dev-centos-7)
|
|
|
|
b. Pull the selected image.
|
|
|
|
```bash
|
|
docker pull rocm/dev-ubuntu-20.04:latest
|
|
```
|
|
|
|
c. Start a Docker container using the downloaded image.
|
|
|
|
```bash
|
|
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/dev-ubuntu-20.04:latest
|
|
```
|
|
|
|
**Option 2:**
|
|
|
|
Select a base OS Docker image (Check [OS compatibility](../../about/compatibility/linux-support.md))
|
|
|
|
Pull selected base OS image (Ubuntu 20.04 for example)
|
|
|
|
```docker
|
|
docker pull ubuntu:20.04
|
|
```
|
|
|
|
Start a Docker container using the downloaded image
|
|
|
|
```docker
|
|
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video ubuntu:20.04
|
|
```
|
|
|
|
Install ROCm using the directions in the [Installation section](../install/linux/install-options.md).
|
|
|
|
**Option 3:**
|
|
|
|
Install on bare metal. Check [OS compatibility](../../about/compatibility/linux-support.md) and install ROCm using the
|
|
directions in the [Installation section](../install/linux/install-options.md).
|
|
|
|
2. Install the required dependencies for the wheels package.
|
|
|
|
```bash
|
|
sudo apt update
|
|
sudo apt install libjpeg-dev python3-dev python3-pip
|
|
pip3 install wheel setuptools
|
|
```
|
|
|
|
3. Install `torch`, `torchvision`, and `torchaudio`, as specified in the [installation matrix](https://pytorch.org/get-started/locally/).
|
|
|
|
:::{note}
|
|
The following command uses the ROCm 5.6 PyTorch wheel. If you want a different version of ROCm,
|
|
modify the command accordingly.
|
|
:::
|
|
|
|
```bash
|
|
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6/
|
|
```
|
|
|
|
4. (Optional) Use MIOpen kdb files with ROCm PyTorch wheels.
|
|
|
|
PyTorch uses [MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen) for machine learning
|
|
primitives, which are compiled into kernels at runtime. Runtime compilation causes a small warm-up
|
|
phase when starting PyTorch, and MIOpen kdb files contain precompiled kernels that can speed up
|
|
application warm-up phases. For more information, refer to the
|
|
{doc}`MIOpen installation page <miopen:install>`.
|
|
|
|
MIOpen kdb files can be used with ROCm PyTorch wheels. However, the kdb files need to be placed in
|
|
a specific location with respect to the PyTorch installation path. A helper script simplifies this task by
|
|
taking the ROCm version and GPU architecture as inputs. This works for Ubuntu and CentOS.
|
|
|
|
You can download the helper script here:
|
|
[install_kdb_files_for_pytorch_wheels.sh](https://raw.githubusercontent.com/wiki/ROCmSoftwarePlatform/pytorch/files/ install_kdb_files_for_pytorch_wheels.sh), or use:
|
|
|
|
`wget https://raw.githubusercontent.com/wiki/ROCmSoftwarePlatform/pytorch/files/install_kdb_files_for_pytorch_wheels.sh`
|
|
|
|
After installing ROCm PyTorch wheels, run the following code:
|
|
|
|
```bash
|
|
#Optional; replace 'gfx90a' with your architecture
|
|
export GFX_ARCH=gfx90a
|
|
|
|
#Optional
|
|
export ROCM_VERSION=5.5
|
|
|
|
./install_kdb_files_for_pytorch_wheels.sh
|
|
```
|
|
|
|
## Using the PyTorch ROCm base Docker image
|
|
|
|
The pre-built base Docker image has all dependencies installed, including:
|
|
|
|
* ROCm
|
|
* Torchvision
|
|
* Conda packages
|
|
* The compiler toolchain
|
|
|
|
Additionally, a particular environment flag (`BUILD_ENVIRONMENT`) is set, which is used by the build
|
|
scripts to determine the configuration of the build environment.
|
|
|
|
1. Download the Docker image. This is the base image, which does not contain PyTorch.
|
|
|
|
```bash
|
|
docker pull rocm/pytorch:latest-base
|
|
```
|
|
|
|
2. Start a Docker container using the downloaded image.
|
|
|
|
```bash
|
|
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest-base
|
|
```
|
|
|
|
You can also pass the `-v` argument to mount any data directories from the host onto the container.
|
|
|
|
3. Clone the PyTorch repository.
|
|
|
|
```bash
|
|
cd ~
|
|
git clone https://github.com/pytorch/pytorch.git
|
|
cd /pytorch
|
|
git submodule update --init --recursive
|
|
```
|
|
|
|
4. Set ROCm architecture (optional). The Docker image tag is `rocm/pytorch:latest-base`.
|
|
|
|
:::{note}
|
|
By default in the `rocm/pytorch:latest-base` image, PyTorch builds simultaneously for the following
|
|
architectures:
|
|
* gfx900
|
|
* gfx906
|
|
* gfx908
|
|
* gfx90a
|
|
* gfx1030
|
|
:::
|
|
|
|
If you want to compile _only_ for your microarchitecture (uarch), run:
|
|
|
|
```bash
|
|
export PYTORCH_ROCM_ARCH=<uarch>
|
|
```
|
|
|
|
Where `<uarch>` is the architecture reported by the `rocminfo` command.
|
|
|
|
To find your uarch, run:
|
|
|
|
```bash
|
|
rocminfo | grep gfx
|
|
```
|
|
|
|
5. Build PyTorch.
|
|
|
|
```bash
|
|
./.ci/pytorch/build.sh
|
|
```
|
|
|
|
This converts PyTorch sources for
|
|
[HIP compatibility](https://www.amd.com/en/developer/rocm-hub/hip-sdk.html) and builds the
|
|
PyTorch framework.
|
|
|
|
To check if your build is successful, run:
|
|
|
|
```bash
|
|
echo $? # should return 0 if success
|
|
```
|
|
|
|
## Using the PyTorch upstream Docker file
|
|
|
|
If you don't want to use a prebuilt base Docker image, you can build a custom base Docker image
|
|
using scripts from the PyTorch repository. This uses a standard Docker image from operating system
|
|
maintainers and installs all the required dependencies, including:
|
|
|
|
* ROCm
|
|
* Torchvision
|
|
* Conda packages
|
|
* The compiler toolchain
|
|
|
|
1. Clone the PyTorch repository.
|
|
|
|
```bash
|
|
cd ~
|
|
git clone https://github.com/pytorch/pytorch.git
|
|
cd /pytorch
|
|
git submodule update --init --recursive
|
|
```
|
|
|
|
2. Build the PyTorch Docker image.
|
|
|
|
```bash
|
|
cd .ci/docker
|
|
./build.sh pytorch-linux-<os-version>-rocm<rocm-version>-py<python-version> -t rocm/pytorch:build_from_dockerfile
|
|
```
|
|
|
|
Where:
|
|
* `<os-version>`: `ubuntu20.04` (or `focal`), `ubuntu22.04` (or `jammy`), `centos7.5`, or `centos9`
|
|
* `<rocm-version>`: `5.4`, `5.5`, or `5.6`
|
|
* `<python-version>`: `3.8`-`3.11`
|
|
|
|
To verify that your image was successfully created, run:
|
|
|
|
`docker image ls rocm/pytorch:build_from_dockerfile`
|
|
|
|
If successful, the output looks like this:
|
|
|
|
```bash
|
|
REPOSITORY TAG IMAGE ID CREATED SIZE
|
|
rocm/pytorch build_from_dockerfile 17071499be47 2 minutes ago 32.8GB
|
|
```
|
|
|
|
3. Start a Docker container using the image with the mounted PyTorch folder.
|
|
|
|
```bash
|
|
docker run -it --cap-add=SYS_PTRACE --security-opt --user root \
|
|
seccomp=unconfined --device=/dev/kfd --device=/dev/dri \
|
|
--group-add video --ipc=host --shm-size 8G \
|
|
-v ~/pytorch:/pytorch rocm/pytorch:build_from_dockerfile
|
|
```
|
|
|
|
You can also pass the `-v` argument to mount any data directories from the host onto the container.
|
|
|
|
4. Go to the PyTorch directory.
|
|
|
|
```bash
|
|
cd pytorch
|
|
```
|
|
|
|
5. Set ROCm architecture.
|
|
|
|
To determine your AMD architecture, run:
|
|
|
|
```bash
|
|
rocminfo | grep gfx
|
|
```
|
|
|
|
The result looks like this (for `gfx1030` architecture):
|
|
|
|
```bash
|
|
Name: gfx1030
|
|
Name: amdgcn-amd-amdhsa--gfx1030
|
|
```
|
|
|
|
Set the `PYTORCH_ROCM_ARCH` environment variable to specify the architectures you want to
|
|
build PyTorch for.
|
|
|
|
```bash
|
|
export PYTORCH_ROCM_ARCH=<uarch>
|
|
```
|
|
|
|
where `<uarch>` is the architecture reported by the `rocminfo` command.
|
|
|
|
6. Build PyTorch.
|
|
|
|
```bash
|
|
./.ci/pytorch/build.sh
|
|
```
|
|
|
|
This converts PyTorch sources for
|
|
[HIP compatibility](https://www.amd.com/en/developer/rocm-hub/hip-sdk.html) and builds the
|
|
PyTorch framework.
|
|
|
|
To check if your build is successful, run:
|
|
|
|
```bash
|
|
echo $? # should return 0 if success
|
|
```
|
|
|
|
## Testing the PyTorch installation
|
|
|
|
You can use PyTorch unit tests to validate your PyTorch installation. If you used a
|
|
**prebuilt PyTorch Docker image from AMD ROCm DockerHub** or installed an
|
|
**official wheels package**, validation tests are not necessary.
|
|
|
|
If you want to manually run unit tests to validate your PyTorch installation fully, follow these steps:
|
|
|
|
1. Import the torch package in Python to test if PyTorch is installed and accessible.
|
|
|
|
:::{note}
|
|
Do not run the following command in the PyTorch git folder.
|
|
:::
|
|
|
|
```bash
|
|
python3 -c 'import torch' 2> /dev/null && echo 'Success' || echo 'Failure'
|
|
```
|
|
|
|
2. Check if the GPU is accessible from PyTorch. In the PyTorch framework, `torch.cuda` is a generic way
|
|
to access the GPU. This can only access an AMD GPU if one is available.
|
|
|
|
```bash
|
|
python3 -c 'import torch; print(torch.cuda.is_available())'
|
|
```
|
|
|
|
3. Run unit tests to validate the PyTorch installation fully.
|
|
|
|
:::{note}
|
|
You must run the following command from the PyTorch home directory.
|
|
:::
|
|
|
|
```bash
|
|
PYTORCH_TEST_WITH_ROCM=1 python3 test/run_test.py --verbose \
|
|
--include test_nn test_torch test_cuda test_ops \
|
|
test_unary_ufuncs test_binary_ufuncs test_autograd
|
|
```
|
|
|
|
This command ensures that the required environment variable is set to skip certain unit tests for
|
|
ROCm. This also applies to wheel installs in a non-controlled environment.
|
|
|
|
:::{note}
|
|
Make sure your PyTorch source code corresponds to the PyTorch wheel or the installation in the
|
|
Docker image. Incompatible PyTorch source code can give errors when running unit tests.
|
|
:::
|
|
|
|
Some tests may be skipped, as appropriate, based on your system configuration. ROCm doesn't
|
|
support all PyTorch features; tests that evaluate unsupported features are skipped. Other tests might
|
|
be skipped, depending on the host or GPU memory and the number of available GPUs.
|
|
|
|
If the compilation and installation are correct, all tests will pass.
|
|
|
|
4. Run individual unit tests.
|
|
|
|
```bash
|
|
PYTORCH_TEST_WITH_ROCM=1 python3 test/test_nn.py --verbose
|
|
```
|
|
|
|
You can replace `test_nn.py` with any other test set.
|
|
|
|
## Running a basic PyTorch example
|
|
|
|
The PyTorch examples repository provides basic examples that exercise the functionality of your
|
|
framework.
|
|
|
|
Two of our favorite testing databases are:
|
|
|
|
* **MNIST** (Modified National Institute of Standards and Technology): A database of handwritten
|
|
digits that can be used to train a Convolutional Neural Network for **handwriting recognition**.
|
|
* **ImageNet**: A database of images that can be used to train a network for
|
|
**visual object recognition**.
|
|
|
|
### MNIST PyTorch example
|
|
|
|
1. Clone the PyTorch examples repository.
|
|
|
|
```bash
|
|
git clone https://github.com/pytorch/examples.git
|
|
```
|
|
|
|
2. Go to the MNIST example folder.
|
|
|
|
```bash
|
|
cd examples/mnist
|
|
```
|
|
|
|
3. Follow the instructions in the `README.md`` file in this folder to install the requirements. Then run:
|
|
|
|
```bash
|
|
python3 main.py
|
|
```
|
|
|
|
This generates the following output:
|
|
|
|
```bash
|
|
...
|
|
Train Epoch: 14 [58240/60000 (97%)] Loss: 0.010128
|
|
Train Epoch: 14 [58880/60000 (98%)] Loss: 0.001348
|
|
Train Epoch: 14 [59520/60000 (99%)] Loss: 0.005261
|
|
|
|
Test set: Average loss: 0.0252, Accuracy: 9921/10000 (99%)
|
|
```
|
|
|
|
### ImageNet PyTorch example
|
|
|
|
1. Clone the PyTorch examples repository (if you didn't already do this step in the preceding MNIST example).
|
|
|
|
```bash
|
|
git clone https://github.com/pytorch/examples.git
|
|
```
|
|
|
|
2. Go to the ImageNet example folder.
|
|
|
|
```bash
|
|
cd examples/imagenet
|
|
```
|
|
|
|
3. Follow the instructions in the `README.md` file in this folder to install the Requirements. Then run:
|
|
|
|
```bash
|
|
python3 main.py
|
|
```
|