mirror of https://github.com/ROCm/ROCm.git synced 2026-02-04 19:35:02 -05:00

Files

Li Li 983987aab5 Update deep learning guide (#2124 )

* add deep learning guide

* seperate out oprimization, reference, and troubleshooting as standalone sections.

* resolve lint errors

* delete introduction to DL

* correct syntax highlights and filename

* remove out-of-date QAs

* Renaming and cleanup

* Spelling

* Fixup TOC

---------

Co-authored-by: Nara Prasetya <nara@streamhpc.com>
Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>

2023-05-24 16:04:30 -06:00

1.8 KiB

Raw Blame History

Troubleshooting

Q: What do I do if I get this error when trying to run PyTorch:

hipErrorNoBinaryForGPU: Unable to find code object for all current devices!

Ans: The error denotes that the installation of PyTorch and/or other dependencies or libraries do not support the current GPU.

Workaround:

To implement a workaround, follow these steps:

Confirm that the hardware supports the ROCm stack. Refer to the Hardware and Software Support document at https://docs.amd.com.
Determine the gfx target.
```
rocminfo | grep gfx
```

Check if PyTorch is compiled with the correct gfx target.

TORCHDIR=$( dirname $( python3 -c 'import torch; print(torch.__file__)' ) )
roc-obj-ls -v $TORCHDIR/lib/libtorch_hip.so # check for gfx target

:::{note} Recompile PyTorch with the right gfx target if compiling from the source if the hardware is not supported. For wheels or Docker installation, contact ROCm support ¹. :::

Q: Why am I unable to access Docker or GPU in user accounts?

Ans: Ensure that the user is added to docker, video, and render Linux groups as described in the ROCm Installation Guide at https://docs.amd.com.

Q: Can I install PyTorch directly on bare metal?

Ans: Bare-metal installation of PyTorch is supported through wheels. Refer to Option 2: Install PyTorch Using Wheels Package in the section Installing PyTorch of this guide for more information.

Q: How do I profile PyTorch workloads?

Ans: Use the PyTorch Profiler to profile GPU kernels on ROCm.

AMD, "ROCm issues," [Online]. Available: https://github.com/RadeonOpenCompute/ROCm/issues ↩︎

1.8 KiB Raw Blame History

Troubleshooting

1.8 KiB

Raw Blame History