Files
ROCm/docs/how_to/system_debugging.md
2023-04-24 11:25:31 -06:00

69 lines
2.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# System Debugging Guide
## ROCm Language and System Level Debug, Flags, and Environment Variables
Kernel options to avoid: the Ethernet port getting renamed every time you change graphics cards, `net.ifnames=0 biosdevname=0`
## ROCr Error Code
- 2 Invalid Dimension
- 4 Invalid Group Memory
- 8 Invalid (or Null) Code
- 32 Invalid Format
- 64 Group is too large
- 128 Out of VGPRs
- 0x80000000 Debug Options
## Command to Dump Firmware Version and Get Linux Kernel Version
`sudo cat /sys/kernel/debug/dri/1/amdgpu_firmware_info`
`uname -a`
## Debug Flags
Debug messages when developing/debugging base ROCm driver. You could enable the printing from `libhsakmt.so` by setting an environment variable, `HSAKMT_DEBUG_LEVEL`. Available debug levels are 3-7. The higher level you set, the more messages will print.
- `export HSAKMT_DEBUG_LEVEL=3` : Only pr_err() prints.
- `export HSAKMT_DEBUG_LEVEL=4` : pr_err() and pr_warn() print.
- `export HSAKMT_DEBUG_LEVEL=5` : We currently do not implement “notice”. Setting to 5 is same as setting to 4.
- `export HSAKMT_DEBUG_LEVEL=6` : pr_err(), pr_warn(), and pr_info print.
- `export HSAKMT_DEBUG_LEVEL=7` : Everything including pr_debug prints.
## ROCr Level Environment Variables for Debug
`HSA_ENABLE_SDMA=0`
`HSA_ENABLE_INTERRUPT=0`
`HSA_SVM_GUARD_PAGES=0`
`HSA_DISABLE_CACHE=1`
## Turn Off Page Retry on GFX9/Vega Devices
`sudo s`
`echo 1 > /sys/module/amdkfd/parameters/noretry`
## HIP Environment Variables 3.x
### OpenCL Debug Flags
`AMD_OCL_WAIT_COMMAND=1 (0 = OFF, 1 = On)`
## PCIe-Debug
Refer to ROCm PCIe Debug, <a href="https://rocmdocs.amd.com/en/latest/Other_Solutions/PCIe-Debug.html#pcie-debug" target="_blank">https://rocmdocs.amd.com/en/latest/Other_Solutions/PCIe-Debug.html#pcie-debug</a>.
For information on how to debug and profile HIP applications, see <a href="https://rocmdocs.amd.com/projects/HIP/en/latest/how_to_guides/debugging.html" target="_blank">https://rocmdocs.amd.com/projects/HIP/en/latest/how_to_guides/debugging.html</a>