mirror of
https://github.com/ROCm/ROCm.git
synced 2026-02-04 11:25:03 -05:00
69 lines
2.0 KiB
Markdown
69 lines
2.0 KiB
Markdown
# System Debugging Guide
|
||
|
||
## ROCm Language and System Level Debug, Flags, and Environment Variables
|
||
|
||
Kernel options to avoid: the Ethernet port getting renamed every time you change graphics cards, `net.ifnames=0 biosdevname=0`
|
||
|
||
## ROCr Error Code
|
||
|
||
- 2 Invalid Dimension
|
||
|
||
- 4 Invalid Group Memory
|
||
|
||
- 8 Invalid (or Null) Code
|
||
|
||
- 32 Invalid Format
|
||
|
||
- 64 Group is too large
|
||
|
||
- 128 Out of VGPRs
|
||
|
||
- 0x80000000 Debug Options
|
||
|
||
## Command to Dump Firmware Version and Get Linux Kernel Version
|
||
|
||
`sudo cat /sys/kernel/debug/dri/1/amdgpu_firmware_info`
|
||
|
||
`uname -a`
|
||
|
||
## Debug Flags
|
||
|
||
Debug messages when developing/debugging base ROCm driver. You could enable the printing from `libhsakmt.so` by setting an environment variable, `HSAKMT_DEBUG_LEVEL`. Available debug levels are 3-7. The higher level you set, the more messages will print.
|
||
|
||
- `export HSAKMT_DEBUG_LEVEL=3` : Only pr_err() prints.
|
||
|
||
- `export HSAKMT_DEBUG_LEVEL=4` : pr_err() and pr_warn() print.
|
||
|
||
- `export HSAKMT_DEBUG_LEVEL=5` : We currently do not implement “notice”. Setting to 5 is same as setting to 4.
|
||
|
||
- `export HSAKMT_DEBUG_LEVEL=6` : pr_err(), pr_warn(), and pr_info print.
|
||
|
||
- `export HSAKMT_DEBUG_LEVEL=7` : Everything including pr_debug prints.
|
||
|
||
## ROCr Level Environment Variables for Debug
|
||
|
||
`HSA_ENABLE_SDMA=0`
|
||
|
||
`HSA_ENABLE_INTERRUPT=0`
|
||
|
||
`HSA_SVM_GUARD_PAGES=0`
|
||
|
||
`HSA_DISABLE_CACHE=1`
|
||
|
||
## Turn Off Page Retry on GFX9/Vega Devices
|
||
|
||
`sudo –s`
|
||
|
||
`echo 1 > /sys/module/amdkfd/parameters/noretry`
|
||
|
||
## HIP Environment Variables 3.x
|
||
|
||
### OpenCL Debug Flags
|
||
|
||
`AMD_OCL_WAIT_COMMAND=1 (0 = OFF, 1 = On)`
|
||
|
||
## PCIe-Debug
|
||
|
||
Refer to ROCm PCIe Debug, <a href="https://rocmdocs.amd.com/en/latest/Other_Solutions/PCIe-Debug.html#pcie-debug" target="_blank">https://rocmdocs.amd.com/en/latest/Other_Solutions/PCIe-Debug.html#pcie-debug</a>.
|
||
For information on how to debug and profile HIP applications, see <a href="https://rocmdocs.amd.com/projects/HIP/en/latest/how_to_guides/debugging.html" target="_blank">https://rocmdocs.amd.com/projects/HIP/en/latest/how_to_guides/debugging.html</a>
|