mirror of
https://github.com/ROCm/ROCm.git
synced 2026-04-27 03:01:52 -04:00
* Create issue_retrieval.yml I am tasked with adding a GitHub action to process incoming GitHub issues. The AMD GitHub admin team asked me to try out one of their runners and to do so, I need to load in a workflow file. * changed group to ROCM-Ubuntu * Added a field to specify project number This action receives an org name and project number and adds issues to it using this information * Update issue_retrieval.yml * Update issue_retrieval.yml * Generate release notes for 6.0.1 from autotag script (#2790) * Update CONTRIBUTING.md (#2791) * Update CONTRIBUTING.md * Fixed link to licensing document Also, changed to use relative links for internal files. * Revert "Update CONTRIBUTING.md" (#2795) * Text change to direct PRs into default branch, since not all repos have develop branch * add keywords (#2799) * Update issue_retrieval.yml * ci(default.xml): Add hipBLASLt to manifest (#2796) * Deleting issue_report.yml in favor of a global issue template placed in ROCm/.github (#2803) * Delete .github/ISSUE_TEMPLATE/issue_report.yml * Delete .github/ISSUE_TEMPLATE/config.yml * Delete .github/ISSUE_TEMPLATE directory (#2805) * docs(conf.py): Update article info for release page (#2806) * docs(conf.py): Update article info for release page * Update conf.py * Fix typo (#2809) --------- Co-authored-by: abhimeda <138710508+abhimeda@users.noreply.github.com> Co-authored-by: David Galiffi <dgaliffi@amd.com> Co-authored-by: Lisa <lisa.delaney@amd.com> Co-authored-by: Young Hui <young.hui@amd.com> Co-authored-by: yhuiYH <145490163+yhuiYH@users.noreply.github.com>
69 lines
1.8 KiB
Markdown
69 lines
1.8 KiB
Markdown
<head>
|
|
<meta charset="UTF-8">
|
|
<meta name="description" content="System debugging guide">
|
|
<meta name="keywords" content="debug, system-level debug, debug flags, PCIe debug, AMD,
|
|
ROCm">
|
|
</head>
|
|
|
|
# System debugging guide
|
|
|
|
## ROCm language and system-level debug, flags, and environment variables
|
|
|
|
Kernel options to avoid: the Ethernet port getting renamed every time you change graphics cards, `net.ifnames=0 biosdevname=0`
|
|
|
|
## ROCr error code
|
|
|
|
* 2 Invalid Dimension
|
|
* 4 Invalid Group Memory
|
|
* 8 Invalid (or Null) Code
|
|
* 32 Invalid Format
|
|
* 64 Group is too large
|
|
* 128 Out of VGPRs
|
|
* 0x80000000 Debug Options
|
|
|
|
## Command to dump firmware version and get Linux kernel version
|
|
|
|
`sudo cat /sys/kernel/debug/dri/1/amdgpu_firmware_info`
|
|
|
|
`uname -a`
|
|
|
|
## Debug flags
|
|
|
|
Debug messages when developing/debugging base ROCm driver. You could enable the printing from `libhsakmt.so` by setting an environment variable, `HSAKMT_DEBUG_LEVEL`. Available debug levels are 3-7. The higher level you set, the more messages will print.
|
|
|
|
* `export HSAKMT_DEBUG_LEVEL=3` : Only pr_err() prints.
|
|
|
|
* `export HSAKMT_DEBUG_LEVEL=4` : pr_err() and pr_warn() print.
|
|
|
|
* `export HSAKMT_DEBUG_LEVEL=5` : We currently do not implement “notice”. Setting to 5 is same as setting to 4.
|
|
|
|
* `export HSAKMT_DEBUG_LEVEL=6` : pr_err(), pr_warn(), and pr_info print.
|
|
|
|
* `export HSAKMT_DEBUG_LEVEL=7` : Everything including pr_debug prints.
|
|
|
|
## ROCr level environment variables for debug
|
|
|
|
`HSA_ENABLE_SDMA=0`
|
|
|
|
`HSA_ENABLE_INTERRUPT=0`
|
|
|
|
`HSA_SVM_GUARD_PAGES=0`
|
|
|
|
`HSA_DISABLE_CACHE=1`
|
|
|
|
## Turn off page retry on GFX9/Vega devices
|
|
|
|
`sudo -s`
|
|
|
|
`echo 1 > /sys/module/amdkfd/parameters/noretry`
|
|
|
|
## HIP environment variables 3.x
|
|
|
|
### OpenCL debug flags
|
|
|
|
`AMD_OCL_WAIT_COMMAND=1 (0 = OFF, 1 = On)`
|
|
|
|
## PCIe-debug
|
|
|
|
For information on how to debug and profile HIP applications, see {doc}`hip:how_to_guides/debugging`
|