Compare commits
41 Commits
roc-5.7.x
...
docs/5.3.3
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
9bfe4fcadf | ||
|
|
880e829a74 | ||
|
|
cb89f1503a | ||
|
|
3d06fb7eed | ||
|
|
5d260c543c | ||
|
|
8b40c808c4 | ||
|
|
0260772fa8 | ||
|
|
05896f9ef7 | ||
|
|
cb25cbd517 | ||
|
|
d2cd9e2141 | ||
|
|
3b7d8b102a | ||
|
|
18d5dd866c | ||
|
|
9829be899e | ||
|
|
5cf7fff338 | ||
|
|
7a16e52d52 | ||
|
|
d4dcd6b9e5 | ||
|
|
71a132b926 | ||
|
|
9b8603d58c | ||
|
|
4cb5510bf0 | ||
|
|
676953fe8c | ||
|
|
15292ddebe | ||
|
|
89986d332d | ||
|
|
de6fc1634a | ||
|
|
52986c3635 | ||
|
|
b86717e454 | ||
|
|
7dbd277203 | ||
|
|
31ee8e712c | ||
|
|
55eda666d5 | ||
|
|
01e24da121 | ||
|
|
f68c47d748 | ||
|
|
2c0a351bbd | ||
|
|
b6509809d3 | ||
|
|
a29205cc5c | ||
|
|
16c4d22099 | ||
|
|
ed3335c3a5 | ||
|
|
30f27c4644 | ||
|
|
f4be54f896 | ||
|
|
9c04aef6a5 | ||
|
|
c7d4e75e95 | ||
|
|
aabbea88f2 | ||
|
|
7747e130b9 |
2
.github/CODEOWNERS
vendored
@@ -1 +1 @@
|
||||
* @saadrahim @Rmalavally @amd-aakash @zhang2amd @jlgreathouse @samjwu @MathiasMagnus @LisaDelaney
|
||||
* @saadrahim @Rmalavally @amd-aakash @zhang2amd @jlgreathouse @samjwu @MathiasMagnus
|
||||
|
||||
76
.github/ISSUE_TEMPLATE/0_issue_report.yml
vendored
@@ -1,76 +0,0 @@
|
||||
name: Issue Report
|
||||
description: File a report for something not working correctly.
|
||||
title: "[Issue]: "
|
||||
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
Thank you for taking the time to fill out this report!
|
||||
|
||||
On a Linux system, you can acquire your OS, CPU, GPU, and ROCm version (for filling out this report) with the following commands:
|
||||
echo "OS:" && cat /etc/os-release | grep -E "^(NAME=|VERSION=)";
|
||||
echo "CPU: " && cat /proc/cpuinfo | grep "model name" | sort --unique;
|
||||
echo "GPU:" && /opt/rocm/bin/rocminfo | grep -E "^\s*(Name|Marketing Name)";
|
||||
echo "ROCm in /opt:" && ls -1 /opt | grep -E "rocm-";
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: Problem Description
|
||||
description: Describe the issue you encountered.
|
||||
placeholder: "The steps to reproduce can be included here, or in the dedicated section further below."
|
||||
validations:
|
||||
required: true
|
||||
- type: input
|
||||
attributes:
|
||||
label: Operating System
|
||||
description: What is the name and version number of the OS?
|
||||
placeholder: "e.g. Ubuntu 22.04.3 LTS (Jammy Jellyfish)"
|
||||
validations:
|
||||
required: true
|
||||
- type: input
|
||||
attributes:
|
||||
label: CPU
|
||||
description: What CPU did you encounter the issue on?
|
||||
placeholder: "e.g. AMD Ryzen 9 5900HX with Radeon Graphics"
|
||||
validations:
|
||||
required: true
|
||||
- type: input
|
||||
attributes:
|
||||
label: GPU
|
||||
description: What GPU(s) did you encounter the issue on?
|
||||
placeholder: "e.g. MI200"
|
||||
validations:
|
||||
required: true
|
||||
- type: input
|
||||
attributes:
|
||||
label: ROCm Version
|
||||
description: What version(s) of ROCm did you encounter the issue on?
|
||||
placeholder: "e.g. 5.7.0"
|
||||
validations:
|
||||
required: true
|
||||
- type: input
|
||||
attributes:
|
||||
label: ROCm Component
|
||||
description: (Optional) If this issue relates to a specific ROCm component, it can be mentioned here.
|
||||
placeholder: "e.g. rocBLAS"
|
||||
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: Steps to Reproduce
|
||||
description: (Optional) Detailed steps to reproduce the issue.
|
||||
placeholder: Please also include what you expected to happen, and what actually did, at the failing step(s).
|
||||
validations:
|
||||
required: false
|
||||
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: Output of /opt/rocm/bin/rocminfo --support
|
||||
description: The output of rocminfo --support will help to better address the problem.
|
||||
placeholder: |
|
||||
ROCk module is loaded
|
||||
=====================
|
||||
HSA System Attributes
|
||||
=====================
|
||||
[...]
|
||||
validations:
|
||||
required: true
|
||||
32
.github/ISSUE_TEMPLATE/1_feature_request.yml
vendored
@@ -1,32 +0,0 @@
|
||||
name: Feature Suggestion
|
||||
description: Suggest an additional functionality, or new way of handling an existing functionality.
|
||||
title: "[Feature]: "
|
||||
|
||||
body:
|
||||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
Thank you for taking the time to make a suggestion!
|
||||
|
||||
- type: textarea
|
||||
attributes:
|
||||
label: Suggestion Description
|
||||
description: Describe your suggestion.
|
||||
validations:
|
||||
required: true
|
||||
- type: input
|
||||
attributes:
|
||||
label: Operating System
|
||||
description: (Optional) If this is for a specific OS, you can mention it here.
|
||||
placeholder: "e.g. Ubuntu"
|
||||
- type: input
|
||||
attributes:
|
||||
label: GPU
|
||||
description: (Optional) If this is for a specific GPU or GPU family, you can mention it here.
|
||||
placeholder: "e.g. MI200"
|
||||
- type: input
|
||||
attributes:
|
||||
label: ROCm Component
|
||||
description: (Optional) If this issue relates to a specific ROCm component, it can be mentioned here.
|
||||
placeholder: "e.g. rocBLAS"
|
||||
|
||||
5
.github/ISSUE_TEMPLATE/config.yml
vendored
@@ -1,5 +0,0 @@
|
||||
blank_issues_enabled: false
|
||||
contact_links:
|
||||
- name: ROCm Community Discussions
|
||||
url: https://github.com/RadeonOpenCompute/ROCm/discussions
|
||||
about: Please ask and answer questions here for anything ROCm.
|
||||
1
.github/dependabot.yml
vendored
@@ -10,4 +10,3 @@ updates:
|
||||
open-pull-requests-limit: 10
|
||||
schedule:
|
||||
interval: "daily"
|
||||
versioning-strategy: increase
|
||||
|
||||
50
.github/workflows/linting.yml
vendored
@@ -5,16 +5,52 @@ on:
|
||||
branches:
|
||||
- develop
|
||||
- main
|
||||
- 'docs/*'
|
||||
- 'roc**'
|
||||
pull_request:
|
||||
branches:
|
||||
- develop
|
||||
- main
|
||||
- 'docs/*'
|
||||
- 'roc**'
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.ref }}-${{ github.workflow }}
|
||||
cancel-in-progress: true
|
||||
|
||||
jobs:
|
||||
call-workflow-passing-data:
|
||||
name: Documentation
|
||||
uses: RadeonOpenCompute/rocm-docs-core/.github/workflows/linting.yml@develop
|
||||
lint-rest:
|
||||
name: "RestructuredText"
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v3
|
||||
- name: Install rst-lint
|
||||
run: pip install restructuredtext-lint
|
||||
- name: Lint ResT files
|
||||
run: rst-lint ${{ join(github.workspace, '/docs') }}
|
||||
|
||||
lint-md:
|
||||
name: "Markdown"
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v3
|
||||
- name: Use markdownlint-cli2
|
||||
uses: DavidAnson/markdownlint-cli2-action@v10.0.1
|
||||
with:
|
||||
globs: '**/*.md'
|
||||
|
||||
spelling:
|
||||
name: "Spelling"
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v3
|
||||
- name: Fetch config
|
||||
shell: sh
|
||||
run: |
|
||||
curl --silent --show-error --fail --location https://raw.github.com/RadeonOpenCompute/rocm-docs-core/develop/.spellcheck.yaml -O
|
||||
curl --silent --show-error --fail --location https://raw.github.com/RadeonOpenCompute/rocm-docs-core/develop/.wordlist.txt >> .wordlist.txt
|
||||
- name: Run spellcheck
|
||||
uses: rojopolis/spellcheck-github-actions@0.30.0
|
||||
- name: On fail
|
||||
if: failure()
|
||||
run: |
|
||||
echo "Please check for spelling mistakes or add them to '.wordlist.txt' in either the root of this project or in rocm-docs-core."
|
||||
|
||||
5
.gitignore
vendored
@@ -13,7 +13,6 @@ _doxygen/
|
||||
_readthedocs/
|
||||
|
||||
# avoid duplicating contributing.md due to conf.py
|
||||
docs/contributing.md
|
||||
docs/release.md
|
||||
docs/CHANGELOG.md
|
||||
docs/contribute/index.md
|
||||
docs/about/release-notes.md
|
||||
docs/about/CHANGELOG.md
|
||||
@@ -1,7 +1,5 @@
|
||||
config:
|
||||
default: true
|
||||
MD004:
|
||||
style: asterisk
|
||||
MD013: false
|
||||
MD026:
|
||||
punctuation: '.,;:!'
|
||||
@@ -10,9 +8,7 @@ config:
|
||||
MD033: false
|
||||
MD034: false
|
||||
MD041: false
|
||||
MD051: false
|
||||
ignores:
|
||||
- CHANGELOG.md
|
||||
- docs/CHANGELOG.md
|
||||
- "{,docs/}{RELEASE,release}.md"
|
||||
- tools/autotag/templates/**/*.md
|
||||
|
||||
@@ -3,16 +3,19 @@
|
||||
|
||||
version: 2
|
||||
|
||||
sphinx:
|
||||
configuration: docs/conf.py
|
||||
|
||||
formats: [htmlzip, pdf]
|
||||
build:
|
||||
os: ubuntu-22.04
|
||||
tools:
|
||||
python: "3.10"
|
||||
apt_packages:
|
||||
- "doxygen"
|
||||
- "graphviz" # For dot graphs in doxygen
|
||||
|
||||
python:
|
||||
install:
|
||||
- requirements: docs/sphinx/requirements.txt
|
||||
|
||||
build:
|
||||
os: ubuntu-20.04
|
||||
tools:
|
||||
python: "3.8"
|
||||
sphinx:
|
||||
configuration: docs/conf.py
|
||||
|
||||
formats: []
|
||||
|
||||
571
.wordlist.txt
@@ -1,558 +1,29 @@
|
||||
# isv_deployment_win
|
||||
ABI
|
||||
activations
|
||||
addr
|
||||
AddressSanitizer
|
||||
AlexNet
|
||||
alloc
|
||||
allocator
|
||||
allocators
|
||||
ALU
|
||||
AMD
|
||||
AMDGPU
|
||||
amdgpu
|
||||
AMDGPUs
|
||||
AMDMIGraphX
|
||||
AMI
|
||||
AOCC
|
||||
AOMP
|
||||
api
|
||||
APIC
|
||||
APIs
|
||||
Arb
|
||||
ASan
|
||||
ASIC
|
||||
ASICs
|
||||
ASm
|
||||
atmi
|
||||
atomics
|
||||
autogenerated
|
||||
avx
|
||||
awk
|
||||
backend
|
||||
# gpu_aware_mpi
|
||||
DMA
|
||||
GDR
|
||||
HCA
|
||||
MPI
|
||||
MVAPICH
|
||||
Mellanox's
|
||||
NIC
|
||||
OFED
|
||||
OSU
|
||||
OpenFabrics
|
||||
PeerDirect
|
||||
RDMA
|
||||
UCX
|
||||
ib_core
|
||||
# linear algebra
|
||||
LAPACK
|
||||
MMA
|
||||
backends
|
||||
benchmarking
|
||||
bilinear
|
||||
BitCode
|
||||
BLAS
|
||||
Blit
|
||||
blit
|
||||
BMC
|
||||
buildable
|
||||
bursty
|
||||
bzip
|
||||
cacheable
|
||||
CCD
|
||||
cd
|
||||
CDNA
|
||||
CentOS
|
||||
centric
|
||||
changelog
|
||||
chiplet
|
||||
CIFAR
|
||||
CLI
|
||||
CMake
|
||||
cmake
|
||||
CMakeLists
|
||||
CMakePackage
|
||||
cmd
|
||||
coalescable
|
||||
codename
|
||||
Codespaces
|
||||
comgr
|
||||
Commitizen
|
||||
CommonMark
|
||||
composable
|
||||
concretization
|
||||
Concretized
|
||||
Conda
|
||||
config
|
||||
conformant
|
||||
convolutional
|
||||
convolves
|
||||
CoRR
|
||||
CP
|
||||
CPC
|
||||
CPF
|
||||
CPP
|
||||
CPU
|
||||
CPUs
|
||||
CSC
|
||||
CSE
|
||||
CSn
|
||||
csn
|
||||
CSV
|
||||
CU
|
||||
cuBLAS
|
||||
CUDA
|
||||
cuFFT
|
||||
cuLIB
|
||||
cuRAND
|
||||
CUs
|
||||
cuSOLVER
|
||||
cuSPARSE
|
||||
dataset
|
||||
datasets
|
||||
dataspace
|
||||
datatype
|
||||
datatypes
|
||||
dbgapi
|
||||
de
|
||||
deallocation
|
||||
denormalize
|
||||
Dependabot
|
||||
deserializers
|
||||
detections
|
||||
dev
|
||||
devicelibs
|
||||
# tuning_guides
|
||||
BMC
|
||||
DGEMM
|
||||
disambiguates
|
||||
distro
|
||||
DL
|
||||
DMA
|
||||
DNN
|
||||
DNNL
|
||||
Dockerfile
|
||||
DockerHub
|
||||
Doxygen
|
||||
DPM
|
||||
DRI
|
||||
DW
|
||||
DWORD
|
||||
el
|
||||
enablement
|
||||
endpgm
|
||||
env
|
||||
epilog
|
||||
EPYC
|
||||
ESXi
|
||||
ethernet
|
||||
exascale
|
||||
executables
|
||||
ffmpeg
|
||||
FFT
|
||||
FFTs
|
||||
FHS
|
||||
filesystem
|
||||
Filesystem
|
||||
Flang
|
||||
FMA
|
||||
Fortran
|
||||
fortran
|
||||
FP
|
||||
galb
|
||||
gcc
|
||||
GCD
|
||||
GCDs
|
||||
GCN
|
||||
GDB
|
||||
gdb
|
||||
GDDR
|
||||
GDR
|
||||
GDS
|
||||
GEMM
|
||||
GEMMs
|
||||
gfortran
|
||||
gfx
|
||||
GIM
|
||||
github
|
||||
Gitpod
|
||||
GL
|
||||
GLXT
|
||||
GMI
|
||||
gnupg
|
||||
GPG
|
||||
GPR
|
||||
GPU
|
||||
GPUs
|
||||
grayscale
|
||||
GRBM
|
||||
gzip
|
||||
Haswell
|
||||
HBM
|
||||
HCA
|
||||
heterogenous
|
||||
hipamd
|
||||
hipBLAS
|
||||
hipblas
|
||||
hipBLASLt
|
||||
HIPCC
|
||||
hipCUB
|
||||
hipcub
|
||||
HIPExtension
|
||||
hipFFT
|
||||
hipfft
|
||||
hipfort
|
||||
HIPIFY
|
||||
hipify
|
||||
hipLIB
|
||||
hipRAND
|
||||
hipSOLVER
|
||||
hipsolver
|
||||
hipSPARSE
|
||||
hipsparse
|
||||
hipSPARSELt
|
||||
hipTensor
|
||||
HPC
|
||||
HPCG
|
||||
HPL
|
||||
HSA
|
||||
hsa
|
||||
hsakmt
|
||||
HWE
|
||||
ib_core
|
||||
ICV
|
||||
ImageNet
|
||||
IMDB
|
||||
inband
|
||||
incrementing
|
||||
inferencing
|
||||
InfiniBand
|
||||
inflight
|
||||
init
|
||||
Inlines
|
||||
inlining
|
||||
installable
|
||||
IntelliSense
|
||||
interprocedural
|
||||
Intersphinx
|
||||
intra
|
||||
invariants
|
||||
invocating
|
||||
Ioffe
|
||||
IOMMU
|
||||
IOP
|
||||
IOPM
|
||||
IOV
|
||||
ipo
|
||||
ISA
|
||||
ISV
|
||||
ISVs
|
||||
JSON
|
||||
Jupyter
|
||||
kdb
|
||||
KFD
|
||||
Khronos
|
||||
KVM
|
||||
LAPACK
|
||||
LCLK
|
||||
LDS
|
||||
libjpeg
|
||||
libs
|
||||
linearized
|
||||
linter
|
||||
linux
|
||||
llvm
|
||||
LLVM
|
||||
localscratch
|
||||
logits
|
||||
lossy
|
||||
LSAN
|
||||
LTS
|
||||
Makefile
|
||||
Makefiles
|
||||
matchers
|
||||
Matplotlib
|
||||
Mellanox's
|
||||
MEM
|
||||
MERCHANTABILITY
|
||||
MFMA
|
||||
microarchitecture
|
||||
MIGraphX
|
||||
migraphx
|
||||
MIOpen
|
||||
miopen
|
||||
MIOpenGEMM
|
||||
miopengemm
|
||||
MIVisionX
|
||||
mivisionx
|
||||
mkdir
|
||||
mlirmiopen
|
||||
MMA
|
||||
MNIST
|
||||
MPI
|
||||
MSVC
|
||||
mtypes
|
||||
Multicore
|
||||
Multithreaded
|
||||
MVAPICH
|
||||
mvffr
|
||||
MyEnvironment
|
||||
MyST
|
||||
namespace
|
||||
namespaces
|
||||
Nano
|
||||
Navi
|
||||
NBIO
|
||||
NBIOs
|
||||
NIC
|
||||
NICs
|
||||
Noncoherently
|
||||
NPS
|
||||
NUMA
|
||||
NumPy
|
||||
numref
|
||||
NVCC
|
||||
NVPTX
|
||||
OAM
|
||||
OAMs
|
||||
ocl
|
||||
OCP
|
||||
OEM
|
||||
OFED
|
||||
OMP
|
||||
OMPT
|
||||
OMPX
|
||||
ONNX
|
||||
OpenCL
|
||||
opencl
|
||||
opencv
|
||||
OpenFabrics
|
||||
OpenGL
|
||||
OpenMP
|
||||
openmp
|
||||
openssl
|
||||
OpenVX
|
||||
optimizers
|
||||
os
|
||||
OSS
|
||||
OSU
|
||||
Pageable
|
||||
pageable
|
||||
passthrough
|
||||
PCI
|
||||
PCIe
|
||||
PeerDirect
|
||||
perfcounter
|
||||
Perfetto
|
||||
performant
|
||||
perl
|
||||
PIL
|
||||
PILImage
|
||||
PowerShell
|
||||
pragma
|
||||
pre
|
||||
prebuilt
|
||||
precompiled
|
||||
prefetch
|
||||
preprocess
|
||||
preprocessing
|
||||
preq
|
||||
prerequisites
|
||||
PRNG
|
||||
profiler
|
||||
protobuf
|
||||
PRs
|
||||
pseudorandom
|
||||
py
|
||||
PyPi
|
||||
PyTorch
|
||||
Qcycles
|
||||
quasirandom
|
||||
Radeon
|
||||
RadeonOpenCompute
|
||||
RCCL
|
||||
rccl
|
||||
RDC
|
||||
rdc
|
||||
RDMA
|
||||
RDNA
|
||||
reformats
|
||||
RelWithDebInfo
|
||||
repos
|
||||
Req
|
||||
req
|
||||
resampling
|
||||
RST
|
||||
reStructuredText
|
||||
RHEL
|
||||
Rickle
|
||||
roadmap
|
||||
roc
|
||||
ROC
|
||||
rocAL
|
||||
rocALUTION
|
||||
rocalution
|
||||
rocBLAS
|
||||
rocblas
|
||||
rocclr
|
||||
ROCdbgapi
|
||||
rocFFT
|
||||
rocfft
|
||||
ROCgdb
|
||||
ROCk
|
||||
rocLIB
|
||||
rocm
|
||||
ROCm
|
||||
ROCmCC
|
||||
rocminfo
|
||||
ROCmSoftwarePlatform
|
||||
ROCmValidationSuite
|
||||
rocPRIM
|
||||
rocprim
|
||||
rocprof
|
||||
ROCProfiler
|
||||
rocprofiler
|
||||
ROCr
|
||||
rocr
|
||||
rocRAND
|
||||
rocrand
|
||||
rocSOLVER
|
||||
rocsolver
|
||||
rocSPARSE
|
||||
rocsparse
|
||||
roct
|
||||
rocThrust
|
||||
rocthrust
|
||||
ROCTracer
|
||||
roctracer
|
||||
rocWMMA
|
||||
RST
|
||||
runtime
|
||||
runtimes
|
||||
RW
|
||||
SALU
|
||||
SBIOS
|
||||
SCA
|
||||
scalability
|
||||
SDK
|
||||
SDMA
|
||||
SDRAM
|
||||
SENDMSG
|
||||
sendmsg
|
||||
SENDMSG
|
||||
sendmsg
|
||||
SerDes
|
||||
serializers
|
||||
SGPR
|
||||
SGPRs
|
||||
SHA
|
||||
shader
|
||||
Shlens
|
||||
sigmoid
|
||||
SIGQUIT
|
||||
SIMD
|
||||
SKU
|
||||
SKUs
|
||||
skylake
|
||||
sL
|
||||
SLES
|
||||
SMEM
|
||||
SMI
|
||||
smi
|
||||
SMT
|
||||
softmax
|
||||
Spack
|
||||
spack
|
||||
SPI
|
||||
SQs
|
||||
SRAM
|
||||
SRAMECC
|
||||
src
|
||||
stochastically
|
||||
strided
|
||||
subdirectory
|
||||
subexpression
|
||||
subfolder
|
||||
subfolders
|
||||
supercomputing
|
||||
SWE
|
||||
Szegedy
|
||||
tagram
|
||||
TCA
|
||||
TCC
|
||||
TCI
|
||||
TCIU
|
||||
TCP
|
||||
TCR
|
||||
TensorBoard
|
||||
TensorFlow
|
||||
TFLOPS
|
||||
tg
|
||||
th
|
||||
tmp
|
||||
ToC
|
||||
tokenize
|
||||
toolchain
|
||||
toolchains
|
||||
toolset
|
||||
toolsets
|
||||
TorchAudio
|
||||
TorchScript
|
||||
TorchServe
|
||||
TorchVision
|
||||
torchvision
|
||||
tracebacks
|
||||
TransferBench
|
||||
TrapStatus
|
||||
txt
|
||||
UAC
|
||||
uarch
|
||||
ubuntu
|
||||
UC
|
||||
UCC
|
||||
UCX
|
||||
UIF
|
||||
Uncached
|
||||
uncached
|
||||
Unhandled
|
||||
uninstallation
|
||||
unsqueeze
|
||||
unstacking
|
||||
unswitching
|
||||
untrusted
|
||||
untuned
|
||||
USM
|
||||
UTCL
|
||||
UTIL
|
||||
utils
|
||||
VALU
|
||||
Vanhoucke
|
||||
VBIOS
|
||||
vdi
|
||||
vectorizable
|
||||
vectorization
|
||||
vectorize
|
||||
vectorized
|
||||
vectorizer
|
||||
vectorizes
|
||||
VGPR
|
||||
VGPRs
|
||||
vjxb
|
||||
vL
|
||||
VM
|
||||
VMEM
|
||||
VMWare
|
||||
VRAM
|
||||
VSIX
|
||||
VSkipped
|
||||
Vulkan
|
||||
walkthrough
|
||||
walkthroughs
|
||||
wavefront
|
||||
wavefronts
|
||||
WGP
|
||||
whitespaces
|
||||
Wojna
|
||||
workgroup
|
||||
Workgroups
|
||||
workgroups
|
||||
writeback
|
||||
Writebacks
|
||||
writebacks
|
||||
wrreq
|
||||
WX
|
||||
wzo
|
||||
Xeon
|
||||
XGMI
|
||||
Xnack
|
||||
XT
|
||||
Xteam
|
||||
XTX
|
||||
xz
|
||||
YAML
|
||||
yaml
|
||||
YML
|
||||
YModel
|
||||
ysvmadyb
|
||||
ZenDNN
|
||||
zypper
|
||||
|
||||
3219
CHANGELOG.md
397
CONTRIBUTING.md
@@ -1,229 +1,246 @@
|
||||
# Contributing to ROCm documentation
|
||||
# Contributing to ROCm Docs
|
||||
|
||||
AMD values and encourages contributions to our code and documentation. If you choose to
|
||||
contribute, we encourage you to be polite and respectful. Improving documentation is a long-term
|
||||
process, to which we are dedicated.
|
||||
AMD values and encourages the ROCm community to contribute to our code and
|
||||
documentation. This repository is focused on ROCm documentation and this
|
||||
contribution guide describes the recommend method for creating and modifying our
|
||||
documentation.
|
||||
|
||||
If you have issues when trying to contribute, refer to the
|
||||
[discussions](https://github.com/RadeonOpenCompute/ROCm/discussions) page in our GitHub
|
||||
repository.
|
||||
While interacting with ROCm Documentation, we encourage you to be polite and
|
||||
respectful in your contributions, content or otherwise. Authors, maintainers of
|
||||
these docs act on good intentions and to the best of their knowledge.
|
||||
Keep that in mind while you engage. Should you have issues with contributing
|
||||
itself, refer to
|
||||
[discussions](https://github.com/RadeonOpenCompute/ROCm/discussions) on the
|
||||
GitHub repository.
|
||||
|
||||
## Folder structure and naming convention
|
||||
## Supported Formats
|
||||
|
||||
Our documentation follows the Pitchfork folder structure. Most documentation files are stored in the
|
||||
`/docs` folder. Some special files (such as release, contributing, and changelog) are stored in the root
|
||||
(`/`) folder.
|
||||
Our documentation includes both markdown and rst files. Markdown is encouraged
|
||||
over rst due to the lower barrier to participation. GitHub flavored markdown is preferred
|
||||
for all submissions as it will render accurately on our GitHub repositories. For existing documentation,
|
||||
[MyST](https://myst-parser.readthedocs.io/en/latest/intro.html) markdown
|
||||
is used to implement certain features unsupported in GitHub markdown. This is
|
||||
not encouraged for new documentation. AMD will transition
|
||||
to stricter use of GitHub flavored markdown with a few caveats. ROCm documentation
|
||||
also uses [sphinx-design](https://sphinx-design.readthedocs.io/en/latest/index.html)
|
||||
in our markdown and rst files. We also will use breathe syntax for doxygen documentation
|
||||
in our markdown files. Other design elements for effective HTML rendering of the documents
|
||||
may be added to our markdown files. Please see
|
||||
[GitHub](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github)'s
|
||||
guide on writing and formatting on GitHub as a starting point.
|
||||
|
||||
All images are stored in the `/docs/data` folder. An image's file path mirrors that of the documentation
|
||||
file where it is used.
|
||||
ROCm documentation adds additional requirements to markdown and rst based files
|
||||
as follows:
|
||||
|
||||
Our naming structure uses kebab case; for example, `my-file-name.rst`.
|
||||
- Level one headers are only used for page titles. There must be only one level
|
||||
1 header per file for both Markdown and Restructured Text.
|
||||
- Pass [markdownlint](https://github.com/markdownlint/markdownlint) check via
|
||||
our automated github action on a Pull Request (PR).
|
||||
|
||||
## Supported formats and syntax
|
||||
## Filenames and folder structure
|
||||
|
||||
Our documentation includes both Markdown and RST files. We are gradually transitioning existing
|
||||
Markdown to RST in order to more effectively meet our documentation needs. When contributing,
|
||||
RST is preferred; if you must use Markdown, use GitHub-flavored Markdown.
|
||||
Please use snake case for file names. Our documentation follows pitchfork for
|
||||
folder structure. All documentation is in /docs except for special files like
|
||||
the contributing guide in the / folder. All images used in the documentation are
|
||||
place in the /docs/data folder.
|
||||
|
||||
We use [Sphinx Design](https://sphinx-design.readthedocs.io/en/latest/index.html) syntax and compile
|
||||
our API references using [Doxygen](https://www.doxygen.nl/).
|
||||
## How to provide feedback for for ROCm documentation
|
||||
|
||||
The following table shows some common documentation components and the syntax convention we
|
||||
use for each:
|
||||
There are three standard ways to provide feedback for this repository.
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<th>Component</th>
|
||||
<th>RST syntax</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Code blocks</td>
|
||||
<td>
|
||||
### Pull Request
|
||||
|
||||
```rst
|
||||
All contributions to ROCm documentation should arrive via the
|
||||
[GitHub Flow](https://docs.github.com/en/get-started/quickstart/github-flow)
|
||||
targetting the develop branch of the repository. If you are unable to contribute
|
||||
via the GitHub Flow, feel free to email us. TODO, confirm email address.
|
||||
|
||||
.. code-block:: language-name
|
||||
### GitHub Issue
|
||||
|
||||
My code block.
|
||||
Issues on existing or absent docs can be filed as [GitHub issues
|
||||
](https://github.com/RadeonOpenCompute/ROCm/issues).
|
||||
|
||||
### Email Feedback
|
||||
|
||||
## Language and Style
|
||||
|
||||
Adopting Microsoft CPP-Docs guidelines for [Voice and Tone
|
||||
](https://github.com/MicrosoftDocs/cpp-docs/blob/main/styleguide/voice-tone.md).
|
||||
|
||||
ROCm documentation templates to be made public shortly. ROCm templates dictate
|
||||
the recommended structure and flow of the documentation. Guidelines on how to
|
||||
integrate figures, equations, and tables are all based off
|
||||
[MyST](https://myst-parser.readthedocs.io/en/latest/intro.html).
|
||||
|
||||
Font size and selection, page layout, white space control, and other formatting
|
||||
details are controlled via rocm-docs-core, sphinx extention. Please raise issues
|
||||
in rocm-docs-core for any formatting concerns and changes requested.
|
||||
|
||||
## Building Documentation
|
||||
|
||||
While contributing, one may build the documentation locally on the command-line
|
||||
or rely on Continuous Integration for previewing the resulting HTML pages in a
|
||||
browser.
|
||||
|
||||
### Command line documentation builds
|
||||
|
||||
Python versions known to build documentation:
|
||||
|
||||
- 3.8
|
||||
|
||||
To build the docs locally using Python Virtual Environment (`venv`), execute the
|
||||
following commands from the project root:
|
||||
|
||||
```sh
|
||||
python3 -mvenv .venv
|
||||
# Windows
|
||||
.venv/Scripts/python -m pip install -r docs/sphinx/requirements.txt
|
||||
.venv/Scripts/python -m sphinx -T -E -b html -d _build/doctrees -D language=en docs _build/html
|
||||
# Linux
|
||||
.venv/bin/python -m pip install -r docs/sphinx/requirements.txt
|
||||
.venv/bin/python -m sphinx -T -E -b html -d _build/doctrees -D language=en docs _build/html
|
||||
```
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Cross-referencing internal files</td>
|
||||
<td>
|
||||
Then open up `_build/html/index.html` in your favorite browser.
|
||||
|
||||
```rst
|
||||
### Pull Requests documentation builds
|
||||
|
||||
:doc:`Title <../path/to/file/filename>`
|
||||
When opening a PR to the `develop` branch on GitHub, the page corresponding to
|
||||
the PR (`https://github.com/RadeonOpenCompute/ROCm/pull/<pr_number>`) will have
|
||||
a summary at the bottom. This requires the user be logged in to GitHub.
|
||||
|
||||
```
|
||||
- There, click `Show all checks` and `Details` of the Read the Docs pipeline. It
|
||||
will take you to `https://readthedocs.com/projects/advanced-micro-devices-rocm/
|
||||
builds/<some_build_num>/`
|
||||
- The list of commands shown are the exact ones used by CI to produce a render
|
||||
of the documentation.
|
||||
- There, click on the small blue link `View docs` (which is not the same as the
|
||||
bigger button with the same text). It will take you to the built HTML site with
|
||||
a URL of the form `https://
|
||||
advanced-micro-devices-demo--<pr_number>.com.readthedocs.build/projects/alpha/en
|
||||
/<pr_number>/`.
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>External links</td>
|
||||
<td>
|
||||
### Build the docs using VS Code
|
||||
|
||||
```rst
|
||||
One can put together a productive environment to author documentation and also
|
||||
test it locally using VS Code with only a handful of extensions. Even though the
|
||||
extension landscape of VS Code is ever changing, here is one example setup that
|
||||
proved useful at the time of writing. In it, one can change/add content, build a
|
||||
new version of the docs using a single VS Code Task (or hotkey), see all errors/
|
||||
warnings emitted by Sphinx in the Problems pane and immediately see the
|
||||
resulting website show up on a locally serving web server.
|
||||
|
||||
`link name <URL>`_
|
||||
#### Configuring VS Code
|
||||
|
||||
```
|
||||
1. Install the following extensions:
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<tr>
|
||||
<td>Headings</td>
|
||||
<td>
|
||||
- Python (ms-python.python)
|
||||
- Live Server (ritwickdey.LiveServer)
|
||||
|
||||
```rst
|
||||
2. Add the following entries in `.vscode/settings.json`
|
||||
|
||||
******************
|
||||
Chapter title (H1)
|
||||
******************
|
||||
```json
|
||||
{
|
||||
"liveServer.settings.root": "/.vscode/build/html",
|
||||
"liveServer.settings.wait": 1000,
|
||||
"python.terminal.activateEnvInCurrentTerminal": true
|
||||
}
|
||||
```
|
||||
|
||||
Section title (H2)
|
||||
===============
|
||||
The settings in order are set for the following reasons:
|
||||
- Sets the root of the output website for live previews. Must be changed
|
||||
alongside the `tasks.json` command.
|
||||
- Tells live server to wait with the update to give time for Sphinx to
|
||||
regenerate site contents and not refresh before all is don. (Empirical value)
|
||||
- Automatic virtual env activation is a nice touch, should you want to build
|
||||
the site from the integrated terminal.
|
||||
|
||||
Subsection title (H3)
|
||||
---------------------
|
||||
3. Add the following tasks in `.vscode/tasks.json`
|
||||
|
||||
Sub-subsection title (H4)
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
```json
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"tasks": [
|
||||
{
|
||||
"label": "Build Docs",
|
||||
"type": "process",
|
||||
"windows": {
|
||||
"command": "${workspaceFolder}/.venv/Scripts/python.exe"
|
||||
},
|
||||
"command": "${workspaceFolder}/.venv/bin/python3",
|
||||
"args": [
|
||||
"-m",
|
||||
"sphinx",
|
||||
"-j",
|
||||
"auto",
|
||||
"-T",
|
||||
"-b",
|
||||
"html",
|
||||
"-d",
|
||||
"${workspaceFolder}/.vscode/build/doctrees",
|
||||
"-D",
|
||||
"language=en",
|
||||
"${workspaceFolder}/docs",
|
||||
"${workspaceFolder}/.vscode/build/html"
|
||||
],
|
||||
"problemMatcher": [
|
||||
{
|
||||
"owner": "sphinx",
|
||||
"fileLocation": "absolute",
|
||||
"pattern": {
|
||||
"regexp": "^(?:.*\\.{3}\\s+)?(\\/[^:]*|[a-zA-Z]:\\\\[^:]*):(\\d+):\\s+(WARNING|ERROR):\\s+(.*)$",
|
||||
"file": 1,
|
||||
"line": 2,
|
||||
"severity": 3,
|
||||
"message": 4
|
||||
},
|
||||
},
|
||||
{
|
||||
"owner": "sphinx",
|
||||
"fileLocation": "absolute",
|
||||
"pattern": {
|
||||
"regexp": "^(?:.*\\.{3}\\s+)?(\\/[^:]*|[a-zA-Z]:\\\\[^:]*):{1,2}\\s+(WARNING|ERROR):\\s+(.*)$",
|
||||
"file": 1,
|
||||
"severity": 2,
|
||||
"message": 3
|
||||
}
|
||||
}
|
||||
],
|
||||
"group": {
|
||||
"kind": "build",
|
||||
"isDefault": true
|
||||
}
|
||||
},
|
||||
],
|
||||
}
|
||||
```
|
||||
|
||||
> (Implementation detail: two problem matchers were needed to be defined,
|
||||
> because VS Code doesn't tolerate some problem information being potentially
|
||||
> absent. While a single regex could match all types of errors, if a capture
|
||||
> group remains empty (the line number doesn't show up in all warning/error
|
||||
> messages) but the `pattern` references said empty capture group, VS Code
|
||||
> discards the message completely.)
|
||||
|
||||
```
|
||||
4. Configure Python virtual environment (venv)
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Images</td>
|
||||
<td>
|
||||
- From the Command Palette, run `Python: Create Environment`
|
||||
- Select `venv` environment and the `docs/sphinx/requirements.txt` file.
|
||||
_(Simply pressing enter while hovering over the file from the dropdown is
|
||||
insufficient, one has to select the radio button with the 'Space' key if
|
||||
using the keyboard.)_
|
||||
|
||||
```rst
|
||||
5. Build the docs
|
||||
|
||||
.. image:: image1.png
|
||||
- Launch the default build Task using either:
|
||||
- a hotkey _(default is 'Ctrl+Shift+B')_ or
|
||||
- by issuing the `Tasks: Run Build Task` from the Command Palette.
|
||||
|
||||
```
|
||||
6. Open the live preview
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Internal links</td>
|
||||
<td>
|
||||
- Navigate to the output of the site within VS Code, right-click on
|
||||
`.vscode/build/html/index.html` and select `Open with Live Server`. The
|
||||
contents should update on every rebuild without having to refresh the
|
||||
browser.
|
||||
|
||||
```rst
|
||||
|
||||
1. Add a tag to the section you want to reference:
|
||||
|
||||
.. _my-section-tag: section-1
|
||||
|
||||
Section 1
|
||||
==========
|
||||
|
||||
2. Link to your tag:
|
||||
|
||||
As shown in :ref:`section-1`.
|
||||
|
||||
```
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<tr>
|
||||
<td>Lists</td>
|
||||
<td>
|
||||
|
||||
```rst
|
||||
|
||||
# Ordered (numbered) list item
|
||||
|
||||
* Unordered (bulleted) list item
|
||||
|
||||
```
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<tr>
|
||||
<td>Math (block)</td>
|
||||
<td>
|
||||
|
||||
```rst
|
||||
|
||||
.. math::
|
||||
|
||||
A = \begin{pmatrix}
|
||||
0.0 & 1.0 & 1.0 & 3.0 \\
|
||||
4.0 & 5.0 & 6.0 & 7.0 \\
|
||||
\end{pmatrix}
|
||||
|
||||
```
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Math (inline)</td>
|
||||
<td>
|
||||
|
||||
```rst
|
||||
|
||||
:math:`2 \times 2 `
|
||||
|
||||
```
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Notes</td>
|
||||
<td>
|
||||
|
||||
```rst
|
||||
|
||||
.. note::
|
||||
|
||||
My note here.
|
||||
|
||||
```
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Tables</td>
|
||||
<td>
|
||||
|
||||
```rst
|
||||
|
||||
.. csv-table:: Optional title here
|
||||
:widths: 30, 70 #optional column widths
|
||||
:header: "entry1 header", "entry2 header"
|
||||
|
||||
"entry1", "entry2"
|
||||
|
||||
```
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
## Language and style
|
||||
|
||||
We use the
|
||||
[Google developer documentation style guide](https://developers.google.com/style/highlights) to
|
||||
guide our content.
|
||||
|
||||
Font size and type, page layout, white space control, and other formatting
|
||||
details are controlled via
|
||||
[rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core). If you want to notify us
|
||||
of any formatting issues, create a pull request in our
|
||||
[rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) GitHub repository.
|
||||
|
||||
## Building our documentation
|
||||
|
||||
<!-- % TODO: Fix the link to be able to work at every files -->
|
||||
To learn how to build our documentation, refer to
|
||||
[Building documentation](./building.md).
|
||||
<!-- markdownlint-restore -->
|
||||
|
||||
66
README.md
@@ -1,40 +1,44 @@
|
||||
# AMD ROCm™ platform
|
||||
# AMD ROCm™ Platform
|
||||
|
||||
ROCm is an open-source stack, composed primarily of open-source software, designed for graphics
|
||||
processing unit (GPU) computation. ROCm consists of a collection of drivers, development tools, and
|
||||
APIs that enable GPU programming from low-level kernel to end-user applications.
|
||||
ROCm™ is an open-source stack for GPU computation. ROCm is primarily Open-Source
|
||||
Software (OSS) that allows developers the freedom to customize and tailor their
|
||||
GPU software for their own needs while collaborating with a community of other
|
||||
developers, and helping each other find solutions in an agile, flexible, rapid
|
||||
and secure manner.
|
||||
|
||||
With ROCm, you can customize your GPU software to meet your specific needs. You can develop,
|
||||
collaborate, test, and deploy your applications in a free, open source, integrated, and secure software
|
||||
ecosystem. ROCm is particularly well-suited to GPU-accelerated high-performance computing (HPC),
|
||||
artificial intelligence (AI), scientific computing, and computer aided design (CAD).
|
||||
ROCm is a collection of drivers, development tools and APIs enabling GPU
|
||||
programming from the low-level kernel to end-user applications. ROCm is powered
|
||||
by AMD’s Heterogeneous-computing Interface for Portability (HIP), an OSS C++ GPU
|
||||
programming environment and its corresponding runtime. HIP allows ROCm
|
||||
developers to create portable applications on different platforms by deploying
|
||||
code on a range of platforms, from dedicated gaming GPUs to exascale HPC
|
||||
clusters. ROCm supports programming models such as OpenMP and OpenCL, and
|
||||
includes all the necessary OSS compilers, debuggers and libraries. ROCm is fully
|
||||
integrated into ML frameworks such as PyTorch and TensorFlow. ROCm can be
|
||||
deployed in many ways, including through the use of containers such as Docker,
|
||||
Spack, and your own build from source.
|
||||
|
||||
ROCm is powered by AMD’s
|
||||
[Heterogeneous-computing Interface for Portability (HIP)](https://github.com/ROCm-Developer-Tools/HIP),
|
||||
an open-source software C++ GPU programming environment and its corresponding runtime. HIP
|
||||
allows ROCm developers to create portable applications on different platforms by deploying code on a
|
||||
range of platforms, from dedicated gaming GPUs to exascale HPC clusters.
|
||||
ROCm’s goal is to allow our users to maximize their GPU hardware investment.
|
||||
ROCm is designed to help develop, test and deploy GPU accelerated HPC, AI,
|
||||
scientific computing, CAD, and other applications in a free, open-source,
|
||||
integrated and secure software ecosystem.
|
||||
|
||||
ROCm supports programming models, such as OpenMP and OpenCL, and includes all necessary open
|
||||
source software compilers, debuggers, and libraries. ROCm is fully integrated into machine learning
|
||||
(ML) frameworks, such as PyTorch and TensorFlow.
|
||||
This repository contains the manifest file for ROCm™ releases, changelogs, and
|
||||
release information. The file default.xml contains information for all
|
||||
repositories and the associated commit used to build the current ROCm release.
|
||||
|
||||
## ROCm documentation
|
||||
The default.xml file uses the repo Manifest format.
|
||||
|
||||
This repository contains the manifest file for ROCm releases, changelogs, and release information.
|
||||
The develop branch of this repository contains content for the next
|
||||
ROCm release.
|
||||
|
||||
The `default.xml` file contains information for all repositories and the associated commit used to build
|
||||
the current ROCm release; `default.xml` uses the Manifest Format repository.
|
||||
## ROCm Documentation
|
||||
|
||||
Source code for our documentation is located in the `/docs` folder of most ROCm repositories. The
|
||||
`develop` branch of our repositories contains content for the next ROCm release.
|
||||
ROCm Documentation is available online at
|
||||
[rocm.docs.amd.com](https://rocm.docs.amd.com). Source code for the documenation
|
||||
is located in the docs folder of most repositories that are part of ROCm.
|
||||
|
||||
The ROCm documentation homepage is [rocm.docs.amd.com](https://rocm.docs.amd.com).
|
||||
|
||||
### Building our documentation
|
||||
|
||||
For a quick-start build, use the following code. For more options and detail, refer to
|
||||
[Building documentation](./contribute/building.md).
|
||||
### How to build documentation via Sphinx
|
||||
|
||||
```bash
|
||||
cd docs
|
||||
@@ -44,7 +48,7 @@ pip3 install -r sphinx/requirements.txt
|
||||
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
|
||||
```
|
||||
|
||||
## Older ROCm releases
|
||||
## Older ROCm™ Releases
|
||||
|
||||
For release information for older ROCm releases, refer to
|
||||
[`CHANGELOG`](./CHANGELOG.md).
|
||||
For release information for older ROCm™ releases, refer to
|
||||
[CHANGELOG](./CHANGELOG.md).
|
||||
|
||||
62
RELEASE.md
@@ -11,65 +11,23 @@
|
||||
|
||||
<!-- spellcheck-disable -->
|
||||
|
||||
Welcome to the release notes for the ROCm platform.
|
||||
The release notes for the ROCm platform.
|
||||
|
||||
-------------------
|
||||
|
||||
## ROCm 5.7.1
|
||||
## ROCm 5.3.3
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
### Fixed Defects
|
||||
|
||||
### What's New in This Release
|
||||
#### Issue with rocTHRUST and rocPRIM Libraries
|
||||
|
||||
### ROCm Libraries
|
||||
There was a known issue with rocTHRUST and rocPRIM libraries supporting iterator and types in ROCm v5.3.x releases.
|
||||
|
||||
#### rocBLAS
|
||||
A new functionality rocblas-gemm-tune and an environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH are added to rocBLAS in the ROCm 5.7.1 release.
|
||||
- `thrust::merge` no longer correctly supports different iterator types for `keys_input1` and `keys_input2`.
|
||||
- `rocprim::device_merge` no longer correctly supports using different types for `keys_input1` and `keys_input2`.
|
||||
|
||||
*rocblas-gemm-tune* is used to find the best-performing GEMM kernel for each GEMM problem set. It has a command line interface, which mimics the --yaml input used by rocblas-bench. To generate the expected --yaml input, profile logging can be used, by setting the environment variable ROCBLAS_LAYER4.
|
||||
This issue is resolved with the following fixes to compilation failures:
|
||||
|
||||
For more information on rocBLAS logging, see Logging in rocBLAS, in the [API Reference Guide](https://rocm.docs.amd.com/projects/rocBLAS/en/docs-5.7.1/API_Reference_Guide.html#logging-in-rocblas).
|
||||
|
||||
An example input file: Expected output (note selected GEMM idx may differ): Where the far right values (solution_index) are the indices of the best-performing kernels for those GEMMs in the rocBLAS kernel library. These indices can be directly used in future GEMM calls. See rocBLAS/samples/example_user_driven_tuning.cpp for sample code of directly using kernels via their indices.
|
||||
|
||||
If the output is stored in a file, the results can be used to override default kernel selection with the kernels found, by setting the environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH, where points to the stored file.
|
||||
|
||||
For more details, refer to the [rocBLAS Programmer's Guide.](https://rocm.docs.amd.com/projects/rocBLAS/en/latest/Programmers_Guide.html#rocblas-gemm-tune)
|
||||
|
||||
#### HIP 5.7.1 (for ROCm 5.7.1)
|
||||
|
||||
ROCm 5.7.1 is a point release with several bug fixes in the HIP runtime.
|
||||
|
||||
### Fixed defects
|
||||
The *hipPointerGetAttributes* API returns the correct HIP memory type as *hipMemoryTypeManaged* for managed memory.
|
||||
|
||||
### Library Changes in ROCM 5.7.1
|
||||
|
||||
| Library | Version |
|
||||
|---------|---------|
|
||||
| hipBLAS | [1.1.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.7.1) |
|
||||
| hipCUB | [2.13.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.7.1) |
|
||||
| hipFFT | [1.0.12](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.7.1) |
|
||||
| hipSOLVER | 1.8.1 ⇒ [1.8.2](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.7.1) |
|
||||
| hipSPARSE | [2.3.8](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.7.1) |
|
||||
| MIOpen | [2.19.0](https://github.com/ROCmSoftwarePlatform/MIOpen/releases/tag/rocm-5.7.1) |
|
||||
| rocALUTION | [2.1.11](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.7.1) |
|
||||
| rocBLAS | [3.1.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.7.1) |
|
||||
| rocFFT | [1.0.24](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.7.1) |
|
||||
| rocm-cmake | [0.10.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.7.1) |
|
||||
| rocPRIM | [2.13.1](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.7.1) |
|
||||
| rocRAND | [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.7.1) |
|
||||
| rocSOLVER | [3.23.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.7.1) |
|
||||
| rocSPARSE | [2.5.4](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.7.1) |
|
||||
| rocThrust | [2.18.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.7.1) |
|
||||
| rocWMMA | [1.2.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.7.1) |
|
||||
| Tensile | [4.38.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.7.1) |
|
||||
|
||||
#### hipSOLVER 1.8.2
|
||||
|
||||
hipSOLVER 1.8.2 for ROCm 5.7.1
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed conflicts between the hipsolver-dev and -asan packages by excluding
|
||||
hipsolver_module.f90 from the latter
|
||||
- rocPRIM: in device_merge if the two key iterators do not match.
|
||||
- rocTHRUST: in thrust::merge if the two key iterators do not match.
|
||||
26
default.xml
@@ -12,7 +12,7 @@ fetch="https://github.com/GPUOpen-ProfessionalCompute-Libraries/" />
|
||||
fetch="https://github.com/GPUOpen-Tools/" />
|
||||
<remote name="KhronosGroup"
|
||||
fetch="https://github.com/KhronosGroup/" />
|
||||
<default revision="refs/tags/rocm-5.7.1"
|
||||
<default revision="refs/tags/rocm-5.5.1"
|
||||
remote="roc-github"
|
||||
sync-c="true"
|
||||
sync-j="4" />
|
||||
@@ -20,36 +20,38 @@ fetch="https://github.com/KhronosGroup/" />
|
||||
<project name="ROCK-Kernel-Driver" />
|
||||
<project name="ROCT-Thunk-Interface" />
|
||||
<project name="ROCR-Runtime" />
|
||||
<project name="amdsmi" />
|
||||
<project name="rocm_smi_lib" />
|
||||
<project name="rocm-core" />
|
||||
<project name="rocm-cmake" />
|
||||
<project name="rocminfo" />
|
||||
<project name="rocm_bandwidth_test" />
|
||||
<project name="rocprofiler" remote="rocm-devtools" />
|
||||
<project name="roctracer" remote="rocm-devtools" />
|
||||
<project name="ROCm-OpenCL-Runtime" />
|
||||
<project path="ROCm-OpenCL-Runtime/api/opencl/khronos/icd" name="OpenCL-ICD-Loader" remote="KhronosGroup" revision="6c03f8b58fafd9dd693eaac826749a5cfad515f8" />
|
||||
<project name="clang-ocl" />
|
||||
<project name="rdc" />
|
||||
<!--HIP Projects-->
|
||||
<project name="HIP" remote="rocm-devtools" />
|
||||
<project name="hipamd" remote="rocm-devtools" />
|
||||
<project name="HIP-Examples" remote="rocm-devtools" />
|
||||
<project name="clr" remote="rocm-devtools" />
|
||||
<project name="ROCclr" remote="rocm-devtools" />
|
||||
<project name="HIPIFY" remote="rocm-devtools" />
|
||||
<project name="HIPCC" remote="rocm-devtools" />
|
||||
<!-- The following projects are all associated with the AMDGPU LLVM compiler -->
|
||||
<project name="llvm-project" />
|
||||
<project name="ROCm-Device-Libs" />
|
||||
<project name="atmi" />
|
||||
<project name="ROCm-CompilerSupport" />
|
||||
<project name="rocr_debug_agent" remote="rocm-devtools" />
|
||||
<project name="rocm_bandwidth_test" />
|
||||
<project name="half" remote="rocm-swplat" revision="37742ce15b76b44e4b271c1e66d13d2fa7bd003e" />
|
||||
<project name="RCP" remote="gpuopen-tools" revision="3a49405a1500067c49d181844ec90aea606055bb" />
|
||||
<!-- gdb projects -->
|
||||
<project name="ROCgdb" remote="rocm-devtools" />
|
||||
<project name="ROCdbgapi" remote="rocm-devtools" />
|
||||
<project name="rocr_debug_agent" remote="rocm-devtools" />
|
||||
<!-- ROCm Libraries -->
|
||||
<project name="rdc" />
|
||||
<project groups="mathlibs" name="rocBLAS" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="Tensile" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="hipTensor" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="hipBLAS" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="rocFFT" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="hipFFT" remote="rocm-swplat" />
|
||||
@@ -59,16 +61,14 @@ fetch="https://github.com/KhronosGroup/" />
|
||||
<project groups="mathlibs" name="hipSOLVER" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="hipSPARSE" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="rocALUTION" remote="rocm-swplat" />
|
||||
<project name="MIOpenGEMM" remote="rocm-swplat" />
|
||||
<project name="MIOpen" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="rccl" remote="rocm-swplat" />
|
||||
<project name="MIVisionX" remote="gpuopen-libs" />
|
||||
<project groups="mathlibs" name="rocThrust" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="hipCUB" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="rocPRIM" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="rocWMMA" remote="rocm-swplat" />
|
||||
<project groups="mathlibs" name="rccl" remote="rocm-swplat" />
|
||||
<project name="rocMLIR" remote="rocm-swplat" />
|
||||
<project name="MIOpen" remote="rocm-swplat" />
|
||||
<project name="composable_kernel" remote="rocm-swplat" />
|
||||
<project name="MIVisionX" remote="gpuopen-libs" />
|
||||
<project name="rpp" remote="gpuopen-libs" />
|
||||
<project name="hipfort" remote="rocm-swplat" />
|
||||
<project name="AMDMIGraphX" remote="rocm-swplat" />
|
||||
<project name="ROCmValidationSuite" remote="rocm-devtools" />
|
||||
|
||||
6
docs/404.md
Normal file
@@ -0,0 +1,6 @@
|
||||
# 404 Page Not Found
|
||||
|
||||
Page could not be found.
|
||||
|
||||
Return to [home](./index) or please use the links from the sidebar to find what
|
||||
you are looking for.
|
||||
74
docs/about.md
Normal file
@@ -0,0 +1,74 @@
|
||||
# About ROCm Documentation
|
||||
|
||||
ROCm documentation is made available under open source [licenses](licensing.md).
|
||||
Documentation is built using open source toolchains. Contributions to our
|
||||
documentation is encouraged and welcome. As a contributor, please familiarize
|
||||
yourself with our documentation toolchain.
|
||||
|
||||
## ReadTheDocs
|
||||
|
||||
[ReadTheDocs](https://docs.readthedocs.io/en/stable/) is our front end for the
|
||||
our documentation. By front end, this is the tool that serves our HTML based
|
||||
documentation to our end users.
|
||||
|
||||
## Doxygen
|
||||
|
||||
[Doxygen](https://www.doxygen.nl/) is the most common inline code documentation
|
||||
standard. ROCm projects are use Doxygen for public API documentation (unless the
|
||||
upstream project is using a different tool).
|
||||
|
||||
## Sphinx
|
||||
|
||||
[Sphinx](https://www.sphinx-doc.org/en/master/) is a documentation generator
|
||||
originally used for python. It is now widely used in the Open Source community.
|
||||
Originally, sphinx supported RST based documentation. Markdown support is now
|
||||
available. ROCm documentation plans to default to markdown for new projects.
|
||||
Existing projects using RST are under no obligation to convert to markdown. New
|
||||
projects that believe markdown is not suitable should contact the documentation
|
||||
team prior to selecting RST.
|
||||
|
||||
### MyST
|
||||
|
||||
[Markedly Structured Text (MyST)](https://myst-tools.org/docs/spec) is an extended
|
||||
flavor of Markdown ([CommonMark](https://commonmark.org/)) influenced by reStructuredText (RST) and Sphinx.
|
||||
It is integrated via [`myst-parser`](https://myst-parser.readthedocs.io/en/latest/).
|
||||
A cheat sheet that showcases how to use the MyST syntax is available over at [the Jupyter
|
||||
reference](https://jupyterbook.org/en/stable/reference/cheatsheet.html).
|
||||
|
||||
### Sphinx Theme
|
||||
|
||||
ROCm is using the
|
||||
[Sphinx Book Theme](https://sphinx-book-theme.readthedocs.io/en/latest/). This
|
||||
theme is used by Jupyter books. ROCm documentation applies some customization
|
||||
include a header and footer on top of the Sphinx Book Theme. A future custom
|
||||
ROCm theme will be part of our documentation goals.
|
||||
|
||||
### Sphinx Design
|
||||
|
||||
Sphinx Design is an extension for sphinx based websites that add design
|
||||
functionality. Please see the documentation
|
||||
[here](https://sphinx-design.readthedocs.io/en/latest/index.html). ROCm
|
||||
documentation uses sphinx design for grids, cards, and synchronized tabs.
|
||||
Other features may be used in the future.
|
||||
|
||||
### Sphinx External TOC
|
||||
|
||||
ROCm uses the
|
||||
[sphinx-external-toc](https://sphinx-external-toc.readthedocs.io/en/latest/intro.html)
|
||||
for our navigation. This tool allows a YAML file based left navigation menu. This
|
||||
tool was selected due to its flexibility that allows scripts to operate on the
|
||||
YAML file. Please transition to this file for the project's navigation. You can
|
||||
see the `_toc.yml.in` file in this repository in the docs/sphinx folder for an
|
||||
example.
|
||||
|
||||
### Breathe
|
||||
|
||||
Sphinx uses [Breathe](https://www.breathe-doc.org/) to integrate Doxygen
|
||||
content.
|
||||
|
||||
## `rocm-docs-core` pip package
|
||||
|
||||
[rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) is an AMD
|
||||
maintained project that applies customization for our documentation. This
|
||||
project is the tool most ROCm repositories will use as part of the documentation
|
||||
build.
|
||||
@@ -1,63 +0,0 @@
|
||||
# Third party support matrix
|
||||
|
||||
ROCm™ supports various 3rd party libraries and frameworks. Supported versions
|
||||
are tested and known to work. Non-supported versions of 3rd parties may also
|
||||
work, but aren't tested.
|
||||
|
||||
## Deep learning
|
||||
|
||||
ROCm releases support the most recent and two prior releases of PyTorch and
|
||||
TensorFlow.
|
||||
|
||||
| ROCm | [PyTorch](https://github.com/pytorch/pytorch/releases/) | [TensorFlow](https://github.com/tensorflow/tensorflow/releases/) |
|
||||
|:------|:--------------------------:|:--------------------:|
|
||||
| 5.0.2 | 1.8, 1.9, 1.10 | 2.6, 2.7, 2.8 |
|
||||
| 5.1.3 | 1.9, 1.10, 1.11 | 2.7, 2.8, 2.9 |
|
||||
| 5.2.x | 1.10, 1.11, 1.12 | 2.8, 2.9, 2.9 |
|
||||
| 5.3.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.8, 2.9, 2.10 |
|
||||
| 5.4.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.8, 2.9, 2.10, 2.11 |
|
||||
| 5.5.x | 1.10.1, 1.11, 1.12.1, 1.13 | 2.10, 2.11, 2.13 |
|
||||
| 5.6.x | 1.12.1, 1.13, 2.0 | 2.12, 2.13 |
|
||||
| 5.7.x | 1.12.1, 1.13, 2.0 | 2.12, 2.13 |
|
||||
|
||||
(communication-libraries)=
|
||||
|
||||
## Communication libraries
|
||||
|
||||
ROCm supports [OpenUCX](https://openucx.org/), an open-source,
|
||||
production-grade communication framework for data-centric and high performance
|
||||
applications.
|
||||
|
||||
UCX version | ROCm 5.4 and older | ROCm 5.5 and newer |
|
||||
|:----------|:------------------:|:------------------:|
|
||||
| -1.14.0 | COMPATIBLE | INCOMPATIBLE |
|
||||
| 1.14.1+ | COMPATIBLE | COMPATIBLE |
|
||||
|
||||
The Unified Collective Communication ([UCC](https://github.com/openucx/ucc)) library also has
|
||||
support for ROCm devices.
|
||||
|
||||
UCC version | ROCm 5.5 and older | ROCm 5.6 and newer |
|
||||
|:----------|:------------------:|:------------------:|
|
||||
| -1.1.0 | COMPATIBLE | INCOMPATIBLE |
|
||||
| 1.2.0+ | COMPATIBLE | COMPATIBLE |
|
||||
|
||||
## Algorithm libraries
|
||||
|
||||
ROCm releases provide algorithm libraries with interfaces compatible with
|
||||
contemporary CUDA / NVIDIA HPC SDK alternatives.
|
||||
|
||||
* Thrust → rocThrust
|
||||
* CUB → hipCUB
|
||||
|
||||
| ROCm | Thrust / CUB | HPC SDK |
|
||||
|:------|:------------:|:-------:|
|
||||
| 5.0.2 | 1.14 | 21.9 |
|
||||
| 5.1.3 | 1.15 | 22.1 |
|
||||
| 5.2.x | 1.15 | 22.2, 22.3 |
|
||||
| 5.3.x | 1.16 | 22.7 |
|
||||
| 5.4.x | 1.16 | 22.9 |
|
||||
| 5.5.x | 1.17 | 22.9 |
|
||||
| 5.6.x | 1.17.2 | 22.9 |
|
||||
| 5.7.x | 1.17.2 | 22.9 |
|
||||
|
||||
For the latest documentation of these libraries, refer to [API libraries](../../reference/library-index.md).
|
||||
@@ -1,130 +0,0 @@
|
||||
******************************************************************
|
||||
Docker image support matrix
|
||||
******************************************************************
|
||||
|
||||
AMD validates and publishes `PyTorch <https://hub.docker.com/r/rocm/pytorch>`_ and
|
||||
`TensorFlow <https://hub.docker.com/r/rocm/tensorflow>`_ containers on dockerhub. The following
|
||||
tags, and associated inventories, are validated with ROCm 5.7.
|
||||
|
||||
.. tab-set::
|
||||
|
||||
.. tab-item:: PyTorch
|
||||
|
||||
.. tab-set::
|
||||
|
||||
.. tab-item:: Ubuntu 22.04
|
||||
|
||||
Tag: `rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1 <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1/images/sha256-21df283b1712f3d73884b9bc4733919374344ceacb694e8fbc2c50bdd3e767ee>`_
|
||||
|
||||
* Inventory:
|
||||
|
||||
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
|
||||
* `Python 3.10 <https://www.python.org/downloads/release/python-31013/>`_
|
||||
* `Torch 2.0.1 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/2.0>`_
|
||||
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
|
||||
* `Torchvision 0.15.0 <https://github.com/pytorch/vision/tree/release/0.15>`_
|
||||
* `Tensorboard 2.14.0 <https://github.com/tensorflow/tensorboard/tree/2.14>`_
|
||||
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
|
||||
* `UCX 1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
|
||||
* `OMPI 4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
|
||||
* `OFED 5.4.3 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
|
||||
|
||||
.. tab-item:: Ubuntu 20.04
|
||||
|
||||
Tag: `rocm/pytorch:rocm5.7_ubuntu20.04_py3.9_pytorch_staging <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1/images/sha256-4dd86046e5f777f53ae40a75ecfc76a5e819f01f3b2d40eacbb2db95c2f971d4)>`_
|
||||
|
||||
* Inventory:
|
||||
|
||||
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
|
||||
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
|
||||
* `Torch 2.1.0 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/rocm5.7_internal_testing>`_
|
||||
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
|
||||
* `Torchvision 0.16.0 <https://github.com/pytorch/vision/tree/release/0.16>`_
|
||||
* `Tensorboard 2.14.0 <https://github.com/tensorflow/tensorboard/tree/2.14>`_
|
||||
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
|
||||
* `UCX 1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
|
||||
* `OMPI 4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
|
||||
* `OFED 5.4.3 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
|
||||
|
||||
|
||||
Tag: `Ubuntu rocm/pytorch:rocm5.7_ubuntu20.04_py3.9_pytorch_1.12.1 <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_ubuntu20.04_py3.9_pytorch_1.12.1/images/sha256-e67db9373c045a7b6defd43cc3d067e7d49fd5d380f3f8582d2fb219c1756e1f>`_
|
||||
|
||||
* Inventory:
|
||||
|
||||
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
|
||||
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
|
||||
* `Torch 1.12.1 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.12>`_
|
||||
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
|
||||
* `Torchvision 0.13.1 <https://github.com/pytorch/vision/tree/v0.13.1>`_
|
||||
* `Tensorboard 2.14.0 <https://github.com/tensorflow/tensorboard/tree/2.14>`_
|
||||
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
|
||||
* `UCX 1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
|
||||
* `OMPI 4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
|
||||
* `OFED 5.4.3 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
|
||||
|
||||
Tag: `Ubuntu rocm/pytorch:rocm5.7_ubuntu20.04_py3.9_pytorch_1.13.1 <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_ubuntu20.04_py3.9_pytorch_1.13.1/images/sha256-ed99d159026093d2aaf5c48c1e4b0911508773430377051372733f75c340a4c1>`_
|
||||
|
||||
* Inventory:
|
||||
|
||||
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
|
||||
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
|
||||
* `Torch 1.12.1 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.13>`_
|
||||
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
|
||||
* `Torchvision 0.14.0 <https://github.com/pytorch/vision/tree/v0.14.0>`_
|
||||
* `Tensorboard 2.12.0 <https://github.com/tensorflow/tensorboard/tree/2.12.0>`_
|
||||
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
|
||||
* `UCX 1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
|
||||
* `OMPI 4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
|
||||
* `OFED 5.4.3 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
|
||||
|
||||
Tag: `Ubuntu rocm/pytorch:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1 <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1/images/sha256-4dd86046e5f777f53ae40a75ecfc76a5e819f01f3b2d40eacbb2db95c2f971d4>`_
|
||||
|
||||
* Inventory:
|
||||
|
||||
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
|
||||
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
|
||||
* `Torch 2.0.1 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/2.0>`_
|
||||
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
|
||||
* `Torchvision 0.15.2 <https://github.com/pytorch/vision/tree/release/0.15>`_
|
||||
* `Tensorboard 2.14.0 <https://github.com/tensorflow/tensorboard/tree/2.14>`_
|
||||
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
|
||||
* `UCX 1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
|
||||
* `OMPI 4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
|
||||
* `OFED 5.4.3 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
|
||||
|
||||
.. tab-item:: CentOS 7
|
||||
|
||||
Tag: `rocm/pytorch:rocm5.7_centos7_py3.9_pytorch_staging <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_centos7_py3.9_pytorch_staging/images/sha256-92240cdf0b4aa7afa76fc78be995caa19ee9c54b5c9f1683bdcac28cedb58d2b>`_
|
||||
|
||||
* Inventory:
|
||||
|
||||
* `ROCm 5.7 <https://repo.radeon.com/rocm/yum/5.7/>`_
|
||||
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
|
||||
* `Torch 2.1.0 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/rocm5.7_internal_testing>`_
|
||||
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
|
||||
* `Torchvision 0.16.0 <https://github.com/pytorch/vision/tree/release/0.16>`_
|
||||
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
|
||||
|
||||
.. tab-item:: TensorFlow
|
||||
|
||||
.. tab-set::
|
||||
|
||||
.. tab-item:: Ubuntu 20.04
|
||||
|
||||
Tag: `rocm5.7-tf2.12-dev <https://hub.docker.com/layers/rocm/tensorflow/rocm5.7-tf2.12-dev/images/sha256-e0ac4d49122702e5167175acaeb98a79b9500f585d5e74df18facf6b52ce3e59>`_
|
||||
|
||||
* Inventory:
|
||||
|
||||
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
|
||||
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
|
||||
* `tensorflow-rocm 2.12.1 <https://pypi.org/project/tensorflow-rocm/2.12.1.570/>`_
|
||||
* `Tensorboard 2.12.3 <https://github.com/tensorflow/tensorboard/tree/2.12>`_
|
||||
|
||||
Tag: `rocm5.7-tf2.13-dev <https://hub.docker.com/layers/rocm/tensorflow/rocm5.7-tf2.13-dev/images/sha256-6f995539eebc062aac2b53db40e2b545192d8b032d0deada8c24c6651a7ac332>`_
|
||||
|
||||
* Inventory:
|
||||
|
||||
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
|
||||
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
|
||||
* `tensorflow-rocm 2.13.0 <https://pypi.org/project/tensorflow-rocm/2.13.0.570/>`_
|
||||
* `Tensorboard 2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13>`_
|
||||
@@ -1,116 +0,0 @@
|
||||
# GPU and OS support (Linux)
|
||||
|
||||
(linux-support)=
|
||||
|
||||
## Supported Linux distributions
|
||||
|
||||
AMD ROCm™ Platform supports the following Linux distributions.
|
||||
|
||||
::::{tab-set}
|
||||
|
||||
:::{tab-item} Supported
|
||||
|
||||
| Distribution | Processor Architectures | Validated Kernel | Support |
|
||||
| :----------- | :---------------------: | :--------------: | ------: |
|
||||
| RHEL 9.2 | x86-64 | 5.14 (5.14.0-284.11.1.el9_2.x86_64) | ✅ |
|
||||
| RHEL 9.1 | x86-64 | 5.14.0-284.11.1.el9_2.x86_64 | ✅ |
|
||||
| RHEL 8.8 | x86-64 | 4.18.0-477.el8.x86_64 | ✅ |
|
||||
| RHEL 8.7 | x86-64 | 4.18.0-425.10.1.el8_7.x86_64 | ✅ |
|
||||
| SLES 15 SP5 | x86-64 | 5.14.21-150500.53-default | ✅ |
|
||||
| SLES 15 SP4 | x86-64 | 5.14.21-150400.24.63-default | ✅ |
|
||||
| Ubuntu 22.04.2 | x86-64 | 5.19.0-45-generic | ✅ |
|
||||
| Ubuntu 20.04.5 | x86-64 | 5.15.0-75-generic | ✅ |
|
||||
|
||||
:::{versionadded} 5.6
|
||||
|
||||
* RHEL 8.8 and 9.2 support is added.
|
||||
* SLES 15 SP5 support is added
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Unsupported
|
||||
|
||||
| Distribution | Processor Architectures | Validated Kernel | Support |
|
||||
| :----------- | :---------------------: | :--------------: | ------: |
|
||||
| RHEL 9.0 | x86-64 | 5.14 | ❌ |
|
||||
| RHEL 8.6 | x86-64 | 5.14 | ❌ |
|
||||
| SLES 15 SP3 | x86-64 | 5.3 | ❌ |
|
||||
| Ubuntu 22.04.0 | x86-64 | 5.15 LTS, 5.17 OEM | ❌ |
|
||||
| Ubuntu 20.04.4 | x86-64 | 5.13 HWE, 5.13 OEM | ❌ |
|
||||
| Ubuntu 22.04.1 | x86-64 | 5.15 LTS | ❌ |
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
✅: **Supported** - AMD performs full testing of all ROCm components on distro
|
||||
GA image.
|
||||
❌: **Unsupported** - AMD no longer performs builds and testing on these
|
||||
previously supported distro GA images.
|
||||
|
||||
## Virtualization support
|
||||
|
||||
ROCm supports virtualization for select GPUs only as shown below.
|
||||
|
||||
| Hypervisor | Version | GPU | Validated Guest OS (validated kernel) |
|
||||
|----------------|----------|-------|----------------------------------------------------------------------------------|
|
||||
| VMWare | ESXi 8 | MI250 | Ubuntu 20.04 (`5.15.0-56-generic`) |
|
||||
| VMWare | ESXi 8 | MI210 | Ubuntu 20.04 (`5.15.0-56-generic`), SLES 15 SP4 (`5.14.21-150400.24.18-default`) |
|
||||
| VMWare | ESXi 7 | MI210 | Ubuntu 20.04 (`5.15.0-56-generic`), SLES 15 SP4 (`5.14.21-150400.24.18-default`) |
|
||||
|
||||
## Linux-supported GPUs
|
||||
|
||||
The table below shows supported GPUs for Instinct™, Radeon Pro™ and Radeon™
|
||||
GPUs. Please click the tabs below to switch between GPU product lines. If a GPU
|
||||
is not listed on this table, the GPU is not officially supported by AMD.
|
||||
|
||||
:::::{tab-set}
|
||||
|
||||
::::{tab-item} AMD Instinct™
|
||||
:sync: instinct
|
||||
|
||||
| Product Name | Architecture | [LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) |Support |
|
||||
|:------------:|:------------:|:--------------------------------------------------------------------:|:-------:|
|
||||
| AMD Instinct™ MI250X | CDNA2 | gfx90a | ✅ |
|
||||
| AMD Instinct™ MI250 | CDNA2 | gfx90a | ✅ |
|
||||
| AMD Instinct™ MI210 | CDNA2 | gfx90a | ✅ |
|
||||
| AMD Instinct™ MI100 | CDNA | gfx908 | ✅ |
|
||||
| AMD Instinct™ MI50 | GCN5.1 | gfx906 | ✅ |
|
||||
| AMD Instinct™ MI25 | GCN5.0 | gfx900 | ❌ |
|
||||
|
||||
::::
|
||||
|
||||
::::{tab-item} Radeon Pro™
|
||||
:sync: radeonpro
|
||||
|
||||
| Name | Architecture |[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Support|
|
||||
|:----:|:------------:|:--------------------------------------------------------------------:|:-------:|
|
||||
| AMD Radeon™ Pro W7900 | RDNA3 | gfx1100 | ✅ (Ubuntu 22.04 only)|
|
||||
| AMD Radeon™ Pro W6800 | RDNA2 | gfx1030 | ✅ |
|
||||
| AMD Radeon™ Pro V620 | RDNA2 | gfx1030 | ✅ |
|
||||
| AMD Radeon™ Pro VII | GCN5.1 | gfx906 | ✅ |
|
||||
::::
|
||||
|
||||
::::{tab-item} Radeon™
|
||||
:sync: radeonpro
|
||||
|
||||
| Name | Architecture |[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Support|
|
||||
|:----:|:---------------:|:--------------------------------------------------------------------:|:-------:|
|
||||
| AMD Radeon™ RX 7900 XTX | RDNA3 | gfx1100 | ✅ (Ubuntu 22.04 only)|
|
||||
| AMD Radeon™ VII | GCN5.1 | gfx906 | ✅ |
|
||||
|
||||
::::
|
||||
:::::
|
||||
|
||||
### Support status
|
||||
|
||||
✅: **Supported** - AMD enables these GPUs in our software distributions for
|
||||
the corresponding ROCm product.
|
||||
⚠️: **Deprecated** - Support will be removed in a future release.
|
||||
❌: **Unsupported** - This configuration is not enabled in our software
|
||||
distributions.
|
||||
|
||||
## CPU support
|
||||
|
||||
ROCm requires CPUs that support PCIe™ atomics. Modern CPUs after the release of
|
||||
1st generation AMD Zen CPU and Intel™ Haswell support PCIe atomics.
|
||||
@@ -1,80 +0,0 @@
|
||||
# GPU and OS support (Windows)
|
||||
|
||||
(windows-support)=
|
||||
|
||||
## Supported SKUs
|
||||
|
||||
AMD HIP SDK supports the following Windows variants.
|
||||
|
||||
| Distribution |Processor Architectures| Validated update |
|
||||
|---------------------|-----------------------|--------------------|
|
||||
| Windows 10 | x86-64 | 22H2 (GA) |
|
||||
| Windows 11 | x86-64 | 22H2 (GA) |
|
||||
| Windows Server 2022 | x86-64 | |
|
||||
|
||||
## Windows-supported GPUs
|
||||
|
||||
The table below shows supported GPUs for Radeon Pro™ and Radeon™ GPUs. Please
|
||||
click the tabs below to switch between GPU product lines. If a GPU is not listed
|
||||
on this table, the GPU is not officially supported by AMD.
|
||||
|
||||
::::{tab-set}
|
||||
|
||||
:::{tab-item} Radeon Pro™
|
||||
:sync: radeonpro
|
||||
|
||||
| Name | Architecture |[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Runtime | HIP SDK |
|
||||
|:----:|:------------:|:--------------------------------------------------------------------:|:-------:|:----------------:|
|
||||
| AMD Radeon Pro™ W7900 | RDNA3 | gfx1100 | ✅ | ✅ |
|
||||
| AMD Radeon Pro™ W7800 | RDNA3 | gfx1100 | ✅ | ✅ |
|
||||
| AMD Radeon Pro™ W6800 | RDNA2 | gfx1030 | ✅ | ✅ |
|
||||
| AMD Radeon Pro™ W6600 | RDNA2 | gfx1032 | ✅ | ❌ |
|
||||
| AMD Radeon Pro™ W5500 | RDNA1 | gfx1012 | ❌ | ❌ |
|
||||
| AMD Radeon Pro™ VII | GCN5.1 | gfx906 | ❌ | ❌ |
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} Radeon™
|
||||
:sync: radeon
|
||||
|
||||
| Name | Architecture | [LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Runtime | HIP SDK |
|
||||
|:----:|:------------:|:--------------------------------------------------------------------:|:-------:|:----------------:|
|
||||
| AMD Radeon™ RX 7900 XTX | RDNA3 | gfx1100 | ✅ | ✅ |
|
||||
| AMD Radeon™ RX 7900 XT | RDNA3 | gfx1100 | ✅ | ✅ |
|
||||
| AMD Radeon™ RX 7600 | RDNA3 | gfx1102 | ✅ | ✅ |
|
||||
| AMD Radeon™ RX 6950 XT | RDNA2 | gfx1030 | ✅ | ✅ |
|
||||
| AMD Radeon™ RX 6900 XT | RDNA2 | gfx1030 | ✅ | ✅ |
|
||||
| AMD Radeon™ RX 6800 XT | RDNA2 | gfx1030 | ✅ | ✅ |
|
||||
| AMD Radeon™ RX 6800 | RDNA2 | gfx1030 | ✅ | ✅ |
|
||||
| AMD Radeon™ RX 6750 XT | RDNA2 | gfx1031 | ✅ | ❌ |
|
||||
| AMD Radeon™ RX 6700 XT | RDNA2 | gfx1031 | ✅ | ❌ |
|
||||
| AMD Radeon™ RX 6700 | RDNA2 | gfx1031 | ✅ | ❌ |
|
||||
| AMD Radeon™ RX 6650 XT | RDNA2 | gfx1032 | ✅ | ❌ |
|
||||
| AMD Radeon™ RX 6600 XT | RDNA2 | gfx1032 | ✅ | ❌ |
|
||||
| AMD Radeon™ RX 6600 | RDNA2 | gfx1032 | ✅ | ❌ |
|
||||
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
### Component support
|
||||
|
||||
ROCm components are described in [What is ROCm?](../../what-is-rocm.md) Support
|
||||
on Windows is provided with two levels on enablement.
|
||||
|
||||
* **Runtime**: Runtime enables the use of the HIP and OpenCL runtimes only.
|
||||
* **HIP SDK**: Runtime plus additional components are listed in [Libraries](../../reference/library-index.md).
|
||||
Note that some math libraries are Linux exclusive.
|
||||
|
||||
### Support status
|
||||
|
||||
✅: **Supported** - AMD enables these GPUs in our software distributions for
|
||||
the corresponding ROCm product.
|
||||
⚠️: **Deprecated** - Support will be removed in a future release.
|
||||
❌: **Unsupported** - This configuration is not enabled in our software
|
||||
distributions.
|
||||
|
||||
## CPU support
|
||||
|
||||
ROCm requires CPUs that support PCIe™ atomics. Modern CPUs after the release of
|
||||
1st generation AMD Zen CPU and Intel™ Haswell support PCIe atomics.
|
||||
@@ -1,9 +0,0 @@
|
||||
# License
|
||||
|
||||
> Note: This license applies to the [ROCm repository](https://github.com/RadeonOpenCompute/ROCm) that primarily contains documentation. For other licensing information, refer to the [Licensing Terms page](./licensing).
|
||||
|
||||
```{include} ../../LICENSE
|
||||
```
|
||||
|
||||
```{include} ./licensing.md
|
||||
```
|
||||
@@ -1,127 +0,0 @@
|
||||
# ROCm licensing terms
|
||||
|
||||
ROCm™ is released by Advanced Micro Devices, Inc. and is licensed per component separately.
|
||||
The following table is a list of ROCm components with links to their respective license
|
||||
terms. These components may include third party components subject to
|
||||
additional licenses. Please review individual repositories for more information.
|
||||
|
||||
The table shows ROCm components, the name of license, and link to the license terms.
|
||||
The table is ordered to follow the ROCm manifest file.
|
||||
|
||||
<!-- spellcheck-disable -->
|
||||
| Component | License |
|
||||
|:---------------------|:-------------------------|
|
||||
| [AMDMIGraphX](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/) | [MIT](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/LICENSE) |
|
||||
| [HIPCC](https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/LICENSE.txt) | [MIT](https://github.com/ROCm-Developer-Tools/HIPCC/blob/develop/LICENSE.txt) |
|
||||
| [HIPIFY](https://github.com/ROCm-Developer-Tools/HIPIFY/) | [MIT](https://github.com/ROCm-Developer-Tools/HIPIFY/blob/amd-staging/LICENSE.txt) |
|
||||
| [HIP](https://github.com/ROCm-Developer-Tools/HIP/) | [MIT](https://github.com/ROCm-Developer-Tools/HIP/blob/develop/LICENSE.txt) |
|
||||
| [MIOpenGEMM](https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/) | [MIT](https://github.com/ROCmSoftwarePlatform/MIOpenGEMM/blob/master/LICENSE.txt) |
|
||||
| [MIOpen](https://github.com/ROCmSoftwarePlatform/MIOpen/) | [MIT](https://github.com/ROCmSoftwarePlatform/MIOpen/blob/master/LICENSE.txt) |
|
||||
| [MIVisionX](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/) | [MIT](https://github.com/GPUOpen-ProfessionalCompute-Libraries/MIVisionX/blob/master/LICENSE.txt) |
|
||||
| [RCP](https://github.com/GPUOpen-Tools/radeon_compute_profiler/) | [MIT](https://github.com/GPUOpen-Tools/radeon_compute_profiler/blob/master/LICENSE) |
|
||||
| [ROCK-Kernel-Driver](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/) | [GPL 2.0 WITH Linux-syscall-note](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/master/COPYING) |
|
||||
| [ROCR-Runtime](https://github.com/RadeonOpenCompute/ROCR-Runtime/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/LICENSE.txt) |
|
||||
| [ROCT-Thunk-Interface](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/) | [MIT](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/master/LICENSE.md) |
|
||||
| [ROCclr](https://github.com/ROCm-Developer-Tools/ROCclr/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCclr/blob/develop/LICENSE.txt) |
|
||||
| [ROCdbgapi](https://github.com/ROCm-Developer-Tools/ROCdbgapi/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCdbgapi/blob/amd-master/LICENSE.txt) |
|
||||
| [ROCgdb](https://github.com/ROCm-Developer-Tools/ROCgdb/) | [GNU General Public License v2.0](https://github.com/ROCm-Developer-Tools/ROCgdb/blob/amd-master/COPYING) |
|
||||
| [ROCm-CompilerSupport](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/blob/amd-stg-open/LICENSE.txt) |
|
||||
| [ROCm-Device-Libs](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/LICENSE.TXT) |
|
||||
| [ROCm-OpenCL-Runtime/api/opencl/khronos/icd](https://github.com/KhronosGroup/OpenCL-ICD-Loader/) | [Apache 2.0](https://github.com/KhronosGroup/OpenCL-ICD-Loader/blob/main/LICENSE) |
|
||||
| [ROCm-OpenCL-Runtime](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/) | [MIT](https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/blob/develop/LICENSE.txt) |
|
||||
| [ROCmValidationSuite](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/) | [MIT](https://github.com/ROCm-Developer-Tools/ROCmValidationSuite/blob/master/LICENSE) |
|
||||
| [Tensile](https://github.com/ROCmSoftwarePlatform/Tensile/) | [MIT](https://github.com/ROCmSoftwarePlatform/Tensile/blob/develop/LICENSE.md) |
|
||||
| [aomp-extras](https://github.com/ROCm-Developer-Tools/aomp-extras/) | [MIT](https://github.com/ROCm-Developer-Tools/aomp-extras/blob/aomp-dev/LICENSE) |
|
||||
| [aomp](https://github.com/ROCm-Developer-Tools/aomp/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/aomp/blob/aomp-dev/LICENSE) |
|
||||
| [atmi](https://github.com/RadeonOpenCompute/atmi/) | [MIT](https://github.com/RadeonOpenCompute/atmi/blob/master/LICENSE.txt) |
|
||||
| [clang-ocl](https://github.com/RadeonOpenCompute/clang-ocl/) | [MIT](https://github.com/RadeonOpenCompute/clang-ocl/blob/master/LICENSE) |
|
||||
| [flang](https://github.com/ROCm-Developer-Tools/flang/) | [Apache 2.0](https://github.com/ROCm-Developer-Tools/flang/blob/master/LICENSE.txt) |
|
||||
| [half](https://github.com/ROCmSoftwarePlatform/half/) | [MIT](https://github.com/ROCmSoftwarePlatform/half/blob/master/LICENSE.txt) |
|
||||
| [hipBLAS](https://github.com/ROCmSoftwarePlatform/hipBLAS/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipBLAS/blob/develop/LICENSE.md) |
|
||||
| [hipCUB](https://github.com/ROCmSoftwarePlatform/hipCUB/) | [Custom](https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/LICENSE.txt) |
|
||||
| [hipFFT](https://github.com/ROCmSoftwarePlatform/hipFFT/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/LICENSE.md) |
|
||||
| [hipSOLVER](https://github.com/ROCmSoftwarePlatform/hipSOLVER/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/LICENSE.md) |
|
||||
| [hipSPARSELt](https://github.com/ROCmSoftwarePlatform/hipSPARSELt/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSPARSELt/blob/develop/LICENSE.md) |
|
||||
| [hipSPARSE](https://github.com/ROCmSoftwarePlatform/hipSPARSE/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipSPARSE/blob/develop/LICENSE.md) |
|
||||
| [hipTensor](https://github.com/ROCmSoftwarePlatform/hipTensor) | [MIT](https://github.com/ROCmSoftwarePlatform/hipTensor/blob/develop/LICENSE) |
|
||||
| [hipamd](https://github.com/ROCm-Developer-Tools/hipamd/) | [MIT](https://github.com/ROCm-Developer-Tools/hipamd/blob/develop/LICENSE.txt) |
|
||||
| [hipfort](https://github.com/ROCmSoftwarePlatform/hipfort/) | [MIT](https://github.com/ROCmSoftwarePlatform/hipfort/blob/master/LICENSE) |
|
||||
| [llvm-project](https://github.com/ROCm-Developer-Tools/llvm-project/) | [Apache](https://github.com/ROCm-Developer-Tools/llvm-project/blob/main/LICENSE.TXT) |
|
||||
| [rccl](https://github.com/ROCmSoftwarePlatform/rccl/) | [Custom](https://github.com/ROCmSoftwarePlatform/rccl/blob/develop/LICENSE.txt) |
|
||||
| [rdc](https://github.com/RadeonOpenCompute/rdc/) | [MIT](https://github.com/RadeonOpenCompute/rdc/blob/master/LICENSE) |
|
||||
| [rocALUTION](https://github.com/ROCmSoftwarePlatform/rocALUTION/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocALUTION/blob/develop/LICENSE.md) |
|
||||
| [rocBLAS](https://github.com/ROCmSoftwarePlatform/rocBLAS/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/develop/LICENSE.md) |
|
||||
| [rocFFT](https://github.com/ROCmSoftwarePlatform/rocFFT/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocFFT/blob/develop/LICENSE.md) |
|
||||
| [rocPRIM](https://github.com/ROCmSoftwarePlatform/rocPRIM/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/LICENSE.txt) |
|
||||
| [rocRAND](https://github.com/ROCmSoftwarePlatform/rocRAND/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocRAND/blob/develop/LICENSE.txt) |
|
||||
| [rocSOLVER](https://github.com/ROCmSoftwarePlatform/rocSOLVER/) | [BSD-2-Clause](https://github.com/ROCmSoftwarePlatform/rocSOLVER/blob/develop/LICENSE.md) |
|
||||
| [rocSPARSE](https://github.com/ROCmSoftwarePlatform/rocSPARSE/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocSPARSE/blob/develop/LICENSE.md) |
|
||||
| [rocThrust](https://github.com/ROCmSoftwarePlatform/rocThrust/) | [Apache 2.0](https://github.com/ROCmSoftwarePlatform/rocThrust/blob/develop/LICENSE) |
|
||||
| [rocWMMA](https://github.com/ROCmSoftwarePlatform/rocWMMA/) | [MIT](https://github.com/ROCmSoftwarePlatform/rocWMMA/blob/develop/LICENSE.md) |
|
||||
| [rocm-cmake](https://github.com/RadeonOpenCompute/rocm-cmake/) | [MIT](https://github.com/RadeonOpenCompute/rocm-cmake/blob/develop/LICENSE) |
|
||||
| [rocm_bandwidth_test](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_bandwidth_test/blob/master/LICENSE.txt) |
|
||||
| [rocm_smi_lib](https://github.com/RadeonOpenCompute/rocm_smi_lib/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocm_smi_lib/blob/master/License.txt) |
|
||||
| [rocminfo](https://github.com/RadeonOpenCompute/rocminfo/) | [The University of Illinois/NCSA](https://github.com/RadeonOpenCompute/rocminfo/blob/master/License.txt) |
|
||||
| [rocprofiler](https://github.com/ROCm-Developer-Tools/rocprofiler/) | [MIT](https://github.com/ROCm-Developer-Tools/rocprofiler/blob/amd-master/LICENSE) |
|
||||
| [rocr_debug_agent](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/) | [The University of Illinois/NCSA](https://github.com/ROCm-Developer-Tools/rocr_debug_agent/blob/master/LICENSE.txt) |
|
||||
| [roctracer](https://github.com/ROCm-Developer-Tools/roctracer/) | [MIT](https://github.com/ROCm-Developer-Tools/roctracer/blob/amd-master/LICENSE) |
|
||||
| rocm-llvm-alt | [AMD Proprietary License](https://www.amd.com/en/support/amd-software-eula)
|
||||
|
||||
Open sourced ROCm components are released via public GitHub
|
||||
repositories, packages on https://repo.radeon.com and other distribution channels.
|
||||
Proprietary products are only available on https://repo.radeon.com. Currently, only
|
||||
one component of ROCm, rocm-llvm-alt is governed by a proprietary license.
|
||||
Proprietary components are organized in a proprietary subdirectory in the package
|
||||
repositories to distinguish from open sourced packages.
|
||||
|
||||
The additional terms and conditions below apply to your use of ROCm technical
|
||||
documentation.
|
||||
|
||||
©2023 Advanced Micro Devices, Inc. All rights reserved.
|
||||
|
||||
The information presented in this document is for informational purposes only
|
||||
and may contain technical inaccuracies, omissions, and typographical errors. The
|
||||
information contained herein is subject to change and may be rendered inaccurate
|
||||
for many reasons, including but not limited to product and roadmap changes,
|
||||
component and motherboard version changes, new model and/or product releases,
|
||||
product differences between differing manufacturers, software changes, BIOS
|
||||
flashes, firmware upgrades, or the like. Any computer system has risks of
|
||||
security vulnerabilities that cannot be completely prevented or mitigated. AMD
|
||||
assumes no obligation to update or otherwise correct or revise this information.
|
||||
However, AMD reserves the right to revise this information and to make changes
|
||||
from time to time to the content hereof without obligation of AMD to notify any
|
||||
person of such revisions or changes.
|
||||
|
||||
THIS INFORMATION IS PROVIDED “AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES
|
||||
WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
|
||||
INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD
|
||||
SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT,
|
||||
MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
|
||||
LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER
|
||||
CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN,
|
||||
EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
|
||||
|
||||
AMD, the AMD Arrow logo, ROCm, and combinations thereof are trademarks of
|
||||
Advanced Micro Devices, Inc. Other product names used in this publication are
|
||||
for identification purposes only and may be trademarks of their respective
|
||||
companies.
|
||||
|
||||
## Package licensing
|
||||
|
||||
```{attention}
|
||||
AQL Profiler and AOCC CPU optimization are both provided in binary form, each
|
||||
subject to the license agreement enclosed in the directory for the binary and is
|
||||
available here: `/opt/rocm/share/doc/rocm-llvm-alt/EULA`. By using, installing,
|
||||
copying or distributing AQL Profiler and/or AOCC CPU Optimizations, you agree to
|
||||
the terms and conditions of this license agreement. If you do not agree to the
|
||||
terms of this agreement, do not install, copy or use the AQL Profiler and/or the
|
||||
AOCC CPU Optimizations.
|
||||
```
|
||||
|
||||
For the rest of the ROCm packages, you can find the licensing information at the
|
||||
following location: `/opt/rocm/share/doc/<component-name>/`
|
||||
|
||||
For example, you can fetch the licensing information of the `_amd_comgr_`
|
||||
component (Code Object Manager) from the `amd_comgr` folder. A file named
|
||||
`LICENSE.txt` contains the license details at:
|
||||
`/opt/rocm-5.4.3/share/doc/amd_comgr/LICENSE.txt`
|
||||
@@ -1,25 +0,0 @@
|
||||
# ROCm release history
|
||||
|
||||
| Version | Release Date |
|
||||
| ------- | ------------ |
|
||||
| [5.7.1](https://rocm.docs.amd.com/en/docs-5.7.1/) | Oct 13, 2023 |
|
||||
| [5.7.0](https://rocm.docs.amd.com/en/docs-5.7.0/) | Sep 15, 2023 |
|
||||
| [5.6.0](https://rocm.docs.amd.com/en/docs-5.6.0/) | Jun 28, 2023 |
|
||||
| [5.5.1](https://rocm.docs.amd.com/en/docs-5.5.1/) | May 24, 2023 |
|
||||
| [5.5.0](https://rocm.docs.amd.com/en/docs-5.5.0/) | May 1, 2023 |
|
||||
| [5.4.3](https://rocm.docs.amd.com/en/docs-5.4.3/) | Feb 7, 2023 |
|
||||
| [5.4.2](https://rocm.docs.amd.com/en/docs-5.4.2/) | Jan 13, 2023 |
|
||||
| [5.4.1](https://rocm.docs.amd.com/en/docs-5.4.1/) | Dec 15, 2022 |
|
||||
| [5.4.0](https://rocm.docs.amd.com/en/docs-5.4.0/) | Nov 30, 2022 |
|
||||
| [5.3.3](https://rocm.docs.amd.com/en/docs-5.3.3/) | Nov 17, 2022 |
|
||||
| [5.3.2](https://rocm.docs.amd.com/en/docs-5.3.2/) | Nov 9, 2022 |
|
||||
| [5.3.0](https://rocm.docs.amd.com/en/docs-5.3.0/) | Oct 4, 2022 |
|
||||
| [5.2.3](https://rocm.docs.amd.com/en/docs-5.2.3/) | Aug 18, 2022 |
|
||||
| [5.2.1](https://rocm.docs.amd.com/en/docs-5.2.1/) | Jul 21, 2022 |
|
||||
| [5.2.0](https://rocm.docs.amd.com/en/docs-5.2.0/) | Jun 28, 2022 |
|
||||
| [5.1.3](https://rocm.docs.amd.com/en/docs-5.1.3/) | May 20, 2022 |
|
||||
| [5.1.1](https://rocm.docs.amd.com/en/docs-5.1.1/) | Apr 8, 2022 |
|
||||
| [5.1.0](https://rocm.docs.amd.com/en/docs-5.1.0/) | Mar 30, 2022 |
|
||||
| [5.0.2](https://rocm.docs.amd.com/en/docs-5.0.2/) | Mar 4, 2022 |
|
||||
| [5.0.1](https://rocm.docs.amd.com/en/docs-5.0.1/) | Feb 16, 2022 |
|
||||
| [5.0.0](https://rocm.docs.amd.com/en/docs-5.0.0/) | Feb 9, 2022 |
|
||||
@@ -1,93 +0,0 @@
|
||||
# What's new in ROCm?
|
||||
|
||||
ROCm is now supported on Windows.
|
||||
|
||||
## Windows support
|
||||
|
||||
Starting with ROCm 5.5, the HIP SDK brings a subset of ROCm to developers on Windows.
|
||||
The collection of features enabled on Windows is referred to as the HIP SDK.
|
||||
These features allow developers to use the HIP runtime, HIP math libraries
|
||||
and HIP Primitive libraries. The following table shows the differences
|
||||
between Windows and Linux releases.
|
||||
|
||||
|Component|Linux|Windows|
|
||||
|---------|-----|-------|
|
||||
|Driver|Radeon Software for Linux |AMD Software Pro Edition|
|
||||
|Compiler|`hipcc`/`amdclang++`|`hipcc`/`clang++`|
|
||||
|Debugger|`rocgdb`|no debugger available|
|
||||
|Profiler|`rocprof`|[Radeon GPU Profiler](https://gpuopen.com/rgp/)|
|
||||
|Porting Tools|HIPIFY|Coming Soon|
|
||||
|Runtime|HIP (Open Sourced)|HIP (closed source)|
|
||||
|Math Libraries|Supported|Supported|
|
||||
|Primitives Libraries|Supported|Supported|
|
||||
|Communication Libraries|Supported|Not Available|
|
||||
|AI Libraries|MIOpen, MIGraphX|Not Available|
|
||||
|System Management|`rocm-smi-lib`, RDC, `rocminfo`|`amdsmi`, `hipInfo`|
|
||||
|AI Frameworks|PyTorch, TensorFlow, etc.|Not Available|
|
||||
|CMake HIP Language|Enabled|Unsupported|
|
||||
|Visual Studio| Not applicable| Plugin Available|
|
||||
|HIP Ray Tracing| Supported|Supported|
|
||||
|
||||
AMD is continuing to invest in Windows support and AMD plans to release enhanced
|
||||
features in subsequent revisions.
|
||||
|
||||
```{note}
|
||||
The 5.5 Windows Installer collectively groups the Math and Primitives
|
||||
libraries.
|
||||
```
|
||||
|
||||
```{note}
|
||||
GPU support on Windows and Linux may differ. You must refer to
|
||||
Windows and Linux GPU support tables separately.
|
||||
```
|
||||
|
||||
```{note}
|
||||
HIP Ray Tracing is not distributed via ROCm in Linux.
|
||||
```
|
||||
|
||||
## ROCm release versioning
|
||||
|
||||
Linux OS releases set the canonical version numbers for ROCm. Windows will
|
||||
follow Linux version numbers as Windows releases are based on Linux ROCm
|
||||
releases. However, not all Linux ROCm releases will have a corresponding Windows
|
||||
release. The following table shows the ROCm releases on Windows and Linux. Releases
|
||||
with both Windows and Linux are referred to as a joint release. Releases with
|
||||
only Linux support are referred to as a skipped release from the Windows
|
||||
perspective.
|
||||
|
||||
|Release version|Linux|Windows|
|
||||
|---------------|-----|-------|
|
||||
|5.5|✅|✅|
|
||||
|5.6|✅|❌|
|
||||
|
||||
ROCm Linux releases are versioned with following the Major.Minor.Patch
|
||||
version number system. Windows releases will only be versioned with Major.Minor.
|
||||
|
||||
In general, Windows releases will trail Linux releases. Software developers that
|
||||
wish to support both Linux and Windows using a single ROCm version should
|
||||
refrain from upgrading ROCm unless there is a joint release.
|
||||
|
||||
## Windows documentation implications
|
||||
|
||||
The ROCm documentation website contains both Windows and Linux documentation.
|
||||
Just below each article title, a convenient article information section states
|
||||
whether the page applies to Linux only, Windows only or both OSes. To find the
|
||||
exact Windows documentation for a release of the HIP SDK, please view the ROCm documentation with the same
|
||||
Major.Minor version number while ignoring the Patch version. The Patch version
|
||||
only matters for Linux releases. For convenience,
|
||||
Windows documentation will continue to be included in the overall ROCm
|
||||
documentation for the skipped Windows releases.
|
||||
|
||||
Windows release notes will contain only information pertinent to Windows.
|
||||
The software developer must read all the previous ROCm release notes (including)
|
||||
skipped ROCm versions on Windows for information on all the changes present in
|
||||
the Windows release.
|
||||
|
||||
## Windows builds from source
|
||||
|
||||
Not all source code required to build Windows from source is available under a
|
||||
permissive open source license. Build instructions on Windows is only provided
|
||||
for projects that can be built from source on Windows using a toolchain that
|
||||
has closed source build prerequisites. The ROCm manifest file is not valid for
|
||||
Windows. AMD does not release a manifest or tag our components in Windows.
|
||||
Users may use corresponding Linux tags to build on Windows.
|
||||
@@ -1,51 +0,0 @@
|
||||
# GPU architecture documentation
|
||||
|
||||
:::::{grid} 1 1 2 2
|
||||
:gutter: 1
|
||||
|
||||
:::{grid-item-card}
|
||||
**AMD Instinct MI200 series**
|
||||
|
||||
Review hardware aspects of the AMD Instinct™ MI200 series of GPU
|
||||
accelerators and the CDNA™ 2 architecture.
|
||||
|
||||
* [AMD Instinct™ MI250 microarchitecture](./gpu-arch/mi250.md)
|
||||
* [AMD Instinct MI200/CDNA2 ISA](https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf)
|
||||
* [White paper](https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf)
|
||||
* [Performance counters](./gpu-arch/mi200-performance-counters.md)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**AMD Instinct MI100**
|
||||
|
||||
Review hardware aspects of the AMD Instinct™ MI100
|
||||
accelerators and the CDNA™ 1 architecture that is the foundation of these GPUs.
|
||||
|
||||
* [AMD Instinct™ MI100 microarchitecture](./gpu-arch/mi100.md)
|
||||
* [AMD Instinct MI100/CDNA1 ISA](https://www.amd.com/system/files/TechDocs/instinct-mi100-cdna1-shader-instruction-set-architecture%C2%A0.pdf)
|
||||
* [White paper](https://www.amd.com/system/files/documents/amd-cdna-whitepaper.pdf)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**RDNA**
|
||||
|
||||
* [AMD RDNA3 ISA](https://www.amd.com/system/files/TechDocs/rdna3-shader-instruction-set-architecture-feb-2023_0.pdf)
|
||||
* [AMD RDNA2 ISA](https://www.amd.com/system/files/TechDocs/rdna2-shader-instruction-set-architecture.pdf)
|
||||
* [AMD RDNA ISA](https://www.amd.com/system/files/TechDocs/rdna-shader-instruction-set-architecture.pdf)
|
||||
* [AMD RDNA Architecture White Paper](https://www.amd.com/system/files/documents/rdna-whitepaper.pdf)
|
||||
|
||||
:::
|
||||
|
||||
:::{grid-item-card}
|
||||
**Older architectures**
|
||||
|
||||
* [AMD Instinct MI50/Vega 7nm ISA](https://www.amd.com/system/files/TechDocs/vega-7nm-shader-instruction-set-architecture.pdf)
|
||||
* [AMD Instinct MI25/Vega ISA](https://www.amd.com/system/files/TechDocs/vega-shader-instruction-set-architecture.pdf)
|
||||
* [AMD GCN3 ISA](https://www.amd.com/system/files/TechDocs/gcn3-instruction-set-architecture.pdf)
|
||||
* [AMD Vega Architecture White Paper](https://en.wikichip.org/w/images/a/a1/vega-whitepaper.pdf)
|
||||
|
||||
:::
|
||||
|
||||
:::::
|
||||
@@ -1,455 +0,0 @@
|
||||
# MI200 performance counters and metrics
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
|
||||
This document lists and describes the hardware performance counters and the derived metrics available on the AMD Instinct™ MI200 GPU. All hardware performance monitors, and the derived performance metrics are accessible via AMD ROCm™ Profiler tool.
|
||||
|
||||
## MI200 performance counters list
|
||||
|
||||
```{note}
|
||||
Preliminary validation of all MI200 performance counters is in progress. Those with “[*]” appended to the names require further evaluation.
|
||||
```
|
||||
|
||||
### GRBM
|
||||
|
||||
#### GRBM counters
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
|--------------------|--------| ------------------------------------------------------|
|
||||
| `grbm_count` | Cycles | Free-running GPU clock |
|
||||
| `grbm_gui_active` | Cycles | GPU active cycles |
|
||||
| `grbm_cp_busy` | Cycles | Any of the command processor (CPC/CPF) blocks are busy. |
|
||||
| `grbm_spi_busy` | Cycles | Any of the shader processor input (SPI) are busy in the shader engine(s). |
|
||||
| `grbm_ta_busy` | Cycles | Any of the texture addressing unit are busy in the shader engine(s). |
|
||||
| `grbm_tc_busy` | Cycles | Any of the texture cache blocks (TCP/TCI/TCA/TCC) are busy. |
|
||||
| `grbm_cpc_busy` | Cycles | The command processor - compute (CPC) is busy. |
|
||||
| `grbm_cpf_busy` | Cycles | The command processor - fetcher (CPF) is busy. |
|
||||
| `grbm_utcl2_busy` | Cycles | The unified translation cache - level 2 (UTCL2) block is busy. |
|
||||
| `grbm_ea_busy` | Cycles | The efficiency arbiter (EA) block is busy. |
|
||||
|
||||
### Command processor
|
||||
|
||||
The command processor counters are further classified into fetcher and compute.
|
||||
|
||||
#### CPF
|
||||
|
||||
##### CPF counters
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
|--------------------------------------|--------|--------------------------------------------------------------|
|
||||
| `cpf_cmp_utcl1_stall_on_translation` | Cycles | One of the compute UTCL1s is stalled waiting on translation. |
|
||||
| `cpf_cpf_stat_idle[∗]` | Cycles | CPF idle |
|
||||
| `cpf_cpf_stat_stall` | Cycles | CPF stall |
|
||||
| `cpf_cpf_tciu_busy` | Cycles | CPF TCIU interface busy |
|
||||
| `cpf_cpf_tciu_idle` | Cycles | CPF TCIU interface idle |
|
||||
| `cpf_cpf_tciu_stall[∗]` | Cycles | CPF TCIU interface is stalled waiting on free tags. |
|
||||
|
||||
#### CPC
|
||||
|
||||
##### CPC counters
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
| ---------------------------------| -------| --------------------------------------------------- |
|
||||
| `cpc_me1_busy_for_packet_decode` | Cycles | CPC ME1 busy decoding packets |
|
||||
| `cpc_utcl1_stall_on_translation` | Cycles | One of the UTCL1s is stalled waiting on translation |
|
||||
| `cpc_cpc_stat_busy` | Cycles | CPC busy |
|
||||
| `cpc_cpc_stat_idle` | Cycles | CPC idle |
|
||||
| `cpc_cpc_stat_stall` | Cycles | CPC stalled |
|
||||
| `cpc_cpc_tciu_busy` | Cycles | CPC TCIU interface busy |
|
||||
| `cpc_cpc_tciu_idle` | Cycles | CPC TCIU interface idle |
|
||||
| `cpc_cpc_utcl2iu_busy` | Cycles | CPC UTCL2 interface busy |
|
||||
| `cpc_cpc_utcl2iu_idle` | Cycles | CPC UTCL2 interface idle |
|
||||
| `cpc_cpc_utcl2iu_stall[∗]` | Cycles | CPC UTCL2 interface stalled waiting |
|
||||
| `cpc_me1_dci0_spi_busy` | Cycles | CPC ME1 Processor busy |
|
||||
|
||||
### SPI
|
||||
|
||||
#### SPI counters
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
| :----------------------------| :-----------| -----------------------------------------------------------: |
|
||||
| `spi_csn_busy` | Cycles | Number of clocks with outstanding waves |
|
||||
| `spi_csn_window_valid` | Cycles | Clock count enabled by perfcounter_start event |
|
||||
| `spi_csn_num_threadgroups` | Workgroups | Total number of dispatched workgroups |
|
||||
| `spi_csn_wave` | Wavefronts | Total number of dispatched wavefronts |
|
||||
| `spi_ra_req_no_alloc` | Cycles | Arb cycles with requests but no allocation (need to multiply this value by 4) |
|
||||
|`spi_ra_req_no_alloc_csn` | Cycles | Arb cycles with CSn req and no CSn alloc (need to multiply this value by 4) |
|
||||
| `spi_ra_res_stall_csn` | Cycles | Arb cycles with CSn req and no CSn fits (need to multiply this value by 4) |
|
||||
| `spi_ra_tmp_stall_csn[∗]` | Cycles | Cycles where CSn wants to req but does not fit in temp space |
|
||||
| `spi_ra_wave_simd_full_csn` | SIMD-cycles | Sum of SIMD where WAVE cannot take csn wave when not fits |
|
||||
| `spi_ra_vgpr_simd_full_csn[∗]` | SIMD-cycles | Sum of SIMD where VGPR cannot take csn wave when not fits |
|
||||
| `spi_ra_sgpr_simd_full_csn[∗]` | SIMD-cycles | Sum of SIMD where SGPR cannot take csn wave when not fits |
|
||||
| `spi_ra_lds_cu_full_csn` | CUs | Sum of CU where LDS cannot take csn wave when not fits |
|
||||
| `spi_ra_bar_cu_full_csn[∗]` | CUs | Sum of CU where BARRIER cannot take csn wave when not fits |
|
||||
| `spi_ra_bulky_cu_full_csn[∗]` | CUs | Sum of CU where BULKY cannot take csn wave when not fits |
|
||||
| `spi_ra_tglim_cu_full_csn[∗]` | Cycles | Cycles where csn wants to req but all CUs are at tg_limit |
|
||||
| `spi_ra_wvlim_cu_full_csn[∗]` | Cycles | Number of clocks csn is stalled due to WAVE LIMIT |
|
||||
| `spi_vwc_csc_wr` | Cycles | Number of clocks to write CSC waves to VGPRs (need to multiply this value by 4) |
|
||||
| `spi_swc_csc_wr` | Cycles | Number of clocks to write CSC waves to SGPRs (need to multiply this value by 4) |
|
||||
|
||||
### Compute unit
|
||||
|
||||
The compute unit counters are further classified into instruction mix, MFMA operation counters, level counters, wavefront counters, wavefront cycle counters, local data share counters, and others.
|
||||
|
||||
#### Instruction mix
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
| :-----------------------| :-----:| -----------------------------------------------------------------------: |
|
||||
| `sq_insts` | Instr | Number of instructions issued |
|
||||
| `sq_insts_valu` | Instr | Number of VALU instructions issued, including MFMA |
|
||||
| `sq_insts_valu_add_f16` | Instr | Number of VALU F16 Add instructions issued |
|
||||
| `sq_insts_valu_mul_f16` | Instr | Number of VALU F16 Multiply instructions issued |
|
||||
| `sq_insts_valu_fma_f16` | Instr | Number of VALU F16 FMA instructions issued |
|
||||
| `sq_insts_valu_trans_f16` | Instr | Number of VALU F16 Transcendental instructions issued |
|
||||
| `sq_insts_valu_add_f32` | Instr | Number of VALU F32 Add instructions issued |
|
||||
| `sq_insts_valu_mul_f32` | Instr | Number of VALU F32 Multiply instructions issued |
|
||||
| `sq_insts_valu_fma_f32` | Instr | Number of VALU F32 FMA instructions issued |
|
||||
| `sq_insts_valu_trans_f32` | Instr | Number of VALU F32 Transcendental instructions issued |
|
||||
| `sq_insts_valu_add_f64` | Instr | Number of VALU F64 Add instructions issued |
|
||||
| `sq_insts_valu_mul_f64` | Instr | Number of VALU F64 Multiply instructions issued |
|
||||
| `sq_insts_valu_fma_f64` | Instr | Number of VALU F64 FMA instructions issued |
|
||||
| `sq_insts_valu_trans_f64` | Instr | Number of VALU F64 Transcendental instructions issued |
|
||||
| `sq_insts_valu_int32` | Instr | Number of VALU 32-bit integer instructions issued (signed or unsigned) |
|
||||
| `sq_insts_valu_int64` | Instr | Number of VALU 64-bit integer instructions issued (signed or unsigned) |
|
||||
| `sq_insts_valu_cvt` | Instr | Number of VALU Conversion instructions issued |
|
||||
| `sq_insts_valu_mfma_i8` | Instr | Number of 8-bit Integer MFMA instructions issued |
|
||||
| `sq_insts_valu_mfma_f16` | Instr | Number of F16 MFMA instructions issued |
|
||||
| `sq_insts_valu_mfma_bf16` | Instr | Number of BF16 MFMA instructions issued |
|
||||
| `sq_insts_valu_mfma_f32` | Instr | Number of F32 MFMA instructions issued |
|
||||
| `sq_insts_valu_mfma_f64` | Instr | Number of F64 MFMA instructions issued |
|
||||
| `sq_insts_mfma` | Instr | Number of MFMA instructions issued |
|
||||
| `sq_insts_vmem_wr` | Instr | Number of VMEM write instructions issued |
|
||||
| `sq_insts_vmem_rd` | Instr | Number of VMEM read instructions issued |
|
||||
| `sq_insts_vmem` | Instr | Number of VMEM instructions issued, including both FLAT and buffer instructions |
|
||||
| `sq_insts_salu` | Instr | Number of SALU instructions issued |
|
||||
| `sq_insts_smem` | Instr | Number of SMEM instructions issued |
|
||||
| `sq_insts_smem_norm` | Instr | Number of SMEM instructions issued to normalize to match `smem_level`. Used in measuring SMEM latency |
|
||||
| `sq_insts_flat` | Instr | Number of FLAT instructions issued |
|
||||
| `sq_insts_flat_lds_only` | Instr | Number of FLAT instructions issued that read/write only from/to LDS |
|
||||
| `sq_insts_lds` | Instr | Number of LDS instructions issued |
|
||||
| `sq_insts_gds` | Instr | Number of GDS instructions issued |
|
||||
| `sq_insts_exp_gds` | Instr | Number of EXP and GDS instructions excluding skipped export instructions issued |
|
||||
| `sq_insts_branch` | Instr | Number of Branch instructions issued |
|
||||
| `sq_insts_sendmsg` | Instr | Number of SENDMSG instructions including s_endpgm issued |
|
||||
| `sq_insts_vskipped[∗]` | Instr | Number of VSkipped instructions issued |
|
||||
|
||||
#### MFMA operation counters
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
| :----------------------------| :-----| ----------------------------------------------: |
|
||||
| `sq_insts_valu_mfma_mops_I8` | IOP | Number of 8-bit integer MFMA ops in unit of 512 |
|
||||
| `sq_insts_valu_mfma_mops_F16` | FLOP | Number of F16 floating MFMA ops in unit of 512 |
|
||||
| `sq_insts_valu_mfma_mops_BF16` | FLOP | Number of BF16 floating MFMA ops in unit of 512 |
|
||||
| `sq_insts_valu_mfma_mops_F32` | FLOP | Number of F32 floating MFMA ops in unit of 512 |
|
||||
| `sq_insts_valu_mfma_mops_F64` | FLOP | Number of F64 floating MFMA ops in unit of 512 |
|
||||
|
||||
#### Level counters
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
| :-------------------| :-----| -------------------------------------: |
|
||||
| `sq_accum_prev` | Count | Accumulated counter sample value where accumulation takes place once every four cycles |
|
||||
| `sq_accum_prev_hires` | Count | Accumulated counter sample value where accumulation takes place once every cycle |
|
||||
| `sq_level_waves` | Waves | Number of inflight waves |
|
||||
| `sq_insts_level_vmem` | Instr | Number of inflight VMEM instructions |
|
||||
| `sq_insts_level_smem` | Instr | Number of inflight SMEM instructions |
|
||||
| `sq_insts_level_lds` | Instr | Number of inflight LDS instructions |
|
||||
| `sq_ifetch_level` | Instr | Number of inflight instruction fetches |
|
||||
|
||||
#### Wavefront counters
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
| :--------------------| :-----| ----------------------------------------------------------------: |
|
||||
| `sq_waves` | Waves | Number of wavefronts dispatch to SQs, including both new and restored wavefronts |
|
||||
| `sq_waves_saved[∗]` | Waves | Number of context-saved wavefronts |
|
||||
| `sq_waves_restored[∗]` | Waves | Number of context-restored wavefronts |
|
||||
| `sq_waves_eq_64` | Waves | Number of wavefronts with exactly 64 active threads sent to SQs |
|
||||
| `sq_waves_lt_64` | Waves | Number of wavefronts with less than 64 active threads sent to SQs |
|
||||
| `sq_waves_lt_48` | Waves | Number of wavefronts with less than 48 active threads sent to SQs |
|
||||
| `sq_waves_lt_32` | Waves | Number of wavefronts with less than 32 active threads sent to SQs |
|
||||
| `sq_waves_lt_16` | Waves | Number of wavefronts with less than 16 active threads sent to SQs |
|
||||
|
||||
#### Wavefront cycle counters
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
| :------------------------| :-------| --------------------------------------------------------------------: |
|
||||
| `sq_cycles` | Cycles | Free-running SQ clocks |
|
||||
| `sq_busy_cycles` | Cycles | Number of cycles while SQ reports it to be busy |
|
||||
| `sq_busy_cu_cycles` | Qcycles | Number of quad cycles each CU is busy |
|
||||
| `sq_valu_mfma_busy_cycles` | Cycles | Number of cycles the MFMA ALU is busy |
|
||||
| `sq_wave_cycles` | Qcycles | Number of quad cycles spent by waves in the CUs |
|
||||
| `sq_wait_any` | Qcycles | Number of quad cycles spent waiting for anything |
|
||||
| `sq_wait_inst_any` | Qcycles | Number of quad cycles spent waiting for an issued instruction |
|
||||
| `sq_active_inst_any` | Qcycles | Number of quad cycles spent by each wave to work on an instruction |
|
||||
| `sq_active_inst_vmem` | Qcycles | Number of quad cycles spent by each wave to work on a non-FLAT VMEM instruction |
|
||||
| `sq_active_inst_lds` | Qcycles | Number of quad cycles spent by each wave to work on an LDS instruction |
|
||||
| `sq_active_inst_valu` | Qcycles | Number of quad cycles spent by each wave to work on a VALU instruction |
|
||||
| `sq_active_inst_sca` | Qcycles | Number of quad cycles spent by each wave to work on an SCA instruction |
|
||||
| `sq_active_inst_exp_gds` | Qcycles | Number of quad cycles spent by each wave to work on EXP or GDS instruction |
|
||||
| `sq_active_inst_misc` | Qcycles | Number of quad cycles spent by each wave to work on an MISC instruction, including branch and sendmsg |
|
||||
| `sq_active_inst_flat` | Qcycles | Number of quad cycles spent by each wave to work on a FLAT instruction |
|
||||
| `sq_inst_cycles_vmem_wr` | Qcycles | Number of quad cycles spent to send addr and cmd data for VMEM write instructions, including both FLAT and buffer |
|
||||
| `sq_inst_cycles_vmem_rd` | Qcycles | Number of quad cycles spent to send addr and cmd data for VMEM read instructions, including both FLAT and buffer |
|
||||
| `sq_inst_cycles_smem` | Qcycles | Number of quad cycles spent to execute scalar memory reads |
|
||||
| `sq_inst_cycles_salu` | Cycles | Number of cycles spent to execute non-memory read scalar operations |
|
||||
| `sq_thread_cycles_valu` | Cycles | Number of thread cycles spent to execute VALU operations |
|
||||
|
||||
#### Local data share
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
| :--------------------------| :------| --------------------------------------------------------: |
|
||||
| `sq_lds_atomic_return` | Cycles | Number of atomic return cycles in LDS |
|
||||
| `sq_lds_bank_conflict` | Cycles | Number of cycles LDS is stalled by bank conflicts |
|
||||
| `sq_lds_addr_conflict[∗]` | Cycles | Number of cycles LDS is stalled by address conflicts |
|
||||
| `sq_lds_unaligned_stalls[∗]` | Cycles | Number of cycles LDS is stalled processing flat unaligned load/store ops |
|
||||
| `sq_lds_mem_violations[∗]` | Count | Number of threads that have a memory violation in the LDS |
|
||||
|
||||
#### Miscellaneous
|
||||
|
||||
##### Local data share
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
| :----------------| :-------| --------------------------------------------------------: |
|
||||
| `sq_ifetch` | Count | Number of fetch requests from L1I cache, in 32-byte width |
|
||||
| `sq_items` | Threads | Number of valid threads |
|
||||
|
||||
### L1I and sL1D caches
|
||||
|
||||
#### L1I and sL1D caches
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
| :----------------------------| :------| ----------------------------------------------------------------: |
|
||||
| `sqc_icache_req` | Req | Number of L1I cache requests |
|
||||
| `sqc_icache_hits` | Count | Number of L1I cache lookup-hits |
|
||||
| `sqc_icache_misses` | Count | Number of L1I cache non-duplicate lookup-misses |
|
||||
| `sqc_icache_misses_duplicate` | Count | Number of d L1I cache duplicate lookup misses whose previous lookup miss on the same cache line is not fulfilled yet |
|
||||
| `sqc_dcache_req` | Req | Number of sL1D cache requests |
|
||||
| `sqc_dcache_input_valid_readb` | Cycles | Number of cycles while SQ input is valid but sL1D cache is not ready |
|
||||
| `sqc_dcache_hits` | Count | Number of sL1D cache lookup-hits |
|
||||
| `sqc_dcache_misses` | Count | Number of sL1D non-duplicate lookup-misses |
|
||||
| `sqc_dcache_misses_duplicate` | Count | Number of sL1D duplicate lookup-misses |
|
||||
| `sqc_dcache_req_read_1` | Req | Number of read requests in a single 32-bit data word, DWORD (DW) |
|
||||
| `sqc_dcache_req_read_2` | Req | Number of read requests in 2 DW |
|
||||
| `sqc_dcache_req_read_4` | Req | Number of read requests in 4 DW |
|
||||
| `sqc_dcache_req_read_8` | Req | Number of read requests in 8 DW |
|
||||
| `sqc_dcache_req_read_16` | Req | Number of read requests in 16 DW |
|
||||
| `sqc_dcache_atomic[∗]` | Req | Number of atomic requests |
|
||||
| `sqc_tc_req` | Req | Number of L2 cache requests that were issued by instruction and constant caches |
|
||||
| `sqc_tc_inst_req` | Req | Number of instruction cache line requests to L2 cache |
|
||||
| `sqc_tc_data_read_req` | Req | Number of data read requests to the L2 cache |
|
||||
| `sqc_tc_data_write_req[∗]` | Req | Number of data write requests to the L2 cache |
|
||||
| `sqc_tc_data_atomic_req[∗]` | Req | Number of data atomic requests to the L2 cache |
|
||||
| `sqc_tc_stall[∗]` | Cycles | Number of cycles while the valid requests to L2 cache are stalled |
|
||||
|
||||
### Vector L1 cache subsystem
|
||||
|
||||
The vector L1 cache subsystem counters are further classified into texture addressing unit, texture data unit, vector L1D cache, and texture cache arbiter.
|
||||
|
||||
#### Texture addressing unit
|
||||
|
||||
##### Texture addressing unit counters
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
| :--------------------------------| :------| ------------------------------------------------: |
|
||||
| `ta_ta_busy` | Cycles | texture addressing unit busy cycles |
|
||||
| `ta_total_wavefronts` | Instr | Number of wavefront instructions |
|
||||
| `ta_buffer_wavefronts` | Instr | Number of buffer wavefront instructions |
|
||||
| `ta_buffer_read_wavefronts` | Instr | Number of buffer read wavefront instructions |
|
||||
| `ta_buffer_write_wavefronts` | Instr | Number of buffer write wavefront instructions |
|
||||
| `ta_buffer_atomic_wavefronts[∗]` | Instr | Number of buffer atomic wavefront instructions |
|
||||
| `ta_buffer_total_cycles` | Cycles | Number of buffer cycles, including read and write |
|
||||
| `ta_buffer_coalesced_read_cycles` | Cycles | Number of coalesced buffer read cycles |
|
||||
| `ta_buffer_coalesced_write_cycles` | Cycles | Number of coalesced buffer write cycles |
|
||||
| `ta_addr_stalled_by_tc` | Cycles | Number of cycles texture addressing unit address is stalled by TCP |
|
||||
| `ta_data_stalled_by_tc` | Cycles | Number of cycles texture addressing unit data is stalled by TCP |
|
||||
| `ta_addr_stalled_by_td_cycles[∗]` | Cycles | Number of cycles texture addressing unit address is stalled by TD |
|
||||
| `ta_flat_wavefronts` | Instr | Number of flat wavefront instructions |
|
||||
| `ta_flat_read_wavefronts` | Instr | Number of flat read wavefront instructions |
|
||||
| `ta_flat_write_wavefronts` | Instr | Number of flat write wavefront instructions |
|
||||
| `ta_flat_atomic_wavefronts` | Instr | Number of flat atomic wavefront instructions |
|
||||
|
||||
#### Texture data unit
|
||||
|
||||
##### Texture data unit counters
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
| :------------------------| :-----| ---------------------------------------------------: |
|
||||
| `td_td_busy` | Cycle | TD busy cycles |
|
||||
| `td_tc_stall` | Cycle | Number of cycles TD is stalled by TCP |
|
||||
| `td_spi_stall[∗]` | Cycle | Number of cycles TD is stalled by SPI |
|
||||
| `td_load_wavefront` | Instr | Number of wavefront instructions (read/write/atomic) |
|
||||
| `td_store_wavefront` | Instr | Number of write wavefront instructions |
|
||||
| `td_atomic_wavefront` | Instr | Number of atomic wavefront instructions |
|
||||
| `td_coalescable_wavefront` | Instr | Number of coalescable instructions |
|
||||
|
||||
#### Vector L1D cache
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
| :-----------------------------------| :------| ----------------------------------------------------------: |
|
||||
| `tcp_gate_en1` | Cycles | Number of cycles/ vL1D interface clocks are turned on |
|
||||
| `tcp_gate_en2` | Cycles | Number of cycles vL1D core clocks are turned on |
|
||||
| `tcp_td_tcp_stall_cycles` | Cycles | Number of cycles TD stalls vL1D |
|
||||
| `tcp_tcr_tcp_stall_cycles` | Cycles | Number of cycles TCR stalls vL1D |
|
||||
| `tcp_read_tagconflict_stall_cycles` | Cycles | Number of cycles tagram conflict stalls on a read |
|
||||
| `tcp_write_tagconflict_stall_cycles` | Cycles | Number of cycles tagram conflict stalls on a write |
|
||||
| `tcp_atomic_tagconflict_stall_cycles` | Cycles | Number of cycles tagram conflict stalls on an atomic |
|
||||
| `tcp_pending_stall_cycles` | Cycles | Number of cycles vL1D cache is stalled due to data pending from L2 cache |
|
||||
| `tcp_ta_tcp_state_read` | Req | Number of wavefront instruction requests to vL1D |
|
||||
| `tcp_volatile[∗]` | Req | Number of L1 volatile pixels/buffers from texture addressing unit |
|
||||
| `tcp_total_accesses` | Req | Number of vL1D accesses |
|
||||
| `tcp_total_read` | Req | Number of vL1D read accesses |
|
||||
| `tcp_total_write` | Req | Number of vL1D write accesses |
|
||||
| `tcp_total_atomic_with_ret` | Req | Number of vL1D atomic with return |
|
||||
| `tcp_total_atomic_without_ret` | Req | Number of vL1D atomic without return |
|
||||
| `tcp_total_writeback_invalidates` | Count | Number of vL1D writebacks and Invalidates |
|
||||
| `tcp_utcl1_request` | Req | Number of address translation requests to UTCL1 |
|
||||
| `tcp_utcl1_translation_hit` | Req | Number of UTCL1 translation hits |
|
||||
| `tcp_utcl1_translation_miss` | Req | Number of UTCL1 translation misses |
|
||||
| `tcp_utcl1_persmission_miss` | Req | Number of UTCL1 permission misses |
|
||||
| `tcp_total_cache_accesses` | Req | Number of vL1D cache accesses |
|
||||
| `tcp_tcp_latency` | Cycles | Accumulated wave access latency to vL1D over all wavefronts |
|
||||
| `tcp_tcc_read_req_latency` | Cycles | Accumulated vL1D-L2 request latency over all wavefronts for reads and atomics with return |
|
||||
| `tcp_tcc_write_req_latency` | Cycles | Accumulated vL1D-L2 request latency over all wavefronts for writes and atomics without return |
|
||||
| `tcp_tcc_read_req` | Req | Number of read requests to L2 cache |
|
||||
| `tcp_tcc_write_req` | Req | Number of write requests to L2 cache |
|
||||
| `tcp_tcc_atomic_with_ret_req` | Req | Number of atomic requests to L2 cache with return |
|
||||
| `tcp_tcc_atomic_without_ret_req` | Req | Number of atomic requests to L2 cache without return |
|
||||
| `tcp_tcc_nc_read_req` | Req | Number of NC read requests to L2 cache |
|
||||
| `tcp_tcc_uc_read_req` | Req | Number of UC read requests to L2 cache |
|
||||
| `tcp_tcc_cc_read_req` | Req | Number of CC read requests to L2 cache |
|
||||
| `tcp_tcc_rw_read_req` | Req | Number of RW read requests to L2 cache |
|
||||
| `tcp_tcc_nc_write_req` | Req | Number of NC write requests to L2 cache |
|
||||
| `tcp_tcc_uc_write_req` | Req | Number of UC write requests to L2 cache |
|
||||
| `tcp_tcc_cc_write_req` | Req | Number of CC write requests to L2 cache |
|
||||
| `tcp_tcc_rw_write_req` | Req | Number of RW write requests to L2 cache |
|
||||
| `tcp_tcc_nc_atomic_req` | Req | Number of NC atomic requests to L2 cache |
|
||||
| `tcp_tcc_uc_atomic_req` | Req | Number of UC atomic requests to L2 cache |
|
||||
| `tcp_tcc_cc_atomic_req` | Req | Number of CC atomic requests to L2 cache |
|
||||
| `tcp_tcc_rw_atomic_req` | Req | Number of RW atomic requests to L2 cache |
|
||||
|
||||
#### TCA
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
| :----------------| :------| ------------------------------------------: |
|
||||
| `tca_cycle` | Cycles | TCA cycles |
|
||||
| `tca_busy` | Cycles | Number of cycles TCA has a pending request |
|
||||
|
||||
### L2 cache access
|
||||
|
||||
#### L2 cache access counters
|
||||
|
||||
| Hardware Counter | Unit | Definition |
|
||||
| :--------------------------------| :------| -------------------------------------------------------------: |
|
||||
| `tcc_cycle` |Cycle | L2 cache free-running clocks |
|
||||
| `tcc_busy` |Cycle | L2 cache busy cycles |
|
||||
| `tcc_req` |Req | Number of L2 cache requests |
|
||||
| `tcc_streaming_req[∗]` |Req | Number of L2 cache streaming requests |
|
||||
| `tcc_NC_req` |Req | Number of NC requests |
|
||||
| `tcc_UC_req` |Req | Number of UC requests |
|
||||
| `tcc_CC_req` |Req | Number of CC requests |
|
||||
| `tcc_RW_req` |Req | Number of RW requests |
|
||||
| `tcc_probe` |Req | Number of L2 cache probe requests |
|
||||
| `tcc_probe_all[∗]` |Req | Number of external probe requests with EA_TCC_preq_all== 1 |
|
||||
| `tcc_read_req` |Req | Number of L2 cache read requests |
|
||||
| `tcc_write_req` |Req | Number of L2 cache write requests |
|
||||
| `tcc_atomic_req` |Req | Number of L2 cache atomic requests |
|
||||
| `tcc_hit` |Req | Number of L2 cache lookup-hits |
|
||||
| `tcc_miss` |Req | Number of L2 cache lookup-misses |
|
||||
| `tcc_writeback` |Req | Number of lines written back to main memory, including writebacks of dirty lines and uncached write/atomic requests |
|
||||
| `tcc_ea_wrreq` |Req | Total number of 32-byte and 64-byte write requests to EA |
|
||||
| `tcc_ea_wrreq_64B` |Req | Total number of 64-byte write requests to EA |
|
||||
| `tcc_ea_wr_uncached_32B` |Req | Number of 32-byte write/atomic going over the TC_EA_wrreq interface due to uncached traffic. Note that CC mtypes can produce uncached requests, and those are included in this. A 64-byte request is counted as 2. |
|
||||
| `tcc_ea_wrreq_stall` | Cycles | Number of cycles a write request was stalled |
|
||||
| `tcc_ea_wrreq_io_credit_stall[∗]` | Cycles | Number of cycles an EA write request runs out of IO credits |
|
||||
| `tcc_ea_wrreq_gmi_credit_stall[∗]` | Cycles | Number of cycles an EA write request runs out of GMI credits |
|
||||
| `tcc_ea_wrreq_dram_credit_stall` | Cycles | Number of cycles an EA write request runs out of DRAM credits |
|
||||
| `tcc_too_many_ea_wrreqs_stall[∗]` | Cycles | Number of cycles the L2 cache reaches maximum number of pending EA write requests |
|
||||
| `tcc_ea_wrreq_level` | Req | Accumulated number of L2 cache-EA write requests in flight |
|
||||
| `tcc_ea_atomic` | Req | Number of 32-byte and 64-byte atomic requests to EA |
|
||||
| `tcc_ea_atomic_level` | Req | Accumulated number of L2 cache-EA atomic requests in flight |
|
||||
| `tcc_ea_rdreq` | Req | Total number of 32-byte and 64-byte read requests to EA |
|
||||
| `tcc_ea_rdreq_32B` | Req | Total number of 32-byte read requests to EA |
|
||||
| `tcc_ea_rd_uncached_32B` | Req | Number of 32-byte L2 cache-EA read due to uncached traffic. A 64-byte request is counted as 2. |
|
||||
| `tcc_ea_rdreq_io_credit_stall[∗]` | Cycles | Number of cycles read request interface runs out of IO credits |
|
||||
| `tcc_ea_rdreq_gmi_credit_stall[∗]` | Cycles | Number of cycles read request interface runs out of GMI credits |
|
||||
| `tcc_ea_rdreq_dram_credit_stall` | Cycles | Number of cycles read request interface runs out of DRAM credits |
|
||||
| `tcc_ea_rdreq_level` | Req | Accumulated number of L2 cache-EA read requests in flight |
|
||||
| `tcc_ea_rdreq_dram` | Req | Number of 32-byte and 64-byte read requests to HBM |
|
||||
| `tcc_ea_wrreq_dram` | Req | Number of 32-byte and 64-byte write requests to HBM |
|
||||
| `tcc_tag_stall` | Cycles | Number of cycles the normal request pipeline in the tag was stalled for any reason |
|
||||
| `tcc_normal_writeback` | Req | Number of L2 cache normal writeback |
|
||||
| `tcc_all_tc_op_wb_writeback[∗]` | Req | Number of instruction-triggered writeback requests |
|
||||
| `tcc_normal_evict` | Req | Number of L2 cache normal evictions |
|
||||
| `tcc_all_tc_op_inv_evict[∗]` | Req | Number of instruction-triggered eviction requests |
|
||||
|
||||
## MI200 derived metrics list
|
||||
|
||||
### Derived metrics on MI200 GPUs
|
||||
|
||||
| Derived Metric | Description |
|
||||
| :----------------| -------------------------------------------------------------------------------------: |
|
||||
| `VFetchInsts` | The average number of vector fetch instructions from the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that fetch from video memory |
|
||||
| `VWriteInsts` | The average number of vector write instructions to the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that write to video memory |
|
||||
| `FlatVMemInsts` | The average number of FLAT instructions that read from or write to the video memory executed per work item (affected by flow control). Includes FLAT instructions that read from or write to scratch |
|
||||
| `LDSInsts` | The average number of LDS read/write instructions executed per work item (affected by flow control). Excludes FLAT instructions that read from or write to LDS |
|
||||
| `FlatLDSInsts` | The average number of FLAT instructions that read or write to LDS executed per work item (affected by flow control) |
|
||||
| `VALUUtilization` | The percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of 64. Value range: 0% (bad), 100% (ideal - no thread divergence) |
|
||||
| `VALUBusy` | The percentage of GPU time vector ALU instructions are processed. Value range: 0% (bad) to 100% (optimal) |
|
||||
| `SALUBusy` | The percentage of GPU time scalar ALU instructions are processed. Value range: 0% (bad) to 100% (optimal) |
|
||||
| `MemWrites32B` | The total number of effective 32B write transactions to the memory |
|
||||
| `L2CacheHit` | The percentage of fetch, write, atomic, and other instructions that hit the data in L2 cache. Value range: 0% (no hit) to 100% (optimal) |
|
||||
| `MemUnitStalled` | The percentage of GPU time the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad) |
|
||||
| `WriteUnitStalled` | The percentage of GPU time the write unit is stalled. Value range: 0% to 100% (bad) |
|
||||
| `LDSBankConflict` | The percentage of GPU time LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad) |
|
||||
|
||||
## MI200 acronyms
|
||||
|
||||
| Abbreviation | Meaning |
|
||||
| :------------| --------------------------------------------------------------------------------: |
|
||||
| `ALU` | Arithmetic logic unit |
|
||||
| `Arb` | Arbiter |
|
||||
| `BF16` | Brain floating point – 16 |
|
||||
| `CC` | Coherently cached |
|
||||
| `CP` | Command processor |
|
||||
| `CPC` | Command processor – compute |
|
||||
| `CPF` | Command processor – fetcher |
|
||||
| `CS` | Compute shader |
|
||||
| `CSC` | Compute shader controller |
|
||||
| `CSn` | Compute Shader, the n-th pipe |
|
||||
| `CU` | Compute unit |
|
||||
| `DW` | 32-bit data word, DWORD |
|
||||
| `EA` | Efficiency arbiter |
|
||||
| `F16` | Half-precision floating point |
|
||||
| `FLAT` | FLAT instructions allow read/write/atomic access to a generic memory address pointer, which can resolve to any of the following physical memories:<br>• Global Memory<br>• Scratch (“private”)<br>• LDS (“shared”)<br>• Invalid – MEM_VIOL TrapStatus |
|
||||
| `FMA` | Fused multiply-add |
|
||||
| `GDS` | Global data share |
|
||||
| `GRBM` | Graphics register bus manager |
|
||||
| `HBM` | High bandwidth memory |
|
||||
| `Instr` | Instructions |
|
||||
| `IOP` | Integer operation |
|
||||
| `L2` | Level-2 cache |
|
||||
| `LDS` | Local data share |
|
||||
| `ME1` | Micro-engine, running packet processing firmware on CPC |
|
||||
| `MFMA` | Matrix fused multiply-add |
|
||||
| `NC` | Noncoherently cached |
|
||||
| `RW` | Coherently cached with write |
|
||||
| `SALU` | Scalar ALU |
|
||||
| `SGPR` | Scalar GPR |
|
||||
| `SIMD` | Single instruction multiple data |
|
||||
| `sL1D` | Scalar Level-1 data cache |
|
||||
| `SMEM` | Scalar memory |
|
||||
| `SPI` | Shader processor input |
|
||||
| `SQ` | Sequencer |
|
||||
| `TA` | Texture addressing unit |
|
||||
| `TC` | Texture cache |
|
||||
| `TCA` | Texture cache arbiter |
|
||||
| `TCC` | Texture cache per channel, known as L2 cache |
|
||||
| `TCIU` | Texture cache interface unit, command processor’s interface to memory system |
|
||||
| `TCP` | Texture cache per pipe, known as vector L1 cache |
|
||||
| `TCR` | Texture cache router |
|
||||
| `TD` | Texture data unit |
|
||||
| `UC` | Uncached |
|
||||
| `UTCL1` | Unified translation cache – level 1 |
|
||||
| `UTCL2` | Unified translation cache – level 2 |
|
||||
| `VALU` | Vector ALU |
|
||||
| `VGPR` | Vector GPR |
|
||||
| `vL1D` | Vector level 1 data cache |
|
||||
| `VMEM` | Vector memory |
|
||||
@@ -1,234 +0,0 @@
|
||||
# GPU memory
|
||||
|
||||
For the HIP reference documentation, see:
|
||||
|
||||
* {doc}`hip:.doxygen/docBin/html/group___memory`
|
||||
* {doc}`hip:.doxygen/docBin/html/group___memory_m`
|
||||
|
||||
Host memory exists on the host (e.g. CPU) of the machine in random access memory (RAM).
|
||||
|
||||
Device memory exists on the device (e.g. GPU) of the machine in video random access memory (VRAM).
|
||||
Recent architectures use graphics double data rate (GDDR) synchronous dynamic random-access memory (SDRAM)such as GDDR6, or high-bandwidth memory (HBM) such as HBM2e.
|
||||
|
||||
## Memory allocation
|
||||
|
||||
Memory can be allocated in two ways: pageable memory, and pinned memory.
|
||||
The following API calls with result in these allocations:
|
||||
|
||||
| API | Data location | Allocation |
|
||||
|--------------------|---------------|------------|
|
||||
| System allocated | Host | Pageable |
|
||||
| `hipMallocManaged` | Host | Managed |
|
||||
| `hipHostMalloc` | Host | Pinned |
|
||||
| `hipMalloc` | Device | Pinned |
|
||||
|
||||
:::{tip}
|
||||
`hipMalloc` and `hipFree` are blocking calls, however, HIP recently added non-blocking versions `hipMallocAsync` and `hipFreeAsync` which take in a stream as an additional argument.
|
||||
:::
|
||||
|
||||
### Pageable memory
|
||||
|
||||
Pageable memory is usually gotten when calling `malloc` or `new` in a C++ application.
|
||||
It is unique in that it exists on "pages" (blocks of memory), which can be migrated to other memory storage.
|
||||
For example, migrating memory between CPU sockets on a motherboard, or a system that runs out of space in RAM and starts dumping pages of RAM into the swap partition of your hard drive.
|
||||
|
||||
### Pinned memory
|
||||
|
||||
Pinned memory (or page-locked memory, or non-pageable memory) is host memory that is mapped into the address space of all GPUs, meaning that the pointer can be used on both host and device.
|
||||
Accessing host-resident pinned memory in device kernels is generally not recommended for performance, as it can force the data to traverse the host-device interconnect (e.g. PCIe), which is much slower than the on-device bandwidth (>40x on MI200).
|
||||
|
||||
Pinned host memory can be allocated with one of two types of coherence support:
|
||||
|
||||
:::{note}
|
||||
In HIP, pinned memory allocations are coherent by default (`hipHostMallocDefault`).
|
||||
There are additional pinned memory flags (e.g. `hipHostMallocMapped` and `hipHostMallocPortable`).
|
||||
On MI200 these options do not impact performance.
|
||||
<!-- TODO: link to programming_manual#memory-allocation-flags -->
|
||||
For more information, see the section *memory allocation flags* in the HIP Programming Guide: {doc}`hip:user_guide/programming_manual`.
|
||||
:::
|
||||
|
||||
Much like how a process can be locked to a CPU core by setting affinity, a pinned memory allocator does this with the memory storage system.
|
||||
On multi-socket systems it is important to ensure that pinned memory is located on the same socket as the owning process, or else each cache line will be moved through the CPU-CPU interconnect, thereby increasing latency and potentially decreasing bandwidth.
|
||||
|
||||
In practice, pinned memory is used to improve transfer times between host and device.
|
||||
For transfer operations, such as `hipMemcpy` or `hipMemcpyAsync`, using pinned memory instead of pageable memory on host can lead to a ~3x improvement in bandwidth.
|
||||
|
||||
:::{tip}
|
||||
If the application needs to move data back and forth between device and host (separate allocations), use pinned memory on the host side.
|
||||
:::
|
||||
|
||||
### Managed memory
|
||||
|
||||
Managed memory refers to universally addressable, or unified memory available on the MI200 series of GPUs.
|
||||
Much like pinned memory, managed memory shares a pointer between host and device and (by default) supports fine-grained coherence, however, managed memory can also automatically migrate pages between host and device.
|
||||
The allocation will be managed by AMD GPU driver using the Linux HMM (Heterogeneous Memory Management) mechanism.
|
||||
|
||||
If heterogenous memory management (HMM) is not available, then `hipMallocManaged` will default back to using system memory and will act like pinned host memory.
|
||||
Other managed memory API calls will have undefined behavior.
|
||||
It is therefore recommended to check for managed memory capability with: `hipDeviceGetAttribute` and `hipDeviceAttributeManagedMemory`.
|
||||
|
||||
HIP supports additional calls that work with page migration:
|
||||
|
||||
* `hipMemAdvise`
|
||||
* `hipMemPrefetchAsync`
|
||||
|
||||
:::{tip}
|
||||
If the application needs to use data on both host and device regularly, does not want to deal with separate allocations, and is not worried about maxing out the VRAM on MI200 GPUs (64 GB per GCD), use managed memory.
|
||||
:::
|
||||
|
||||
:::{tip}
|
||||
If managed memory performance is poor, check to see if managed memory is supported on your system and if page migration (XNACK) is enabled.
|
||||
:::
|
||||
|
||||
## Access behavior
|
||||
|
||||
Memory allocations for GPUs behave as follow:
|
||||
|
||||
| API | Data location | Host access | Device access |
|
||||
|--------------------|---------------|--------------|----------------------|
|
||||
| System allocated | Host | Local access | Unhandled page fault |
|
||||
| `hipMallocManaged` | Host | Local access | Zero-copy |
|
||||
| `hipHostMalloc` | Host | Local access | Zero-copy* |
|
||||
| `hipMalloc` | Device | Zero-copy | Local access |
|
||||
|
||||
Zero-copy accesses happen over the Infinity Fabric interconnect or PCI-E lanes on discrete GPUs.
|
||||
|
||||
:::{note}
|
||||
While `hipHostMalloc` allocated memory is accessible by a device, the host pointer must be converted to a device pointer with `hipHostGetDevicePointer`.
|
||||
|
||||
Memory allocated through standard system allocators such as `malloc`, can be accessed a device by registering the memory via `hipHostRegister`.
|
||||
The device pointer to be used in kernels can be retrieved with `hipHostGetDevicePointer`.
|
||||
Registered memory is treated like `hipHostMalloc` and will have similar performance.
|
||||
|
||||
On devices that support and have [](#xnack) enabled, such as the MI250X, `hipHostRegister` is not required as memory accesses are handled via automatic page migration.
|
||||
:::
|
||||
|
||||
### XNACK
|
||||
|
||||
Normally, host and device memory are separate and data has to be transferred manually via `hipMemcpy`.
|
||||
|
||||
On a subset of GPUs, such as the MI200, there is an option to automatically migrate pages of memory between host and device.
|
||||
This is important for managed memory, where the locality of the data is important for performance.
|
||||
Depending on the system, page migration may be disabled by default in which case managed memory will act like pinned host memory and suffer degraded performance.
|
||||
|
||||
*XNACK* describes the GPUs ability to retry memory accesses that failed due a page fault (which normally would lead to a memory access error), and instead retrieve the missing page.
|
||||
|
||||
This also affects memory allocated by the system as indicated by the following table:
|
||||
|
||||
| API | Data location | Host after device access | Device after host access |
|
||||
|--------------------|---------------|--------------------------|--------------------------|
|
||||
| System allocated | Host | Migrate page to host | Migrate page to device |
|
||||
| `hipMallocManaged` | Host | Migrate page to host | Migrate page to device |
|
||||
| `hipHostMalloc` | Host | Local access | Zero-copy |
|
||||
| `hipMalloc` | Device | Zero-copy | Local access |
|
||||
|
||||
To check if page migration is available on a platform, use `rocminfo`:
|
||||
|
||||
```sh
|
||||
$ rocminfo | grep xnack
|
||||
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
|
||||
```
|
||||
|
||||
Here, `xnack-` means that XNACK is available but is disabled by default.
|
||||
Turning on XNACK by setting the environment variable `HSA_XNACK=1` and gives the expected result, `xnack+`:
|
||||
|
||||
```sh
|
||||
$ HSA_XNACK=1 rocminfo | grep xnack
|
||||
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack+
|
||||
```
|
||||
|
||||
`hipcc`by default will generate code that runs correctly with both XNACK enabled or disabled.
|
||||
Setting the `--offload-arch=`-option with `xnack+` or `xnack-` forces code to be only run with XNACK enabled or disabled respectively.
|
||||
|
||||
```sh
|
||||
# Compiled kernels will run regardless if XNACK is enabled or is disabled.
|
||||
hipcc --offload-arch=gfx90a
|
||||
|
||||
# Compiled kernels will only be run if XNACK is enabled with XNACK=1.
|
||||
hipcc --offload-arch=gfx90a:xnack+
|
||||
|
||||
# Compiled kernels will only be run if XNACK is disabled with XNACK=0.
|
||||
hipcc --offload-arch=gfx90a:xnack-
|
||||
```
|
||||
|
||||
:::{tip}
|
||||
If you want to make use of page migration, use managed memory. While pageable memory will migrate correctly, it is not a portable solution and can have performance issues if the accessed data isn't page aligned.
|
||||
:::
|
||||
|
||||
### Coherence
|
||||
|
||||
* *Coarse-grained coherence* means that memory is only considered up to date at kernel boundaries, which can be enforced through `hipDeviceSynchronize`, `hipStreamSynchronize`, or any blocking operation that acts on the null stream (e.g. `hipMemcpy`).
|
||||
For example, cacheable memory is a type of coarse-grained memory where an up-to-date copy of the data can be stored elsewhere (e.g. in an L2 cache).
|
||||
* *Fine-grained coherence* means the coherence is supported while a CPU/GPU kernel is running.
|
||||
This can be useful if both host and device are operating on the same dataspace using system-scope atomic operations (e.g. updating an error code or flag to a buffer).
|
||||
Fine-grained memory implies that up-to-date data may be made visible to others regardless of kernel boundaries as discussed above.
|
||||
|
||||
| API | Flag | Coherence |
|
||||
|-------------------------|------------------------------|----------------|
|
||||
| `hipHostMalloc` | `hipHostMallocDefault` | Fine-grained |
|
||||
| `hipHostMalloc` | `hipHostMallocNonCoherent` | Coarse-grained |
|
||||
|
||||
| API | Flag | Coherence |
|
||||
|-------------------------|------------------------------|----------------|
|
||||
| `hipExtMallocWithFlags` | `hipHostMallocDefault` | Fine-grained |
|
||||
| `hipExtMallocWithFlags` | `hipDeviceMallocFinegrained` | Coarse-grained |
|
||||
|
||||
| API | `hipMemAdvise` argument | Coherence |
|
||||
|-------------------------|------------------------------|----------------|
|
||||
| `hipMallocManaged` | | Fine-grained |
|
||||
| `hipMallocManaged` | `hipMemAdviseSetCoarseGrain` | Coarse-grained |
|
||||
| `malloc` | | Fine-grained |
|
||||
| `malloc` | `hipMemAdviseSetCoarseGrain` | Coarse-grained |
|
||||
|
||||
:::{tip}
|
||||
Try to design your algorithms to avoid host-device memory coherence (e.g. system scope atomics). While it can be a useful feature in very specific cases, it is not supported on all systems, and can negatively impact performance by introducing the host-device interconnect bottleneck.
|
||||
:::
|
||||
|
||||
The availability of fine- and coarse-grained memory pools can be checked with `rocminfo`:
|
||||
|
||||
```sh
|
||||
$ rocminfo
|
||||
...
|
||||
*******
|
||||
Agent 1
|
||||
*******
|
||||
Name: AMD EPYC 7742 64-Core Processor
|
||||
...
|
||||
Pool Info:
|
||||
Pool 1
|
||||
Segment: GLOBAL; FLAGS: FINE GRAINED
|
||||
...
|
||||
Pool 3
|
||||
Segment: GLOBAL; FLAGS: COARSE GRAINED
|
||||
...
|
||||
*******
|
||||
Agent 9
|
||||
*******
|
||||
Name: gfx90a
|
||||
...
|
||||
Pool Info:
|
||||
Pool 1
|
||||
Segment: GLOBAL; FLAGS: COARSE GRAINED
|
||||
...
|
||||
```
|
||||
|
||||
## System direct memory access
|
||||
|
||||
In most cases, the default behavior for HIP in transferring data from a pinned host allocation to device will run at the limit of the interconnect.
|
||||
However, there are certain cases where the interconnect is not the bottleneck.
|
||||
|
||||
The primary way to transfer data onto and off of a GPU, such as the MI200, is to use the onboard System Direct Memory Access engine, which is used to feed blocks of memory to the off-device interconnect (either GPU-CPU or GPU-GPU).
|
||||
Each GCD has a separate SDMA engine for host-to-device and device-to-host memory transfers.
|
||||
Importantly, SDMA engines are separate from the computing infrastructure, meaning that memory transfers to and from a device will not impact kernel compute performance, though they do impact memory bandwidth to a limited extent.
|
||||
The SDMA engines are mainly tuned for PCIe-4.0 x16, which means they are designed to operate at bandwidths up to 32 GB/s.
|
||||
|
||||
:::{note}
|
||||
An important feature of the MI250X platform is the Infinity Fabric™ interconnect between host and device.
|
||||
The Infinity Fabric interconnect supports improved performance over standard PCIe-4.0 (usually ~50% more bandwidth); however, since the SDMA engine does not run at this speed, it will not max out the bandwidth of the faster interconnect.
|
||||
:::
|
||||
|
||||
The bandwidth limitation can be countered by bypassing the SDMA engine and replacing it with a type of copy kernel known as a "blit" kernel.
|
||||
Blit kernels will use the compute units on the GPU, thereby consuming compute resources, which may not always be beneficial.
|
||||
The easiest way to enable blit kernels is to set an environment variable `HSA_ENABLE_SDMA=0`, which will disable the SDMA engine.
|
||||
On systems where the GPU uses a PCIe interconnect instead of an Infinity Fabric interconnect, blit kernels will not impact bandwidth, but will still consume compute resources.
|
||||
The use of SDMA vs blit kernels also applies to MPI data transfers and GPU-GPU transfers.
|
||||
@@ -1,241 +0,0 @@
|
||||
# Using the LLVM ASan on a GPU (beta release)
|
||||
|
||||
The LLVM AddressSanitizer (ASan) provides a process that allows developers to detect runtime addressing errors in applications and libraries. The detection is achieved using a combination of compiler-added instrumentation and runtime techniques, including function interception and replacement.
|
||||
|
||||
Until now, the LLVM ASan process was only available for traditional purely CPU applications. However, ROCm has extended this mechanism to additionally allow the detection of some addressing errors on the GPU in heterogeneous applications. Ideally, developers should treat heterogeneous HIP and OpenMP applications exactly like pure CPU applications. However, this simplicity has not been achieved yet.
|
||||
|
||||
This document provides documentation on using ROCm ASan.
|
||||
For information about LLVM ASan, see the [LLVM documentation](https://clang.llvm.org/docs/AddressSanitizer.html).
|
||||
|
||||
**Note**: The beta release of LLVM ASan for ROCm is currently tested and validated on Ubuntu 20.04.
|
||||
|
||||
## Compiling for ASan
|
||||
|
||||
The ASan process begins by compiling the application of interest with the ASan instrumentation.
|
||||
|
||||
Recommendations for doing this are:
|
||||
|
||||
* Compile as many application and dependent library sources as possible using an AMD-built clang-based compiler such as `amdclang++`.
|
||||
* Add the following options to the existing compiler and linker options:
|
||||
* `-fsanitize=address` - enables instrumentation
|
||||
* `-shared-libsan` - use shared version of runtime
|
||||
* `-g` - add debug info for improved reporting
|
||||
* Explicitly use `xnack+` in the offload architecture option. For example, `--offload-arch=gfx90a:xnack+`
|
||||
Other architectures are allowed, but their device code will not be instrumented and a warning will be emitted.
|
||||
|
||||
It is not an error to compile some files without ASan instrumentation, but doing so reduces the ability of the process to detect addressing errors. However, if the main program "`a.out`" does not directly depend on the ASan runtime (`libclang_rt.asan-x86_64.so`) after the build completes (check by running `ldd` (List Dynamic Dependencies) or `readelf`), the application will immediately report an error at runtime as described in the next section.
|
||||
|
||||
### About compilation time
|
||||
|
||||
When `-fsanitize=address` is used, the LLVM compiler adds instrumentation code around every memory operation. This added code must be handled by all of the downstream components of the compiler toolchain and results in increased overall compilation time. This increase is especially evident in the AMDGPU device compiler and has in a few instances raised the compile time to an unacceptable level.
|
||||
|
||||
There are a few options if the compile time becomes unacceptable:
|
||||
|
||||
* Avoid instrumentation of the files which have the worst compile times. This will reduce the effectiveness of the ASan process.
|
||||
* Add the option `-fsanitize-recover=address` to the compiles with the worst compile times. This option simplifies the added instrumentation resulting in faster compilation. See below for more information.
|
||||
* Disable instrumentation on a per-function basis by adding `__attribute__`((no_sanitize("address"))) to functions found to be responsible for the large compile time. Again, this will reduce the effectiveness of the process.
|
||||
|
||||
## Installing ROCm GPU ASan packages
|
||||
|
||||
For a complete ROCm GPU Sanitizer installation, including packages, instrumented HSA and HIP runtimes, tools, and math libraries, use the following instruction,
|
||||
|
||||
```bash
|
||||
sudo apt-get install rocm-ml-sdk-asan
|
||||
|
||||
```
|
||||
|
||||
## Using AMD-supplied ASan instrumented libraries
|
||||
|
||||
ROCm releases have optional packages that contain additional ASan instrumented builds of the ROCm libraries (usually found in `/opt/rocm-<version>/lib`). The instrumented libraries have identical names to the regular uninstrumented libraries, and are located in `/opt/rocm-<version>/lib/asan`.
|
||||
These additional libraries are built using the `amdclang++` and `hipcc` compilers, while some uninstrumented libraries are built with g++. The preexisting build options are used but, as described above, additional options are used: `-fsanitize=address`, `-shared-libsan` and `-g`.
|
||||
|
||||
These additional libraries avoid additional developer effort to locate repositories, identify the correct branch, check out the correct tags, and other efforts needed to build the libraries from the source. And they extend the ability of the process to detect addressing errors into the ROCm libraries themselves.
|
||||
|
||||
When adjusting an application build to add instrumentation, linking against these instrumented libraries is unnecessary. For example, any `-L` `/opt/rocm-<version>/lib` compiler options need not be changed. However, the instrumented libraries should be used when the application is run. It is particularly important that the instrumented language runtimes, like `libamdhip64.so` and `librocm-core.so`, are used; otherwise, device invalid access detections may not be reported.
|
||||
|
||||
## Running ASan instrumented applications
|
||||
|
||||
### Preparing to run an instrumented application
|
||||
|
||||
Here are a few recommendations to consider before running an ASan instrumented heterogeneous application.
|
||||
|
||||
* Ensure the Linux kernel running on the system has Heterogeneous Memory Management (HMM) support. A kernel version of 5.6 or higher should be sufficient.
|
||||
* Ensure XNACK is enabled
|
||||
* For `gfx90a` (MI-2X0) or `gfx940` (MI-3X0) use environment `HSA_XNACK = 1`.
|
||||
* For `gfx906` (MI-50) or `gfx908` (MI-100) use environment `HSA_XNACK = 1` but also ensure the amdgpu kernel module is loaded with module argument `noretry=0`.
|
||||
This requirement is due to the fact that the XNACK setting for these GPUs is system-wide.
|
||||
|
||||
* Ensure that the application will use the instrumented libraries when it runs. The output from the shell command `ldd <application name>` can be used to see which libraries will be used.
|
||||
If the instrumented libraries are not listed by `ldd`, the environment variable `LD_LIBRARY_PATH` may need to be adjusted, or in some cases an `RPATH` compiled into the application may need to be changed and the application recompiled.
|
||||
|
||||
* Ensure that the application depends on the ASan runtime. This can be checked by running the command `readelf -d <application name> | grep NEEDED` and verifying that shared library: `libclang_rt.asan-x86_64.so` appears in the output.
|
||||
If it does not appear, when executed the application will quickly output an ASan error that looks like:
|
||||
|
||||
```bash
|
||||
==3210==ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.
|
||||
```
|
||||
|
||||
* Ensure that the application `llvm-symbolizer` can be executed, and that it is located in `/opt/rocm-<version>/llvm/bin`. This executable is not strictly required, but if found is used to translate ("symbolize") a host-side instruction address into a more useful function name, file name, and line number (assuming the application has been built to include debug information).
|
||||
|
||||
There is an environment variable, `ASAN_OPTIONS`, that can be used to adjust the runtime behavior of the ASAN runtime itself. There are more than a hundred "flags" that can be adjusted (see an old list at [flags](https://github.com/google/sanitizers/wiki/AddressSanitizerFlags)) but the default settings are correct and should be used in most cases. It must be noted that these options only affect the host ASAN runtime. The device runtime only currently supports the default settings for the few relevant options.
|
||||
|
||||
There are two `ASAN_OPTION` flags of particular note.
|
||||
|
||||
* `halt_on_error=0/1 default 1`.
|
||||
|
||||
This tells the ASAN runtime to halt the application immediately after detecting and reporting an addressing error. The default makes sense because the application has entered the realm of undefined behavior. If the developer wishes to have the application continue anyway, this option can be set to zero. However, the application and libraries should then be compiled with the additional option `-fsanitize-recover=address`. Note that the ROCm optional ASan instrumented libraries are not compiled with this option and if an error is detected within one of them, but halt_on_error is set to 0, more undefined behavior will occur.
|
||||
|
||||
* `detect_leaks=0/1 default 1`.
|
||||
This option directs the ASan runtime to enable the [Leak Sanitizer](https://clang.llvm.org/docs/LeakSanitizer.html) (LSAN). Unfortunately, for heterogeneous applications, this default will result in significant output from the leak sanitizer when the application exits due to allocations made by the language runtime which are not considered to be to be leaks. This output can be avoided by adding `detect_leaks=0` to the `ASAN_OPTIONS`, or alternatively by producing an LSAN suppression file (syntax described [here](https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer)) and activating it with environment variable `LSAN_OPTIONS=suppressions=/path/to/suppression/file`. When using a suppression file, a suppression report is printed by default. The suppression report can be disabled by using the `LSAN_OPTIONS` flag `print_suppressions=0`.
|
||||
|
||||
## Runtime overhead
|
||||
|
||||
Running an ASan instrumented application incurs
|
||||
overheads which may result in unacceptably long runtimes
|
||||
or failure to run at all.
|
||||
|
||||
### Higher execution time
|
||||
|
||||
ASan detection works by checking each address at runtime
|
||||
before the address is actually accessed by a load, store, or atomic
|
||||
instruction.
|
||||
This checking involves an additional load to "shadow" memory which
|
||||
records whether the address is "poisoned" or not, and additional logic
|
||||
that decides whether to produce an detection report or not.
|
||||
|
||||
This extra runtime work can cause the application to slow down by
|
||||
a factor of three or more, depending on how many memory accesses are
|
||||
executed.
|
||||
For heterogeneous applications, the shadow memory must be accessible by all devices
|
||||
and this can mean that shadow accesses from some devices may be more costly
|
||||
than non-shadow accesses.
|
||||
|
||||
### Higher memory use
|
||||
|
||||
The address checking described above relies on the compiler to surround
|
||||
each program variable with a red zone and on ASan
|
||||
runtime to surround each runtime memory allocation with a red zone and
|
||||
fill the shadow corresponding to each red zone with poison.
|
||||
The added memory for the red zones is additional overhead on top
|
||||
of the 13% overhead for the shadow memory itself.
|
||||
|
||||
Applications which consume most one or more available memory pools when
|
||||
run normally are likely to encounter allocation failures when run with
|
||||
instrumentation.
|
||||
|
||||
## Runtime reporting
|
||||
|
||||
It is not the intention of this document to provide a detailed explanation of all of the types of reports that can be output by the ASan runtime. Instead, the focus is on the differences between the standard reports for CPU issues, and reports for GPU issues.
|
||||
|
||||
An invalid address detection report for the CPU always starts with
|
||||
|
||||
```bash
|
||||
==<PID>==ERROR: AddressSanitizer: <problem type> on address <memory address> at pc <pc> bp <bp> sp <sp> <access> of size <N> at <memory address> thread T0
|
||||
```
|
||||
|
||||
and continues with a stack trace for the access, a stack trace for the allocation and deallocation, if relevant, and a dump of the shadow near the <memory address>.
|
||||
|
||||
In contrast, an invalid address detection report for the GPU always starts with
|
||||
|
||||
```bash
|
||||
==<PID>==ERROR: AddressSanitizer: <problem type> on amdgpu device <device> at pc <pc> <access> of size <n> in workgroup id (<X>,<Y>,<Z>)
|
||||
```
|
||||
|
||||
Above, `<device>` is the integer device ID, and `(<X>, <Y>, <Z>)` is the ID of the workgroup or block where the invalid address was detected.
|
||||
|
||||
While the CPU report include a call stack for the thread attempting the invalid access, the GPU is currently to a call stack of size one, i.e. the (symbolized) of the invalid access, e.g.
|
||||
|
||||
```bash
|
||||
#0 <pc> in <fuction signature> at /path/to/file.hip:<line>:<column>
|
||||
```
|
||||
|
||||
This short call stack is followed by a GPU unique section that looks like
|
||||
|
||||
```bash
|
||||
Thread ids and accessed addresses:
|
||||
<lid0> <maddr 0> : <lid1> <maddr1> : ...
|
||||
```
|
||||
|
||||
where each `<lid j> <maddr j>` indicates the lane ID and the invalid memory address held by lane `j` of the wavefront attempting the invalid access.
|
||||
|
||||
Additionally, reports for invalid GPU accesses to memory allocated by GPU code via `malloc` or new starting with, for example,
|
||||
|
||||
```bash
|
||||
==1234==ERROR: AddressSanitizer: heap-buffer-overflow on amdgpu device 0 at pc 0x7fa9f5c92dcc
|
||||
```
|
||||
|
||||
or
|
||||
|
||||
```bash
|
||||
==5678==ERROR: AddressSanitizer: heap-use-after-free on amdgpu device 3 at pc 0x7f4c10062d74
|
||||
```
|
||||
|
||||
currently may include one or two surprising CPU side tracebacks mentioning :`hostcall`". This is due to how `malloc` and `free` are implemented for GPU code and these call stacks can be ignored.
|
||||
|
||||
### Running with `rocgdb`
|
||||
|
||||
`rocgdb` can be used to further investigate ASan detected errors, with some preparation.
|
||||
|
||||
Currently, the ASan runtime complains when starting `rocgdb` without preparation.
|
||||
|
||||
```bash
|
||||
$ rocgdb my_app
|
||||
==1122==ASan` runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.
|
||||
```
|
||||
|
||||
This is solved by setting environment variable `LD_PRELOAD` to the path to the ASan runtime, whose path can be obtained using the command
|
||||
|
||||
```bash
|
||||
amdclang++ -print-file-name=libclang_rt.asan-x86_64.so
|
||||
```
|
||||
|
||||
It is also recommended to set the environment variable `HIP_ENABLE_DEFERRED_LOADING=0` before debugging HIP applications.
|
||||
|
||||
After starting `rocgdb` breakpoints can be set on the ASan runtime error reporting entry points of interest. For example, if an ASan error report includes
|
||||
|
||||
```bash
|
||||
WRITE of size 4 in workgroup id (10,0,0)
|
||||
```
|
||||
|
||||
the `rocgdb` command needed to stop the program before the report is printed is
|
||||
|
||||
```bash
|
||||
(gdb) break __asan_report_store4
|
||||
```
|
||||
|
||||
Similarly, the appropriate command for a report including
|
||||
|
||||
```bash
|
||||
READ of size <N> in workgroup ID (1,2,3)
|
||||
```
|
||||
|
||||
is
|
||||
|
||||
```bash
|
||||
(gdb) break __asan_report_load<N>
|
||||
```
|
||||
|
||||
It is possible to set breakpoints on all ASan report functions using these commands:
|
||||
|
||||
```bash
|
||||
$ rocgdb <path to application>
|
||||
(gdb) start <commmand line arguments>
|
||||
(gdb) rbreak ^__asan_report
|
||||
(gdb) c
|
||||
```
|
||||
|
||||
### Using ASan with a short HIP application
|
||||
|
||||
Refer to the following example to use ASan with a short HIP application,
|
||||
|
||||
https://github.com/Rmalavally/rocm-examples/blob/Rmalavally-patch-1/LLVM_ASAN/Using-Address-Sanitizer-with-a-Short-HIP-Application.md
|
||||
|
||||
### Known issues with using GPU sanitizer
|
||||
|
||||
* Red zones must have limited size and it is possible for an invalid access to completely miss a red zone and not be detected.
|
||||
|
||||
* Lack of detection or false reports can be caused by the runtime not properly maintaining red zone shadows.
|
||||
|
||||
* Lack of detection on the GPU might also be due to the implementation not instrumenting accesses to all GPU specific address spaces. For example, in the current implementation accesses to "private" or "stack" variables on the GPU are not instrumented, and accesses to HIP shared variables (also known as "local data store" or "LDS") are also not instrumented.
|
||||
|
||||
* It can also be the case that a memory fault is hit for an invalid address even with the instrumentation. This is usually caused by the invalid address being so wild that its shadow address is outside of any memory region, and the fault actually occurs on the access to the shadow address. It is also possible to hit a memory fault for the `NULL` pointer. While address 0 does have a shadow location, it is not poisoned by the runtime.
|
||||
109
docs/conf.py
@@ -5,97 +5,66 @@
|
||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html
|
||||
|
||||
import shutil
|
||||
import jinja2
|
||||
import os
|
||||
|
||||
from rocm_docs import ROCmDocs
|
||||
|
||||
# Environement to process Jinja templates.
|
||||
jinja_env = jinja2.Environment(loader=jinja2.FileSystemLoader("."))
|
||||
|
||||
# Jinja templates to render out.
|
||||
templates = [
|
||||
|
||||
]
|
||||
|
||||
# Render templates and output files without the last extension.
|
||||
# For example: 'install.md.jinja' becomes 'install.md'.
|
||||
for template in templates:
|
||||
rendered = jinja_env.get_template(template).render()
|
||||
with open(os.path.splitext(template)[0], 'w') as file:
|
||||
file.write(rendered)
|
||||
|
||||
shutil.copy2('../CONTRIBUTING.md','./contribute/index.md')
|
||||
shutil.copy2('../RELEASE.md','./about/release-notes.md')
|
||||
shutil.copy2('../CONTRIBUTING.md','./contributing.md')
|
||||
shutil.copy2('../RELEASE.md','./release.md')
|
||||
# Keep capitalization due to similar linking on GitHub's markdown preview.
|
||||
shutil.copy2('../CHANGELOG.md','./about/CHANGELOG.md')
|
||||
|
||||
latex_engine = "xelatex"
|
||||
latex_elements = {
|
||||
"fontpkg": r"""
|
||||
\usepackage{tgtermes}
|
||||
\usepackage{tgheros}
|
||||
\renewcommand\ttdefault{txtt}
|
||||
"""
|
||||
}
|
||||
shutil.copy2('../CHANGELOG.md','./CHANGELOG.md')
|
||||
|
||||
# configurations for PDF output by Read the Docs
|
||||
project = "ROCm Documentation"
|
||||
author = "Advanced Micro Devices, Inc."
|
||||
copyright = "Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved."
|
||||
version = "5.7.1"
|
||||
release = "5.7.1"
|
||||
version = "5.3.3"
|
||||
release = "5.3.3"
|
||||
|
||||
setting_all_article_info = True
|
||||
all_article_info_os = ["linux", "windows"]
|
||||
all_article_info_os = ["linux"]
|
||||
all_article_info_author = ""
|
||||
|
||||
# pages with specific settings
|
||||
article_pages = [
|
||||
{
|
||||
"file":"release",
|
||||
"os":["linux", "windows"],
|
||||
"date":"2023-07-27"
|
||||
},
|
||||
{"file":"deploy/linux/index", "os":["linux"]},
|
||||
{"file":"deploy/linux/install_overview", "os":["linux"]},
|
||||
{"file":"deploy/linux/prerequisites", "os":["linux"]},
|
||||
{"file":"deploy/linux/quick_start", "os":["linux"]},
|
||||
{"file":"deploy/linux/install", "os":["linux"]},
|
||||
{"file":"deploy/linux/upgrade", "os":["linux"]},
|
||||
{"file":"deploy/linux/uninstall", "os":["linux"]},
|
||||
{"file":"deploy/linux/package_manager_integration", "os":["linux"]},
|
||||
{"file":"deploy/docker", "os":["linux"]},
|
||||
|
||||
{"file":"release/gpu_os_support", "os":["linux"]},
|
||||
{"file":"release/docker_support_matrix", "os":["linux"]},
|
||||
|
||||
{"file":"reference/gpu_libraries/communication", "os":["linux"]},
|
||||
{"file":"reference/ai_tools", "os":["linux"]},
|
||||
{"file":"reference/management_tools", "os":["linux"]},
|
||||
{"file":"reference/validation_tools", "os":["linux"]},
|
||||
{"file":"reference/framework_compatibility/framework_compatibility", "os":["linux"]},
|
||||
{"file":"reference/computer_vision", "os":["linux"]},
|
||||
|
||||
{"file":"how_to/deep_learning_rocm", "os":["linux"]},
|
||||
{"file":"how_to/gpu_aware_mpi", "os":["linux"]},
|
||||
{"file":"how_to/magma_install/magma_install", "os":["linux"]},
|
||||
{"file":"how_to/pytorch_install/pytorch_install", "os":["linux"]},
|
||||
{"file":"how_to/system_debugging", "os":["linux"]},
|
||||
{"file":"how_to/tensorflow_install/tensorflow_install", "os":["linux"]},
|
||||
|
||||
{"file":"install/windows/install-quick", "os":["windows"]},
|
||||
{"file":"install/linux/install-quick", "os":["linux"]},
|
||||
|
||||
{"file":"install/linux/install", "os":["linux"]},
|
||||
{"file":"install/linux/install-options", "os":["linux"]},
|
||||
{"file":"install/linux/prerequisites", "os":["linux"]},
|
||||
|
||||
{"file":"install/docker", "os":["linux"]},
|
||||
{"file":"install/magma-install", "os":["linux"]},
|
||||
{"file":"install/pytorch-install", "os":["linux"]},
|
||||
{"file":"install/tensorflow-install", "os":["linux"]},
|
||||
|
||||
{"file":"install/windows/install", "os":["windows"]},
|
||||
{"file":"install/windows/prerequisites", "os":["windows"]},
|
||||
{"file":"install/windows/cli/index", "os":["windows"]},
|
||||
{"file":"install/windows/gui/index", "os":["windows"]},
|
||||
|
||||
{"file":"about/compatibility/linux-support", "os":["linux"]},
|
||||
{"file":"about/compatibility/windows-support", "os":["windows"]},
|
||||
|
||||
{"file":"about/compatibility/docker-image-support-matrix", "os":["linux"]},
|
||||
{"file":"about/compatibility/user-kernel-space-compat-matrix", "os":["linux"]},
|
||||
|
||||
{"file":"reference/library-index", "os":["linux"]},
|
||||
|
||||
{"file":"how-to/deep-learning-rocm", "os":["linux"]},
|
||||
{"file":"how-to/gpu-enabled-mpi", "os":["linux"]},
|
||||
{"file":"how-to/system-debugging", "os":["linux"]},
|
||||
{"file":"how-to/tuning-guides", "os":["linux", "windows"]},
|
||||
|
||||
{"file":"rocm-a-z", "os":["linux", "windows"]},
|
||||
{"file":"examples/machine_learning", "os":["linux"]},
|
||||
{"file":"examples/inception_casestudy/inception_casestudy", "os":["linux"]},
|
||||
|
||||
{"file":"understand/file_reorg", "os":["linux"]},
|
||||
|
||||
{"file":"understand/isv_deployment_win", "os":["windows"]},
|
||||
]
|
||||
|
||||
exclude_patterns = ['temp']
|
||||
|
||||
external_toc_path = "./sphinx/_toc.yml"
|
||||
|
||||
docs_core = ROCmDocs("ROCm Documentation")
|
||||
docs_core = ROCmDocs("ROCm 5.3.3 Documentation Home")
|
||||
docs_core.setup()
|
||||
|
||||
external_projects_current_project = "rocm"
|
||||
|
||||
@@ -1,148 +0,0 @@
|
||||
# Building documentation
|
||||
|
||||
You can build our documentation via GitHub (in a pull request) or locally (using the command line or
|
||||
Visual Studio (VS) Code.
|
||||
|
||||
## GitHub
|
||||
|
||||
If you open a pull request on the `develop` branch of a ROCm repository and scroll to the bottom of
|
||||
the page, there is a summary panel. Next to the line
|
||||
`docs/readthedocs.com:advanced-micro-devices-demo`, there is a `Details` link. If you click this, it takes
|
||||
you to the Read the Docs build for your pull request.
|
||||
|
||||

|
||||
|
||||
If you don't see this line, click `Show all checks` to get an itemized view.
|
||||
|
||||
## Command line
|
||||
|
||||
You can build our documentation via the command line using Python. We use Python 3.8; other
|
||||
versions may not support the build.
|
||||
|
||||
Use the Python Virtual Environment (`venv`) and run the following commands from the project root:
|
||||
|
||||
```sh
|
||||
python3 -mvenv .venv
|
||||
|
||||
# Windows
|
||||
.venv/Scripts/python -m pip install -r docs/sphinx/requirements.txt
|
||||
.venv/Scripts/python -m sphinx -T -E -b html -d _build/doctrees -D language=en docs _build/html
|
||||
|
||||
# Linux
|
||||
.venv/bin/python -m pip install -r docs/sphinx/requirements.txt
|
||||
.venv/bin/python -m sphinx -T -E -b html -d _build/doctrees -D language=en docs _build/html
|
||||
```
|
||||
|
||||
Navigate to `_build/html/index.html` and open this file in a web browser.
|
||||
|
||||
## Visual Studio Code
|
||||
|
||||
With the help of a few extensions, you can create a productive environment to author and test
|
||||
documentation locally using Visual Studio (VS) Code. Follow these steps to configure VS Code:
|
||||
|
||||
1. Install the required extensions:
|
||||
|
||||
* Python: `(ms-python.python)`
|
||||
* Live Server: `(ritwickdey.LiveServer)`
|
||||
|
||||
2. Add the following entries to `.vscode/settings.json`.
|
||||
|
||||
```json
|
||||
{
|
||||
"liveServer.settings.root": "/.vscode/build/html",
|
||||
"liveServer.settings.wait": 1000,
|
||||
"python.terminal.activateEnvInCurrentTerminal": true
|
||||
}
|
||||
```
|
||||
|
||||
* `liveServer.settings.root`: Sets the root of the output website for live previews. Must be changed
|
||||
alongside the `tasks.json` command.
|
||||
* `liveServer.settings.wait`: Tells the live server to wait with the update in order to give Sphinx time to
|
||||
regenerate the site contents and not refresh before the build is complete.
|
||||
* `python.terminal.activateEnvInCurrentTerminal`: Activates the automatic virtual environment, so you
|
||||
can build the site from the integrated terminal.
|
||||
|
||||
3. Add the following tasks to `.vscode/tasks.json`.
|
||||
|
||||
```json
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"tasks": [
|
||||
{
|
||||
"label": "Build Docs",
|
||||
"type": "process",
|
||||
"windows": {
|
||||
"command": "${workspaceFolder}/.venv/Scripts/python.exe"
|
||||
},
|
||||
"command": "${workspaceFolder}/.venv/bin/python3",
|
||||
"args": [
|
||||
"-m",
|
||||
"sphinx",
|
||||
"-j",
|
||||
"auto",
|
||||
"-T",
|
||||
"-b",
|
||||
"html",
|
||||
"-d",
|
||||
"${workspaceFolder}/.vscode/build/doctrees",
|
||||
"-D",
|
||||
"language=en",
|
||||
"${workspaceFolder}/docs",
|
||||
"${workspaceFolder}/.vscode/build/html"
|
||||
],
|
||||
"problemMatcher": [
|
||||
{
|
||||
"owner": "sphinx",
|
||||
"fileLocation": "absolute",
|
||||
"pattern": {
|
||||
"regexp": "^(?:.*\\.{3}\\s+)?(\\/[^:]*|[a-zA-Z]:\\\\[^:]*):(\\d+):\\s+(WARNING|ERROR):\\s+(.*)$",
|
||||
"file": 1,
|
||||
"line": 2,
|
||||
"severity": 3,
|
||||
"message": 4
|
||||
}
|
||||
},
|
||||
{
|
||||
"owner": "sphinx",
|
||||
"fileLocation": "absolute",
|
||||
"pattern": {
|
||||
"regexp": "^(?:.*\\.{3}\\s+)?(\\/[^:]*|[a-zA-Z]:\\\\[^:]*):{1,2}\\s+(WARNING|ERROR):\\s+(.*)$",
|
||||
"file": 1,
|
||||
"severity": 2,
|
||||
"message": 3
|
||||
}
|
||||
}
|
||||
],
|
||||
"group": {
|
||||
"kind": "build",
|
||||
"isDefault": true
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
> (Implementation detail: two problem matchers were needed to be defined,
|
||||
> because VS Code doesn't tolerate some problem information being potentially
|
||||
> absent. While a single regex could match all types of errors, if a capture
|
||||
> group remains empty (the line number doesn't show up in all warning/error
|
||||
> messages) but the `pattern` references said empty capture group, VS Code
|
||||
> discards the message completely.)
|
||||
|
||||
4. Configure the Python virtual environment (`venv`).
|
||||
|
||||
From the Command Palette, run `Python: Create Environment`. Select `venv` environment and
|
||||
`docs/sphinx/requirements.txt`.
|
||||
|
||||
5. Build the docs.
|
||||
|
||||
Launch the default build task using one of the following options:
|
||||
|
||||
* A hotkey (the default is `Ctrl+Shift+B`)
|
||||
* Issuing the `Tasks: Run Build Task` from the Command Palette
|
||||
|
||||
6. Open the live preview.
|
||||
|
||||
Navigate to the site output within VS Code: right-click on `.vscode/build/html/index.html` and
|
||||
select `Open with Live Server`. The contents should update on every rebuild without having to
|
||||
refresh the browser.
|
||||
@@ -1,27 +0,0 @@
|
||||
# How to provide feedback for ROCm documentation
|
||||
|
||||
There are four standard ways to provide feedback for this repository.
|
||||
|
||||
## Pull request
|
||||
|
||||
All contributions to ROCm documentation should arrive via the
|
||||
[GitHub Flow](https://docs.github.com/en/get-started/quickstart/github-flow)
|
||||
targeting the develop branch of the repository. If you are unable to contribute
|
||||
via the GitHub Flow, feel free to email us at [rocm-feedback@amd.com](mailto:rocm-feedback@amd.com?subject=Documentation%20Feedback).
|
||||
|
||||
## GitHub discussions
|
||||
|
||||
To ask questions or view answers to frequently asked questions, refer to
|
||||
[GitHub Discussions](https://github.com/RadeonOpenCompute/ROCm/discussions).
|
||||
On GitHub Discussions, in addition to asking and answering questions,
|
||||
members can share updates, have open-ended conversations,
|
||||
and follow along on via public announcements.
|
||||
|
||||
## GitHub issue
|
||||
|
||||
Issues on existing or absent docs can be filed as
|
||||
[GitHub Issues](https://github.com/RadeonOpenCompute/ROCm/issues).
|
||||
|
||||
## Email
|
||||
|
||||
Send other feedback or questions to [rocm-feedback@amd.com](mailto:rocm-feedback@amd.com?subject=Documentation%20Feedback).
|
||||
@@ -1,71 +0,0 @@
|
||||
# ROCm documentation toolchain
|
||||
|
||||
Our documentation relies on several open source toolchains and sites.
|
||||
|
||||
## `rocm-docs-core`
|
||||
|
||||
[rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) is an AMD-maintained
|
||||
project that applies customization for our documentation. This
|
||||
project is the tool most ROCm repositories use as part of the documentation
|
||||
build. It is also available as a [pip package on PyPI](https://pypi.org/project/rocm-docs-core/).
|
||||
|
||||
See the user and developer guides for rocm-docs-core at {doc}`rocm-docs-core documentation<rocm-docs-core:index>`.
|
||||
|
||||
## Sphinx
|
||||
|
||||
[Sphinx](https://www.sphinx-doc.org/en/master/) is a documentation generator
|
||||
originally used for Python. It is now widely used in the open source community.
|
||||
Originally, Sphinx supported reStructuredText (RST) based documentation, but
|
||||
Markdown support is now available.
|
||||
ROCm documentation plans to default to Markdown for new projects.
|
||||
Existing projects using RST are under no obligation to convert to Markdown. New
|
||||
projects that believe Markdown is not suitable should contact the documentation
|
||||
team prior to selecting RST.
|
||||
|
||||
## Read the Docs
|
||||
|
||||
[Read the Docs](https://docs.readthedocs.io/en/stable/) is the service that builds
|
||||
and hosts the HTML documentation generated using Sphinx to our end users.
|
||||
|
||||
## Doxygen
|
||||
|
||||
[Doxygen](https://www.doxygen.nl/) is a documentation generator that extracts
|
||||
information from inline code.
|
||||
ROCm projects typically use Doxygen for public API documentation unless the
|
||||
upstream project uses a different tool.
|
||||
|
||||
### Breathe
|
||||
|
||||
[Breathe](https://www.breathe-doc.org/) is a Sphinx plugin to integrate Doxygen
|
||||
content.
|
||||
|
||||
### MyST
|
||||
|
||||
[Markedly Structured Text (MyST)](https://myst-tools.org/docs/spec) is an extended
|
||||
flavor of Markdown ([CommonMark](https://commonmark.org/)) influenced by reStructuredText (RST) and Sphinx.
|
||||
It is integrated into ROCm documentation by the Sphinx extension [`myst-parser`](https://myst-parser.readthedocs.io/en/latest/).
|
||||
A cheat sheet that showcases how to use the MyST syntax is available over at
|
||||
the [Jupyter reference](https://jupyterbook.org/en/stable/reference/cheatsheet.html).
|
||||
|
||||
### Sphinx External ToC
|
||||
|
||||
[Sphinx External ToC](https://sphinx-external-toc.readthedocs.io/en/latest/intro.html)
|
||||
is a Sphinx extension used for ROCm documentation navigation. This tool generates a navigation menu on the left
|
||||
based on a YAML file that specifies the table of contents.
|
||||
It was selected due to its flexibility that allows scripts to operate on the
|
||||
YAML file. Please transition to this file for the project's navigation. You can
|
||||
see the `_toc.yml.in` file in this repository in the `docs/sphinx` folder for an
|
||||
example.
|
||||
|
||||
### Sphinx-book-theme
|
||||
|
||||
[Sphinx-book-theme](https://sphinx-book-theme.readthedocs.io/en/latest/) is a Sphinx theme
|
||||
that defines the base appearance for ROCm documentation.
|
||||
ROCm documentation applies some customization,
|
||||
such as a custom header and footer on top of the Sphinx Book Theme.
|
||||
|
||||
### Sphinx design
|
||||
|
||||
[Sphinx design](https://sphinx-design.readthedocs.io/en/latest/index.html) is a Sphinx extension that adds design
|
||||
functionality.
|
||||
ROCm documentation uses Sphinx Design for grids, cards, and synchronized tabs.
|
||||
|
Before Width: | Height: | Size: 10 KiB |
|
Before Width: | Height: | Size: 939 KiB After Width: | Height: | Size: 939 KiB |
|
Before Width: | Height: | Size: 537 KiB After Width: | Height: | Size: 537 KiB |
|
Before Width: | Height: | Size: 292 KiB After Width: | Height: | Size: 292 KiB |
|
Before Width: | Height: | Size: 1.3 MiB After Width: | Height: | Size: 1.3 MiB |
BIN
docs/data/deploy/quick_start_windows/AMD-Display-Driver.png
Normal file
|
After Width: | Height: | Size: 163 KiB |
BIN
docs/data/deploy/quick_start_windows/AMD-Logo.png
Normal file
|
After Width: | Height: | Size: 2.1 KiB |
BIN
docs/data/deploy/quick_start_windows/BitCode-Profiler.png
Normal file
|
After Width: | Height: | Size: 34 KiB |
BIN
docs/data/deploy/quick_start_windows/DeSelectAll.png
Normal file
|
After Width: | Height: | Size: 183 KiB |
BIN
docs/data/deploy/quick_start_windows/HIP-Libraries.png
Normal file
|
After Width: | Height: | Size: 40 KiB |
BIN
docs/data/deploy/quick_start_windows/HIP-Ray-Tracing.png
Normal file
|
After Width: | Height: | Size: 40 KiB |
BIN
docs/data/deploy/quick_start_windows/HIP-Runtime-Compiler.png
Normal file
|
After Width: | Height: | Size: 36 KiB |
BIN
docs/data/deploy/quick_start_windows/HIP-SDK-Core.png
Normal file
|
After Width: | Height: | Size: 38 KiB |
BIN
docs/data/deploy/quick_start_windows/Installation-Complete.png
Normal file
|
After Width: | Height: | Size: 407 KiB |
BIN
docs/data/deploy/quick_start_windows/Installation.png
Normal file
|
After Width: | Height: | Size: 465 KiB |
BIN
docs/data/deploy/quick_start_windows/Installer-Window.png
Normal file
|
After Width: | Height: | Size: 207 KiB |
BIN
docs/data/deploy/quick_start_windows/Loading-Window.png
Normal file
|
After Width: | Height: | Size: 461 KiB |
BIN
docs/data/deploy/quick_start_windows/LoadingWindow.png
Normal file
|
After Width: | Height: | Size: 461 KiB |
|
Before Width: | Height: | Size: 3.5 KiB After Width: | Height: | Size: 3.5 KiB |
BIN
docs/data/deploy/quick_start_windows/Uninstallation.png
Normal file
|
After Width: | Height: | Size: 412 KiB |
BIN
docs/data/deploy/quick_start_windows/Windows-Security.png
Normal file
|
After Width: | Height: | Size: 68 KiB |
1
docs/data/deploy/quick_start_windows/image planning
Normal file
@@ -0,0 +1 @@
|
||||
|
||||
|
Before Width: | Height: | Size: 10 KiB After Width: | Height: | Size: 10 KiB |
|
Before Width: | Height: | Size: 13 KiB After Width: | Height: | Size: 13 KiB |
|
Before Width: | Height: | Size: 13 KiB After Width: | Height: | Size: 13 KiB |
|
Before Width: | Height: | Size: 88 KiB After Width: | Height: | Size: 88 KiB |
|
Before Width: | Height: | Size: 32 KiB After Width: | Height: | Size: 32 KiB |
|
Before Width: | Height: | Size: 99 KiB After Width: | Height: | Size: 99 KiB |
|
Before Width: | Height: | Size: 130 KiB After Width: | Height: | Size: 130 KiB |
|
Before Width: | Height: | Size: 21 KiB After Width: | Height: | Size: 21 KiB |
|
Before Width: | Height: | Size: 8.8 KiB After Width: | Height: | Size: 8.8 KiB |
|
Before Width: | Height: | Size: 14 KiB After Width: | Height: | Size: 14 KiB |
|
Before Width: | Height: | Size: 25 KiB After Width: | Height: | Size: 25 KiB |
|
Before Width: | Height: | Size: 144 KiB After Width: | Height: | Size: 144 KiB |
|
Before Width: | Height: | Size: 17 KiB After Width: | Height: | Size: 17 KiB |
|
Before Width: | Height: | Size: 47 KiB After Width: | Height: | Size: 47 KiB |
|
Before Width: | Height: | Size: 41 KiB After Width: | Height: | Size: 41 KiB |
|
Before Width: | Height: | Size: 14 KiB After Width: | Height: | Size: 14 KiB |
|
Before Width: | Height: | Size: 19 KiB After Width: | Height: | Size: 19 KiB |
|
Before Width: | Height: | Size: 57 KiB After Width: | Height: | Size: 57 KiB |
|
Before Width: | Height: | Size: 36 KiB After Width: | Height: | Size: 36 KiB |
|
Before Width: | Height: | Size: 102 KiB After Width: | Height: | Size: 102 KiB |
|
Before Width: | Height: | Size: 114 KiB After Width: | Height: | Size: 114 KiB |
|
Before Width: | Height: | Size: 3.6 KiB |
|
Before Width: | Height: | Size: 3.5 KiB |
|
Before Width: | Height: | Size: 114 KiB |
|
Before Width: | Height: | Size: 110 KiB |
|
Before Width: | Height: | Size: 26 KiB |
|
Before Width: | Height: | Size: 26 KiB |
|
Before Width: | Height: | Size: 228 KiB |
|
Before Width: | Height: | Size: 796 KiB |
|
Before Width: | Height: | Size: 310 KiB |
|
Before Width: | Height: | Size: 789 KiB |
|
Before Width: | Height: | Size: 801 KiB |
|
Before Width: | Height: | Size: 102 KiB |
|
Before Width: | Height: | Size: 102 KiB |
|
Before Width: | Height: | Size: 103 KiB After Width: | Height: | Size: 103 KiB |
|
Before Width: | Height: | Size: 59 KiB After Width: | Height: | Size: 59 KiB |
|
Before Width: | Height: | Size: 41 KiB After Width: | Height: | Size: 41 KiB |
|
Before Width: | Height: | Size: 39 KiB After Width: | Height: | Size: 39 KiB |
|
Before Width: | Height: | Size: 47 KiB After Width: | Height: | Size: 47 KiB |
|
Before Width: | Height: | Size: 33 KiB After Width: | Height: | Size: 33 KiB |
|
Before Width: | Height: | Size: 323 KiB |
|
Before Width: | Height: | Size: 58 KiB After Width: | Height: | Size: 58 KiB |
|
Before Width: | Height: | Size: 46 KiB After Width: | Height: | Size: 46 KiB |
|
Before Width: | Height: | Size: 64 KiB After Width: | Height: | Size: 64 KiB |
|
Before Width: | Height: | Size: 28 KiB After Width: | Height: | Size: 28 KiB |