mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-09 22:58:17 -05:00
Compare commits
17 Commits
ci_example
...
docs/5.7.0
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
30f6f2508a | ||
|
|
3241bbbf5e | ||
|
|
f266801752 | ||
|
|
920b185f14 | ||
|
|
60fcfccf9b | ||
|
|
1aa1360ee8 | ||
|
|
6f3078d5cc | ||
|
|
90ce130831 | ||
|
|
5691e8e881 | ||
|
|
1146917571 | ||
|
|
7460f4f4b9 | ||
|
|
9bb9bf20bf | ||
|
|
4ca1db8f13 | ||
|
|
215d161eaa | ||
|
|
7d0083c13a | ||
|
|
ab7c9fbf47 | ||
|
|
10be92bb79 |
@@ -3,16 +3,19 @@
|
||||
|
||||
version: 2
|
||||
|
||||
sphinx:
|
||||
configuration: docs/conf.py
|
||||
|
||||
formats: [htmlzip, pdf, epub]
|
||||
build:
|
||||
os: ubuntu-22.04
|
||||
tools:
|
||||
python: "3.10"
|
||||
apt_packages:
|
||||
- "doxygen"
|
||||
- "graphviz" # For dot graphs in doxygen
|
||||
|
||||
python:
|
||||
install:
|
||||
- requirements: docs/sphinx/requirements.txt
|
||||
|
||||
build:
|
||||
os: ubuntu-20.04
|
||||
tools:
|
||||
python: "3.8"
|
||||
sphinx:
|
||||
configuration: docs/conf.py
|
||||
|
||||
formats: []
|
||||
|
||||
684
.wordlist.txt
684
.wordlist.txt
@@ -1,122 +1,686 @@
|
||||
# building
|
||||
matchers
|
||||
# file_reorg
|
||||
FHS
|
||||
incrementing
|
||||
Filesystem
|
||||
filesystem
|
||||
rocm
|
||||
# gpu_aware_mpi
|
||||
DMA
|
||||
GDR
|
||||
HCA
|
||||
MPI
|
||||
MVAPICH
|
||||
Mellanox's
|
||||
NIC
|
||||
OFED
|
||||
OSU
|
||||
OpenFabrics
|
||||
PeerDirect
|
||||
RDMA
|
||||
UCX
|
||||
ib_core
|
||||
# isv_deployment_win
|
||||
AAC
|
||||
ABI
|
||||
# linear algebra
|
||||
LAPACK
|
||||
MMA
|
||||
backends
|
||||
cuSOLVER
|
||||
cuSPARSE
|
||||
# mi200_performance_counters
|
||||
ACE
|
||||
ACEs
|
||||
AccVGPR
|
||||
AccVGPRs
|
||||
ALU
|
||||
AMD
|
||||
AMDGPU
|
||||
AMDGPUs
|
||||
AMDMIGraphX
|
||||
AMI
|
||||
AOCC
|
||||
AOMP
|
||||
APIC
|
||||
APIs
|
||||
APU
|
||||
ASIC
|
||||
ASICs
|
||||
ASan
|
||||
ASm
|
||||
ATI
|
||||
AddressSanitizer
|
||||
AlexNet
|
||||
Arb
|
||||
BLAS
|
||||
BMC
|
||||
BitCode
|
||||
Blit
|
||||
Bluefield
|
||||
CCD
|
||||
CDNA
|
||||
CIFAR
|
||||
CLI
|
||||
CLion
|
||||
CMake
|
||||
CMakeLists
|
||||
CMakePackage
|
||||
CP
|
||||
CPC
|
||||
CPF
|
||||
CPP
|
||||
CPU
|
||||
CPUs
|
||||
CSC
|
||||
CSE
|
||||
CSV
|
||||
CSn
|
||||
CTests
|
||||
CU
|
||||
CUDA
|
||||
CUs
|
||||
CXX
|
||||
Cavium
|
||||
CentOS
|
||||
ChatGPT
|
||||
CoRR
|
||||
Codespaces
|
||||
Commitizen
|
||||
CommonMark
|
||||
Concretized
|
||||
Conda
|
||||
ConnectX
|
||||
DGEMM
|
||||
DKMS
|
||||
DL
|
||||
DMA
|
||||
DNN
|
||||
DNNL
|
||||
DPM
|
||||
DRI
|
||||
DW
|
||||
DWORD
|
||||
Dask
|
||||
DataFrame
|
||||
DataLoader
|
||||
DataParallel
|
||||
DeepSpeed
|
||||
Dependabot
|
||||
DevCap
|
||||
Dockerfile
|
||||
Doxygen
|
||||
ELMo
|
||||
ENDPGM
|
||||
EPYC
|
||||
ESXi
|
||||
FFT
|
||||
FFTs
|
||||
FFmpeg
|
||||
FHS
|
||||
FMA
|
||||
FP
|
||||
Filesystem
|
||||
Flang
|
||||
Fortran
|
||||
Fuyu
|
||||
GALB
|
||||
GCD
|
||||
GCDs
|
||||
GCN
|
||||
GDB
|
||||
GDDR
|
||||
GDR
|
||||
GDS
|
||||
GEMM
|
||||
GEMMs
|
||||
GFortran
|
||||
GiB
|
||||
GIM
|
||||
GL
|
||||
GLXT
|
||||
GMI
|
||||
GPG
|
||||
GPR
|
||||
GPT
|
||||
GPU
|
||||
GPU's
|
||||
GPUs
|
||||
GRBM
|
||||
GenAI
|
||||
GenZ
|
||||
GitHub
|
||||
Gitpod
|
||||
HBM
|
||||
HCA
|
||||
HIPCC
|
||||
HIPExtension
|
||||
HIPIFY
|
||||
HPC
|
||||
HPCG
|
||||
HPE
|
||||
HPL
|
||||
HSA
|
||||
HWE
|
||||
Haswell
|
||||
Higgs
|
||||
Hyperparameters
|
||||
ICV
|
||||
IDE
|
||||
IDEs
|
||||
IMDb
|
||||
IOMMU
|
||||
IOP
|
||||
IOPM
|
||||
IOV
|
||||
IRQ
|
||||
ISA
|
||||
ISV
|
||||
ISVs
|
||||
ImageNet
|
||||
InfiniBand
|
||||
Inlines
|
||||
IntelliSense
|
||||
Intersphinx
|
||||
Intra
|
||||
Ioffe
|
||||
JSON
|
||||
Jupyter
|
||||
KFD
|
||||
KiB
|
||||
KVM
|
||||
Keras
|
||||
Khronos
|
||||
LAPACK
|
||||
LCLK
|
||||
LDS
|
||||
LLM
|
||||
LLMs
|
||||
LLVM
|
||||
LM
|
||||
LSAN
|
||||
LTS
|
||||
LoRA
|
||||
MEM
|
||||
MERCHANTABILITY
|
||||
MFMA
|
||||
MiB
|
||||
MIGraphX
|
||||
MIOpen
|
||||
MIOpenGEMM
|
||||
MIVisionX
|
||||
MLM
|
||||
MMA
|
||||
MMIO
|
||||
MMIOH
|
||||
MNIST
|
||||
MPI
|
||||
MSVC
|
||||
MVAPICH
|
||||
MVFFR
|
||||
Makefile
|
||||
Makefiles
|
||||
Matplotlib
|
||||
Megatron
|
||||
Mellanox
|
||||
Mellanox's
|
||||
Meta's
|
||||
MirroredStrategy
|
||||
Multicore
|
||||
Multithreaded
|
||||
MyEnvironment
|
||||
MyST
|
||||
NBIO
|
||||
NBIOs
|
||||
NIC
|
||||
NICs
|
||||
NLI
|
||||
NLP
|
||||
NPS
|
||||
NSP
|
||||
NUMA
|
||||
NVCC
|
||||
NVIDIA
|
||||
NVPTX
|
||||
NaN
|
||||
Nano
|
||||
Navi
|
||||
Noncoherently
|
||||
NousResearch's
|
||||
NumPy
|
||||
OAM
|
||||
OAMs
|
||||
OCP
|
||||
OEM
|
||||
OFED
|
||||
OMP
|
||||
OMPI
|
||||
OMPT
|
||||
OMPX
|
||||
ONNX
|
||||
OSS
|
||||
OSU
|
||||
OpenCL
|
||||
OpenCV
|
||||
OpenFabrics
|
||||
OpenGL
|
||||
OpenMP
|
||||
OpenSSL
|
||||
OpenVX
|
||||
PCI
|
||||
PCIe
|
||||
PEFT
|
||||
PIL
|
||||
PILImage
|
||||
PRNG
|
||||
PRs
|
||||
PaLM
|
||||
Pageable
|
||||
PeerDirect
|
||||
Perfetto
|
||||
PipelineParallel
|
||||
PnP
|
||||
PowerShell
|
||||
PyPi
|
||||
PyTorch
|
||||
Qcycles
|
||||
RAII
|
||||
RCCL
|
||||
RDC
|
||||
RDMA
|
||||
RDNA
|
||||
RHEL
|
||||
ROC
|
||||
ROCProfiler
|
||||
ROCTracer
|
||||
ROCclr
|
||||
ROCdbgapi
|
||||
ROCgdb
|
||||
ROCk
|
||||
ROCm
|
||||
ROCmCC
|
||||
ROCmSoftwarePlatform
|
||||
ROCmValidationSuite
|
||||
ROCr
|
||||
RST
|
||||
RW
|
||||
Radeon
|
||||
RelWithDebInfo
|
||||
Req
|
||||
Rickle
|
||||
RoCE
|
||||
Ryzen
|
||||
SALU
|
||||
SBIOS
|
||||
SCA
|
||||
SDK
|
||||
SDMA
|
||||
SDRAM
|
||||
SENDMSG
|
||||
SGPR
|
||||
SGPRs
|
||||
SHA
|
||||
SIGQUIT
|
||||
SIMD
|
||||
SIMDs
|
||||
SKU
|
||||
SKUs
|
||||
SLES
|
||||
SMEM
|
||||
SMI
|
||||
SMT
|
||||
SPI
|
||||
SQs
|
||||
SRAM
|
||||
SRAMECC
|
||||
SVD
|
||||
SWE
|
||||
SerDes
|
||||
Shlens
|
||||
Skylake
|
||||
Softmax
|
||||
Spack
|
||||
Supermicro
|
||||
Szegedy
|
||||
TCA
|
||||
TCC
|
||||
TCI
|
||||
TCIU
|
||||
TCP
|
||||
TCR
|
||||
TF
|
||||
TFLOPS
|
||||
TPU
|
||||
TPUs
|
||||
TensorBoard
|
||||
TensorFlow
|
||||
TensorParallel
|
||||
ToC
|
||||
TorchAudio
|
||||
TorchMIGraphX
|
||||
TorchScript
|
||||
TorchServe
|
||||
TorchVision
|
||||
TransferBench
|
||||
TrapStatus
|
||||
UAC
|
||||
UC
|
||||
UCC
|
||||
UCX
|
||||
UIF
|
||||
USM
|
||||
UTCL
|
||||
UTIL
|
||||
Uncached
|
||||
Unhandled
|
||||
VALU
|
||||
VBIOS
|
||||
VGPR
|
||||
VGPRs
|
||||
VM
|
||||
VMEM
|
||||
VMWare
|
||||
VRAM
|
||||
VSIX
|
||||
VSkipped
|
||||
Vanhoucke
|
||||
Vulkan
|
||||
WGP
|
||||
WGPs
|
||||
WX
|
||||
WikiText
|
||||
Wojna
|
||||
Workgroups
|
||||
Writebacks
|
||||
XCD
|
||||
XCDs
|
||||
XGBoost
|
||||
XGBoost's
|
||||
XGMI
|
||||
XT
|
||||
XTX
|
||||
Xeon
|
||||
Xilinx
|
||||
Xnack
|
||||
Xteam
|
||||
YAML
|
||||
YML
|
||||
YModel
|
||||
ZeRO
|
||||
ZenDNN
|
||||
accuracies
|
||||
activations
|
||||
addr
|
||||
alloc
|
||||
allocator
|
||||
allocators
|
||||
amdgpu
|
||||
api
|
||||
atmi
|
||||
atomics
|
||||
autogenerated
|
||||
avx
|
||||
awk
|
||||
backend
|
||||
backends
|
||||
benchmarking
|
||||
bfloat
|
||||
bilinear
|
||||
bitsandbytes
|
||||
blit
|
||||
boson
|
||||
bosons
|
||||
buildable
|
||||
bursty
|
||||
bzip
|
||||
cacheable
|
||||
cd
|
||||
centos
|
||||
centric
|
||||
changelog
|
||||
chiplet
|
||||
cmake
|
||||
cmd
|
||||
coalescable
|
||||
codename
|
||||
collater
|
||||
comgr
|
||||
completers
|
||||
composable
|
||||
concretization
|
||||
config
|
||||
conformant
|
||||
convolutional
|
||||
convolves
|
||||
cpp
|
||||
csn
|
||||
cuBLAS
|
||||
cuFFT
|
||||
cuLIB
|
||||
cuRAND
|
||||
cuSOLVER
|
||||
cuSPARSE
|
||||
dataset
|
||||
datasets
|
||||
dataspace
|
||||
datatype
|
||||
datatypes
|
||||
dbgapi
|
||||
de
|
||||
deallocation
|
||||
denoise
|
||||
denoised
|
||||
denoises
|
||||
denormalize
|
||||
deserializers
|
||||
detections
|
||||
dev
|
||||
devicelibs
|
||||
devsel
|
||||
dimensionality
|
||||
disambiguates
|
||||
distro
|
||||
el
|
||||
embeddings
|
||||
enablement
|
||||
endpgm
|
||||
encodings
|
||||
env
|
||||
epilog
|
||||
etcetera
|
||||
ethernet
|
||||
exascale
|
||||
executables
|
||||
ffmpeg
|
||||
filesystem
|
||||
fortran
|
||||
galb
|
||||
gcc
|
||||
gdb
|
||||
gfortran
|
||||
gfx
|
||||
githooks
|
||||
github
|
||||
gnupg
|
||||
grayscale
|
||||
gzip
|
||||
heterogenous
|
||||
hipBLAS
|
||||
hipBLASLt
|
||||
hipCUB
|
||||
hipFFT
|
||||
hipLIB
|
||||
hipRAND
|
||||
hipSOLVER
|
||||
hipSPARSE
|
||||
hipSPARSELt
|
||||
hipTensor
|
||||
hipamd
|
||||
hipblas
|
||||
hipcub
|
||||
hipfft
|
||||
hipfort
|
||||
hipify
|
||||
hipsolver
|
||||
hipsparse
|
||||
hpp
|
||||
hsa
|
||||
hsakmt
|
||||
hyperparameter
|
||||
ib_core
|
||||
inband
|
||||
incrementing
|
||||
inferencing
|
||||
inflight
|
||||
init
|
||||
initializer
|
||||
inlining
|
||||
installable
|
||||
interprocedural
|
||||
intra
|
||||
invariants
|
||||
invocating
|
||||
ipo
|
||||
kdb
|
||||
latencies
|
||||
libfabric
|
||||
libjpeg
|
||||
libs
|
||||
linearized
|
||||
linter
|
||||
linux
|
||||
llvm
|
||||
localscratch
|
||||
logits
|
||||
lossy
|
||||
macOS
|
||||
matchers
|
||||
microarchitecture
|
||||
migraphx
|
||||
miopen
|
||||
miopengemm
|
||||
mivisionx
|
||||
mkdir
|
||||
mlirmiopen
|
||||
mtypes
|
||||
mvffr
|
||||
namespace
|
||||
namespaces
|
||||
numref
|
||||
ocl
|
||||
opencl
|
||||
opencv
|
||||
openmp
|
||||
openssl
|
||||
optimizers
|
||||
os
|
||||
pageable
|
||||
parallelization
|
||||
parameterization
|
||||
passthrough
|
||||
perfcounter
|
||||
preq
|
||||
performant
|
||||
perl
|
||||
pragma
|
||||
pre
|
||||
prebuilt
|
||||
precompiled
|
||||
prefetch
|
||||
prefetchable
|
||||
preprocess
|
||||
preprocessed
|
||||
preprocessing
|
||||
prequantized
|
||||
prerequisites
|
||||
profiler
|
||||
protobuf
|
||||
pseudorandom
|
||||
py
|
||||
quasirandom
|
||||
queueing
|
||||
rccl
|
||||
rdc
|
||||
reStructuredText
|
||||
reformats
|
||||
repos
|
||||
representativeness
|
||||
req
|
||||
resampling
|
||||
rescaling
|
||||
reusability
|
||||
roadmap
|
||||
roc
|
||||
rocAL
|
||||
rocALUTION
|
||||
rocBLAS
|
||||
rocFFT
|
||||
rocLIB
|
||||
rocMLIR
|
||||
rocPRIM
|
||||
rocRAND
|
||||
rocSOLVER
|
||||
rocSPARSE
|
||||
rocThrust
|
||||
rocWMMA
|
||||
rocalution
|
||||
rocblas
|
||||
rocclr
|
||||
rocfft
|
||||
rocm
|
||||
rocminfo
|
||||
rocprim
|
||||
rocprof
|
||||
rocprofiler
|
||||
rocr
|
||||
rocrand
|
||||
rocsolver
|
||||
rocsparse
|
||||
rocthrust
|
||||
roctracer
|
||||
runtime
|
||||
runtimes
|
||||
sL
|
||||
scalability
|
||||
scalable
|
||||
sendmsg
|
||||
tagram
|
||||
tg
|
||||
serializers
|
||||
shader
|
||||
sharding
|
||||
sigmoid
|
||||
sm
|
||||
smi
|
||||
softmax
|
||||
spack
|
||||
src
|
||||
stochastically
|
||||
strided
|
||||
subdirectory
|
||||
subexpression
|
||||
subfolder
|
||||
subfolders
|
||||
supercomputing
|
||||
tensorfloat
|
||||
th
|
||||
tokenization
|
||||
tokenize
|
||||
tokenized
|
||||
tokenizer
|
||||
tokenizes
|
||||
toolchain
|
||||
toolchains
|
||||
toolset
|
||||
toolsets
|
||||
torchvision
|
||||
tqdm
|
||||
tracebacks
|
||||
txt
|
||||
uarch
|
||||
uncached
|
||||
uncorrectable
|
||||
uninstallation
|
||||
unsqueeze
|
||||
unstacking
|
||||
unswitching
|
||||
untrusted
|
||||
untuned
|
||||
upvote
|
||||
USM
|
||||
UTCL
|
||||
UTIL
|
||||
utils
|
||||
vL
|
||||
variational
|
||||
vdi
|
||||
vectorizable
|
||||
vectorization
|
||||
vectorize
|
||||
vectorized
|
||||
vectorizer
|
||||
vectorizes
|
||||
vjxb
|
||||
walkthrough
|
||||
walkthroughs
|
||||
wavefront
|
||||
wavefronts
|
||||
whitespaces
|
||||
workgroup
|
||||
workgroups
|
||||
writeback
|
||||
writebacks
|
||||
wrreq
|
||||
# openmp
|
||||
ICV
|
||||
Multithreaded
|
||||
# tuning_guides
|
||||
BMC
|
||||
DGEMM
|
||||
HPCG
|
||||
HPL
|
||||
IOPM
|
||||
# windows
|
||||
SKU
|
||||
SKUs
|
||||
PowerShell
|
||||
UAC
|
||||
# pytorch_install
|
||||
kdb
|
||||
precompiled
|
||||
# gpu_os_support
|
||||
HWE
|
||||
el
|
||||
# using_gpu_sanitizer
|
||||
LSAN
|
||||
deallocation
|
||||
detections
|
||||
tracebacks
|
||||
workgroup
|
||||
wzo
|
||||
xargs
|
||||
xz
|
||||
yaml
|
||||
ysvmadyb
|
||||
zypper
|
||||
@@ -5,7 +5,6 @@ The following table is a list of ROCm components with links to their respective
|
||||
terms. These components may include third party components subject to
|
||||
additional licenses. Please review individual repositories for more information.
|
||||
The table shows ROCm components, license name, and link to the license terms.
|
||||
The table is ordered to follow ROCm's manifest file.
|
||||
|
||||
<!-- spellcheck-disable -->
|
||||
| Component | License |
|
||||
|
||||
@@ -46,15 +46,6 @@ sudo apt install ./amdgpu-install_5.7.50700-1_all.deb
|
||||
:sync: RHEL
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8.6
|
||||
:sync: RHEL-8.6
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.7/rhel/8.6/amdgpu-install-5.7.50700-1.el8.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 8.7
|
||||
:sync: RHEL-8.7
|
||||
:sync: RHEL-8
|
||||
|
||||
@@ -167,29 +167,6 @@ section.
|
||||
```
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8.6
|
||||
:sync: RHEL-8.6
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
# version
|
||||
ver=5.7
|
||||
|
||||
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/$ver/rhel/8.6/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL 8.7
|
||||
:sync: RHEL-8.7
|
||||
:sync: RHEL-8
|
||||
|
||||
@@ -57,28 +57,6 @@ sudo apt update
|
||||
:sync: RHEL
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8.6
|
||||
:sync: RHEL-8.6
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
# version
|
||||
version=5.7
|
||||
|
||||
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/8.6/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 8.7
|
||||
:sync: RHEL-8.7
|
||||
:sync: RHEL-8
|
||||
|
||||
@@ -116,12 +116,16 @@ sudo crb enable
|
||||
|
||||
Add the perl languages repository.
|
||||
|
||||
```{note}
|
||||
Mar 25, 2024: We currently need to install the Perl module from SLES 15 SP5 as a workaround. The module was removed for SLES 15 SP4.
|
||||
```
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} SLES 15.4
|
||||
:sync: SLES-15.4
|
||||
|
||||
```shell
|
||||
zypper addrepo https://download.opensuse.org/repositories/devel:languages:perl/SLE_15_SP4/devel:languages:perl.repo
|
||||
zypper addrepo https://download.opensuse.org/repositories/devel:/languages:/perl/15.5/devel:languages:perl.repo
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
@@ -29,11 +29,11 @@ wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
|
||||
```shell
|
||||
# Kernel driver repository for focal
|
||||
sudo tee /etc/apt/sources.list.d/amdgpu.list <<'EOF'
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/latest/ubuntu focal main
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.7/ubuntu focal main
|
||||
EOF
|
||||
# ROCm repository for focal
|
||||
sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF'
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/debian focal main
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.7 focal main
|
||||
EOF
|
||||
```
|
||||
|
||||
@@ -44,11 +44,11 @@ EOF
|
||||
```shell
|
||||
# Kernel driver repository for jammy
|
||||
sudo tee /etc/apt/sources.list.d/amdgpu.list <<'EOF'
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/latest/ubuntu jammy main
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.7/ubuntu jammy main
|
||||
EOF
|
||||
# ROCm repository for jammy
|
||||
sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF'
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/debian jammy main
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.7 jammy main
|
||||
EOF
|
||||
# Prefer packages from the rocm repository over system packages
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
@@ -73,33 +73,6 @@ sudo apt update
|
||||
::::
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8.6
|
||||
:sync: RHEL-8.6
|
||||
|
||||
```shell
|
||||
# Add the amdgpu module repository for RHEL 8.6
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/8.6/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
# Add the rocm repository for RHEL 8
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
|
||||
[rocm]
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/latest/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL 8.7
|
||||
:sync: RHEL-8.7
|
||||
|
||||
@@ -108,7 +81,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/8.7/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.7/rhel/8.7/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -117,7 +90,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
|
||||
[rocm]
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/latest/main
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/5.7/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -135,7 +108,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/8.8/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.7/rhel/8.8/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -144,7 +117,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
|
||||
[rocm]
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/latest/main
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/5.7/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -162,7 +135,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/9.1/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.7/rhel/9.1/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -171,7 +144,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
|
||||
[rocm]
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/latest/main
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/5.7/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -189,7 +162,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/9.2/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.7/rhel/9.2/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -198,7 +171,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
|
||||
[rocm]
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/latest/main
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/5.7/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -234,7 +207,7 @@ sudo yum clean all
|
||||
sudo tee /etc/zypp/repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/latest/sle/15.4/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.7/sle/15.4/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -261,7 +234,7 @@ EOF
|
||||
sudo tee /etc/zypp/repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/latest/sle/15.5/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.7/sle/15.5/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
|
||||
@@ -24,7 +24,7 @@ MIGraphX is a graph compiler focused on accelerating the Machine Learning infere
|
||||
|
||||
After doing all these transformations, MIGraphX emits code for the AMD GPU by calling to MIOpen or rocBLAS or creating HIP kernels for a particular operator. MIGraphX can also target CPUs using DNNL or ZenDNN libraries.
|
||||
|
||||
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using MIGraphX's C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
|
||||
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using the MIGraphX C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
|
||||
|
||||
- Number of arguments
|
||||
|
||||
@@ -187,7 +187,7 @@ Follow these steps:
|
||||
}
|
||||
```
|
||||
|
||||
2. To compile this program, you can use CMake and you only need to link the `migraphx::c` library to use MIGraphX's C++ API. The following is the `CMakeLists.txt` file that can build the earlier example:
|
||||
2. To compile this program, you can use CMake and you only need to link the `migraphx::c` library to use the MIGraphX C++ API. The following is the `CMakeLists.txt` file that can build the earlier example:
|
||||
|
||||
```cmake
|
||||
cmake_minimum_required(VERSION 3.5)
|
||||
@@ -327,7 +327,7 @@ To run generated `.mxr` files through `migraphx-driver`, use the following:
|
||||
./path/to/migraphx-driver run --migraphx resnet50.mxr --enable-offload-copy
|
||||
```
|
||||
|
||||
Alternatively, you can use MIGraphX's C++ or Python API to generate `.mxr` file. Refer to {numref}`image018` for an example.
|
||||
Alternatively, you can use the MIGraphX C++ or Python API to generate `.mxr` file. Refer to {numref}`image018` for an example.
|
||||
|
||||
```{figure} ../../data/understand/deep_learning/image.018.png
|
||||
:name: image018
|
||||
|
||||
@@ -299,7 +299,7 @@ USE_ROCM=1 MAX_JOBS=4 python3 setup.py install --user
|
||||
### Test the PyTorch Installation
|
||||
|
||||
You can use PyTorch unit tests to validate a PyTorch installation. If using a
|
||||
prebuilt PyTorch Docker image from AMD ROCm DockerHub or installing an official
|
||||
prebuilt PyTorch Docker image from AMD ROCm Docker Hub or installing an official
|
||||
wheels package, these tests are already run on those configurations.
|
||||
Alternatively, you can manually run the unit tests to validate the PyTorch
|
||||
installation fully.
|
||||
|
||||
@@ -21,7 +21,7 @@ The software support matrices for ROCm container releases is listed.
|
||||
|
||||
#### `CentOS7+ rocm5.6_internal_testing +169530b`
|
||||
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6/)
|
||||
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
|
||||
* [Torch 2.0.0](https://github.com/ROCmSoftwarePlatform/pytorch/tree/rocm5.6_internal_testing)
|
||||
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
|
||||
@@ -31,7 +31,7 @@ The software support matrices for ROCm container releases is listed.
|
||||
|
||||
#### `1.13 +bfeb431`
|
||||
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6/)
|
||||
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
|
||||
* [Torch 1.13.1](https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.13)
|
||||
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
|
||||
@@ -44,7 +44,7 @@ The software support matrices for ROCm container releases is listed.
|
||||
|
||||
#### `1.12 +05d5d04`
|
||||
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6/)
|
||||
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
|
||||
* [Torch 1.12.1](https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.12)
|
||||
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
|
||||
@@ -59,7 +59,7 @@ The software support matrices for ROCm container releases is listed.
|
||||
|
||||
#### `tensorflow_develop-upstream-QA-rocm56 +c88a9f4`
|
||||
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6/)
|
||||
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
|
||||
* `tensorflow-rocm` 2.13.0
|
||||
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
|
||||
@@ -69,7 +69,7 @@ The software support matrices for ROCm container releases is listed.
|
||||
|
||||
#### `r2.11-rocm-enhanced +5be4141`
|
||||
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6/)
|
||||
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
|
||||
* [`tensorflow-rocm` 2.11.0](https://pypi.org/project/tensorflow-rocm/2.11.0.540/)
|
||||
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
|
||||
@@ -79,7 +79,7 @@ The software support matrices for ROCm container releases is listed.
|
||||
|
||||
#### `r2.10-rocm-enhanced +72789a3`
|
||||
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6/)
|
||||
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
|
||||
* [`tensorflow-rocm` 2.10.1](https://pypi.org/project/tensorflow-rocm/2.10.1.540/)
|
||||
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
|
||||
|
||||
@@ -5,7 +5,6 @@ The following table is a list of ROCm components with links to their respective
|
||||
terms. These components may include third party components subject to
|
||||
additional licenses. Please review individual repositories for more information.
|
||||
The table shows ROCm components, the name of license and link to the license terms.
|
||||
The table is ordered to follow ROCm's manifest file.
|
||||
|
||||
<!-- spellcheck-disable -->
|
||||
| Component | License |
|
||||
|
||||
@@ -2,6 +2,8 @@
|
||||
|
||||
| Version | Release Date |
|
||||
| ------- | ------------ |
|
||||
| [5.7.0](https://rocm.docs.amd.com/en/docs-5.7.0/) | Sep 15, 2023 |
|
||||
| [5.6.1](https://rocm.docs.amd.com/en/docs-5.6.1/) | Aug 29, 2023 |
|
||||
| [5.6.0](https://rocm.docs.amd.com/en/docs-5.6.0/) | Jun 28, 2023 |
|
||||
| [5.5.1](https://rocm.docs.amd.com/en/docs-5.5.1/) | May 24, 2023 |
|
||||
| [5.5.0](https://rocm.docs.amd.com/en/docs-5.5.0/) | May 1, 2023 |
|
||||
|
||||
@@ -1 +1,2 @@
|
||||
rocm-docs-core>=0.24.0
|
||||
rocm-docs-core==1.8.0
|
||||
sphinx-reredirects
|
||||
|
||||
@@ -1,108 +1,106 @@
|
||||
#
|
||||
# This file is autogenerated by pip-compile with Python 3.8
|
||||
# This file is autogenerated by pip-compile with Python 3.10
|
||||
# by the following command:
|
||||
#
|
||||
# pip-compile requirements.in
|
||||
#
|
||||
accessible-pygments==0.0.3
|
||||
accessible-pygments==0.0.5
|
||||
# via pydata-sphinx-theme
|
||||
alabaster==0.7.13
|
||||
alabaster==1.0.0
|
||||
# via sphinx
|
||||
babel==2.11.0
|
||||
babel==2.16.0
|
||||
# via
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
beautifulsoup4==4.11.2
|
||||
beautifulsoup4==4.12.3
|
||||
# via pydata-sphinx-theme
|
||||
breathe==4.34.0
|
||||
breathe==4.35.0
|
||||
# via rocm-docs-core
|
||||
certifi==2023.7.22
|
||||
certifi==2024.8.30
|
||||
# via requests
|
||||
cffi==1.15.1
|
||||
cffi==1.17.1
|
||||
# via
|
||||
# cryptography
|
||||
# pynacl
|
||||
charset-normalizer==2.1.1
|
||||
charset-normalizer==3.3.2
|
||||
# via requests
|
||||
click==8.1.3
|
||||
click==8.1.7
|
||||
# via sphinx-external-toc
|
||||
cryptography==41.0.3
|
||||
cryptography==43.0.1
|
||||
# via pyjwt
|
||||
deprecated==1.2.13
|
||||
deprecated==1.2.14
|
||||
# via pygithub
|
||||
docutils==0.19
|
||||
docutils==0.21.2
|
||||
# via
|
||||
# breathe
|
||||
# myst-parser
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
fastjsonschema==2.16.3
|
||||
fastjsonschema==2.20.0
|
||||
# via rocm-docs-core
|
||||
gitdb==4.0.10
|
||||
gitdb==4.0.11
|
||||
# via gitpython
|
||||
gitpython==3.1.30
|
||||
gitpython==3.1.43
|
||||
# via rocm-docs-core
|
||||
idna==3.4
|
||||
idna==3.10
|
||||
# via requests
|
||||
imagesize==1.4.1
|
||||
# via sphinx
|
||||
jinja2==3.1.2
|
||||
jinja2==3.1.4
|
||||
# via
|
||||
# myst-parser
|
||||
# sphinx
|
||||
markdown-it-py==2.2.0
|
||||
markdown-it-py==3.0.0
|
||||
# via
|
||||
# mdit-py-plugins
|
||||
# myst-parser
|
||||
markupsafe==2.1.2
|
||||
markupsafe==2.1.5
|
||||
# via jinja2
|
||||
mdit-py-plugins==0.3.4
|
||||
mdit-py-plugins==0.4.2
|
||||
# via myst-parser
|
||||
mdurl==0.1.2
|
||||
# via markdown-it-py
|
||||
myst-parser==1.0.0
|
||||
myst-parser==4.0.0
|
||||
# via rocm-docs-core
|
||||
packaging==23.0
|
||||
packaging==24.1
|
||||
# via
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
pycparser==2.21
|
||||
pycparser==2.22
|
||||
# via cffi
|
||||
pydata-sphinx-theme==0.13.3
|
||||
pydata-sphinx-theme==0.15.4
|
||||
# via
|
||||
# rocm-docs-core
|
||||
# sphinx-book-theme
|
||||
pygithub==1.58.1
|
||||
pygithub==2.4.0
|
||||
# via rocm-docs-core
|
||||
pygments==2.15.0
|
||||
pygments==2.18.0
|
||||
# via
|
||||
# accessible-pygments
|
||||
# pydata-sphinx-theme
|
||||
# sphinx
|
||||
pyjwt[crypto]==2.6.0
|
||||
pyjwt[crypto]==2.9.0
|
||||
# via pygithub
|
||||
pynacl==1.5.0
|
||||
# via pygithub
|
||||
pytz==2022.7.1
|
||||
# via babel
|
||||
pyyaml==6.0
|
||||
pyyaml==6.0.2
|
||||
# via
|
||||
# myst-parser
|
||||
# rocm-docs-core
|
||||
# sphinx-external-toc
|
||||
requests==2.31.0
|
||||
requests==2.32.3
|
||||
# via
|
||||
# pygithub
|
||||
# sphinx
|
||||
rocm-docs-core>=0.24.0
|
||||
rocm-docs-core==1.8.0
|
||||
# via -r requirements.in
|
||||
smmap==5.0.0
|
||||
smmap==5.0.1
|
||||
# via gitdb
|
||||
snowballstemmer==2.2.0
|
||||
# via sphinx
|
||||
soupsieve==2.4
|
||||
soupsieve==2.6
|
||||
# via beautifulsoup4
|
||||
sphinx==5.3.0
|
||||
sphinx==8.0.2
|
||||
# via
|
||||
# breathe
|
||||
# myst-parser
|
||||
@@ -113,31 +111,40 @@ sphinx==5.3.0
|
||||
# sphinx-design
|
||||
# sphinx-external-toc
|
||||
# sphinx-notfound-page
|
||||
sphinx-book-theme==1.0.1
|
||||
# sphinx-reredirects
|
||||
sphinx-book-theme==1.1.3
|
||||
# via rocm-docs-core
|
||||
sphinx-copybutton==0.5.1
|
||||
sphinx-copybutton==0.5.2
|
||||
# via rocm-docs-core
|
||||
sphinx-design==0.4.1
|
||||
sphinx-design==0.6.1
|
||||
# via rocm-docs-core
|
||||
sphinx-external-toc==0.3.1
|
||||
sphinx-external-toc==1.0.1
|
||||
# via rocm-docs-core
|
||||
sphinx-notfound-page==0.8.3
|
||||
sphinx-notfound-page==1.0.4
|
||||
# via rocm-docs-core
|
||||
sphinxcontrib-applehelp==1.0.4
|
||||
sphinx-reredirects==0.1.5
|
||||
# via -r requirements.in
|
||||
sphinxcontrib-applehelp==2.0.0
|
||||
# via sphinx
|
||||
sphinxcontrib-devhelp==1.0.2
|
||||
sphinxcontrib-devhelp==2.0.0
|
||||
# via sphinx
|
||||
sphinxcontrib-htmlhelp==2.0.1
|
||||
sphinxcontrib-htmlhelp==2.1.0
|
||||
# via sphinx
|
||||
sphinxcontrib-jsmath==1.0.1
|
||||
# via sphinx
|
||||
sphinxcontrib-qthelp==1.0.3
|
||||
sphinxcontrib-qthelp==2.0.0
|
||||
# via sphinx
|
||||
sphinxcontrib-serializinghtml==1.1.5
|
||||
sphinxcontrib-serializinghtml==2.0.0
|
||||
# via sphinx
|
||||
typing-extensions==4.5.0
|
||||
# via pydata-sphinx-theme
|
||||
urllib3==1.26.13
|
||||
# via requests
|
||||
wrapt==1.14.1
|
||||
tomli==2.0.1
|
||||
# via sphinx
|
||||
typing-extensions==4.12.2
|
||||
# via
|
||||
# pydata-sphinx-theme
|
||||
# pygithub
|
||||
urllib3==2.2.3
|
||||
# via
|
||||
# pygithub
|
||||
# requests
|
||||
wrapt==1.16.0
|
||||
# via deprecated
|
||||
|
||||
@@ -13,7 +13,7 @@ The full list of HSA system architecture platform requirements are here: `HSA Sy
|
||||
|
||||
The ROCm Platform uses the new PCI Express 3.0 (PCIe 3.0) features for Atomic Read-Modify-Write Transactions which extends inter-processor synchronization mechanisms to IO to support the defined set of HSA capabilities needed for queuing and signaling memory operations.
|
||||
|
||||
The new PCIe AtomicOps operate as completers for ``CAS`` (Compare and Swap), ``FetchADD``, ``SWAP`` atomics. The AtomicsOps are initiated by the
|
||||
The new PCIe atomic operations operate as completers for ``CAS`` (Compare and Swap), ``FetchADD``, ``SWAP`` atomics. The atomic operations are initiated by the
|
||||
I/O device which support 32-bit, 64-bit and 128-bit operand which target address have to be naturally aligned to operation sizes.
|
||||
|
||||
For ROCm the Platform atomics are used in ROCm in the following ways:
|
||||
@@ -22,11 +22,11 @@ For ROCm the Platform atomics are used in ROCm in the following ways:
|
||||
* Update HSA queue’s write_dispatch_id: 64 bit atomic add used by the CPU and GPU agent to support multi-writer queue insertions.
|
||||
* Update HSA Signals – 64bit atomic ops are used for CPU & GPU synchronization.
|
||||
|
||||
The PCIe 3.0 AtomicOp feature allows atomic transactions to be requested by, routed through and completed by PCIe components. Routing and completion does not require software support. Component support for each is detectable via the DEVCAP2 register. Upstream bridges need to have AtomicOp routing enabled or the Atomic Operations will fail even though PCIe endpoint and PCIe I/O Devices has the capability to Atomics Operations.
|
||||
The PCIe 3.0 atomic operations feature allows atomic transactions to be requested by, routed through and completed by PCIe components. Routing and completion does not require software support. Component support for each is detectable via the DEVCAP2 register. Upstream bridges need to have atomic operations routing enabled or the Atomic Operations will fail even though PCIe endpoint and PCIe I/O Devices has the capability to Atomics Operations.
|
||||
|
||||
To do AtomicOp routing capability between two or more Root Ports, each associated Root Port must indicate that capability via the AtomicOp Routing Supported bit in the Device Capabilities 2 register.
|
||||
To do atomic operations routing capability between two or more Root Ports, each associated Root Port must indicate that capability via the atomic operations routing supported bit in the Device Capabilities 2 register.
|
||||
|
||||
If your system has a PCIe Express Switch it needs to support AtomicsOp routing. Again AtomicOp requests are permitted only if a component’s ``DEVCTL2.ATOMICOP_REQUESTER_ENABLE`` field is set. These requests can only be serviced if the upstream components support AtomicOp completion and/or routing to a component which does. AtomicOp Routing Support=1 Routing is supported, AtomicOp Routing Support=0 routing is not supported.
|
||||
If your system has a PCIe Express Switch it needs to support atomic operations routing. Atomic operations requests are permitted only if a component’s ``DEVCTL2.ATOMICOP_REQUESTER_ENABLE`` field is set. These requests can only be serviced if the upstream components support atomic operations completion and/or routing to a component which does. Atomic operations routing support=1, routing is supported; Atomic operations routing support=0, routing is not supported.
|
||||
|
||||
Atomic Operation is a Non-Posted transaction supporting 32-bit and 64-bit address formats, there must be a response for Completion containing the result of the operation. Errors associated with the operation (uncorrectable error accessing the target location or carrying out the Atomic operation) are signaled to the requester by setting the Completion Status field in the completion descriptor, they are set to to Completer Abort (CA) or Unsupported Request (UR).
|
||||
|
||||
@@ -69,7 +69,7 @@ BAR Memory Overview
|
||||
*******************
|
||||
On a Xeon E5 based system in the BIOS we can turn on above 4GB PCIe addressing, if so he need to set MMIO Base address ( MMIOH Base) and Range ( MMIO High Size) in the BIOS.
|
||||
|
||||
In SuperMicro system in the system bios you need to see the following
|
||||
In Supermicro system in the system bios you need to see the following
|
||||
|
||||
* Advanced->PCIe/PCI/PnP configuration-> Above 4G Decoding = Enabled
|
||||
|
||||
@@ -77,7 +77,7 @@ In SuperMicro system in the system bios you need to see the following
|
||||
|
||||
* Advanced->PCIe/PCI/PnP Configuration->MMIO High Size = 256G
|
||||
|
||||
When we support Large Bar Capability there is a Large Bar Vbios which also disable the IO bar.
|
||||
When we support Large Bar Capability there is a Large Bar VBIOS which also disable the IO bar.
|
||||
|
||||
For GFX9 and Vega10 which have Physical Address up 44 bit and 48 bit Virtual address.
|
||||
|
||||
@@ -116,30 +116,5 @@ Legend:
|
||||
|
||||
5 : Expansion ROM – This is required for the AMD Driver SW to access the GPU’s video-bios. This is currently fixed at 128KB.
|
||||
|
||||
Excepts form Overview of Changes to PCI Express 3.0
|
||||
===================================================
|
||||
By Mike Jackson, Senior Staff Architect, MindShare, Inc.
|
||||
********************************************************
|
||||
Atomic Operations – Goal:
|
||||
*************************
|
||||
Support SMP-type operations across a PCIe network to allow for things like offloading tasks between CPU cores and accelerators like a GPU. The spec says this enables advanced synchronization mechanisms that are particularly useful with multiple producers or consumers that need to be synchronized in a non-blocking fashion. Three new atomic non-posted requests were added, plus the corresponding completion (the address must be naturally aligned with the operand size or the TLP is malformed):
|
||||
|
||||
* Fetch and Add – uses one operand as the “add” value. Reads the target location, adds the operand, and then writes the result back to the original location.
|
||||
|
||||
* Unconditional Swap – uses one operand as the “swap” value. Reads the target location and then writes the swap value to it.
|
||||
|
||||
* Compare and Swap – uses 2 operands: first data is compare value, second is swap value. Reads the target location, checks it against the compare value and, if equal, writes the swap value to the target location.
|
||||
|
||||
* AtomicOpCompletion – new completion to give the result so far atomic request and indicate that the atomicity of the transaction has been maintained.
|
||||
|
||||
Since AtomicOps are not locked they don't have the performance downsides of the PCI locked protocol. Compared to locked cycles, they provide “lower latency, higher scalability, advanced synchronization algorithms, and dramatically lower impact on other PCIe traffic.” The lock mechanism can still be used across a bridge to PCI or PCI-X to achieve the desired operation.
|
||||
|
||||
AtomicOps can go from device to device, device to host, or host to device. Each completer indicates whether it supports this capability and guarantees atomic access if it does. The ability to route AtomicOps is also indicated in the registers for a given port.
|
||||
|
||||
ID-based Ordering – Goal:
|
||||
*************************
|
||||
Improve performance by avoiding stalls caused by ordering rules. For example, posted writes are never normally allowed to pass each other in a queue, but if they are requested by different functions, we can have some confidence that the requests are not dependent on each other. The previously reserved Attribute bit [2] is now combined with the RO bit to indicate ID ordering with or without relaxed ordering.
|
||||
|
||||
This only has meaning for memory requests, and is reserved for Configuration or IO requests. Completers are not required to copy this bit into a completion, and only use the bit if their enable bit is set for this operation.
|
||||
|
||||
To read more on PCIe Gen 3 new options https://www.mindshare.com/files/resources/PCIe%203-0.pdf
|
||||
For more information, you can review
|
||||
`Overview of Changes to PCI Express 3.0 <https://www.mindshare.com/files/resources/PCIe%203-0.pdf>`_.
|
||||
|
||||
@@ -4,7 +4,7 @@ Using CMake
|
||||
|
||||
Most components in ROCm support CMake. Projects depending on header-only or
|
||||
library components typically require CMake 3.5 or higher whereas those wanting
|
||||
to make use of CMake's HIP language support will require CMake 3.21 or higher.
|
||||
to make use of the CMake HIP language support will require CMake 3.21 or higher.
|
||||
|
||||
Finding Dependencies
|
||||
====================
|
||||
@@ -16,7 +16,7 @@ Finding Dependencies
|
||||
<https://cmake.org/cmake/help/latest/command/find_package.html>`_ and the
|
||||
`Using Dependencies Guide
|
||||
<https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html>`_
|
||||
to get an overview of CMake's related facilities.
|
||||
to get an overview of CMake related facilities.
|
||||
|
||||
In short, CMake supports finding dependencies in two ways:
|
||||
|
||||
@@ -28,7 +28,7 @@ In short, CMake supports finding dependencies in two ways:
|
||||
regards needed to consume it.
|
||||
|
||||
ROCm predominantly relies on Config mode, one notable exception being the Module
|
||||
driving the compilation of HIP programs on Nvidia runtimes. As such, when
|
||||
driving the compilation of HIP programs on NVIDIA runtimes. As such, when
|
||||
dependencies are not found in standard system locations, one either has to
|
||||
instruct CMake to search for package config files in additional folders using
|
||||
the ``CMAKE_PREFIX_PATH`` variable (a semi-colon separated list of filesystem
|
||||
@@ -55,8 +55,8 @@ to the installation guides in these docs (`Linux <../deploy/linux/index.html>`_)
|
||||
Using HIP in CMake
|
||||
==================
|
||||
|
||||
ROCm componenents providing a C/C++ interface support being consumed using any
|
||||
C/C++ toolchain that CMake knows how to drive. ROCm also supports CMake's HIP
|
||||
ROCm components providing a C/C++ interface support being consumed using any
|
||||
C/C++ toolchain that CMake knows how to drive. ROCm also supports the CMake HIP
|
||||
language features, allowing users to program using the HIP single-source
|
||||
programming model. When a program (or translation-unit) uses the HIP API without
|
||||
compiling any GPU device code, HIP can be treated in CMake as a simple C/C++
|
||||
@@ -172,7 +172,7 @@ all the flags necessary for device compilation.
|
||||
.. note::
|
||||
Compiling for the GPU device requires at least C++11.
|
||||
|
||||
This project can then be configured with for eg.
|
||||
This project can then be configured with the following CMake commands:
|
||||
|
||||
- Windows: ``cmake -D CMAKE_CXX_COMPILER:PATH=${env:HIP_PATH}\bin\clang++.exe``
|
||||
|
||||
@@ -186,7 +186,7 @@ When using the CXX language support to compile HIP device code, selecting the
|
||||
target GPU architectures is done via setting the ``GPU_TARGETS`` variable.
|
||||
``CMAKE_HIP_ARCHITECTURES`` only exists when the HIP language is enabled. By
|
||||
default, this is set to some subset of the currently supported architectures of
|
||||
AMD ROCm. It can be set to eg. ``-D GPU_TARGETS="gfx1032;gfx1035"``.
|
||||
AMD ROCm. It can be set to the CMake option ``-D GPU_TARGETS="gfx1032;gfx1035"``.
|
||||
|
||||
ROCm CMake Packages
|
||||
-------------------
|
||||
@@ -251,9 +251,9 @@ options.
|
||||
|
||||
IDEs supporting CMake (Visual Studio, Visual Studio Code, CLion, etc.) all came
|
||||
up with their own way to register command-line fragments of different purpose in
|
||||
a setup'n'forget fashion for quick assembly using graphical front-ends. This is
|
||||
a setup-and-forget fashion for quick assembly using graphical front-ends. This is
|
||||
all nice, but configurations aren't portable, nor can they be reused in
|
||||
Continuous Intergration (CI) pipelines. CMake has condensed existing practice
|
||||
Continuous Integration (CI) pipelines. CMake has condensed existing practice
|
||||
into a portable JSON format that works in all IDEs and can be invoked from any
|
||||
command-line. This is
|
||||
`CMake Presets <https://cmake.org/cmake/help/latest/manual/cmake-presets.7.html>`_
|
||||
|
||||
@@ -10,6 +10,6 @@ disambiguates compiler naming used throughout the documentation.
|
||||
| `amdclang++` | Clang/LLVM-based compiler that is part of `rocm-llvm` package. The source code is available at <a href="https://github.com/RadeonOpenCompute/llvm-project" target="_blank">https://github.com/RadeonOpenCompute/llvm-project</a>. |
|
||||
| AOCC | Closed-source clang-based compiler that includes additional CPU optimizations. Offered as part of ROCm via the `rocm-llvm-alt` package. See for details, <a href="https://developer.amd.com/amd-aocc/" target="_blank">https://developer.amd.com/amd-aocc/</a>. |
|
||||
| HIP-Clang | Informal term for the `amdclang++` compiler |
|
||||
| HIPify | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPIFY" target="_blank">https://github.com/ROCm-Developer-Tools/HIPIFY</a> |
|
||||
| HIPIFY | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPIFY" target="_blank">https://github.com/ROCm-Developer-Tools/HIPIFY</a> |
|
||||
| `hipcc` | HIP compiler driver. A utility that invokes `clang` or `nvcc` depending on the target and passes the appropriate include and library options for the target compiler and HIP infrastructure. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPCC" target="_blank">https://github.com/ROCm-Developer-Tools/HIPCC</a>. |
|
||||
| ROCmCC | Clang/LLVM-based compiler. ROCmCC in itself is not a binary but refers to the overall compiler. |
|
||||
|
||||
@@ -81,7 +81,7 @@ The command processor counters are further classified into fetcher and compute.
|
||||
| `spi_ra_lds_cu_full_csn` | CUs | Sum of CU where LDS cannot take csn wave when not fits |
|
||||
| `spi_ra_bar_cu_full_csn[∗]` | CUs | Sum of CU where BARRIER cannot take csn wave when not fits |
|
||||
| `spi_ra_bulky_cu_full_csn[∗]` | CUs | Sum of CU where BULKY cannot take csn wave when not fits |
|
||||
| `spi_ra_tglim_cu_full_csn[∗]` | Cycles | Cycles where csn wants to req but all CUs are at tg_limit |
|
||||
| `spi_ra_tglim_cu_full_csn[∗]` | Cycles | Cycles where csn wants to req but all CUs are at `tg_limit` |
|
||||
| `spi_ra_wvlim_cu_full_csn[∗]` | Cycles | Number of clocks csn is stalled due to WAVE LIMIT |
|
||||
| `spi_vwc_csc_wr` | Cycles | Number of clocks to write CSC waves to VGPRs (need to multiply this value by 4) |
|
||||
| `spi_swc_csc_wr` | Cycles | Number of clocks to write CSC waves to SGPRs (need to multiply this value by 4) |
|
||||
@@ -288,9 +288,9 @@ The vector L1 cache subsystem counters are further classified into texture addre
|
||||
| `tcp_gate_en2` | Cycles | Number of cycles vL1D core clocks are turned on |
|
||||
| `tcp_td_tcp_stall_cycles` | Cycles | Number of cycles TD stalls vL1D |
|
||||
| `tcp_tcr_tcp_stall_cycles` | Cycles | Number of cycles TCR stalls vL1D |
|
||||
| `tcp_read_tagconflict_stall_cycles` | Cycles | Number of cycles tagram conflict stalls on a Read |
|
||||
| `tcp_write_tagconflict_stall_cycles` | Cycles | Number of cycles tagram conflict stalls on a Write |
|
||||
| `tcp_atomic_tagconflict_stall_cycles` | Cycles | Number of cycles tagram conflict stalls on an Atomic |
|
||||
| `tcp_read_tagconflict_stall_cycles` | Cycles | Number of cycles tag RAM conflict stalls on a Read |
|
||||
| `tcp_write_tagconflict_stall_cycles` | Cycles | Number of cycles tag RAM conflict stalls on a Write |
|
||||
| `tcp_atomic_tagconflict_stall_cycles` | Cycles | Number of cycles tag RAM conflict stalls on an Atomic |
|
||||
| `tcp_pending_stall_cycles` | Cycles | Number of cycles vL1D cache is stalled due to data pending from L2 Cache |
|
||||
| `tcp_ta_tcp_state_read` | Req | Number of wavefront instruction requests to vL1D |
|
||||
| `tcp_volatile[∗]` | Req | Number of L1 volatile pixels/buffers from TA |
|
||||
@@ -347,7 +347,7 @@ The vector L1 cache subsystem counters are further classified into texture addre
|
||||
| `tcc_CC_req` |Req | Number of CC requests |
|
||||
| `tcc_RW_req` |Req | Number of RW requests |
|
||||
| `tcc_probe` |Req | Number of L2 Cache probe requests |
|
||||
| `tcc_probe_all[∗]` |Req | Number of external probe requests with EA_TCC_preq_all== 1 |
|
||||
| `tcc_probe_all[∗]` |Req | Number of external probe requests with `EA_TCC_preq_all== 1` |
|
||||
| `tcc_read_req` |Req | Number of L2 Cache Read requests |
|
||||
| `tcc_write_req` |Req | Number of L2 Cache Write requests |
|
||||
| `tcc_atomic_req` |Req | Number of L2 Cache Atomic requests |
|
||||
|
||||
Reference in New Issue
Block a user