Compare commits

...

13 Commits

Author SHA1 Message Date
Sam Wu
7aada9a2ae Update documentation requirements 2024-09-16 10:13:02 -08:00
Sam Wu
1d0b86431c Update documentation requirements 2024-06-06 16:58:51 -06:00
Sam Wu
2c7bccfad0 Disable epub builds 2024-05-02 09:14:55 -06:00
Sam Wu
3ec5d174d8 Fix RTD config 2024-05-02 08:55:58 -06:00
Sam Wu
5fe10b4fa5 Update documentation requirements 2024-05-01 16:59:40 -06:00
Sam Wu
b0931d5236 Update documentation requirements 2024-05-01 16:54:12 -06:00
Young Hui - AMD
2d25b75117 update SLES 15.4 prerequisites into 5.7.1 (#2979)
* update SLES 15.4 prerequisites

* match SLES prerequisite URLs

* update wordlist

* fix spelling
2024-04-01 17:49:36 -04:00
Sam Wu
6027186e46 docs(versions.md): Add 5.6.1 to versions list (#2825) 2024-01-22 15:17:20 -07:00
Mátyás Aradi
580e218d71 Fix installation links in ROCm 5.7.1 documentation (#2822)
* Fix ROCm repository link to keep version 5.7.1 instead of latest

* Fix RHEL versions
2024-01-18 09:30:05 -07:00
Sam Wu
f64d81bde2 docs(release-history.md): Add 5.7.1 to release history (#2665) 2023-11-29 15:45:40 -07:00
Saad Rahim (AMD)
f53b99990c 7900 XT support (#2661) 2023-11-21 09:44:49 -07:00
Sam Wu
eb6a4912ab Fix conflict when merging roc-5.7.x into docs/5.7.1 2023-10-25 13:16:39 -06:00
Sam Wu
db38fbd3a0 Merge roc-5.7.x updates into docs/5.7.1 branch (#2576)
* Update GPU Support on Linux (#2572)

Update docs with information in the AMD blog post announcing support for some RDNA3 Radeon GPUs on Linux.

Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>

* Making GPU and OS support page titles consistent between Win and Linux (#2575)

* Update LLVM ASan documentation (#2529)

---------

Co-authored-by: Houssem MENHOUR <husmen@users.noreply.github.com>
Co-authored-by: Saad Rahim (AMD) <44449863+saadrahim@users.noreply.github.com>
2023-10-25 13:08:30 -06:00
19 changed files with 791 additions and 261 deletions

View File

@@ -3,16 +3,19 @@
version: 2
sphinx:
configuration: docs/conf.py
formats: [htmlzip, pdf]
build:
os: ubuntu-22.04
tools:
python: "3.10"
apt_packages:
- "doxygen"
- "graphviz" # For dot graphs in doxygen
python:
install:
- requirements: docs/sphinx/requirements.txt
build:
os: ubuntu-20.04
tools:
python: "3.8"
sphinx:
configuration: docs/conf.py
formats: []

View File

@@ -1,122 +1,686 @@
# building
matchers
# file_reorg
FHS
incrementing
Filesystem
filesystem
rocm
# gpu_aware_mpi
DMA
GDR
HCA
MPI
MVAPICH
Mellanox's
NIC
OFED
OSU
OpenFabrics
PeerDirect
RDMA
UCX
ib_core
# isv_deployment_win
AAC
ABI
# linear algebra
LAPACK
MMA
backends
cuSOLVER
cuSPARSE
# mi200_performance_counters
ACE
ACEs
AccVGPR
AccVGPRs
ALU
AMD
AMDGPU
AMDGPUs
AMDMIGraphX
AMI
AOCC
AOMP
APIC
APIs
APU
ASIC
ASICs
ASan
ASm
ATI
AddressSanitizer
AlexNet
Arb
BLAS
BMC
BitCode
Blit
Bluefield
CCD
CDNA
CIFAR
CLI
CLion
CMake
CMakeLists
CMakePackage
CP
CPC
CPF
CPP
CPU
CPUs
CSC
CSE
CSV
CSn
CTests
CU
CUDA
CUs
CXX
Cavium
CentOS
ChatGPT
CoRR
Codespaces
Commitizen
CommonMark
Concretized
Conda
ConnectX
DGEMM
DKMS
DL
DMA
DNN
DNNL
DPM
DRI
DW
DWORD
Dask
DataFrame
DataLoader
DataParallel
DeepSpeed
Dependabot
DevCap
Dockerfile
Doxygen
ELMo
ENDPGM
EPYC
ESXi
FFT
FFTs
FFmpeg
FHS
FMA
FP
Filesystem
Flang
Fortran
Fuyu
GALB
GCD
GCDs
GCN
GDB
GDDR
GDR
GDS
GEMM
GEMMs
GFortran
GiB
GIM
GL
GLXT
GMI
GPG
GPR
GPT
GPU
GPU's
GPUs
GRBM
GenAI
GenZ
GitHub
Gitpod
HBM
HCA
HIPCC
HIPExtension
HIPIFY
HPC
HPCG
HPE
HPL
HSA
HWE
Haswell
Higgs
Hyperparameters
ICV
IDE
IDEs
IMDb
IOMMU
IOP
IOPM
IOV
IRQ
ISA
ISV
ISVs
ImageNet
InfiniBand
Inlines
IntelliSense
Intersphinx
Intra
Ioffe
JSON
Jupyter
KFD
KiB
KVM
Keras
Khronos
LAPACK
LCLK
LDS
LLM
LLMs
LLVM
LM
LSAN
LTS
LoRA
MEM
MERCHANTABILITY
MFMA
MiB
MIGraphX
MIOpen
MIOpenGEMM
MIVisionX
MLM
MMA
MMIO
MMIOH
MNIST
MPI
MSVC
MVAPICH
MVFFR
Makefile
Makefiles
Matplotlib
Megatron
Mellanox
Mellanox's
Meta's
MirroredStrategy
Multicore
Multithreaded
MyEnvironment
MyST
NBIO
NBIOs
NIC
NICs
NLI
NLP
NPS
NSP
NUMA
NVCC
NVIDIA
NVPTX
NaN
Nano
Navi
Noncoherently
NousResearch's
NumPy
OAM
OAMs
OCP
OEM
OFED
OMP
OMPI
OMPT
OMPX
ONNX
OSS
OSU
OpenCL
OpenCV
OpenFabrics
OpenGL
OpenMP
OpenSSL
OpenVX
PCI
PCIe
PEFT
PIL
PILImage
PRNG
PRs
PaLM
Pageable
PeerDirect
Perfetto
PipelineParallel
PnP
PowerShell
PyPi
PyTorch
Qcycles
RAII
RCCL
RDC
RDMA
RDNA
RHEL
ROC
ROCProfiler
ROCTracer
ROCclr
ROCdbgapi
ROCgdb
ROCk
ROCm
ROCmCC
ROCmSoftwarePlatform
ROCmValidationSuite
ROCr
RST
RW
Radeon
RelWithDebInfo
Req
Rickle
RoCE
Ryzen
SALU
SBIOS
SCA
SDK
SDMA
SDRAM
SENDMSG
SGPR
SGPRs
SHA
SIGQUIT
SIMD
SIMDs
SKU
SKUs
SLES
SMEM
SMI
SMT
SPI
SQs
SRAM
SRAMECC
SVD
SWE
SerDes
Shlens
Skylake
Softmax
Spack
Supermicro
Szegedy
TCA
TCC
TCI
TCIU
TCP
TCR
TF
TFLOPS
TPU
TPUs
TensorBoard
TensorFlow
TensorParallel
ToC
TorchAudio
TorchMIGraphX
TorchScript
TorchServe
TorchVision
TransferBench
TrapStatus
UAC
UC
UCC
UCX
UIF
USM
UTCL
UTIL
Uncached
Unhandled
VALU
VBIOS
VGPR
VGPRs
VM
VMEM
VMWare
VRAM
VSIX
VSkipped
Vanhoucke
Vulkan
WGP
WGPs
WX
WikiText
Wojna
Workgroups
Writebacks
XCD
XCDs
XGBoost
XGBoost's
XGMI
XT
XTX
Xeon
Xilinx
Xnack
Xteam
YAML
YML
YModel
ZeRO
ZenDNN
accuracies
activations
addr
alloc
allocator
allocators
amdgpu
api
atmi
atomics
autogenerated
avx
awk
backend
backends
benchmarking
bfloat
bilinear
bitsandbytes
blit
boson
bosons
buildable
bursty
bzip
cacheable
cd
centos
centric
changelog
chiplet
cmake
cmd
coalescable
codename
collater
comgr
completers
composable
concretization
config
conformant
convolutional
convolves
cpp
csn
cuBLAS
cuFFT
cuLIB
cuRAND
cuSOLVER
cuSPARSE
dataset
datasets
dataspace
datatype
datatypes
dbgapi
de
deallocation
denoise
denoised
denoises
denormalize
deserializers
detections
dev
devicelibs
devsel
dimensionality
disambiguates
distro
el
embeddings
enablement
endpgm
encodings
env
epilog
etcetera
ethernet
exascale
executables
ffmpeg
filesystem
fortran
galb
gcc
gdb
gfortran
gfx
githooks
github
gnupg
grayscale
gzip
heterogenous
hipBLAS
hipBLASLt
hipCUB
hipFFT
hipLIB
hipRAND
hipSOLVER
hipSPARSE
hipSPARSELt
hipTensor
hipamd
hipblas
hipcub
hipfft
hipfort
hipify
hipsolver
hipsparse
hpp
hsa
hsakmt
hyperparameter
ib_core
inband
incrementing
inferencing
inflight
init
initializer
inlining
installable
interprocedural
intra
invariants
invocating
ipo
kdb
latencies
libfabric
libjpeg
libs
linearized
linter
linux
llvm
localscratch
logits
lossy
macOS
matchers
microarchitecture
migraphx
miopen
miopengemm
mivisionx
mkdir
mlirmiopen
mtypes
mvffr
namespace
namespaces
numref
ocl
opencl
opencv
openmp
openssl
optimizers
os
pageable
parallelization
parameterization
passthrough
perfcounter
preq
performant
perl
pragma
pre
prebuilt
precompiled
prefetch
prefetchable
preprocess
preprocessed
preprocessing
prequantized
prerequisites
profiler
protobuf
pseudorandom
py
quasirandom
queueing
rccl
rdc
reStructuredText
reformats
repos
representativeness
req
resampling
rescaling
reusability
roadmap
roc
rocAL
rocALUTION
rocBLAS
rocFFT
rocLIB
rocMLIR
rocPRIM
rocRAND
rocSOLVER
rocSPARSE
rocThrust
rocWMMA
rocalution
rocblas
rocclr
rocfft
rocm
rocminfo
rocprim
rocprof
rocprofiler
rocr
rocrand
rocsolver
rocsparse
rocthrust
roctracer
runtime
runtimes
sL
scalability
scalable
sendmsg
tagram
tg
serializers
shader
sharding
sigmoid
sm
smi
softmax
spack
src
stochastically
strided
subdirectory
subexpression
subfolder
subfolders
supercomputing
tensorfloat
th
tokenization
tokenize
tokenized
tokenizer
tokenizes
toolchain
toolchains
toolset
toolsets
torchvision
tqdm
tracebacks
txt
uarch
uncached
uncorrectable
uninstallation
unsqueeze
unstacking
unswitching
untrusted
untuned
upvote
USM
UTCL
UTIL
utils
vL
variational
vdi
vectorizable
vectorization
vectorize
vectorized
vectorizer
vectorizes
vjxb
walkthrough
walkthroughs
wavefront
wavefronts
whitespaces
workgroup
workgroups
writeback
writebacks
wrreq
# openmp
ICV
Multithreaded
# tuning_guides
BMC
DGEMM
HPCG
HPL
IOPM
# windows
SKU
SKUs
PowerShell
UAC
# pytorch_install
kdb
precompiled
# gpu_os_support
HWE
el
# using_gpu_sanitizer
LSAN
deallocation
detections
tracebacks
workgroup
wzo
xargs
xz
yaml
ysvmadyb
zypper

View File

@@ -5,7 +5,6 @@ The following table is a list of ROCm components with links to their respective
terms. These components may include third party components subject to
additional licenses. Please review individual repositories for more information.
The table shows ROCm components, license name, and link to the license terms.
The table is ordered to follow ROCm's manifest file.
<!-- spellcheck-disable -->
| Component | License |

View File

@@ -30,18 +30,6 @@
"name": "Red Hat Enterprise Linux",
"shortname" : "RHEL",
"version": [
{
"number": "8.8",
"release": "rhel8"
},
{
"number": "8.7",
"release": "rhel8"
},
{
"number": "8.6",
"release": "rhel8"
},
{
"number": "9.2",
"release": "rhel9"
@@ -50,6 +38,14 @@
"number": "9.1",
"release": "rhel9"
},
{
"number": "8.8",
"release": "rhel8"
},
{
"number": "8.7",
"release": "rhel8"
},
]
},
{

View File

@@ -57,9 +57,9 @@ sudo apt update
:sync: RHEL
::::{tab-set}
:::{tab-item} RHEL 8.6
:sync: RHEL-8.6
:sync: RHEL-8
:::{tab-item} RHEL 9.2
:sync: RHEL-9.2
:sync: RHEL-9
```shell
# version
@@ -69,51 +69,7 @@ version=5.7
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/8.6/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 8.7
:sync: RHEL-8.7
:sync: RHEL-8
```shell
# version
version=5.7
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/8.7/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 8.8
:sync: RHEL-8.8
:sync: RHEL-8
```shell
# version
version=5.7
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/8.8/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/9.2/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -145,9 +101,9 @@ sudo yum clean all
```
:::
:::{tab-item} RHEL 9.2
:sync: RHEL-9.2
:sync: RHEL-9
:::{tab-item} RHEL 8.8
:sync: RHEL-8.8
:sync: RHEL-8
```shell
# version
@@ -157,7 +113,29 @@ version=5.7
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/9.2/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/8.8/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 8.7
:sync: RHEL-8.7
:sync: RHEL-8
```shell
# version
version=5.7
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/8.7/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -173,26 +151,6 @@ sudo yum clean all
:sync: SLES
::::{tab-set}
:::{tab-item} SLES 15.4
:sync: SLES-15.4
```shell
# version
version=5.7
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/$version/sle/15.4/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo zypper ref
```
:::
:::{tab-item} SLES 15.5
:sync: SLES-15.5
@@ -212,6 +170,26 @@ EOF
sudo zypper ref
```
:::
:::{tab-item} SLES 15.4
:sync: SLES-15.4
```shell
# version
version=5.7
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/$version/sle/15.4/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo zypper ref
```
:::
::::
:::::

View File

@@ -116,12 +116,16 @@ sudo crb enable
Add the perl languages repository.
```{note}
Mar 25, 2024: We currently need to install the Perl module from SLES 15 SP5 as a workaround. The module was removed for SLES 15 SP4.
```
::::{tab-set}
:::{tab-item} SLES 15.4
:sync: SLES-15.4
```shell
zypper addrepo https://download.opensuse.org/repositories/devel:languages:perl/SLE_15_SP4/devel:languages:perl.repo
zypper addrepo https://download.opensuse.org/repositories/devel:/languages:/perl/15.5/devel:languages:perl.repo
```
:::

View File

@@ -41,7 +41,7 @@ deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/am
EOF
# ROCm repository for {{ version.release }}
sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/debian {{ version.release }} main
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/{{ family.amdgpu_version }} {{ version.release }} main
EOF
# Prefer packages from the rocm repository over system packages
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600

View File

@@ -24,7 +24,7 @@ MIGraphX is a graph compiler focused on accelerating the Machine Learning infere
After doing all these transformations, MIGraphX emits code for the AMD GPU by calling to MIOpen or rocBLAS or creating HIP kernels for a particular operator. MIGraphX can also target CPUs using DNNL or ZenDNN libraries.
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using MIGraphX's C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using the MIGraphX C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
- Number of arguments
@@ -187,7 +187,7 @@ Follow these steps:
}
```
2. To compile this program, you can use CMake and you only need to link the `migraphx::c` library to use MIGraphX's C++ API. The following is the `CMakeLists.txt` file that can build the earlier example:
2. To compile this program, you can use CMake and you only need to link the `migraphx::c` library to use the MIGraphX C++ API. The following is the `CMakeLists.txt` file that can build the earlier example:
```cmake
cmake_minimum_required(VERSION 3.5)
@@ -327,7 +327,7 @@ To run generated `.mxr` files through `migraphx-driver`, use the following:
./path/to/migraphx-driver run --migraphx resnet50.mxr --enable-offload-copy
```
Alternatively, you can use MIGraphX's C++ or Python API to generate `.mxr` file. Refer to {numref}`image018` for an example.
Alternatively, you can use the MIGraphX C++ or Python API to generate `.mxr` file. Refer to {numref}`image018` for an example.
```{figure} ../../data/understand/deep_learning/image.018.png
:name: image018

View File

@@ -290,7 +290,7 @@ USE_ROCM=1 MAX_JOBS=4 python3 setup.py install --user
### Test the PyTorch Installation
You can use PyTorch unit tests to validate a PyTorch installation. If using a
prebuilt PyTorch Docker image from AMD ROCm DockerHub or installing an official
prebuilt PyTorch Docker image from AMD ROCm Docker Hub or installing an official
wheels package, these tests are already run on those configurations.
Alternatively, you can manually run the unit tests to validate the PyTorch
installation fully.

View File

@@ -3,7 +3,7 @@ Docker image support matrix
******************************************************************
AMD validates and publishes `PyTorch <https://hub.docker.com/r/rocm/pytorch>`_ and `TensorFlow <https://hub.docker.com/r/rocm/tensorflow>`_
containers on dockerhub. The following tags, and associated inventories, are validated with ROCm 5.7.
containers on docker hub. The following tags, and associated inventories, are validated with ROCm 5.7.
.. tab-set::

View File

@@ -110,7 +110,8 @@ for those using select RDNA™ 3 GPU with graphical applications and ROCm.
| Name | Architecture |[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Support|
|:----:|:---------------:|:--------------------------------------------------------------------:|:-------:|
| AMD Radeon™ RX 7900 XTX | RDNA3 | gfx1100 | ✅ (Ubuntu 22.04 only)|
| AMD Radeon™ VII | GCN5.1 | gfx906 | ✅ |
| AMD Radeon™ RX 7900 XT | RDNA3 | gfx1100 | ✅ (Ubuntu 22.04 only)|
| AMD Radeon™ VII | GCN5.1 | gfx906 | ✅ |
::::

View File

@@ -5,7 +5,6 @@ The following table is a list of ROCm components with links to their respective
terms. These components may include third party components subject to
additional licenses. Please review individual repositories for more information.
The table shows ROCm components, the name of license and link to the license terms.
The table is ordered to follow ROCm's manifest file.
<!-- spellcheck-disable -->
| Component | License |

View File

@@ -2,6 +2,9 @@
| Version | Release Date |
| ------- | ------------ |
| [5.7.1](https://rocm.docs.amd.com/en/docs-5.7.1/) | Oct 13, 2023 |
| [5.7.0](https://rocm.docs.amd.com/en/docs-5.7.0/) | Sep 15, 2023 |
| [5.6.1](https://rocm.docs.amd.com/en/docs-5.6.1/) | Aug 29, 2023 |
| [5.6.0](https://rocm.docs.amd.com/en/docs-5.6.0/) | Jun 28, 2023 |
| [5.5.1](https://rocm.docs.amd.com/en/docs-5.5.1/) | May 24, 2023 |
| [5.5.0](https://rocm.docs.amd.com/en/docs-5.5.0/) | May 1, 2023 |

View File

@@ -1 +1,2 @@
rocm-docs-core==0.26.0
rocm-docs-core==1.8.0
sphinx-reredirects

View File

@@ -4,105 +4,103 @@
#
# pip-compile requirements.in
#
accessible-pygments==0.0.3
accessible-pygments==0.0.5
# via pydata-sphinx-theme
alabaster==0.7.13
alabaster==1.0.0
# via sphinx
babel==2.11.0
babel==2.16.0
# via
# pydata-sphinx-theme
# sphinx
beautifulsoup4==4.11.2
beautifulsoup4==4.12.3
# via pydata-sphinx-theme
breathe==4.34.0
breathe==4.35.0
# via rocm-docs-core
certifi==2023.7.22
certifi==2024.8.30
# via requests
cffi==1.15.1
cffi==1.17.1
# via
# cryptography
# pynacl
charset-normalizer==2.1.1
charset-normalizer==3.3.2
# via requests
click==8.1.3
click==8.1.7
# via sphinx-external-toc
cryptography==41.0.3
cryptography==43.0.1
# via pyjwt
deprecated==1.2.13
deprecated==1.2.14
# via pygithub
docutils==0.19
docutils==0.21.2
# via
# breathe
# myst-parser
# pydata-sphinx-theme
# sphinx
fastjsonschema==2.16.3
fastjsonschema==2.20.0
# via rocm-docs-core
gitdb==4.0.10
gitdb==4.0.11
# via gitpython
gitpython==3.1.30
gitpython==3.1.43
# via rocm-docs-core
idna==3.4
idna==3.10
# via requests
imagesize==1.4.1
# via sphinx
jinja2==3.1.2
jinja2==3.1.4
# via
# myst-parser
# sphinx
markdown-it-py==2.2.0
markdown-it-py==3.0.0
# via
# mdit-py-plugins
# myst-parser
markupsafe==2.1.2
markupsafe==2.1.5
# via jinja2
mdit-py-plugins==0.3.4
mdit-py-plugins==0.4.2
# via myst-parser
mdurl==0.1.2
# via markdown-it-py
myst-parser==1.0.0
myst-parser==4.0.0
# via rocm-docs-core
packaging==23.0
packaging==24.1
# via
# pydata-sphinx-theme
# sphinx
pycparser==2.21
pycparser==2.22
# via cffi
pydata-sphinx-theme==0.13.3
pydata-sphinx-theme==0.15.4
# via
# rocm-docs-core
# sphinx-book-theme
pygithub==1.58.1
pygithub==2.4.0
# via rocm-docs-core
pygments==2.15.0
pygments==2.18.0
# via
# accessible-pygments
# pydata-sphinx-theme
# sphinx
pyjwt[crypto]==2.6.0
pyjwt[crypto]==2.9.0
# via pygithub
pynacl==1.5.0
# via pygithub
pytz==2022.7.1
# via babel
pyyaml==6.0
pyyaml==6.0.2
# via
# myst-parser
# rocm-docs-core
# sphinx-external-toc
requests==2.31.0
requests==2.32.3
# via
# pygithub
# sphinx
rocm-docs-core==0.26.0
rocm-docs-core==1.8.0
# via -r requirements.in
smmap==5.0.0
smmap==5.0.1
# via gitdb
snowballstemmer==2.2.0
# via sphinx
soupsieve==2.4
soupsieve==2.6
# via beautifulsoup4
sphinx==5.3.0
sphinx==8.0.2
# via
# breathe
# myst-parser
@@ -113,31 +111,40 @@ sphinx==5.3.0
# sphinx-design
# sphinx-external-toc
# sphinx-notfound-page
sphinx-book-theme==1.0.1
# sphinx-reredirects
sphinx-book-theme==1.1.3
# via rocm-docs-core
sphinx-copybutton==0.5.1
sphinx-copybutton==0.5.2
# via rocm-docs-core
sphinx-design==0.4.1
sphinx-design==0.6.1
# via rocm-docs-core
sphinx-external-toc==0.3.1
sphinx-external-toc==1.0.1
# via rocm-docs-core
sphinx-notfound-page==0.8.3
sphinx-notfound-page==1.0.4
# via rocm-docs-core
sphinxcontrib-applehelp==1.0.4
sphinx-reredirects==0.1.5
# via -r requirements.in
sphinxcontrib-applehelp==2.0.0
# via sphinx
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-devhelp==2.0.0
# via sphinx
sphinxcontrib-htmlhelp==2.0.1
sphinxcontrib-htmlhelp==2.1.0
# via sphinx
sphinxcontrib-jsmath==1.0.1
# via sphinx
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-qthelp==2.0.0
# via sphinx
sphinxcontrib-serializinghtml==1.1.5
sphinxcontrib-serializinghtml==2.0.0
# via sphinx
typing-extensions==4.5.0
# via pydata-sphinx-theme
urllib3==1.26.13
# via requests
wrapt==1.14.1
tomli==2.0.1
# via sphinx
typing-extensions==4.12.2
# via
# pydata-sphinx-theme
# pygithub
urllib3==2.2.3
# via
# pygithub
# requests
wrapt==1.16.0
# via deprecated

View File

@@ -13,7 +13,7 @@ The full list of HSA system architecture platform requirements are here: `HSA Sy
The ROCm Platform uses the new PCI Express 3.0 (PCIe 3.0) features for Atomic Read-Modify-Write Transactions which extends inter-processor synchronization mechanisms to IO to support the defined set of HSA capabilities needed for queuing and signaling memory operations.
The new PCIe AtomicOps operate as completers for ``CAS`` (Compare and Swap), ``FetchADD``, ``SWAP`` atomics. The AtomicsOps are initiated by the
The new PCIe atomic operations operate as completers for ``CAS`` (Compare and Swap), ``FetchADD``, ``SWAP`` atomics. The atomic operations are initiated by the
I/O device which support 32-bit, 64-bit and 128-bit operand which target address have to be naturally aligned to operation sizes.
For ROCm the Platform atomics are used in ROCm in the following ways:
@@ -22,11 +22,11 @@ For ROCm the Platform atomics are used in ROCm in the following ways:
* Update HSA queues write_dispatch_id: 64 bit atomic add used by the CPU and GPU agent to support multi-writer queue insertions.
* Update HSA Signals 64bit atomic ops are used for CPU & GPU synchronization.
The PCIe 3.0 AtomicOp feature allows atomic transactions to be requested by, routed through and completed by PCIe components. Routing and completion does not require software support. Component support for each is detectable via the DEVCAP2 register. Upstream bridges need to have AtomicOp routing enabled or the Atomic Operations will fail even though PCIe endpoint and PCIe I/O Devices has the capability to Atomics Operations.
The PCIe 3.0 atomic operations feature allows atomic transactions to be requested by, routed through and completed by PCIe components. Routing and completion does not require software support. Component support for each is detectable via the DEVCAP2 register. Upstream bridges need to have atomic operations routing enabled or the Atomic Operations will fail even though PCIe endpoint and PCIe I/O Devices has the capability to Atomics Operations.
To do AtomicOp routing capability between two or more Root Ports, each associated Root Port must indicate that capability via the AtomicOp Routing Supported bit in the Device Capabilities 2 register.
To do atomic operations routing capability between two or more Root Ports, each associated Root Port must indicate that capability via the atomic operations routing supported bit in the Device Capabilities 2 register.
If your system has a PCIe Express Switch it needs to support AtomicsOp routing. Again AtomicOp requests are permitted only if a components ``DEVCTL2.ATOMICOP_REQUESTER_ENABLE`` field is set. These requests can only be serviced if the upstream components support AtomicOp completion and/or routing to a component which does. AtomicOp Routing Support=1 Routing is supported, AtomicOp Routing Support=0 routing is not supported.
If your system has a PCIe Express Switch it needs to support atomic operations routing. Atomic operations requests are permitted only if a components ``DEVCTL2.ATOMICOP_REQUESTER_ENABLE`` field is set. These requests can only be serviced if the upstream components support atomic operations completion and/or routing to a component which does. Atomic operations routing support=1, routing is supported; Atomic operations routing support=0, routing is not supported.
Atomic Operation is a Non-Posted transaction supporting 32-bit and 64-bit address formats, there must be a response for Completion containing the result of the operation. Errors associated with the operation (uncorrectable error accessing the target location or carrying out the Atomic operation) are signaled to the requester by setting the Completion Status field in the completion descriptor, they are set to to Completer Abort (CA) or Unsupported Request (UR).
@@ -69,7 +69,7 @@ BAR Memory Overview
*******************
On a Xeon E5 based system in the BIOS we can turn on above 4GB PCIe addressing, if so he need to set MMIO Base address ( MMIOH Base) and Range ( MMIO High Size) in the BIOS.
In SuperMicro system in the system bios you need to see the following
In Supermicro system in the system bios you need to see the following
* Advanced->PCIe/PCI/PnP configuration-> Above 4G Decoding = Enabled
@@ -77,7 +77,7 @@ In SuperMicro system in the system bios you need to see the following
* Advanced->PCIe/PCI/PnP Configuration->MMIO High Size = 256G
When we support Large Bar Capability there is a Large Bar Vbios which also disable the IO bar.
When we support Large Bar Capability there is a Large Bar VBIOS which also disable the IO bar.
For GFX9 and Vega10 which have Physical Address up 44 bit and 48 bit Virtual address.
@@ -116,30 +116,5 @@ Legend:
5 : Expansion ROM This is required for the AMD Driver SW to access the GPUs video-bios. This is currently fixed at 128KB.
Excepts form Overview of Changes to PCI Express 3.0
===================================================
By Mike Jackson, Senior Staff Architect, MindShare, Inc.
********************************************************
Atomic Operations Goal:
*************************
Support SMP-type operations across a PCIe network to allow for things like offloading tasks between CPU cores and accelerators like a GPU. The spec says this enables advanced synchronization mechanisms that are particularly useful with multiple producers or consumers that need to be synchronized in a non-blocking fashion. Three new atomic non-posted requests were added, plus the corresponding completion (the address must be naturally aligned with the operand size or the TLP is malformed):
* Fetch and Add uses one operand as the “add” value. Reads the target location, adds the operand, and then writes the result back to the original location.
* Unconditional Swap uses one operand as the “swap” value. Reads the target location and then writes the swap value to it.
* Compare and Swap uses 2 operands: first data is compare value, second is swap value. Reads the target location, checks it against the compare value and, if equal, writes the swap value to the target location.
* AtomicOpCompletion new completion to give the result so far atomic request and indicate that the atomicity of the transaction has been maintained.
Since AtomicOps are not locked they don't have the performance downsides of the PCI locked protocol. Compared to locked cycles, they provide “lower latency, higher scalability, advanced synchronization algorithms, and dramatically lower impact on other PCIe traffic.” The lock mechanism can still be used across a bridge to PCI or PCI-X to achieve the desired operation.
AtomicOps can go from device to device, device to host, or host to device. Each completer indicates whether it supports this capability and guarantees atomic access if it does. The ability to route AtomicOps is also indicated in the registers for a given port.
ID-based Ordering Goal:
*************************
Improve performance by avoiding stalls caused by ordering rules. For example, posted writes are never normally allowed to pass each other in a queue, but if they are requested by different functions, we can have some confidence that the requests are not dependent on each other. The previously reserved Attribute bit [2] is now combined with the RO bit to indicate ID ordering with or without relaxed ordering.
This only has meaning for memory requests, and is reserved for Configuration or IO requests. Completers are not required to copy this bit into a completion, and only use the bit if their enable bit is set for this operation.
To read more on PCIe Gen 3 new options https://www.mindshare.com/files/resources/PCIe%203-0.pdf
For more information, you can review
`Overview of Changes to PCI Express 3.0 <https://www.mindshare.com/files/resources/PCIe%203-0.pdf>`_.

View File

@@ -4,7 +4,7 @@ Using CMake
Most components in ROCm support CMake. Projects depending on header-only or
library components typically require CMake 3.5 or higher whereas those wanting
to make use of CMake's HIP language support will require CMake 3.21 or higher.
to make use of the CMake HIP language support will require CMake 3.21 or higher.
Finding Dependencies
====================
@@ -16,7 +16,7 @@ Finding Dependencies
<https://cmake.org/cmake/help/latest/command/find_package.html>`_ and the
`Using Dependencies Guide
<https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html>`_
to get an overview of CMake's related facilities.
to get an overview of CMake related facilities.
In short, CMake supports finding dependencies in two ways:
@@ -28,7 +28,7 @@ In short, CMake supports finding dependencies in two ways:
regards needed to consume it.
ROCm predominantly relies on Config mode, one notable exception being the Module
driving the compilation of HIP programs on Nvidia runtimes. As such, when
driving the compilation of HIP programs on NVIDIA runtimes. As such, when
dependencies are not found in standard system locations, one either has to
instruct CMake to search for package config files in additional folders using
the ``CMAKE_PREFIX_PATH`` variable (a semi-colon separated list of filesystem
@@ -55,8 +55,8 @@ to the installation guides in these docs (`Linux <../deploy/linux/index.html>`_)
Using HIP in CMake
==================
ROCm componenents providing a C/C++ interface support being consumed using any
C/C++ toolchain that CMake knows how to drive. ROCm also supports CMake's HIP
ROCm components providing a C/C++ interface support being consumed using any
C/C++ toolchain that CMake knows how to drive. ROCm also supports the CMake HIP
language features, allowing users to program using the HIP single-source
programming model. When a program (or translation-unit) uses the HIP API without
compiling any GPU device code, HIP can be treated in CMake as a simple C/C++
@@ -172,7 +172,7 @@ all the flags necessary for device compilation.
.. note::
Compiling for the GPU device requires at least C++11.
This project can then be configured with for eg.
This project can then be configured with for the following CMake commands:
- Windows: ``cmake -D CMAKE_CXX_COMPILER:PATH=${env:HIP_PATH}\bin\clang++.exe``
@@ -186,7 +186,7 @@ When using the CXX language support to compile HIP device code, selecting the
target GPU architectures is done via setting the ``GPU_TARGETS`` variable.
``CMAKE_HIP_ARCHITECTURES`` only exists when the HIP language is enabled. By
default, this is set to some subset of the currently supported architectures of
AMD ROCm. It can be set to eg. ``-D GPU_TARGETS="gfx1032;gfx1035"``.
AMD ROCm. It can be set to the CMake option ``-D GPU_TARGETS="gfx1032;gfx1035"``.
ROCm CMake Packages
-------------------
@@ -251,9 +251,9 @@ options.
IDEs supporting CMake (Visual Studio, Visual Studio Code, CLion, etc.) all came
up with their own way to register command-line fragments of different purpose in
a setup'n'forget fashion for quick assembly using graphical front-ends. This is
a setup-and-forget fashion for quick assembly using graphical front-ends. This is
all nice, but configurations aren't portable, nor can they be reused in
Continuous Intergration (CI) pipelines. CMake has condensed existing practice
Continuous Integration (CI) pipelines. CMake has condensed existing practice
into a portable JSON format that works in all IDEs and can be invoked from any
command-line. This is
`CMake Presets <https://cmake.org/cmake/help/latest/manual/cmake-presets.7.html>`_

View File

@@ -10,6 +10,6 @@ disambiguates compiler naming used throughout the documentation.
| `amdclang++` | Clang/LLVM-based compiler that is part of `rocm-llvm` package. The source code is available at <a href="https://github.com/RadeonOpenCompute/llvm-project" target="_blank">https://github.com/RadeonOpenCompute/llvm-project</a>. |
| AOCC | Closed-source clang-based compiler that includes additional CPU optimizations. Offered as part of ROCm via the `rocm-llvm-alt` package. See for details, <a href="https://developer.amd.com/amd-aocc/" target="_blank">https://developer.amd.com/amd-aocc/</a>. |
| HIP-Clang | Informal term for the `amdclang++` compiler |
| HIPify | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPIFY" target="_blank">https://github.com/ROCm-Developer-Tools/HIPIFY</a> |
| HIPIFY | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPIFY" target="_blank">https://github.com/ROCm-Developer-Tools/HIPIFY</a> |
| `hipcc` | HIP compiler driver. A utility that invokes `clang` or `nvcc` depending on the target and passes the appropriate include and library options for the target compiler and HIP infrastructure. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPCC" target="_blank">https://github.com/ROCm-Developer-Tools/HIPCC</a>. |
| ROCmCC | Clang/LLVM-based compiler. ROCmCC in itself is not a binary but refers to the overall compiler. |

View File

@@ -81,7 +81,7 @@ The command processor counters are further classified into fetcher and compute.
| `spi_ra_lds_cu_full_csn` | CUs | Sum of CU where LDS cannot take csn wave when not fits |
| `spi_ra_bar_cu_full_csn[]` | CUs | Sum of CU where BARRIER cannot take csn wave when not fits |
| `spi_ra_bulky_cu_full_csn[]` | CUs | Sum of CU where BULKY cannot take csn wave when not fits |
| `spi_ra_tglim_cu_full_csn[]` | Cycles | Cycles where csn wants to req but all CUs are at tg_limit |
| `spi_ra_tglim_cu_full_csn[]` | Cycles | Cycles where csn wants to req but all CUs are at `tg_limit` |
| `spi_ra_wvlim_cu_full_csn[]` | Cycles | Number of clocks csn is stalled due to WAVE LIMIT |
| `spi_vwc_csc_wr` | Cycles | Number of clocks to write CSC waves to VGPRs (need to multiply this value by 4) |
| `spi_swc_csc_wr` | Cycles | Number of clocks to write CSC waves to SGPRs (need to multiply this value by 4) |
@@ -288,9 +288,9 @@ The vector L1 cache subsystem counters are further classified into texture addre
| `tcp_gate_en2` | Cycles | Number of cycles vL1D core clocks are turned on |
| `tcp_td_tcp_stall_cycles` | Cycles | Number of cycles TD stalls vL1D |
| `tcp_tcr_tcp_stall_cycles` | Cycles | Number of cycles TCR stalls vL1D |
| `tcp_read_tagconflict_stall_cycles` | Cycles | Number of cycles tagram conflict stalls on a Read |
| `tcp_write_tagconflict_stall_cycles` | Cycles | Number of cycles tagram conflict stalls on a Write |
| `tcp_atomic_tagconflict_stall_cycles` | Cycles | Number of cycles tagram conflict stalls on an Atomic |
| `tcp_read_tagconflict_stall_cycles` | Cycles | Number of cycles tag RAM conflict stalls on a Read |
| `tcp_write_tagconflict_stall_cycles` | Cycles | Number of cycles tag RAM conflict stalls on a Write |
| `tcp_atomic_tagconflict_stall_cycles` | Cycles | Number of cycles tag RAM conflict stalls on an Atomic |
| `tcp_pending_stall_cycles` | Cycles | Number of cycles vL1D cache is stalled due to data pending from L2 Cache |
| `tcp_ta_tcp_state_read` | Req | Number of wavefront instruction requests to vL1D |
| `tcp_volatile[]` | Req | Number of L1 volatile pixels/buffers from TA |
@@ -347,7 +347,7 @@ The vector L1 cache subsystem counters are further classified into texture addre
| `tcc_CC_req` |Req | Number of CC requests |
| `tcc_RW_req` |Req | Number of RW requests |
| `tcc_probe` |Req | Number of L2 Cache probe requests |
| `tcc_probe_all[]` |Req | Number of external probe requests with EA_TCC_preq_all== 1 |
| `tcc_probe_all[]` |Req | Number of external probe requests with `EA_TCC_preq_all== 1` |
| `tcc_read_req` |Req | Number of L2 Cache Read requests |
| `tcc_write_req` |Req | Number of L2 Cache Write requests |
| `tcc_atomic_req` |Req | Number of L2 Cache Atomic requests |