Compare commits

..

8 Commits

Author SHA1 Message Date
Sam Wu
365b31728d Update doc reqs for 5.7.1 (#2558)
* Update doc reqs

rocm-docs-core==0.26.0

* Update release notes
2023-10-13 17:12:49 -06:00
Sam Wu
b6c71018a6 Disable epub format in rtd yaml config (#2557)
Because rubric is not supported

ValueError: <container: <rubric...><container...>> is not in list
2023-10-13 16:51:16 -06:00
Sam Wu
54177e8b96 Update rtd conf.py for 5.7.1 (#2556) 2023-10-13 16:41:19 -06:00
Saad Rahim (AMD)
74f4f86c92 5.7.1 Release Notes (#2550)
* 5.7.1 Release Notes

* Run script for 5.7.1 release notes

* Update CHANGELOG header

---------

Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-10-13 16:11:48 -06:00
Nara
74d8f95afb ROCm 5.7.1 Linux install and compatibility updates (#2547) 2023-10-13 15:16:14 -06:00
Saad Rahim (AMD)
50ad3847e5 Docker Image Support table updates (#2545) 2023-10-12 14:00:30 -06:00
Lisa
444efec642 Docker support updates (#2541) 2023-10-11 11:35:10 -06:00
Lisa
7d22b96c5d remove image (#2505) 2023-10-06 15:39:53 -06:00
38 changed files with 891 additions and 1966 deletions

5
.gitignore vendored
View File

@@ -16,3 +16,8 @@ _readthedocs/
docs/contributing.md
docs/release.md
docs/CHANGELOG.md
# auto-generated files
docs/deploy/linux/installer/install.md
docs/deploy/linux/os-native/install.md
docs/deploy/linux/quick_start.md

View File

@@ -3,19 +3,16 @@
version: 2
build:
os: ubuntu-22.04
tools:
python: "3.10"
apt_packages:
- "doxygen"
- "graphviz" # For dot graphs in doxygen
sphinx:
configuration: docs/conf.py
formats: [htmlzip, pdf]
python:
install:
- requirements: docs/sphinx/requirements.txt
sphinx:
configuration: docs/conf.py
formats: []
build:
os: ubuntu-20.04
tools:
python: "3.8"

View File

@@ -1,686 +1,122 @@
AAC
# building
matchers
# file_reorg
FHS
incrementing
Filesystem
filesystem
rocm
# gpu_aware_mpi
DMA
GDR
HCA
MPI
MVAPICH
Mellanox's
NIC
OFED
OSU
OpenFabrics
PeerDirect
RDMA
UCX
ib_core
# isv_deployment_win
ABI
ACE
ACEs
AccVGPR
AccVGPRs
# linear algebra
LAPACK
MMA
backends
cuSOLVER
cuSPARSE
# mi200_performance_counters
ALU
AMD
AMDGPU
AMDGPUs
AMDMIGraphX
AMI
AOCC
AOMP
APIC
APIs
APU
ASIC
ASICs
ASan
ASm
ATI
AddressSanitizer
AlexNet
Arb
BLAS
BMC
BitCode
Blit
Bluefield
CCD
CDNA
CIFAR
CLI
CLion
CMake
CMakeLists
CMakePackage
CP
CPC
CPF
CPP
CPU
CPUs
CSC
CSE
CSV
CSn
CTests
CU
CUDA
CUs
CXX
Cavium
CentOS
ChatGPT
CoRR
Codespaces
Commitizen
CommonMark
Concretized
Conda
ConnectX
DGEMM
DKMS
DL
DMA
DNN
DNNL
DPM
DRI
DW
DWORD
Dask
DataFrame
DataLoader
DataParallel
DeepSpeed
Dependabot
DevCap
Dockerfile
Doxygen
ELMo
ENDPGM
EPYC
ESXi
FFT
FFTs
FFmpeg
FHS
FMA
FP
Filesystem
Flang
Fortran
Fuyu
GALB
GCD
GCDs
GCN
GDB
GDDR
GDR
GDS
GEMM
GEMMs
GFortran
GiB
GIM
GL
GLXT
GMI
GPG
GPR
GPT
GPU
GPU's
GPUs
GRBM
GenAI
GenZ
GitHub
Gitpod
HBM
HCA
HIPCC
HIPExtension
HIPIFY
HPC
HPCG
HPE
HPL
HSA
HWE
Haswell
Higgs
Hyperparameters
ICV
IDE
IDEs
IMDb
IOMMU
IOP
IOPM
IOV
IRQ
ISA
ISV
ISVs
ImageNet
InfiniBand
Inlines
IntelliSense
Intersphinx
Intra
Ioffe
JSON
Jupyter
KFD
KiB
KVM
Keras
Khronos
LAPACK
LCLK
LDS
LLM
LLMs
LLVM
LM
LSAN
LTS
LoRA
MEM
MERCHANTABILITY
MFMA
MiB
MIGraphX
MIOpen
MIOpenGEMM
MIVisionX
MLM
MMA
MMIO
MMIOH
MNIST
MPI
MSVC
MVAPICH
MVFFR
Makefile
Makefiles
Matplotlib
Megatron
Mellanox
Mellanox's
Meta's
MirroredStrategy
Multicore
Multithreaded
MyEnvironment
MyST
NBIO
NBIOs
NIC
NICs
NLI
NLP
NPS
NSP
NUMA
NVCC
NVIDIA
NVPTX
NaN
Nano
Navi
Noncoherently
NousResearch's
NumPy
OAM
OAMs
OCP
OEM
OFED
OMP
OMPI
OMPT
OMPX
ONNX
OSS
OSU
OpenCL
OpenCV
OpenFabrics
OpenGL
OpenMP
OpenSSL
OpenVX
PCI
PCIe
PEFT
PIL
PILImage
PRNG
PRs
PaLM
Pageable
PeerDirect
Perfetto
PipelineParallel
PnP
PowerShell
PyPi
PyTorch
Qcycles
RAII
RCCL
RDC
RDMA
RDNA
RHEL
ROC
ROCProfiler
ROCTracer
ROCclr
ROCdbgapi
ROCgdb
ROCk
ROCm
ROCmCC
ROCmSoftwarePlatform
ROCmValidationSuite
ROCr
RST
RW
Radeon
RelWithDebInfo
Req
Rickle
RoCE
Ryzen
SALU
SBIOS
SCA
SDK
SDMA
SDRAM
SENDMSG
SGPR
SGPRs
SHA
SIGQUIT
SIMD
SIMDs
SKU
SKUs
SLES
SMEM
SMI
SMT
SPI
SQs
SRAM
SRAMECC
SVD
SWE
SerDes
Shlens
Skylake
Softmax
Spack
Supermicro
Szegedy
TCA
TCC
TCI
TCIU
TCP
TCR
TF
TFLOPS
TPU
TPUs
TensorBoard
TensorFlow
TensorParallel
ToC
TorchAudio
TorchMIGraphX
TorchScript
TorchServe
TorchVision
TransferBench
TrapStatus
UAC
UC
UCC
UCX
UIF
USM
UTCL
UTIL
Uncached
Unhandled
VALU
VBIOS
VGPR
VGPRs
VM
VMEM
VMWare
VRAM
VSIX
VSkipped
Vanhoucke
Vulkan
WGP
WGPs
WX
WikiText
Wojna
Workgroups
Writebacks
XCD
XCDs
XGBoost
XGBoost's
XGMI
XT
XTX
Xeon
Xilinx
Xnack
Xteam
YAML
YML
YModel
ZeRO
ZenDNN
accuracies
activations
addr
alloc
allocator
allocators
amdgpu
api
atmi
atomics
autogenerated
avx
awk
backend
backends
benchmarking
bfloat
bilinear
bitsandbytes
blit
boson
bosons
buildable
bursty
bzip
cacheable
cd
centos
centric
changelog
chiplet
cmake
cmd
coalescable
codename
collater
comgr
completers
composable
concretization
config
conformant
convolutional
convolves
cpp
csn
cuBLAS
cuFFT
cuLIB
cuRAND
cuSOLVER
cuSPARSE
dataset
datasets
dataspace
datatype
datatypes
dbgapi
de
deallocation
denoise
denoised
denoises
denormalize
deserializers
detections
dev
devicelibs
devsel
dimensionality
disambiguates
distro
el
embeddings
enablement
endpgm
encodings
env
epilog
etcetera
ethernet
exascale
executables
ffmpeg
filesystem
fortran
galb
gcc
gdb
gfortran
gfx
githooks
github
gnupg
grayscale
gzip
heterogenous
hipBLAS
hipBLASLt
hipCUB
hipFFT
hipLIB
hipRAND
hipSOLVER
hipSPARSE
hipSPARSELt
hipTensor
hipamd
hipblas
hipcub
hipfft
hipfort
hipify
hipsolver
hipsparse
hpp
hsa
hsakmt
hyperparameter
ib_core
inband
incrementing
inferencing
inflight
init
initializer
inlining
installable
interprocedural
intra
invariants
invocating
ipo
kdb
latencies
libfabric
libjpeg
libs
linearized
linter
linux
llvm
localscratch
logits
lossy
macOS
matchers
microarchitecture
migraphx
miopen
miopengemm
mivisionx
mkdir
mlirmiopen
mtypes
mvffr
namespace
namespaces
numref
ocl
opencl
opencv
openmp
openssl
optimizers
os
pageable
parallelization
parameterization
passthrough
perfcounter
performant
perl
pragma
pre
prebuilt
precompiled
prefetch
prefetchable
preprocess
preprocessed
preprocessing
prequantized
prerequisites
profiler
protobuf
pseudorandom
py
quasirandom
queueing
rccl
rdc
reStructuredText
reformats
repos
representativeness
preq
req
resampling
rescaling
reusability
roadmap
roc
rocAL
rocALUTION
rocBLAS
rocFFT
rocLIB
rocMLIR
rocPRIM
rocRAND
rocSOLVER
rocSPARSE
rocThrust
rocWMMA
rocalution
rocblas
rocclr
rocfft
rocm
rocminfo
rocprim
rocprof
rocprofiler
rocr
rocrand
rocsolver
rocsparse
rocthrust
roctracer
runtime
runtimes
sL
scalability
scalable
sendmsg
serializers
shader
sharding
sigmoid
sm
smi
softmax
spack
src
stochastically
strided
subdirectory
subexpression
subfolder
subfolders
supercomputing
tensorfloat
th
tokenization
tokenize
tokenized
tokenizer
tokenizes
toolchain
toolchains
toolset
toolsets
torchvision
tqdm
tracebacks
txt
uarch
tagram
tg
uncached
uncorrectable
uninstallation
unsqueeze
unstacking
unswitching
untrusted
untuned
upvote
USM
UTCL
UTIL
utils
vL
variational
vdi
vectorizable
vectorization
vectorize
vectorized
vectorizer
vectorizes
vjxb
walkthrough
walkthroughs
wavefront
wavefronts
whitespaces
workgroup
workgroups
writeback
writebacks
wrreq
wzo
xargs
xz
yaml
ysvmadyb
zypper
# openmp
ICV
Multithreaded
# tuning_guides
BMC
DGEMM
HPCG
HPL
IOPM
# windows
SKU
SKUs
PowerShell
UAC
# pytorch_install
kdb
precompiled
# gpu_os_support
HWE
el
# using_gpu_sanitizer
LSAN
deallocation
detections
tracebacks
workgroup

View File

@@ -1,4 +1,4 @@
# Release Notes
# Changelog
<!-- Do not edit this file! This file is autogenerated with -->
<!-- tools/autotag/tag_script.py -->
@@ -11,7 +11,70 @@
<!-- spellcheck-disable -->
The release notes for the ROCm platform.
The changelog for the ROCm platform.
-------------------
## ROCm 5.7.1
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-duplicate-header -->
### What's New in This Release
### ROCm Libraries
#### rocBLAS
A new functionality rocblas-gemm-tune and an environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH are added to rocBLAS in the ROCm 5.7.1 release.
*rocblas-gemm-tune* is used to find the best-performing GEMM kernel for each GEMM problem set. It has a command line interface, which mimics the --yaml input used by rocblas-bench. To generate the expected --yaml input, profile logging can be used, by setting the environment variable ROCBLAS_LAYER4.
For more information on rocBLAS logging, see Logging in rocBLAS, in the [API Reference Guide](https://rocm.docs.amd.com/projects/rocBLAS/en/docs-5.7.1/API_Reference_Guide.html#logging-in-rocblas).
An example input file: Expected output (note selected GEMM idx may differ): Where the far right values (solution_index) are the indices of the best-performing kernels for those GEMMs in the rocBLAS kernel library. These indices can be directly used in future GEMM calls. See rocBLAS/samples/example_user_driven_tuning.cpp for sample code of directly using kernels via their indices.
If the output is stored in a file, the results can be used to override default kernel selection with the kernels found, by setting the environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH, where points to the stored file.
For more details, refer to the [rocBLAS Programmer's Guide.](https://rocm.docs.amd.com/projects/rocBLAS/en/latest/Programmers_Guide.html#rocblas-gemm-tune)
#### HIP 5.7.1 (for ROCm 5.7.1)
ROCm 5.7.1 is a point release with several bug fixes in the HIP runtime.
### Fixed defects
The *hipPointerGetAttributes* API returns the correct HIP memory type as *hipMemoryTypeManaged* for managed memory.
### Library Changes in ROCM 5.7.1
| Library | Version |
|---------|---------|
| hipBLAS | [1.1.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.7.1) |
| hipCUB | [2.13.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.7.1) |
| hipFFT | [1.0.12](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.7.1) |
| hipSOLVER | 1.8.1 ⇒ [1.8.2](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.7.1) |
| hipSPARSE | [2.3.8](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.7.1) |
| MIOpen | [2.19.0](https://github.com/ROCmSoftwarePlatform/MIOpen/releases/tag/rocm-5.7.1) |
| rocALUTION | [2.1.11](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.7.1) |
| rocBLAS | [3.1.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.7.1) |
| rocFFT | [1.0.24](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.7.1) |
| rocm-cmake | [0.10.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.7.1) |
| rocPRIM | [2.13.1](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.7.1) |
| rocRAND | [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.7.1) |
| rocSOLVER | [3.23.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.7.1) |
| rocSPARSE | [2.5.4](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.7.1) |
| rocThrust | [2.18.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.7.1) |
| rocWMMA | [1.2.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.7.1) |
| Tensile | [4.38.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.7.1) |
#### hipSOLVER 1.8.2
hipSOLVER 1.8.2 for ROCm 5.7.1
##### Fixed
- Fixed conflicts between the hipsolver-dev and -asan packages by excluding
hipsolver_module.f90 from the latter
-------------------

View File

@@ -15,469 +15,63 @@ The release notes for the ROCm platform.
-------------------
## ROCm 5.7.0
## ROCm 5.7.1
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-duplicate-header -->
### Release Highlights for ROCm v5.7
### What's New in This Release
ROCm 5.7.0 includes many new features. These include: a new library (hipTensor), and optimizations for rocRAND and MIVisionX. Address sanitizer for host and device code (GPU) is now available as a beta. Note that ROCm 5.7.0 is EOS for MI50. 5.7 versions of ROCm are the last major release in the ROCm 5 series. This release is Linux-only.
### ROCm Libraries
Important: The next major ROCm release (ROCm 6.0) will not be backward compatible with the ROCm 5 series. Changes will include: splitting LLVM packages into more manageable sizes, changes to the HIP runtime API, splitting rocRAND and hipRAND into separate packages, and reorganizing our file structure.
#### rocBLAS
A new functionality rocblas-gemm-tune and an environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH are added to rocBLAS in the ROCm 5.7.1 release.
#### AMD Instinct™ MI50 End of Support Notice
*rocblas-gemm-tune* is used to find the best-performing GEMM kernel for each GEMM problem set. It has a command line interface, which mimics the --yaml input used by rocblas-bench. To generate the expected --yaml input, profile logging can be used, by setting the environment variable ROCBLAS_LAYER4.
AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively gfx906 GPUs) will enter maintenance mode starting Q3 2023.
For more information on rocBLAS logging, see Logging in rocBLAS, in the [API Reference Guide](https://rocm.docs.amd.com/projects/rocBLAS/en/docs-5.7.1/API_Reference_Guide.html#logging-in-rocblas).
As outlined in [5.6.0](https://rocm.docs.amd.com/en/docs-5.6.0/release.html), ROCm 5.7 will be the final release for gfx906 GPUs to be in a fully supported state.
An example input file: Expected output (note selected GEMM idx may differ): Where the far right values (solution_index) are the indices of the best-performing kernels for those GEMMs in the rocBLAS kernel library. These indices can be directly used in future GEMM calls. See rocBLAS/samples/example_user_driven_tuning.cpp for sample code of directly using kernels via their indices.
- ROCm 6.0 release will show MI50s as "under maintenance" mode for [Linux](./about/release/linux_support) and [Windows](./about/release/windows_support)
If the output is stored in a file, the results can be used to override default kernel selection with the kernels found, by setting the environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH, where points to the stored file.
- No new features and performance optimizations will be supported for the gfx906 GPUs beyond this major release (ROCm 5.7).
For more details, refer to the [rocBLAS Programmer's Guide.](https://rocm.docs.amd.com/projects/rocBLAS/en/latest/Programmers_Guide.html#rocblas-gemm-tune)
- Bug fixes / critical security patches will continue to be supported for the gfx906 GPUs till Q2 2024 (EOM (End of Maintenance) will be aligned with the closest ROCm release).
#### HIP 5.7.1 (for ROCm 5.7.1)
- Bug fixes during the maintenance will be made to the next ROCm point release.
ROCm 5.7.1 is a point release with several bug fixes in the HIP runtime.
- Bug fixes will not be backported to older ROCm releases for gfx906.
### Fixed defects
The *hipPointerGetAttributes* API returns the correct HIP memory type as *hipMemoryTypeManaged* for managed memory.
- Distro / Operating system updates will continue as per the ROCm release cadence for gfx906 GPUs till EOM.
#### Feature Updates
##### Non-hostcall HIP Printf
**Current behavior**
The current version of HIP printf relies on hostcalls, which, in turn, rely on PCIe atomics. However, PCle atomics are unavailable in some environments, and, as a result, HIP-printf does not work in those environments. Users may see the following error from runtime (with AMD_LOG_LEVEL 1 and above),
```
Pcie atomics not enabled, hostcall not supported
```
**Workaround**
The ROCm 5.7 release introduces an alternative to the current hostcall-based implementation that leverages an older OpenCL-based printf scheme, which does not rely on hostcalls/PCIe atomics.
Note: This option is less robust than hostcall-based implementation and is intended to be a workaround when hostcalls do not work.
The printf variant is now controlled via a new compiler option -mprintf-kind=<value>. This is supported only for HIP programs and takes the following values,
- “hostcall” This currently available implementation relies on hostcalls, which require the system to support PCIe atomics. It is the default scheme.
- “buffered” This implementation leverages the older printf scheme used by OpenCL; it relies on a memory buffer where printf arguments are stored during the kernel execution, and then the runtime handles the actual printing once the kernel finishes execution.
**NOTE**: With the new workaround,
- The printf buffer is fixed size and non-circular. After the buffer is filled, calls to printf will not result in additional output.
- The printf call returns either 0 (on success) or -1 (on failure, due to full buffer), unlike the hostcall scheme that returns the number of characters printed.
##### Beta Release of LLVM Address Sanitizer (ASAN) with the GPU
The ROCm v5.7 release introduces the beta release of LLVM Address Sanitizer (ASAN) with the GPU. The LLVM Address Sanitizer provides a process that allows developers to detect runtime addressing errors in applications and libraries. The detection is achieved using a combination of compiler-added instrumentation and runtime techniques, including function interception and replacement.
Until now, the LLVM Address Sanitizer process was only available for traditional purely CPU applications. However, ROCm has extended this mechanism to additionally allow the detection of some addressing errors on the GPU in heterogeneous applications. Ideally, developers should treat heterogeneous HIP and OpenMP applications like pure CPU applications. However, this simplicity has not been achieved yet.
Refer to the documentation on LLVM Address Sanitizer with the GPU at [LLVM Address Sanitizer User Guide](understand/using_gpu_sanitizer.md).
**Note**: The beta release of LLVM Address Sanitizer for ROCm is currently tested and validated on Ubuntu 20.04.
#### Fixed Defects
The following defects are fixed in ROCm v5.7,
- Test hangs observed in HMM RCCL
- NoGpuTst test of Catch2 fails with Docker
- Failures observed with non-HMM HIP directed catch2 tests with XNACK+
- Multiple test failures and test hangs observed in hip-directed catch2 tests with xnack+
#### HIP 5.7.0
##### Optimizations
##### Added
- Added `meta_group_size`/`rank` for getting the number of tiles and rank of a tile in the partition
- Added new APIs supporting Windows only, under development on Linux
- `hipMallocMipmappedArray` for allocating a mipmapped array on the device
- `hipFreeMipmappedArray` for freeing a mipmapped array on the device
- `hipGetMipmappedArrayLevel` for getting a mipmap level of a HIP mipmapped array
- `hipMipmappedArrayCreate` for creating a mipmapped array
- `hipMipmappedArrayDestroy` for destroy a mipmapped array
- `hipMipmappedArrayGetLevel` for getting a mipmapped array on a mipmapped level
##### Changed
##### Fixed
##### Known Issues
- HIP memory type enum values currently don't support equivalent value to `cudaMemoryTypeUnregistered`, due to HIP functionality backward compatibility.
- HIP API `hipPointerGetAttributes` could return invalid value in case the input memory pointer was not allocated through any HIP API on device or host.
##### Upcoming changes for HIP in ROCm 6.0 release
- Removal of gcnarch from hipDeviceProp_t structure
- Addition of new fields in hipDeviceProp_t structure
- maxTexture1D
- maxTexture2D
- maxTexture1DLayered
- maxTexture2DLayered
- sharedMemPerMultiprocessor
- deviceOverlap
- asyncEngineCount
- surfaceAlignment
- unifiedAddressing
- computePreemptionSupported
- hostRegisterSupported
- uuid
- Removal of deprecated code -hip-hcc codes from hip code tree
- Correct hipArray usage in HIP APIs such as hipMemcpyAtoH and hipMemcpyHtoA
- HIPMEMCPY_3D fields correction to avoid truncation of "size_t" to "unsigned int" inside hipMemcpy3D()
- Renaming of 'memoryType' in hipPointerAttribute_t structure to 'type'
- Correct hipGetLastError to return the last error instead of last API call's return code
- Update hipExternalSemaphoreHandleDesc to add "unsigned int reserved[16]"
- Correct handling of flag values in hipIpcOpenMemHandle for hipIpcMemLazyEnablePeerAccess
- Remove hiparray* and make it opaque with hipArray_t
### Library Changes in ROCM 5.7.0
### Library Changes in ROCM 5.7.1
| Library | Version |
|---------|---------|
| hipBLAS | [1.1.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.7.0) |
| hipCUB | [2.13.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.7.0) |
| hipFFT | [1.0.12](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.7.0) |
| hipSOLVER | ⇒ [1.8.1](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.7.0) |
| hipSPARSE | [2.3.8](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.7.0) |
| MIOpen | [2.19.0](https://github.com/ROCmSoftwarePlatform/MIOpen/releases/tag/rocm-5.7.0) |
| rccl | ⇒ [2.17.1-1](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.7.0) |
| rocALUTION | ⇒ [2.1.11](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.7.0) |
| rocBLAS | ⇒ [3.1.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.7.0) |
| rocFFT | ⇒ [1.0.24](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.7.0) |
| rocm-cmake | ⇒ [0.10.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.7.0) |
| rocPRIM | ⇒ [2.13.1](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.7.0) |
| rocRAND | ⇒ [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.7.0) |
| rocSOLVER | ⇒ [3.23.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.7.0) |
| rocSPARSE | ⇒ [2.5.4](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.7.0) |
| rocThrust | ⇒ [2.18.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.7.0) |
| rocWMMA | ⇒ [1.2.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.7.0) |
| Tensile | ⇒ [4.38.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.7.0) |
| hipBLAS | [1.1.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.7.1) |
| hipCUB | [2.13.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.7.1) |
| hipFFT | [1.0.12](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.7.1) |
| hipSOLVER | 1.8.1 ⇒ [1.8.2](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.7.1) |
| hipSPARSE | [2.3.8](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.7.1) |
| MIOpen | [2.19.0](https://github.com/ROCmSoftwarePlatform/MIOpen/releases/tag/rocm-5.7.1) |
| rocALUTION | [2.1.11](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.7.1) |
| rocBLAS | [3.1.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.7.1) |
| rocFFT | [1.0.24](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.7.1) |
| rocm-cmake | [0.10.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.7.1) |
| rocPRIM | [2.13.1](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.7.1) |
| rocRAND | [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.7.1) |
| rocSOLVER | [3.23.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.7.1) |
| rocSPARSE | [2.5.4](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.7.1) |
| rocThrust | [2.18.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.7.1) |
| rocWMMA | [1.2.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.7.1) |
| Tensile | [4.38.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.7.1) |
#### hipBLAS 1.1.0
#### hipSOLVER 1.8.2
hipBLAS 1.1.0 for ROCm 5.7.0
##### Changed
- updated documentation requirements
##### Dependencies
- dependency rocSOLVER now depends on rocSPARSE
#### hipCUB 2.13.1
hipCUB 2.13.1 for ROCm 5.7.0
##### Changed
- CUB backend references CUB and Thrust version 2.0.1.
- Fixed `DeviceSegmentedReduce::ArgMin` and `DeviceSegmentedReduce::ArgMax` by returning the segment-relative index instead of the absolute one.
- Fixed `DeviceSegmentedReduce::ArgMin` for inputs where the segment minimum is smaller than the value returned for empty segments. An equivalent fix is applied to `DeviceSegmentedReduce::ArgMax`.
##### Known Issues
- `debug_synchronous` no longer works on CUDA platform. `CUB_DEBUG_SYNC` should be used to enable those checks.
- `DeviceReduce::Sum` does not compile on CUDA platform for mixed extended-floating-point/floating-point InputT and OutputT types.
- `DeviceHistogram::HistogramEven` fails on CUDA platform for `[LevelT, SampleIteratorT] = [int, int]`.
- `DeviceHistogram::MultiHistogramEven` fails on CUDA platform for `[LevelT, SampleIteratorT] = [int, int/unsigned short/float/double]` and `[LevelT, SampleIteratorT] = [float, double]`.
#### hipFFT 1.0.12
hipFFT 1.0.12 for ROCm 5.7.0
##### Added
- Implemented the hipfftXtMakePlanMany, hipfftXtGetSizeMany, hipfftXtExec APIs, to allow requesting half-precision transforms.
##### Changed
- Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform.
#### hipSOLVER 1.8.1
hipSOLVER 1.8.1 for ROCm 5.7.0
##### Changed
- Changed hipsolver-test sparse input data search paths to be relative to the test executable
#### hipSPARSE 2.3.8
hipSPARSE 2.3.8 for ROCm 5.7.0
##### Improved
- Fix compilation failures when using cusparse 12.1.0 backend
- Fix compilation failures when using cusparse 12.0.0 backend
- Fix compilation failures when using cusparse 10.1 (non-update versions) as backend
- Minor improvements
#### MIOpen 2.19.0
MIOpen 2.19.0 for ROCm 5.7.0
##### Added
- ROCm 5.5 support for gfx1101 (Navi32)
##### Changed
- Tuning results for MLIR on ROCm 5.5
- Bumping MLIR commit to 5.5.0 release tag
hipSOLVER 1.8.2 for ROCm 5.7.1
##### Fixed
- Fix 3d convolution Host API bug
- [HOTFIX][MI200][FP16] Disabled ConvHipImplicitGemmBwdXdlops when FP16_ALT is required.
#### RCCL 2.17.1-1
RCCL 2.17.1-1 for ROCm 5.7.0
##### Changed
- Compatibility with NCCL 2.17.1-1
- Performance tuning for some collective operations
##### Added
- Minor improvements to MSCCL codepath
- NCCL_NCHANNELS_PER_PEER support
- Improved compilation performance
- Support for gfx94x
##### Fixed
- Potential race-condition during ncclSocketClose()
#### rocALUTION 2.1.11
rocALUTION 2.1.11 for ROCm 5.7.0
##### Added
- Added support for gfx940, gfx941 and gfx942
##### Improved
- Fixed OpenMP runtime issue with Windows toolchain
#### rocBLAS 3.1.0
rocBLAS 3.1.0 for ROCm 5.7.0
##### Added
- yaml lock step argument scanning for rocblas-bench and rocblas-test clients. See Programmers Guide for details.
- rocblas-gemm-tune is used to find the best performing GEMM kernel for each of a given set of GEMM problems.
##### Fixed
- make offset calculations for rocBLAS functions 64 bit safe. Fixes for very large leading dimensions or increments potentially causing overflow:
- Level 1: axpy, copy, rot, rotm, scal, swap, asum, dot, iamax, iamin, nrm2
- Level 2: gemv, symv, hemv, trmv, ger, syr, her, syr2, her2, trsv
- Level 3: gemm, symm, hemm, trmm, syrk, herk, syr2k, her2k, syrkx, herkx, trsm, trtri, dgmm, geam
- General: set_vector, get_vector, set_matrix, get_matrix
- Related fixes: internal scalar loads with &gt; 32bit offsets
- fix in-place functionality for all trtri sizes
##### Changed
- dot when using rocblas_pointer_mode_host is now synchronous to match legacy BLAS as it stores results in host memory
- enhanced reporting of installation issues caused by runtime libraries (Tensile)
- standardized internal rocblas C++ interface across most functions
##### Deprecated
- Removal of __STDC_WANT_IEC_60559_TYPES_EXT__ define in future release
##### Dependencies
- optional use of AOCL BLIS 4.0 on Linux for clients
- optional build tool only dependency on python psutil
#### rocFFT 1.0.24
rocFFT 1.0.24 for ROCm 5.7.0
##### Optimizations
- Improved performance of complex forward/inverse 1D FFTs (2049 &lt;= length &lt;= 131071) that use Bluestein&#39;s algorithm.
##### Added
- Implemented a solution map version converter and finish the first conversion from ver.0 to ver.1. Where version 1 removes some incorrect kernels (sbrc/sbcr using half_lds)
##### Changed
- Moved rocfft_rtc_helper executable to lib/rocFFT directory on Linux.
- Moved library kernel cache to lib/rocFFT directory.
#### rocm-cmake 0.10.0
rocm-cmake 0.10.0 for ROCm 5.7.0
##### Added
- Added ROCMTest module
- ROCMCreatePackage: Added support for ASAN packages
#### rocPRIM 2.13.1
rocPRIM 2.13.1 for ROCm 5.7.0
##### Changed
- Deprecated configuration `radix_sort_config` for device-level radix sort as it no longer matches the algorithm&#39;s parameters. New configuration `radix_sort_config_v2` is preferred instead.
- Removed erroneous implementation of device-level `inclusive_scan` and `exclusive_scan`. The prior default implementation using lookback-scan now is the only available implementation.
- The benchmark metric indicating the bytes processed for `exclusive_scan_by_key` and `inclusive_scan_by_key` has been changed to incorporate the key type. Furthermore, the benchmark log has been changed such that these algorithms are reported as `scan` and `scan_by_key` instead of `scan_exclusive` and `scan_inclusive`.
- Deprecated configurations `scan_config` and `scan_by_key_config` for device-level scans, as they no longer match the algorithm&#39;s parameters. New configurations `scan_config_v2` and `scan_by_key_config_v2` are preferred instead.
##### Fixed
- Fixed build issue caused by missing header in `thread/thread_search.hpp`.
#### rocRAND 2.10.17
rocRAND 2.10.17 for ROCm 5.7.0
##### Added
- MT19937 pseudo random number generator based on M. Matsumoto and T. Nishimura, 1998, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator.
- New benchmark for the device API using Google Benchmark, `benchmark_rocrand_device_api`, replacing `benchmark_rocrand_kernel`. `benchmark_rocrand_kernel` is deprecated and will be removed in a future version. Likewise, `benchmark_curand_host_api` is added to replace `benchmark_curand_generate` and `benchmark_curand_device_api` is added to replace `benchmark_curand_kernel`.
- experimental HIP-CPU feature
- ThreeFry pseudorandom number generator based on Salmon et al., 2011, &#34;Parallel random numbers: as easy as 1, 2, 3&#34;.
##### Changed
- Python 2.7 is no longer officially supported.
#### rocSOLVER 3.23.0
rocSOLVER 3.23.0 for ROCm 5.7.0
##### Added
- LU factorization without pivoting for block tridiagonal matrices:
- GEBLTTRF_NPVT now supports interleaved\_batched format
- Linear system solver without pivoting for block tridiagonal matrices:
- GEBLTTRS_NPVT now supports interleaved\_batched format
##### Fixed
- Fixed stack overflow in sparse tests on Windows
##### Changed
- Changed rocsolver-test sparse input data search paths to be relative to the test executable
- Changed build scripts to default to compressed debug symbols in Debug builds
#### rocSPARSE 2.5.4
rocSPARSE 2.5.4 for ROCm 5.7.0
##### Added
- Added more mixed precisions for SpMV, (matrix: float, vectors: double, calculation: double) and (matrix: rocsparse_float_complex, vectors: rocsparse_double_complex, calculation: rocsparse_double_complex)
- Added support for gfx940, gfx941 and gfx942
##### Improved
- Fixed a bug in csrsm and bsrsm
##### Known Issues
In csritlu0, the algorithm rocsparse_itilu0_alg_sync_split_fusion has some accuracy issues to investigate with XNACK enabled. The fallback is rocsparse_itilu0_alg_sync_split.
#### rocThrust 2.18.0
rocThrust 2.18.0 for ROCm 5.7.0
##### Fixed
- `lower_bound`, `upper_bound`, and `binary_search` failed to compile for certain types.
- Fixed issue where `transform_iterator` would not compile with `__device__`-only operators.
##### Changed
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
- Removed references to and workarounds for deprecated hcc
#### rocWMMA 1.2.0
rocWMMA 1.2.0 for ROCm 5.7.0
##### Changed
- Fixed a bug with synchronization
- Updated rocWMMA cmake versioning
#### Tensile 4.38.0
Tensile 4.38.0 for ROCm 5.7.0
##### Added
- Added support for FP16 Alt Round Near Zero Mode (this feature allows the generation of alternate kernels with intermediate rounding instead of truncation)
- Added user-driven solution selection feature
##### Optimizations
- Enabled LocalSplitU with MFMA for I8 data type
- Optimized K mask code in mfmaIter
- Enabled TailLoop code in NoLoadLoop to prefetch global/local read
- Enabled DirectToVgpr in TailLoop for NN, TN, and TT matrix orientations
- Optimized DirectToLds test cases to reduce the test duration
##### Changed
- Removed DGEMM NT custom kernels and related test cases
- Changed noTailLoop logic to apply noTailLoop only for NT
- Changed the range of AssertFree0ElementMultiple and Free1
- Unified aStr, bStr generation code in mfmaIter
##### Fixed
- Fixed LocalSplitU mismatch issue for SGEMM
- Fixed BufferStore=0 and Ldc != Ldd case
- Fixed mismatch issue with TailLoop + MatrixInstB &gt; 1
- Fixed conflicts between the hipsolver-dev and -asan packages by excluding
hipsolver_module.f90 from the latter

View File

@@ -12,7 +12,7 @@ fetch="https://github.com/GPUOpen-ProfessionalCompute-Libraries/" />
fetch="https://github.com/GPUOpen-Tools/" />
<remote name="KhronosGroup"
fetch="https://github.com/KhronosGroup/" />
<default revision="refs/tags/rocm-5.7.0"
<default revision="refs/tags/rocm-5.7.1"
remote="roc-github"
sync-c="true"
sync-j="4" />

View File

@@ -1,3 +0,0 @@
# 404 - Page Not Found
Return [home](./index) or use the sidebar navigation to get back on track.

View File

@@ -5,6 +5,7 @@ The following table is a list of ROCm components with links to their respective
terms. These components may include third party components subject to
additional licenses. Please review individual repositories for more information.
The table shows ROCm components, license name, and link to the license terms.
The table is ordered to follow ROCm's manifest file.
<!-- spellcheck-disable -->
| Component | License |

View File

@@ -5,9 +5,27 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html
import shutil
import jinja2
import os
from rocm_docs import ROCmDocs
# Environement to process Jinja templates.
jinja_env = jinja2.Environment(loader=jinja2.FileSystemLoader("."))
# Jinja templates to render out.
templates = [
"./deploy/linux/quick_start.md.jinja",
"./deploy/linux/installer/install.md.jinja",
"./deploy/linux/os-native/install.md.jinja"
]
# Render templates and output files without the last extension.
# For example: 'install.md.jinja' becomes 'install.md'.
for template in templates:
rendered = jinja_env.get_template(template).render()
with open(os.path.splitext(template)[0], 'w') as file:
file.write(rendered)
shutil.copy2('../CONTRIBUTING.md','./contributing.md')
shutil.copy2('../RELEASE.md','./release.md')
@@ -27,8 +45,8 @@ latex_elements = {
project = "ROCm Documentation"
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved."
version = "5.7.0"
release = "5.7.0"
version = "5.7.1"
release = "5.7.1"
setting_all_article_info = True
all_article_info_os = ["linux", "windows"]
@@ -92,7 +110,7 @@ article_pages = [
external_toc_path = "./sphinx/_toc.yml"
docs_core = ROCmDocs("ROCm 5.7.0 Documentation Home")
docs_core = ROCmDocs("ROCm 5.7.1 Documentation Home")
docs_core.setup()
external_projects_current_project = "rocm"

Binary file not shown.

Before

Width:  |  Height:  |  Size: 32 KiB

View File

@@ -1,3 +1,5 @@
{%- import "deploy/linux/linux.template.jinja" as linux %}
<!-- markdownlint-disable no-duplicate-header blanks-around-headings no-multiple-blanks -->
# Installation with install script
Prior to beginning, please ensure you have the [prerequisites](../prerequisites)
@@ -15,98 +17,48 @@ it via {ref}`hip_visible_devices`.
To download and install the `amdgpu-install` script on the system, use the
following commands based on your distribution.
::::::{tab-set}
:::::{tab-item} Ubuntu
:sync: ubuntu
{% call(family) linux.for_family_in(linux.supported_family) %}
{%- call(os) linux.for_os_in(linux.supported_os) %}
::::{tab-set}
:::{tab-item} Ubuntu 20.04
:sync: ubuntu-20.04
{%- if os.tag == "ubuntu" %}
{%- call(version) linux.for_version_in(os) %}
```{important}
Instructions for {{ family.name }}, {{ os.name }} {{ version.number }}
```
```shell
sudo apt update
wget https://repo.radeon.com/amdgpu-install/5.7/ubuntu/focal/amdgpu-install_5.7.50700-1_all.deb
sudo apt install ./amdgpu-install_5.7.50700-1_all.deb
wget https://repo.radeon.com/amdgpu-install/{{ family.amdgpu_version }}/ubuntu/{{ version.release }}/amdgpu-install_{{ family.amdgpu_install_version }}_all.deb
sudo apt install ./amdgpu-install_{{ family.amdgpu_install_version }}_all.deb
```
{%- endcall -%}
{%- elif os.tag == "rhel" %}
{%- call(version) linux.for_version_in(os) %}
:::
:::{tab-item} Ubuntu 22.04
:sync: ubuntu-22.04
```{important}
Instructions for {{ family.name }}, {{ os.name }} {{ version.number }}
```
```shell
sudo apt update
wget https://repo.radeon.com/amdgpu-install/5.7/ubuntu/jammy/amdgpu-install_5.7.50700-1_all.deb
sudo apt install ./amdgpu-install_5.7.50700-1_all.deb
sudo yum install https://repo.radeon.com/amdgpu-install/{{ family.amdgpu_version }}/rhel/{{ version.number }}/amdgpu-install-{{ family.amdgpu_install_version }}.{{ version.release | trim("rh") }}.noarch.rpm
```
{%- endcall -%}
{%- elif os.tag == "sle" %}
{%- call(version) linux.for_version_in(os) %}
:::
::::
:::::
:::::{tab-item} Red Hat Enterprise Linux
:sync: RHEL
::::{tab-set}
:::{tab-item} RHEL 8.7
:sync: RHEL-8.7
:sync: RHEL-8
```{important}
Instructions for {{ family.name }}, {{ os.name }} {{ version.number }}
```
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/5.7/rhel/8.7/amdgpu-install-5.7.50700-1.el8.noarch.rpm
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/{{ family.amdgpu_version }}/sle/{{ version.number }}/amdgpu-install-{{ family.amdgpu_install_version }}.noarch.rpm
```
{%- endcall -%}
{%- endif %}
:::
:::{tab-item} RHEL 8.8
:sync: RHEL-8.8
:sync: RHEL-8
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/5.7/rhel/8.8/amdgpu-install-5.7.50700-1.el8.noarch.rpm
```
:::
:::{tab-item} RHEL 9.1
:sync: RHEL-9.1
:sync: RHEL-9
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/5.7/rhel/9.1/amdgpu-install-5.7.50700-1.el9.noarch.rpm
```
:::
:::{tab-item} RHEL 9.2
:sync: RHEL-9.2
:sync: RHEL-9
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/5.7/rhel/9.2/amdgpu-install-5.7.50700-1.el9.noarch.rpm
```
:::
::::
:::::
:::::{tab-item} SUSE Linux Enterprise Server
:sync: SLES
::::{tab-set}
:::{tab-item} SLES 15.4
:sync: SLES-15.4
```shell
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.7/sle/15.4/amdgpu-install-5.7.50700-1.noarch.rpm
```
:::
:::{tab-item} SLES 15.5
:sync: SLES-15.5
```shell
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.7/sle/15.5/amdgpu-install-5.7.50700-1.noarch.rpm
```
:::
::::
:::::
::::::
{%- endcall -%}
{%- endcall %}
## Use cases
@@ -191,9 +143,9 @@ the installer script will install packages in the single-version layout.
For the multi-version ROCm installation you must use the installer script from
the latest release of ROCm that you wish to install.
**Example:** If you want to install ROCm releases 5.5.3, 5.6.1 and 5.7
**Example:** If you want to install ROCm releases 5.5.3, 5.6.1 and {{ linux.supported_family[0].rocm_version }}
simultaneously, you are required to download the installer from the latest ROCm
release 5.7.
release {{ linux.supported_family[0].rocm_version }}.
### Add Required Repositories
@@ -203,50 +155,37 @@ automatically adds the required repositories for the latest release.
Run the following commands based on your distribution to add the repositories:
::::::{tab-set}
:::::{tab-item} Ubuntu
:sync: ubuntu
{% call(family) linux.for_family_in(linux.supported_family) %}
{%- call(os) linux.for_os_in(linux.supported_os) %}
::::{tab-set}
:::{tab-item} Ubuntu 20.04
:sync: ubuntu-20.04
{%- if os.tag == "ubuntu" %}
{%- call(version) linux.for_version_in(os) %}
```{important}
Instructions for {{ family.name }}, {{ os.name }} {{ version.number }}
```
```shell
for ver in 5.5.3 5.6.1 5.7; do
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" | sudo tee /etc/apt/sources.list.d/rocm.list
for ver in 5.5.3 5.6.1 {{ linux.supported_family[0].rocm_version }}; do
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver {{ version.release }} main" | sudo tee /etc/apt/sources.list.d/rocm.list
done
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt update
```
{%- endcall -%}
{%- elif os.tag == "rhel" %}
{%- call(version) linux.for_version_in(os) %}
:::
:::{tab-item} Ubuntu 22.04
:sync: ubuntu-22.04
```shell
for ver in 5.5.3 5.6.1 5.7; do
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" | sudo tee /etc/apt/sources.list.d/rocm.list
done
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt update
```{important}
Instructions for {{ family.name }}, {{ os.name }} {{ version.number }}
```
:::
::::
:::::
:::::{tab-item} Red Hat Enterprise Linux
:sync: RHEL
::::{tab-set}
:::{tab-item} RHEL 8
:sync: RHEL-8
```shell
for ver in 5.5.3 5.6.1 5.7; do
for ver in 5.5.3 5.6.1 {{ linux.supported_family[0].rocm_version }}; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
baseurl=https://repo.radeon.com/rocm/rhel8/$ver/main
baseurl=https://repo.radeon.com/rocm/{{ version.release }}/$ver/main
enabled=1
priority=50
gpgcheck=1
@@ -255,34 +194,10 @@ EOF
done
sudo yum clean all
```
:::
:::{tab-item} RHEL 9
:sync: RHEL-9
{%- endcall -%}
{%- elif os.tag == "sle" %}
```shell
for ver in 5.5.3 5.6.1 5.7; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
baseurl=https://repo.radeon.com/rocm/rhel9/$ver/main
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
done
sudo yum clean all
```
:::
::::
:::::
:::::{tab-item} SUSE Linux Enterprise Server
:sync: SLES
```shell
for ver in 5.5.3 5.6.1 5.7; do
for ver in 5.5.3 5.6.1 {{ linux.supported_family[0].rocm_version }}; do
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
name=rocm
baseurl=https://repo.radeon.com/rocm/zyp/$ver/main
@@ -293,26 +208,27 @@ EOF
done
sudo zypper ref
```
{%- endif %}
:::::
::::::
{%- endcall -%}
{%- endcall %}
### Install packages
Use the installer script as given below:
```none
```shell
sudo amdgpu-install --usecase=rocm --rocmrelease=<release-number-1>
sudo amdgpu-install --usecase=rocm --rocmrelease=<release-number-2>
sudo amdgpu-install --usecase=rocm --rocmrelease=<release-number-3>
```
Following are examples of ROCm multi-version installation. The kernel-mode
driver, associated with the ROCm release 5.7, will be installed as its latest
driver, associated with the ROCm release {{ linux.supported_family[0].rocm_version }}, will be installed as its latest
release in the list.
```none
sudo amdgpu-install --usecase=rocm --rocmrelease=5.7
```shell
sudo amdgpu-install --usecase=rocm --rocmrelease={{ linux.supported_family[0].rocm_version }}
sudo amdgpu-install --usecase=rocm --rocmrelease=5.6.1
sudo amdgpu-install --usecase=rocm --rocmrelease=5.5.3
```

View File

@@ -0,0 +1,120 @@
{%- set supported_family = ([
{
"tag": "instinct",
"name": "Select OS",
"amdgpu_version": "5.7.1",
"amdgpu_install_version": "5.7.50701-1",
"rocm_version": "5.7.1",
"rocm_install_version": "5.7.50701-1",
}
]) -%}
{%- set supported_os = ([
{
"tag": "ubuntu",
"name": "Ubuntu",
"shortname" : "Ubuntu",
"version": [
{
"number": "22.04",
"release": "jammy"
},
{
"number": "20.04",
"release": "focal"
}
]
},
{
"tag": "rhel",
"name": "Red Hat Enterprise Linux",
"shortname" : "RHEL",
"version": [
{
"number": "8.8",
"release": "rhel8"
},
{
"number": "8.7",
"release": "rhel8"
},
{
"number": "8.6",
"release": "rhel8"
},
{
"number": "9.2",
"release": "rhel9"
},
{
"number": "9.1",
"release": "rhel9"
},
]
},
{
"tag": "sle",
"name": "SUSE Linux Enterprise Server",
"shortname" : "SLES",
"version": [
{
"number": "15.5"
},
{
"number": "15.4"
},
]
}
]) -%}
{%- macro for_family_in(supported_family) %}
::::::::{tab-set}
{%- for family in supported_family %}
:::::::{tab-item} {{ family.name }}
:sync: {{ family.tag }}
{{ caller(family) }}
:::::::
{%- endfor %}
::::::::
{%- endmacro -%}
{%- macro for_os_in(supported_os) %}
::::::{tab-set}
{%- for os in supported_os %}
:::::{tab-item} {{ os.name }}
:sync: {{ os.tag }}
{{ caller(os) }}
:::::
{%- endfor %}
::::::
{%- endmacro -%}
{%- macro for_version_in(os) %}
::::{tab-set}
{%- for version in os.version %}
:::{tab-item} {{ os.shortname }} {{ version.number }}
:sync: {{ os.tag }}-{{ version.number }}
{{ caller(version) }}
:::
{%- endfor %}
::::
{%- endmacro -%}
{%- macro install(os, argument) %}
```shell
{%- if os.tag == "ubuntu" %}
sudo apt install {{ argument }}
{%- elif os.tag == "rhel" %}
sudo yum install {{ argument }}
{%- elif os.tag == "sle" %}
sudo zypper install {{ argument }}
{%- endif %}
```
{%- endmacro -%}
{%- macro header_anchor(family, os) -%}
({{ caller() | lower | replace('#', '') | trim | replace(' ', '-')}}-{{ family.tag }}-{{ os.tag }})= {{ caller() }}
{%- endmacro -%}

View File

@@ -1,3 +1,5 @@
{%- import "deploy/linux/linux.template.jinja" as linux %}
<!-- markdownlint-disable no-duplicate-header blanks-around-headings no-multiple-blanks -->
# Installation (Linux)
```{warning}
@@ -19,10 +21,9 @@ installed version by using the multi-version ROCm packages.
## Step by Step Instructions
::::::{tab-set}
:::::{tab-item} Ubuntu
:sync: ubuntu
{%- call(os) linux.for_os_in(linux.supported_os) %}
{%- if os.tag == "ubuntu" %}
::::{rubric} 1. Download and convert the package signing key
::::
@@ -53,39 +54,22 @@ section.
```
To add the AMDGPU repository, follow these steps:
{% call(version) linux.for_version_in(os) %}
::::{tab-set}
:::{tab-item} Ubuntu 20.04
:sync: ubuntu-20.04
```{important}
Instructions for {{ os.name }} {{ version.number }}
```
```shell
# version
ver=5.7
ver={{ linux.supported_family[0].amdgpu_version }}
# amdgpu repository for focal
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/$ver/ubuntu focal main" \
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/$ver/ubuntu {{ version.release }} main" \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
:::
:::{tab-item} Ubuntu 22.04
:sync: ubuntu-22.04
```shell
# version
ver=5.7
# amdgpu repository for jammy
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/$ver/ubuntu jammy main" \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
# Prefer packages from the rocm repository over system packages
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
```
:::
::::
{%- endcall %}
Install the kernel mode driver and reboot the system using the following
commands:
@@ -100,38 +84,23 @@ sudo reboot
To add the ROCm repository, use the following steps:
::::{tab-set}
:::{tab-item} Ubuntu 20.04
:sync: ubuntu-20.04
{%- call(version) linux.for_version_in(os) %}
```{important}
Instructions for {{ os.name }} {{ version.number }}
```
```shell
# ROCm repositories for focal
for ver in 5.3.3 5.4.6 5.5.3 5.6.1 5.7; do
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" \
# ROCm repositories for {{ version.release }}
for ver in 5.3.3 5.4.6 5.5.3 5.6.1 {{ linux.supported_family[0].rocm_version }}; do
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver {{ version.release }} main" \
| sudo tee --append /etc/apt/sources.list.d/rocm.list
done
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt update
```
:::
:::{tab-item} Ubuntu 22.04
:sync: ubuntu-22.04
```shell
# ROCm repositories for jammy
for ver in 5.3.3 5.4.6 5.5.3 5.6.1 5.7; do
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" \
| sudo tee --append /etc/apt/sources.list.d/rocm.list
done
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt update
```
:::
::::
{%- endcall %}
::::{rubric} 4. Install packages
::::
@@ -151,13 +120,9 @@ For a comprehensive list of meta-packages, refer to
- Sample Multi-version installation
```shell
sudo apt install rocm-hip-sdk5.7 rocm-hip-sdk5.6.1 rocm-hip-sdk5.5.3
sudo apt install rocm-hip-sdk{{ linux.supported_family[0].rocm_version }} rocm-hip-sdk5.6.1 rocm-hip-sdk5.5.3
```
:::::
:::::{tab-item} Red Hat Enterprise Linux
:sync: RHEL
{%- elif os.tag == "rhel" %}
::::{rubric} 1. Add the AMDGPU Stack Repository and Install the Kernel-mode Driver
::::
@@ -165,20 +130,20 @@ For a comprehensive list of meta-packages, refer to
If you have a version of the kernel-mode driver installed, you may skip this
section.
```
{% call(version) linux.for_version_in(os) %}
::::{tab-set}
:::{tab-item} RHEL 8.7
:sync: RHEL-8.7
:sync: RHEL-8
```{important}
Instructions for {{ os.name }} {{ version.number }}
```
```shell
# version
ver=5.7
ver={{ linux.supported_family[0].amdgpu_version }}
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/$ver/rhel/8.7/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/$ver/rhel/{{ version.number }}/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -186,76 +151,7 @@ gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 8.8
:sync: RHEL-8.8
:sync: RHEL-8
```shell
# version
ver=5.7
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/$ver/rhel/8.8/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 9.1
:sync: RHEL-9.1
:sync: RHEL-9
```shell
# version
ver=5.7
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/$ver/rhel/9.1/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 9.2
:sync: RHEL-9.2
:sync: RHEL-9
```shell
# version
ver=5.7
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/$ver/rhel/9.2/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
::::
{%- endcall %}
Install the kernel mode driver and reboot the system using the following
commands:
@@ -274,7 +170,7 @@ To add the ROCm repository, use the following steps, based on your distribution:
:sync: RHEL-8
```shell
for ver in 5.3.3 5.4.6 5.5.3 5.6.1 5.7; do
for ver in 5.3.3 5.4.6 5.5.3 5.6.1 {{ linux.supported_family[0].rocm_version }}; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -285,6 +181,7 @@ gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
done
sudo yum clean all
```
@@ -293,7 +190,7 @@ sudo yum clean all
:sync: RHEL-9
```shell
for ver in 5.3.3 5.4.6 5.5.3 5.6.1 5.7; do
for ver in 5.3.3 5.4.6 5.5.3 5.6.1 {{ linux.supported_family[0].rocm_version }}; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -328,13 +225,9 @@ For a comprehensive list of meta-packages, refer to
- Sample Multi-version installation
```shell
sudo yum install rocm-hip-sdk5.7 rocm-hip-sdk5.6.1
sudo yum install rocm-hip-sdk{{ linux.supported_family[0].rocm_version }} rocm-hip-sdk5.6.1
```
:::::
:::::{tab-item} SUSE Linux Enterprise Server
:sync: SLES
{%- elif os.tag == "sle" %}
::::{rubric} 1. Add the AMDGPU Repository and Install the Kernel-mode Driver
::::
@@ -342,48 +235,27 @@ For a comprehensive list of meta-packages, refer to
If you have a version of the kernel-mode driver installed, you may skip this
section.
```
{% call(version) linux.for_version_in(os) %}
::::{tab-set}
:::{tab-item} SLES 15.4
:sync: SLES-15.4
```{important}
Instructions for {{ os.name }} {{ version.number }}
```
```shell
# version
ver=5.7
ver={{ linux.supported_family[0].amdgpu_version }}
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/$ver/sle/15.4/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/$ver/sle/{{ version.number }}/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo zypper ref
```
:::
:::{tab-item} SLES 15.5
:sync: SLES-15.5
```shell
# version
ver=5.7
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/$ver/sle/15.5/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo zypper ref
```
:::
::::
{%- endcall %}
Install the kernel mode driver and reboot the system using the following
commands:
@@ -399,7 +271,7 @@ sudo reboot
To add the ROCm repository, use the following steps:
```shell
for ver in 5.3.3 5.4.6 5.5.3 5.6.1 5.7; do
for ver in 5.3.3 5.4.6 5.5.3 5.6.1 {{ linux.supported_family[0].rocm_version }}; do
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -431,14 +303,12 @@ For a comprehensive list of meta-packages, refer to
- Sample Multi-version installation
```shell
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk5.7 rocm-hip-sdk5.6.1
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk{{ linux.supported_family[0].rocm_version }} rocm-hip-sdk5.6.1
```
:::::
::::::
{%- endif %}
{%- endcall %}
(post-install-actions-linux)=
## Post-install Actions and Verification Process
The post-install actions listed here are optional and depend on your use case,
@@ -468,7 +338,7 @@ but are generally useful. Verification of the install is advised.
2. Add binary paths to the `PATH` environment variable.
```shell
export PATH=$PATH:/opt/rocm-5.7/bin:/opt/rocm-5.7/opencl/bin
export PATH=$PATH:/opt/rocm-{{ linux.supported_family[0].rocm_version }}/bin:/opt/rocm-{{ linux.supported_family[0].rocm_version }}/opencl/bin
```
@@ -512,31 +382,18 @@ by both commands, the installation is considered successful:
To ensure the packages are installed successfully, use the following commands:
::::{tab-set}
:::{tab-item} Ubuntu
:sync: ubuntu
{%- call(os) linux.for_os_in(linux.supported_os) %}
{%- if os.tag == "ubuntu" %}
```shell
sudo apt list --installed
```
:::
:::{tab-item} Red Hat Enterprise Linux
:sync: RHEL
{%- elif os.tag == "rhel" %}
```shell
sudo yum list installed
```
:::
:::{tab-item} SUSE Linux Enterprise Server
:sync: SLES
{%- elif os.tag == "sle" %}
```shell
sudo zypper search --installed-only
```
:::
::::
{%- endif %}
{%- endcall %}

View File

@@ -57,6 +57,28 @@ sudo apt update
:sync: RHEL
::::{tab-set}
:::{tab-item} RHEL 8.6
:sync: RHEL-8.6
:sync: RHEL-8
```shell
# version
version=5.7
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/8.6/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 8.7
:sync: RHEL-8.7
:sync: RHEL-8

View File

@@ -116,16 +116,12 @@ sudo crb enable
Add the perl languages repository.
```{note}
Mar 25, 2024: We currently need to install the Perl module from SLES 15 SP5 as a workaround. The module was removed for SLES 15 SP4.
```
::::{tab-set}
:::{tab-item} SLES 15.4
:sync: SLES-15.4
```shell
zypper addrepo https://download.opensuse.org/repositories/devel:/languages:/perl/15.5/devel:languages:perl.repo
zypper addrepo https://download.opensuse.org/repositories/devel:languages:perl/SLE_15_SP4/devel:languages:perl.repo
```
:::

View File

@@ -1,342 +0,0 @@
# Quick Start (Linux)
## Add Repositories
::::::{tab-set}
:::::{tab-item} Ubuntu
:sync: ubuntu
::::{rubric} 1. Download and convert the package signing key
::::
```shell
# Make the directory if it doesn't exist yet.
# This location is recommended by the distribution maintainers.
sudo mkdir --parents --mode=0755 /etc/apt/keyrings
# Download the key, convert the signing-key to a full
# keyring required by apt and store in the keyring directory
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
```
::::{rubric} 2. Add the repositories
::::
::::{tab-set}
:::{tab-item} Ubuntu 20.04
:sync: ubuntu-20.04
```shell
# Kernel driver repository for focal
sudo tee /etc/apt/sources.list.d/amdgpu.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.7/ubuntu focal main
EOF
# ROCm repository for focal
sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.7 focal main
EOF
```
:::
:::{tab-item} Ubuntu 22.04
:sync: ubuntu-22.04
```shell
# Kernel driver repository for jammy
sudo tee /etc/apt/sources.list.d/amdgpu.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.7/ubuntu jammy main
EOF
# ROCm repository for jammy
sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.7 jammy main
EOF
# Prefer packages from the rocm repository over system packages
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
```
:::
::::
::::{rubric} 3. Update the list of packages
::::
```shell
sudo apt update
```
:::::
:::::{tab-item} Red Hat Enterprise Linux
:sync: RHEL
::::{rubric} 1. Add the repositories
::::
::::{tab-set}
:::{tab-item} RHEL 8.7
:sync: RHEL-8.7
```shell
# Add the amdgpu module repository for RHEL 8.7
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.7/rhel/8.7/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for RHEL 8
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/rhel8/5.7/main
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
:::
:::{tab-item} RHEL 8.8
:sync: RHEL-8.8
```shell
# Add the amdgpu module repository for RHEL 8.8
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.7/rhel/8.8/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for RHEL 8
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/rhel8/5.7/main
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
:::
:::{tab-item} RHEL 9.1
:sync: RHEL-9.1
```shell
# Add the amdgpu module repository for RHEL 9.1
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.7/rhel/9.1/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for RHEL 9
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/rhel9/5.7/main
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
:::
:::{tab-item} RHEL 9.2
:sync: RHEL-9.2
```shell
# Add the amdgpu module repository for RHEL 9.2
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.7/rhel/9.2/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for RHEL 9
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/rhel9/5.7/main
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
:::
::::
::::{rubric} 2. Clean cached files from enabled repositories
::::
```shell
sudo yum clean all
```
:::::
:::::{tab-item} SUSE Linux Enterprise Server
:sync: SLES
::::{rubric} 1. Add the repositories
::::
::::{tab-set}
:::{tab-item} SLES 15.4
:sync: SLES-15.4
```shell
# Add the amdgpu module repository for SLES 15.4
sudo tee /etc/zypp/repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.7/sle/15.4/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for SLES
sudo tee /etc/zypp/repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/zyp/zypper
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
:::
:::{tab-item} SLES 15.5
:sync: SLES-15.5
```shell
# Add the amdgpu module repository for SLES 15.5
sudo tee /etc/zypp/repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.7/sle/15.5/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for SLES
sudo tee /etc/zypp/repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/zyp/zypper
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
:::
::::
::::{rubric} 2. Update the new repository
::::
```shell
sudo zypper ref
```
:::::
::::::
## Install Drivers
Install the `amdgpu-dkms` kernel module, aka driver, on your system.
::::{tab-set}
:::{tab-item} Ubuntu
:sync: ubuntu
```shell
sudo apt install amdgpu-dkms
```
:::
:::{tab-item} Red Hat Enterprise Linux
:sync: RHEL
```shell
sudo yum install amdgpu-dkms
```
:::
:::{tab-item} SUSE Linux Enterprise Server
:sync: SLES
```shell
sudo zypper install amdgpu-dkms
```
:::
::::
## Install ROCm Runtimes
Install the `rocm-hip-libraries` meta-package. This contains dependencies for most
common ROCm applications.
::::{tab-set}
:::{tab-item} Ubuntu
:sync: ubuntu
```console shell
sudo apt install rocm-hip-libraries
```
:::
:::{tab-item} Red Hat Enterprise Linux
:sync: RHEL
```console shell
sudo yum install rocm-hip-libraries
```
:::
:::{tab-item} SUSE Linux Enterprise Server
:sync: SLES
```console shell
sudo zypper install rocm-hip-libraries
```
:::
::::
## Reboot the system
Loading the new driver requires a reboot of the system.
```shell
sudo reboot
```

View File

@@ -0,0 +1,161 @@
{%- import "deploy/linux/linux.template.jinja" as linux %}
<!-- markdownlint-disable no-duplicate-header blanks-around-headings no-multiple-blanks -->
# Quick Start (Linux)
## Add Repositories
{% call(family) linux.for_family_in(linux.supported_family) %}
{%- if family.tag == "instinct" %}
{%- call(os) linux.for_os_in(linux.supported_os) %}
{%- if os.tag == "ubuntu" %}
::::{rubric} 1. Download and convert the package signing key
::::
```shell
# Make the directory if it doesn't exist yet.
# This location is recommended by the distribution maintainers.
sudo mkdir --parents --mode=0755 /etc/apt/keyrings
# Download the key, convert the signing-key to a full
# keyring required by apt and store in the keyring directory
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
```
::::{rubric} 2. Add the repositories
::::
{%- call(version) linux.for_version_in(os) %}
```{important}
Instructions for {{ family.name }}, {{ os.name }} {{ version.number }}
```
```shell
# Kernel driver repository for {{ version.release }}
sudo tee /etc/apt/sources.list.d/amdgpu.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/{{ family.amdgpu_version }}/ubuntu {{ version.release }} main
EOF
# ROCm repository for {{ version.release }}
sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/debian {{ version.release }} main
EOF
# Prefer packages from the rocm repository over system packages
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
```
{%- endcall %}
::::{rubric} 3. Update the list of packages
::::
```shell
sudo apt update
```
{%- elif os.tag == "rhel" %}
::::{rubric} 1. Add the repositories
::::
{%- call(version) linux.for_version_in(os) %}
```{important}
Instructions for {{ family.name }}, {{ os.name }} {{ version.number }}
```
```shell
# Add the amdgpu module repository for RHEL {{ version.number }}
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/{{ family.amdgpu_version }}/rhel/{{ version.number }}/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for {{ version.release | upper }}
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/{{ version.release }}/latest/main
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
{%- endcall %}
::::{rubric} 2. Clean cached files from enabled repositories
::::
```shell
sudo yum clean all
```
{%- elif os.tag == "sle" %}
::::{rubric} 1. Add the repositories
::::
{%- call(version) linux.for_version_in(os) %}
```{important}
Instructions for {{ family.name }}, {{ os.name }} {{ version.number }}
```
```shell
# Add the amdgpu module repository for SLES {{ version.number }}
sudo tee /etc/zypp/repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/{{ family.amdgpu_version }}/sle/{{ version.number }}/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for SLES
sudo tee /etc/zypp/repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/zyp/zypper
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
{%- endcall %}
::::{rubric} 2. Update the new repository
::::
```shell
sudo zypper ref
```
{%- endif %}
{%- endcall -%}
{%- endif %}
{%- endcall %}
## Install drivers
Install the `amdgpu-dkms` kernel module, aka driver, on your system.
{%- call(os) linux.for_os_in(linux.supported_os) %}
{{ linux.install(os, "amdgpu-dkms")}}
{%- endcall %}
## Install ROCm runtimes
Install the `rocm-hip-libraries` meta-package. This contains dependencies for most
common ROCm applications.
{%- call(os) linux.for_os_in(linux.supported_os) %}
{{ linux.install(os, "rocm-hip-libraries")}}
{%- endcall %}
## Reboot the system
Loading the new driver requires a reboot of the system.
```shell
sudo reboot
```

View File

@@ -24,7 +24,7 @@ MIGraphX is a graph compiler focused on accelerating the Machine Learning infere
After doing all these transformations, MIGraphX emits code for the AMD GPU by calling to MIOpen or rocBLAS or creating HIP kernels for a particular operator. MIGraphX can also target CPUs using DNNL or ZenDNN libraries.
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using the MIGraphX C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using MIGraphX's C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
- Number of arguments
@@ -187,7 +187,7 @@ Follow these steps:
}
```
2. To compile this program, you can use CMake and you only need to link the `migraphx::c` library to use the MIGraphX C++ API. The following is the `CMakeLists.txt` file that can build the earlier example:
2. To compile this program, you can use CMake and you only need to link the `migraphx::c` library to use MIGraphX's C++ API. The following is the `CMakeLists.txt` file that can build the earlier example:
```cmake
cmake_minimum_required(VERSION 3.5)
@@ -327,7 +327,7 @@ To run generated `.mxr` files through `migraphx-driver`, use the following:
./path/to/migraphx-driver run --migraphx resnet50.mxr --enable-offload-copy
```
Alternatively, you can use the MIGraphX C++ or Python API to generate `.mxr` file. Refer to {numref}`image018` for an example.
Alternatively, you can use MIGraphX's C++ or Python API to generate `.mxr` file. Refer to {numref}`image018` for an example.
```{figure} ../../data/understand/deep_learning/image.018.png
:name: image018

View File

@@ -3,6 +3,14 @@
:::::{grid} 1 1 2 2
:gutter: 1
:::{grid-item-card} ROCm using Radeon
:link: {doc}`ROCm using Radeon <radeon:index>`
:link-type: url
ROCm and PyTorch installation processes to pair with the Radeon RX 7900 XTX GPU or the Radeon PRO W7900 GPU,
and get started on a fully-functional environment for AI and ML development.
:::
:::{grid-item-card} Tuning Guides
:link: tuning_guides/index
:link-type: doc

View File

@@ -59,16 +59,7 @@ Follow these steps:
PyTorch supports the ROCm platform by providing tested wheels packages. To
access this feature, refer to
[https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/)
and choose the "ROCm" compute platform. {numref}`Installation-Matrix-from-Pytorch` is a matrix from <https://pytorch.org/> that illustrates the installation compatibility between ROCm and the PyTorch build.
```{figure} ../../data/how_to/magma_install/image.006.png
:name: Installation-Matrix-from-Pytorch
---
align: center
---
Installation Matrix from Pytorch
```
[https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/). For the correct wheels command, you must select 'Linux', 'Python', 'pip', and 'ROCm' in the matrix.
To install PyTorch using the wheels package, follow these installation steps:
@@ -299,7 +290,7 @@ USE_ROCM=1 MAX_JOBS=4 python3 setup.py install --user
### Test the PyTorch Installation
You can use PyTorch unit tests to validate a PyTorch installation. If using a
prebuilt PyTorch Docker image from AMD ROCm Docker Hub or installing an official
prebuilt PyTorch Docker image from AMD ROCm DockerHub or installing an official
wheels package, these tests are already run on those configurations.
Alternatively, you can manually run the unit tests to validate the PyTorch
installation fully.

View File

@@ -17,6 +17,7 @@ tools, and APIs that enable GPU programming from low-level kernel to end-user ap
- {doc}`/deploy/linux/index`
- {doc}`/deploy/docker`
- {doc}`Deploy ROCm using Radeon <radeon:index>`
:::
::::

View File

@@ -1 +0,0 @@
# Docker

View File

@@ -57,8 +57,8 @@ contemporary CUDA / NVIDIA HPC SDK alternatives.
| 5.3.x | 1.16 | 22.7 |
| 5.4.x | 1.16 | 22.9 |
| 5.5.x | 1.17 | 22.9 |
| 5.6 | 1.17.2 | 22.9 |
| 5.7 | 1.17.2 | 22.9 |
| 5.6.x | 1.17.2 | 22.9 |
| 5.7.x | 1.17.2 | 22.9 |
For the latest documentation of these libraries, refer to the
[associated documentation](../reference/gpu_libraries/c%2B%2B_primitives.md).

View File

@@ -1,88 +0,0 @@
# Docker Image Support Matrix
The software support matrices for ROCm container releases is listed.
## ROCm 5.6
### PyTorch
#### `Ubuntu+ rocm5.6_internal_testing +169530b`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
* [Torch 2.0.0](https://github.com/ROCmSoftwarePlatform/pytorch/tree/rocm5.6_internal_testing)
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
* [Torchvision 0.15.1](https://github.com/pytorch/vision/tree/v0.15.1)
* [Tensorboard 2.12.0](https://github.com/tensorflow/tensorboard/tree/2.12.0)
* [MAGMA](https://bitbucket.org/icl/magma/src/master/)
* [UCX 1.10.0](https://github.com/openucx/ucx/tree/v1.10.0)
* [OMPI 4.0.3](https://github.com/open-mpi/ompi/tree/v4.0.3)
* [OFED 5.4.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
#### `CentOS7+ rocm5.6_internal_testing +169530b`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6/)
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
* [Torch 2.0.0](https://github.com/ROCmSoftwarePlatform/pytorch/tree/rocm5.6_internal_testing)
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
* [Torchvision 0.15.1](https://github.com/pytorch/vision/tree/v0.15.1)
* [Tensorboard 2.12.0](https://github.com/tensorflow/tensorboard/tree/2.12.0)
* [MAGMA](https://bitbucket.org/icl/magma/src/master/)
#### `1.13 +bfeb431`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6/)
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
* [Torch 1.13.1](https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.13)
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
* [Torchvision 0.14.0](https://github.com/pytorch/vision/tree/v0.14.0)
* [Tensorboard 2.12.0](https://github.com/tensorflow/tensorboard/tree/2.12.0)
* [MAGMA](https://bitbucket.org/icl/magma/src/master/)
* [UCX 1.10.0](https://github.com/openucx/ucx/tree/v1.10.0)
* [OMPI 4.0.3](https://github.com/open-mpi/ompi/tree/v4.0.3)
* [OFED 5.4.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
#### `1.12 +05d5d04`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6/)
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
* [Torch 1.12.1](https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.12)
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
* [Torchvision 0.13.1](https://github.com/pytorch/vision/tree/v0.13.1)
* [Tensorboard 2.12.0](https://github.com/tensorflow/tensorboard/tree/2.12.0)
* [MAGMA](https://bitbucket.org/icl/magma/src/master/)
* [UCX 1.10.0](https://github.com/openucx/ucx/tree/v1.10.0)
* [OMPI 4.0.3](https://github.com/open-mpi/ompi/tree/v4.0.3)
* [OFED 5.4.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
### TensorFlow
#### `tensorflow_develop-upstream-QA-rocm56 +c88a9f4`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6/)
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
* `tensorflow-rocm` 2.13.0
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
* [OMPI 4.0.7](https://github.com/open-mpi/ompi/tree/v4.0.7)
* [Horovod 0.27.0](https://github.com/horovod/horovod/tree/v0.27.0)
* [Tensorboard 2.12.0](https://github.com/tensorflow/tensorboard/tree/2.12.0)
#### `r2.11-rocm-enhanced +5be4141`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6/)
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
* [`tensorflow-rocm` 2.11.0](https://pypi.org/project/tensorflow-rocm/2.11.0.540/)
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
* [OMPI 4.0.7](https://github.com/open-mpi/ompi/tree/v4.0.7)
* [Horovod 0.27.0](https://github.com/horovod/horovod/tree/v0.27.0)
* [Tensorboard 2.11.2](https://github.com/tensorflow/tensorboard/tree/2.11.2)
#### `r2.10-rocm-enhanced +72789a3`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6/)
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
* [`tensorflow-rocm` 2.10.1](https://pypi.org/project/tensorflow-rocm/2.10.1.540/)
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
* [OMPI 4.0.7](https://github.com/open-mpi/ompi/tree/v4.0.7)
* [Horovod 0.27.0](https://github.com/horovod/horovod/tree/v0.27.0)
* [Tensorboard 2.10.1](https://github.com/tensorflow/tensorboard/tree/2.10.1)

View File

@@ -0,0 +1,112 @@
******************************************************************
Docker image support matrix
******************************************************************
AMD validates and publishes `PyTorch <https://hub.docker.com/r/rocm/pytorch>`_ and `TensorFlow <https://hub.docker.com/r/rocm/tensorflow>`_
containers on dockerhub. The following tags, and associated inventories, are validated with ROCm 5.7.
.. tab-set::
.. tab-item:: PyTorch
.. tab-set::
.. tab-item:: Ubuntu 20.04
Tag: `rocm/pytorch:rocm5.7_ubuntu20.04_py3.9_pytorch_staging <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1/images/sha256-4dd86046e5f777f53ae40a75ecfc76a5e819f01f3b2d40eacbb2db95c2f971d4)>`_
* Inventory:
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
* `Torch 2.1.0 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/rocm5.7_internal_testing>`_
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
* `Torchvision 0.16.0 <https://github.com/pytorch/vision/tree/release/0.16>`_
* `Tensorboard 2.14.0 <https://github.com/tensorflow/tensorboard/tree/2.14>`_
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
* `UCX 1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
* `OMPI 4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
* `OFED 5.4.3 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
Tag: `Ubuntu rocm/pytorch:rocm5.7_ubuntu20.04_py3.9_pytorch_1.12.1 <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_ubuntu20.04_py3.9_pytorch_1.12.1/images/sha256-e67db9373c045a7b6defd43cc3d067e7d49fd5d380f3f8582d2fb219c1756e1f>`_
* Inventory:
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
* `Torch 1.12.1 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.12>`_
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
* `Torchvision 0.13.1 <https://github.com/pytorch/vision/tree/v0.13.1>`_
* `Tensorboard 2.14.0 <https://github.com/tensorflow/tensorboard/tree/2.14>`_
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
* `UCX 1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
* `OMPI 4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
* `OFED 5.4.3 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
Tag: `Ubuntu rocm/pytorch:rocm5.7_ubuntu20.04_py3.9_pytorch_1.13.1 <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_ubuntu20.04_py3.9_pytorch_1.13.1/images/sha256-ed99d159026093d2aaf5c48c1e4b0911508773430377051372733f75c340a4c1>`_
* Inventory:
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
* `Torch 1.12.1 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.13>`_
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
* `Torchvision 0.14.0 <https://github.com/pytorch/vision/tree/v0.14.0>`_
* `Tensorboard 2.12.0 <https://github.com/tensorflow/tensorboard/tree/2.12.0>`_
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
* `UCX 1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
* `OMPI 4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
* `OFED 5.4.3 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
Tag: `Ubuntu rocm/pytorch:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1 <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1/images/sha256-4dd86046e5f777f53ae40a75ecfc76a5e819f01f3b2d40eacbb2db95c2f971d4>`_
* Inventory:
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
* `Torch 2.0.1 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/2.0>`_
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
* `Torchvision 0.15.2 <https://github.com/pytorch/vision/tree/release/0.15>`_
* `Tensorboard 2.14.0 <https://github.com/tensorflow/tensorboard/tree/2.14>`_
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
* `UCX 1.10.0 <https://github.com/openucx/ucx/tree/v1.10.0>`_
* `OMPI 4.0.3 <https://github.com/open-mpi/ompi/tree/v4.0.3>`_
* `OFED 5.4.3 <https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz>`_
.. tab-item:: CentOS 7
Tag: `rocm/pytorch:rocm5.7_centos7_py3.9_pytorch_staging <https://hub.docker.com/layers/rocm/pytorch/rocm5.7_centos7_py3.9_pytorch_staging/images/sha256-92240cdf0b4aa7afa76fc78be995caa19ee9c54b5c9f1683bdcac28cedb58d2b>`_
* Inventory:
* `ROCm 5.7 <https://repo.radeon.com/rocm/yum/5.7/>`_
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
* `Torch 2.1.0 <https://github.com/ROCmSoftwarePlatform/pytorch/tree/rocm5.7_internal_testing>`_
* `Apex 0.1 <https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1>`_
* `Torchvision 0.16.0 <https://github.com/pytorch/vision/tree/release/0.16>`_
* `MAGMA <https://bitbucket.org/icl/magma/src/master/>`_
.. tab-item:: TensorFlow
.. tab-set::
.. tab-item:: Ubuntu 20.04
Tag: `rocm5.7-tf2.12-dev <https://hub.docker.com/layers/rocm/tensorflow/rocm5.7-tf2.12-dev/images/sha256-e0ac4d49122702e5167175acaeb98a79b9500f585d5e74df18facf6b52ce3e59>`_
* Inventory:
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
* `tensorflow-rocm 2.12.1 <https://pypi.org/project/tensorflow-rocm/2.12.1.570/>`_
* `Tensorboard 2.12.3 <https://github.com/tensorflow/tensorboard/tree/2.12>`_
Tag: `rocm5.7-tf2.13-dev <https://hub.docker.com/layers/rocm/tensorflow/rocm5.7-tf2.13-dev/images/sha256-6f995539eebc062aac2b53db40e2b545192d8b032d0deada8c24c6651a7ac332>`_
* Inventory:
* `ROCm 5.7 <https://repo.radeon.com/rocm/apt/5.7/>`_
* `Python 3.9 <https://www.python.org/downloads/release/python-3918/>`_
* `tensorflow-rocm 2.13.0 <https://pypi.org/project/tensorflow-rocm/2.13.0.570/>`_
* `Tensorboard 2.13.0 <https://github.com/tensorflow/tensorboard/tree/2.13>`_

View File

@@ -5,6 +5,7 @@ The following table is a list of ROCm components with links to their respective
terms. These components may include third party components subject to
additional licenses. Please review individual repositories for more information.
The table shows ROCm components, the name of license and link to the license terms.
The table is ordered to follow ROCm's manifest file.
<!-- spellcheck-disable -->
| Component | License |

View File

@@ -21,3 +21,4 @@ the compatibility combinations that are currently supported.
| 5.6.0 | 5.4.3, 5.5.1 |
| 5.6.1 | 5.7.0 |
| 5.7.0 | 5.5.0, 5.6.1 |
| 5.7.1 | 5.5.0, 5.6.1 |

View File

@@ -2,8 +2,6 @@
| Version | Release Date |
| ------- | ------------ |
| [5.7.0](https://rocm.docs.amd.com/en/docs-5.7.0/) | Sep 15, 2023 |
| [5.6.1](https://rocm.docs.amd.com/en/docs-5.6.1/) | Aug 29, 2023 |
| [5.6.0](https://rocm.docs.amd.com/en/docs-5.6.0/) | Jun 28, 2023 |
| [5.5.1](https://rocm.docs.amd.com/en/docs-5.5.1/) | May 24, 2023 |
| [5.5.0](https://rocm.docs.amd.com/en/docs-5.5.0/) | May 1, 2023 |

View File

@@ -19,6 +19,14 @@ ROCm supports programming models, such as OpenMP and OpenCL, and includes all ne
compilers, debuggers, and libraries. ROCm is fully integrated into machine learning (ML) frameworks,
such as PyTorch and TensorFlow.
## ROCm on Radeon
Starting with ROCm™ 5.7 on Linux®, researchers and developers working with Machine Learning (ML) models and algorithms can tap into the parallel computing power of the AMD desktop GPUs based on the RDNA™ 3 architecture.
A client solution built on powerful high-end AMD GPUs provides a local, private and often cost-effective workflow to develop ROCm and train ML (PyTorch) for the users who previously relied solely on cloud-based solutions.
For information about how to install ROCm on AMD desktop GPUs based on the RDNA™ 3 architecture, see {doc}`Use ROCm on Radeon <radeon:index>`. For more information about supported AMD Radeon™ desktop GPUs, see {doc}`Radeon Compatibility Matrices <radeon:compatibility>`.
## ROCm on Windows
Starting with ROCm 5.5, the HIP SDK brings a subset of ROCm to developers on Windows.

View File

@@ -81,7 +81,7 @@ subtrees:
subtrees:
- entries:
- file: release/user_kernel_space_compat_matrix
- file: release/docker_image_support_matrix
- file: release/docker_image_support_matrix.rst
- file: release/3rd_party_support_matrix
- file: release/licensing
@@ -102,7 +102,7 @@ subtrees:
- entries:
- file: reference/gpu_libraries/linear_algebra
subtrees:
- entries:
- entries:
- title: rocBLAS
url: ${project:rocblas}
- title: hipBLAS

View File

@@ -1,2 +1 @@
rocm-docs-core==1.8.0
sphinx-reredirects
rocm-docs-core==0.26.0

View File

@@ -4,103 +4,105 @@
#
# pip-compile requirements.in
#
accessible-pygments==0.0.5
accessible-pygments==0.0.3
# via pydata-sphinx-theme
alabaster==1.0.0
alabaster==0.7.13
# via sphinx
babel==2.16.0
babel==2.11.0
# via
# pydata-sphinx-theme
# sphinx
beautifulsoup4==4.12.3
beautifulsoup4==4.11.2
# via pydata-sphinx-theme
breathe==4.35.0
breathe==4.34.0
# via rocm-docs-core
certifi==2024.8.30
certifi==2023.7.22
# via requests
cffi==1.17.1
cffi==1.15.1
# via
# cryptography
# pynacl
charset-normalizer==3.3.2
charset-normalizer==2.1.1
# via requests
click==8.1.7
click==8.1.3
# via sphinx-external-toc
cryptography==43.0.1
cryptography==41.0.3
# via pyjwt
deprecated==1.2.14
deprecated==1.2.13
# via pygithub
docutils==0.21.2
docutils==0.19
# via
# breathe
# myst-parser
# pydata-sphinx-theme
# sphinx
fastjsonschema==2.20.0
fastjsonschema==2.16.3
# via rocm-docs-core
gitdb==4.0.11
gitdb==4.0.10
# via gitpython
gitpython==3.1.43
gitpython==3.1.30
# via rocm-docs-core
idna==3.10
idna==3.4
# via requests
imagesize==1.4.1
# via sphinx
jinja2==3.1.4
jinja2==3.1.2
# via
# myst-parser
# sphinx
markdown-it-py==3.0.0
markdown-it-py==2.2.0
# via
# mdit-py-plugins
# myst-parser
markupsafe==2.1.5
markupsafe==2.1.2
# via jinja2
mdit-py-plugins==0.4.2
mdit-py-plugins==0.3.4
# via myst-parser
mdurl==0.1.2
# via markdown-it-py
myst-parser==4.0.0
myst-parser==1.0.0
# via rocm-docs-core
packaging==24.1
packaging==23.0
# via
# pydata-sphinx-theme
# sphinx
pycparser==2.22
pycparser==2.21
# via cffi
pydata-sphinx-theme==0.15.4
pydata-sphinx-theme==0.13.3
# via
# rocm-docs-core
# sphinx-book-theme
pygithub==2.4.0
pygithub==1.58.1
# via rocm-docs-core
pygments==2.18.0
pygments==2.15.0
# via
# accessible-pygments
# pydata-sphinx-theme
# sphinx
pyjwt[crypto]==2.9.0
pyjwt[crypto]==2.6.0
# via pygithub
pynacl==1.5.0
# via pygithub
pyyaml==6.0.2
pytz==2022.7.1
# via babel
pyyaml==6.0
# via
# myst-parser
# rocm-docs-core
# sphinx-external-toc
requests==2.32.3
requests==2.31.0
# via
# pygithub
# sphinx
rocm-docs-core==1.8.0
rocm-docs-core==0.26.0
# via -r requirements.in
smmap==5.0.1
smmap==5.0.0
# via gitdb
snowballstemmer==2.2.0
# via sphinx
soupsieve==2.6
soupsieve==2.4
# via beautifulsoup4
sphinx==8.0.2
sphinx==5.3.0
# via
# breathe
# myst-parser
@@ -111,40 +113,31 @@ sphinx==8.0.2
# sphinx-design
# sphinx-external-toc
# sphinx-notfound-page
# sphinx-reredirects
sphinx-book-theme==1.1.3
sphinx-book-theme==1.0.1
# via rocm-docs-core
sphinx-copybutton==0.5.2
sphinx-copybutton==0.5.1
# via rocm-docs-core
sphinx-design==0.6.1
sphinx-design==0.4.1
# via rocm-docs-core
sphinx-external-toc==1.0.1
sphinx-external-toc==0.3.1
# via rocm-docs-core
sphinx-notfound-page==1.0.4
sphinx-notfound-page==0.8.3
# via rocm-docs-core
sphinx-reredirects==0.1.5
# via -r requirements.in
sphinxcontrib-applehelp==2.0.0
sphinxcontrib-applehelp==1.0.4
# via sphinx
sphinxcontrib-devhelp==2.0.0
sphinxcontrib-devhelp==1.0.2
# via sphinx
sphinxcontrib-htmlhelp==2.1.0
sphinxcontrib-htmlhelp==2.0.1
# via sphinx
sphinxcontrib-jsmath==1.0.1
# via sphinx
sphinxcontrib-qthelp==2.0.0
sphinxcontrib-qthelp==1.0.3
# via sphinx
sphinxcontrib-serializinghtml==2.0.0
sphinxcontrib-serializinghtml==1.1.5
# via sphinx
tomli==2.0.1
# via sphinx
typing-extensions==4.12.2
# via
# pydata-sphinx-theme
# pygithub
urllib3==2.2.3
# via
# pygithub
# requests
wrapt==1.16.0
typing-extensions==4.5.0
# via pydata-sphinx-theme
urllib3==1.26.13
# via requests
wrapt==1.14.1
# via deprecated

View File

@@ -13,7 +13,7 @@ The full list of HSA system architecture platform requirements are here: `HSA Sy
The ROCm Platform uses the new PCI Express 3.0 (PCIe 3.0) features for Atomic Read-Modify-Write Transactions which extends inter-processor synchronization mechanisms to IO to support the defined set of HSA capabilities needed for queuing and signaling memory operations.
The new PCIe atomic operations operate as completers for ``CAS`` (Compare and Swap), ``FetchADD``, ``SWAP`` atomics. The atomic operations are initiated by the
The new PCIe AtomicOps operate as completers for ``CAS`` (Compare and Swap), ``FetchADD``, ``SWAP`` atomics. The AtomicsOps are initiated by the
I/O device which support 32-bit, 64-bit and 128-bit operand which target address have to be naturally aligned to operation sizes.
For ROCm the Platform atomics are used in ROCm in the following ways:
@@ -22,11 +22,11 @@ For ROCm the Platform atomics are used in ROCm in the following ways:
* Update HSA queues write_dispatch_id: 64 bit atomic add used by the CPU and GPU agent to support multi-writer queue insertions.
* Update HSA Signals 64bit atomic ops are used for CPU & GPU synchronization.
The PCIe 3.0 atomic operations feature allows atomic transactions to be requested by, routed through and completed by PCIe components. Routing and completion does not require software support. Component support for each is detectable via the DEVCAP2 register. Upstream bridges need to have atomic operations routing enabled or the Atomic Operations will fail even though PCIe endpoint and PCIe I/O Devices has the capability to Atomics Operations.
The PCIe 3.0 AtomicOp feature allows atomic transactions to be requested by, routed through and completed by PCIe components. Routing and completion does not require software support. Component support for each is detectable via the DEVCAP2 register. Upstream bridges need to have AtomicOp routing enabled or the Atomic Operations will fail even though PCIe endpoint and PCIe I/O Devices has the capability to Atomics Operations.
To do atomic operations routing capability between two or more Root Ports, each associated Root Port must indicate that capability via the atomic operations routing supported bit in the Device Capabilities 2 register.
To do AtomicOp routing capability between two or more Root Ports, each associated Root Port must indicate that capability via the AtomicOp Routing Supported bit in the Device Capabilities 2 register.
If your system has a PCIe Express Switch it needs to support atomic operations routing. Atomic operations requests are permitted only if a components ``DEVCTL2.ATOMICOP_REQUESTER_ENABLE`` field is set. These requests can only be serviced if the upstream components support atomic operations completion and/or routing to a component which does. Atomic operations routing support=1, routing is supported; Atomic operations routing support=0, routing is not supported.
If your system has a PCIe Express Switch it needs to support AtomicsOp routing. Again AtomicOp requests are permitted only if a components ``DEVCTL2.ATOMICOP_REQUESTER_ENABLE`` field is set. These requests can only be serviced if the upstream components support AtomicOp completion and/or routing to a component which does. AtomicOp Routing Support=1 Routing is supported, AtomicOp Routing Support=0 routing is not supported.
Atomic Operation is a Non-Posted transaction supporting 32-bit and 64-bit address formats, there must be a response for Completion containing the result of the operation. Errors associated with the operation (uncorrectable error accessing the target location or carrying out the Atomic operation) are signaled to the requester by setting the Completion Status field in the completion descriptor, they are set to to Completer Abort (CA) or Unsupported Request (UR).
@@ -69,7 +69,7 @@ BAR Memory Overview
*******************
On a Xeon E5 based system in the BIOS we can turn on above 4GB PCIe addressing, if so he need to set MMIO Base address ( MMIOH Base) and Range ( MMIO High Size) in the BIOS.
In Supermicro system in the system bios you need to see the following
In SuperMicro system in the system bios you need to see the following
* Advanced->PCIe/PCI/PnP configuration-> Above 4G Decoding = Enabled
@@ -77,7 +77,7 @@ In Supermicro system in the system bios you need to see the following
* Advanced->PCIe/PCI/PnP Configuration->MMIO High Size = 256G
When we support Large Bar Capability there is a Large Bar VBIOS which also disable the IO bar.
When we support Large Bar Capability there is a Large Bar Vbios which also disable the IO bar.
For GFX9 and Vega10 which have Physical Address up 44 bit and 48 bit Virtual address.
@@ -116,5 +116,30 @@ Legend:
5 : Expansion ROM This is required for the AMD Driver SW to access the GPUs video-bios. This is currently fixed at 128KB.
For more information, you can review
`Overview of Changes to PCI Express 3.0 <https://www.mindshare.com/files/resources/PCIe%203-0.pdf>`_.
Excepts form Overview of Changes to PCI Express 3.0
===================================================
By Mike Jackson, Senior Staff Architect, MindShare, Inc.
********************************************************
Atomic Operations Goal:
*************************
Support SMP-type operations across a PCIe network to allow for things like offloading tasks between CPU cores and accelerators like a GPU. The spec says this enables advanced synchronization mechanisms that are particularly useful with multiple producers or consumers that need to be synchronized in a non-blocking fashion. Three new atomic non-posted requests were added, plus the corresponding completion (the address must be naturally aligned with the operand size or the TLP is malformed):
* Fetch and Add uses one operand as the “add” value. Reads the target location, adds the operand, and then writes the result back to the original location.
* Unconditional Swap uses one operand as the “swap” value. Reads the target location and then writes the swap value to it.
* Compare and Swap uses 2 operands: first data is compare value, second is swap value. Reads the target location, checks it against the compare value and, if equal, writes the swap value to the target location.
* AtomicOpCompletion new completion to give the result so far atomic request and indicate that the atomicity of the transaction has been maintained.
Since AtomicOps are not locked they don't have the performance downsides of the PCI locked protocol. Compared to locked cycles, they provide “lower latency, higher scalability, advanced synchronization algorithms, and dramatically lower impact on other PCIe traffic.” The lock mechanism can still be used across a bridge to PCI or PCI-X to achieve the desired operation.
AtomicOps can go from device to device, device to host, or host to device. Each completer indicates whether it supports this capability and guarantees atomic access if it does. The ability to route AtomicOps is also indicated in the registers for a given port.
ID-based Ordering Goal:
*************************
Improve performance by avoiding stalls caused by ordering rules. For example, posted writes are never normally allowed to pass each other in a queue, but if they are requested by different functions, we can have some confidence that the requests are not dependent on each other. The previously reserved Attribute bit [2] is now combined with the RO bit to indicate ID ordering with or without relaxed ordering.
This only has meaning for memory requests, and is reserved for Configuration or IO requests. Completers are not required to copy this bit into a completion, and only use the bit if their enable bit is set for this operation.
To read more on PCIe Gen 3 new options https://www.mindshare.com/files/resources/PCIe%203-0.pdf

View File

@@ -4,7 +4,7 @@ Using CMake
Most components in ROCm support CMake. Projects depending on header-only or
library components typically require CMake 3.5 or higher whereas those wanting
to make use of the CMake HIP language support will require CMake 3.21 or higher.
to make use of CMake's HIP language support will require CMake 3.21 or higher.
Finding Dependencies
====================
@@ -16,7 +16,7 @@ Finding Dependencies
<https://cmake.org/cmake/help/latest/command/find_package.html>`_ and the
`Using Dependencies Guide
<https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html>`_
to get an overview of CMake related facilities.
to get an overview of CMake's related facilities.
In short, CMake supports finding dependencies in two ways:
@@ -28,7 +28,7 @@ In short, CMake supports finding dependencies in two ways:
regards needed to consume it.
ROCm predominantly relies on Config mode, one notable exception being the Module
driving the compilation of HIP programs on NVIDIA runtimes. As such, when
driving the compilation of HIP programs on Nvidia runtimes. As such, when
dependencies are not found in standard system locations, one either has to
instruct CMake to search for package config files in additional folders using
the ``CMAKE_PREFIX_PATH`` variable (a semi-colon separated list of filesystem
@@ -55,8 +55,8 @@ to the installation guides in these docs (`Linux <../deploy/linux/index.html>`_)
Using HIP in CMake
==================
ROCm components providing a C/C++ interface support being consumed using any
C/C++ toolchain that CMake knows how to drive. ROCm also supports the CMake HIP
ROCm componenents providing a C/C++ interface support being consumed using any
C/C++ toolchain that CMake knows how to drive. ROCm also supports CMake's HIP
language features, allowing users to program using the HIP single-source
programming model. When a program (or translation-unit) uses the HIP API without
compiling any GPU device code, HIP can be treated in CMake as a simple C/C++
@@ -172,7 +172,7 @@ all the flags necessary for device compilation.
.. note::
Compiling for the GPU device requires at least C++11.
This project can then be configured with the following CMake commands:
This project can then be configured with for eg.
- Windows: ``cmake -D CMAKE_CXX_COMPILER:PATH=${env:HIP_PATH}\bin\clang++.exe``
@@ -186,7 +186,7 @@ When using the CXX language support to compile HIP device code, selecting the
target GPU architectures is done via setting the ``GPU_TARGETS`` variable.
``CMAKE_HIP_ARCHITECTURES`` only exists when the HIP language is enabled. By
default, this is set to some subset of the currently supported architectures of
AMD ROCm. It can be set to the CMake option ``-D GPU_TARGETS="gfx1032;gfx1035"``.
AMD ROCm. It can be set to eg. ``-D GPU_TARGETS="gfx1032;gfx1035"``.
ROCm CMake Packages
-------------------
@@ -251,9 +251,9 @@ options.
IDEs supporting CMake (Visual Studio, Visual Studio Code, CLion, etc.) all came
up with their own way to register command-line fragments of different purpose in
a setup-and-forget fashion for quick assembly using graphical front-ends. This is
a setup'n'forget fashion for quick assembly using graphical front-ends. This is
all nice, but configurations aren't portable, nor can they be reused in
Continuous Integration (CI) pipelines. CMake has condensed existing practice
Continuous Intergration (CI) pipelines. CMake has condensed existing practice
into a portable JSON format that works in all IDEs and can be invoked from any
command-line. This is
`CMake Presets <https://cmake.org/cmake/help/latest/manual/cmake-presets.7.html>`_

View File

@@ -10,6 +10,6 @@ disambiguates compiler naming used throughout the documentation.
| `amdclang++` | Clang/LLVM-based compiler that is part of `rocm-llvm` package. The source code is available at <a href="https://github.com/RadeonOpenCompute/llvm-project" target="_blank">https://github.com/RadeonOpenCompute/llvm-project</a>. |
| AOCC | Closed-source clang-based compiler that includes additional CPU optimizations. Offered as part of ROCm via the `rocm-llvm-alt` package. See for details, <a href="https://developer.amd.com/amd-aocc/" target="_blank">https://developer.amd.com/amd-aocc/</a>. |
| HIP-Clang | Informal term for the `amdclang++` compiler |
| HIPIFY | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPIFY" target="_blank">https://github.com/ROCm-Developer-Tools/HIPIFY</a> |
| HIPify | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPIFY" target="_blank">https://github.com/ROCm-Developer-Tools/HIPIFY</a> |
| `hipcc` | HIP compiler driver. A utility that invokes `clang` or `nvcc` depending on the target and passes the appropriate include and library options for the target compiler and HIP infrastructure. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPCC" target="_blank">https://github.com/ROCm-Developer-Tools/HIPCC</a>. |
| ROCmCC | Clang/LLVM-based compiler. ROCmCC in itself is not a binary but refers to the overall compiler. |

View File

@@ -81,7 +81,7 @@ The command processor counters are further classified into fetcher and compute.
| `spi_ra_lds_cu_full_csn` | CUs | Sum of CU where LDS cannot take csn wave when not fits |
| `spi_ra_bar_cu_full_csn[]` | CUs | Sum of CU where BARRIER cannot take csn wave when not fits |
| `spi_ra_bulky_cu_full_csn[]` | CUs | Sum of CU where BULKY cannot take csn wave when not fits |
| `spi_ra_tglim_cu_full_csn[]` | Cycles | Cycles where csn wants to req but all CUs are at `tg_limit` |
| `spi_ra_tglim_cu_full_csn[]` | Cycles | Cycles where csn wants to req but all CUs are at tg_limit |
| `spi_ra_wvlim_cu_full_csn[]` | Cycles | Number of clocks csn is stalled due to WAVE LIMIT |
| `spi_vwc_csc_wr` | Cycles | Number of clocks to write CSC waves to VGPRs (need to multiply this value by 4) |
| `spi_swc_csc_wr` | Cycles | Number of clocks to write CSC waves to SGPRs (need to multiply this value by 4) |
@@ -288,9 +288,9 @@ The vector L1 cache subsystem counters are further classified into texture addre
| `tcp_gate_en2` | Cycles | Number of cycles vL1D core clocks are turned on |
| `tcp_td_tcp_stall_cycles` | Cycles | Number of cycles TD stalls vL1D |
| `tcp_tcr_tcp_stall_cycles` | Cycles | Number of cycles TCR stalls vL1D |
| `tcp_read_tagconflict_stall_cycles` | Cycles | Number of cycles tag RAM conflict stalls on a Read |
| `tcp_write_tagconflict_stall_cycles` | Cycles | Number of cycles tag RAM conflict stalls on a Write |
| `tcp_atomic_tagconflict_stall_cycles` | Cycles | Number of cycles tag RAM conflict stalls on an Atomic |
| `tcp_read_tagconflict_stall_cycles` | Cycles | Number of cycles tagram conflict stalls on a Read |
| `tcp_write_tagconflict_stall_cycles` | Cycles | Number of cycles tagram conflict stalls on a Write |
| `tcp_atomic_tagconflict_stall_cycles` | Cycles | Number of cycles tagram conflict stalls on an Atomic |
| `tcp_pending_stall_cycles` | Cycles | Number of cycles vL1D cache is stalled due to data pending from L2 Cache |
| `tcp_ta_tcp_state_read` | Req | Number of wavefront instruction requests to vL1D |
| `tcp_volatile[]` | Req | Number of L1 volatile pixels/buffers from TA |
@@ -347,7 +347,7 @@ The vector L1 cache subsystem counters are further classified into texture addre
| `tcc_CC_req` |Req | Number of CC requests |
| `tcc_RW_req` |Req | Number of RW requests |
| `tcc_probe` |Req | Number of L2 Cache probe requests |
| `tcc_probe_all[]` |Req | Number of external probe requests with `EA_TCC_preq_all== 1` |
| `tcc_probe_all[]` |Req | Number of external probe requests with EA_TCC_preq_all== 1 |
| `tcc_read_req` |Req | Number of L2 Cache Read requests |
| `tcc_write_req` |Req | Number of L2 Cache Write requests |
| `tcc_atomic_req` |Req | Number of L2 Cache Atomic requests |

View File

@@ -17,16 +17,16 @@
- Run this for 5.6.0 (change for whatever version you require)
- `GITHUB_ACCESS_TOKEN=my_token_here`
To generate the changelog from 5.0.0 up to and including 5.6.0:
To generate the changelog from 5.0.0 up to and including 5.7.1:
```sh
python3 tag_script.py -t $GITHUB_ACCESS_TOKEN --no-release --no-pulls --do-previous --compile_file ../../CHANGELOG.md --branch release/rocm-rel-5.6 5.6.0
python3 tag_script.py -t $GITHUB_ACCESS_TOKEN --no-release --no-pulls --do-previous --compile_file ../../CHANGELOG.md --branch release/rocm-rel-5.7 5.7.1
```
To generate the changelog only for 5.6.0:
To generate the changelog only for 5.7.1:
```sh
python3 tag_script.py -t $GITHUB_ACCESS_TOKEN --no-release --no-pulls --compile_file ../../CHANGELOG.md --branch release/rocm-rel-5.6 5.6.0
python3 tag_script.py -t $GITHUB_ACCESS_TOKEN --no-release --no-pulls --compile_file ../../CHANGELOG.md --branch release/rocm-rel-5.7 5.7.1
```
### Notes

View File

@@ -0,0 +1,36 @@
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-duplicate-header -->
### What's New in This Release
#### Installing all GPU Address sanitizer packages with a single command
ROCm 5.7.1 simplifies the installation steps for the optional Address Sanitizer (ASan) packages. This release provides the meta package *rocm-ml-sdk-asan* for ease of ASan installation. The following command can be used to install all ASan packages rather than installing each package separately,
sudo apt-get install rocm-ml-sdk-asan
For more detailed information about using the GPU AddressSanitizer, refer to the [user guide](https://rocm.docs.amd.com/en/docs-5.7.1/understand/using_gpu_sanitizer.html)
### ROCm Libraries
#### rocBLAS
A new functionality rocblas-gemm-tune and an environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH are added to rocBLAS in the ROCm 5.7.1 release.
*rocblas-gemm-tune* is used to find the best-performing GEMM kernel for each GEMM problem set. It has a command line interface, which mimics the --yaml input used by rocblas-bench. To generate the expected --yaml input, profile logging can be used, by setting the environment variable ROCBLAS_LAYER4.
For more information on rocBLAS logging, see Logging in rocBLAS, in the [API Reference Guide](https://rocm.docs.amd.com/projects/rocBLAS/en/docs-5.7.1/API_Reference_Guide.html#logging-in-rocblas).
An example input file: Expected output (note selected GEMM idx may differ): Where the far right values (solution_index) are the indices of the best-performing kernels for those GEMMs in the rocBLAS kernel library. These indices can be directly used in future GEMM calls. See rocBLAS/samples/example_user_driven_tuning.cpp for sample code of directly using kernels via their indices.
If the output is stored in a file, the results can be used to override default kernel selection with the kernels found, by setting the environment variable ROCBLAS_TENSILE_GEMM_OVERRIDE_PATH, where points to the stored file.
For more details, refer to the [rocBLAS Programmer's Guide.](https://rocm.docs.amd.com/projects/rocBLAS/en/latest/Programmers_Guide.html#rocblas-gemm-tune)
#### HIP 5.7.1 (for ROCm 5.7.1)
ROCm 5.7.1 is a point release with several bug fixes in the HIP runtime.
### Fixed defects
The *hipPointerGetAttributes* API returns the correct HIP memory type as *hipMemoryTypeManaged* for managed memory.