Compare commits

..

22 Commits

Author SHA1 Message Date
Sam Wu
0e0d89c45b Update documentation requirements 2024-09-16 10:12:57 -08:00
Sam Wu
1990a9340b Update documentation requirements 2024-06-06 16:58:47 -06:00
Sam Wu
6e2cdef227 Fix RTD config 2024-05-02 08:55:43 -06:00
Sam Wu
e8faa13217 Update documentation requirements 2024-05-01 16:59:36 -06:00
Sam Wu
f9db7ee88e Update documentation requirements 2024-05-01 16:54:08 -06:00
Young Hui - AMD
c2a1ea1248 update SLES 15.4 prerequisites for 5.6.1 (#2981)
* update SLES 15.4 prerequisites

* update wordlist from develop

* fixing spelling mistakes

* more spelling fixes
2024-04-01 17:50:08 -04:00
Mátyás Aradi
5bcb8404e3 Update Linux and ROCm versions (#2869) 2024-02-05 09:40:21 -07:00
Mátyás Aradi
21f23d9089 Remove RHEL 8.6, because it is not supported (#2829) 2024-01-19 09:15:18 -07:00
Sam Wu
76f43d7457 Roc 5.6.x to docs/5.6.1 (#2489)
* update linux installer instructions
el8 > el9

* update to 5.6.1 linux install instructions
2023-09-27 17:18:02 -06:00
Máté Ferenc Nagy-Egri
ddbe4cd38f Update Linux install instructions for 5.6.1 2023-08-30 07:08:50 -06:00
Sam Wu
7e097ce72a Update conf.py 2023-08-29 17:04:47 -06:00
Saad Rahim
f3d3929f11 Updating version number to 5.6.1 2023-08-29 16:56:11 -06:00
Nara
084ed7f4cb docs: fix missing '--append' flag in install instructions (#2411) 2023-08-29 16:53:28 -06:00
Saad Rahim (AMD)
7482a8b261 Bump rocm-docs-core from 0.20.0 to 0.21.0 in /docs/sphinx (#2419) (#2420)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.20.0 to 0.21.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.20.0...v0.21.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-29 16:08:48 -06:00
Saad Rahim (AMD)
bf8f0ccc65 Updating the manifest file (#2417) 2023-08-29 15:07:13 -06:00
Sam Wu
ed8251872f 5.6.1 Release notes (#2416)
* 5.6.1 rel notes

* update rtd config
2023-08-29 15:04:53 -06:00
Nara
c3e8e15e51 doc: Update version in install guide to 5.6 (#2387) 2023-08-18 13:57:45 -06:00
Sam Wu
df0ee5a0ae add version to html title 2023-08-04 17:18:41 -06:00
Saad Rahim
6fb7b9f3b5 GPU support clarification (#2350) 2023-07-27 17:42:24 -06:00
Saad Rahim
7f8eede7d1 linting fix 2023-07-27 16:30:18 -06:00
Saad Rahim
0741268fd5 Updating GPU support list 2023-07-27 16:30:18 -06:00
Sam Wu
4ab3787abe Merge pull request #2345 from RadeonOpenCompute/docs/5.5.1
Docs/5.5.1 Sync into 5.6
2023-07-27 13:32:02 -06:00
25 changed files with 985 additions and 888 deletions

View File

@@ -3,12 +3,19 @@
version: 2
build:
os: ubuntu-22.04
tools:
python: "3.10"
apt_packages:
- "doxygen"
- "graphviz" # For dot graphs in doxygen
python:
install:
- requirements: docs/sphinx/requirements.txt
sphinx:
configuration: docs/conf.py
formats: [htmlzip, pdf, epub]
python:
version: "3.8"
install:
- requirements: docs/sphinx/requirements.txt
formats: []

View File

@@ -1,49 +1,686 @@
# file_reorg
FHS
Filesystem
filesystem
incrementing
rocm
# gpu_aware_mpi
DMA
GDR
HCA
MPI
MVAPICH
Mellanox's
NIC
OFED
OSU
OpenFabrics
PeerDirect
RDMA
UCX
ib_core
# isv_deployment_win
AAC
ABI
# linear algebra
LAPACK
MMA
backends
cuSOLVER
cuSPARSE
# openmp
ICV
Multithreaded
# tuning_guides
ACE
ACEs
AccVGPR
AccVGPRs
ALU
AMD
AMDGPU
AMDGPUs
AMDMIGraphX
AMI
AOCC
AOMP
APIC
APIs
APU
ASIC
ASICs
ASan
ASm
ATI
AddressSanitizer
AlexNet
Arb
BLAS
BMC
BitCode
Blit
Bluefield
CCD
CDNA
CIFAR
CLI
CLion
CMake
CMakeLists
CMakePackage
CP
CPC
CPF
CPP
CPU
CPUs
CSC
CSE
CSV
CSn
CTests
CU
CUDA
CUs
CXX
Cavium
CentOS
ChatGPT
CoRR
Codespaces
Commitizen
CommonMark
Concretized
Conda
ConnectX
DGEMM
DKMS
DL
DMA
DNN
DNNL
DPM
DRI
DW
DWORD
Dask
DataFrame
DataLoader
DataParallel
DeepSpeed
Dependabot
DevCap
Dockerfile
Doxygen
ELMo
ENDPGM
EPYC
ESXi
FFT
FFTs
FFmpeg
FHS
FMA
FP
Filesystem
Flang
Fortran
Fuyu
GALB
GCD
GCDs
GCN
GDB
GDDR
GDR
GDS
GEMM
GEMMs
GFortran
GiB
GIM
GL
GLXT
GMI
GPG
GPR
GPT
GPU
GPU's
GPUs
GRBM
GenAI
GenZ
GitHub
Gitpod
HBM
HCA
HIPCC
HIPExtension
HIPIFY
HPC
HPCG
HPE
HPL
HSA
HWE
Haswell
Higgs
Hyperparameters
ICV
IDE
IDEs
IMDb
IOMMU
IOP
IOPM
# windows
IOV
IRQ
ISA
ISV
ISVs
ImageNet
InfiniBand
Inlines
IntelliSense
Intersphinx
Intra
Ioffe
JSON
Jupyter
KFD
KiB
KVM
Keras
Khronos
LAPACK
LCLK
LDS
LLM
LLMs
LLVM
LM
LSAN
LTS
LoRA
MEM
MERCHANTABILITY
MFMA
MiB
MIGraphX
MIOpen
MIOpenGEMM
MIVisionX
MLM
MMA
MMIO
MMIOH
MNIST
MPI
MSVC
MVAPICH
MVFFR
Makefile
Makefiles
Matplotlib
Megatron
Mellanox
Mellanox's
Meta's
MirroredStrategy
Multicore
Multithreaded
MyEnvironment
MyST
NBIO
NBIOs
NIC
NICs
NLI
NLP
NPS
NSP
NUMA
NVCC
NVIDIA
NVPTX
NaN
Nano
Navi
Noncoherently
NousResearch's
NumPy
OAM
OAMs
OCP
OEM
OFED
OMP
OMPI
OMPT
OMPX
ONNX
OSS
OSU
OpenCL
OpenCV
OpenFabrics
OpenGL
OpenMP
OpenSSL
OpenVX
PCI
PCIe
PEFT
PIL
PILImage
PRNG
PRs
PaLM
Pageable
PeerDirect
Perfetto
PipelineParallel
PnP
PowerShell
PyPi
PyTorch
Qcycles
RAII
RCCL
RDC
RDMA
RDNA
RHEL
ROC
ROCProfiler
ROCTracer
ROCclr
ROCdbgapi
ROCgdb
ROCk
ROCm
ROCmCC
ROCmSoftwarePlatform
ROCmValidationSuite
ROCr
RST
RW
Radeon
RelWithDebInfo
Req
Rickle
RoCE
Ryzen
SALU
SBIOS
SCA
SDK
SDMA
SDRAM
SENDMSG
SGPR
SGPRs
SHA
SIGQUIT
SIMD
SIMDs
SKU
SKUs
PowerShell
SLES
SMEM
SMI
SMT
SPI
SQs
SRAM
SRAMECC
SVD
SWE
SerDes
Shlens
Skylake
Softmax
Spack
Supermicro
Szegedy
TCA
TCC
TCI
TCIU
TCP
TCR
TF
TFLOPS
TPU
TPUs
TensorBoard
TensorFlow
TensorParallel
ToC
TorchAudio
TorchMIGraphX
TorchScript
TorchServe
TorchVision
TransferBench
TrapStatus
UAC
# pytorch_install
kdb
precompiled
# gpu_os_support
HWE
UC
UCC
UCX
UIF
USM
UTCL
UTIL
Uncached
Unhandled
VALU
VBIOS
VGPR
VGPRs
VM
VMEM
VMWare
VRAM
VSIX
VSkipped
Vanhoucke
Vulkan
WGP
WGPs
WX
WikiText
Wojna
Workgroups
Writebacks
XCD
XCDs
XGBoost
XGBoost's
XGMI
XT
XTX
Xeon
Xilinx
Xnack
Xteam
YAML
YML
YModel
ZeRO
ZenDNN
accuracies
activations
addr
alloc
allocator
allocators
amdgpu
api
atmi
atomics
autogenerated
avx
awk
backend
backends
benchmarking
bfloat
bilinear
bitsandbytes
blit
boson
bosons
buildable
bursty
bzip
cacheable
cd
centos
centric
changelog
chiplet
cmake
cmd
coalescable
codename
collater
comgr
completers
composable
concretization
config
conformant
convolutional
convolves
cpp
csn
cuBLAS
cuFFT
cuLIB
cuRAND
cuSOLVER
cuSPARSE
dataset
datasets
dataspace
datatype
datatypes
dbgapi
de
deallocation
denoise
denoised
denoises
denormalize
deserializers
detections
dev
devicelibs
devsel
dimensionality
disambiguates
distro
el
embeddings
enablement
endpgm
encodings
env
epilog
etcetera
ethernet
exascale
executables
ffmpeg
filesystem
fortran
galb
gcc
gdb
gfortran
gfx
githooks
github
gnupg
grayscale
gzip
heterogenous
hipBLAS
hipBLASLt
hipCUB
hipFFT
hipLIB
hipRAND
hipSOLVER
hipSPARSE
hipSPARSELt
hipTensor
hipamd
hipblas
hipcub
hipfft
hipfort
hipify
hipsolver
hipsparse
hpp
hsa
hsakmt
hyperparameter
ib_core
inband
incrementing
inferencing
inflight
init
initializer
inlining
installable
interprocedural
intra
invariants
invocating
ipo
kdb
latencies
libfabric
libjpeg
libs
linearized
linter
linux
llvm
localscratch
logits
lossy
macOS
matchers
microarchitecture
migraphx
miopen
miopengemm
mivisionx
mkdir
mlirmiopen
mtypes
mvffr
namespace
namespaces
numref
ocl
opencl
opencv
openmp
openssl
optimizers
os
pageable
parallelization
parameterization
passthrough
perfcounter
performant
perl
pragma
pre
prebuilt
precompiled
prefetch
prefetchable
preprocess
preprocessed
preprocessing
prequantized
prerequisites
profiler
protobuf
pseudorandom
py
quasirandom
queueing
rccl
rdc
reStructuredText
reformats
repos
representativeness
req
resampling
rescaling
reusability
roadmap
roc
rocAL
rocALUTION
rocBLAS
rocFFT
rocLIB
rocMLIR
rocPRIM
rocRAND
rocSOLVER
rocSPARSE
rocThrust
rocWMMA
rocalution
rocblas
rocclr
rocfft
rocm
rocminfo
rocprim
rocprof
rocprofiler
rocr
rocrand
rocsolver
rocsparse
rocthrust
roctracer
runtime
runtimes
sL
scalability
scalable
sendmsg
serializers
shader
sharding
sigmoid
sm
smi
softmax
spack
src
stochastically
strided
subdirectory
subexpression
subfolder
subfolders
supercomputing
tensorfloat
th
tokenization
tokenize
tokenized
tokenizer
tokenizes
toolchain
toolchains
toolset
toolsets
torchvision
tqdm
tracebacks
txt
uarch
uncached
uncorrectable
uninstallation
unsqueeze
unstacking
unswitching
untrusted
untuned
upvote
USM
UTCL
UTIL
utils
vL
variational
vdi
vectorizable
vectorization
vectorize
vectorized
vectorizer
vectorizes
vjxb
walkthrough
walkthroughs
wavefront
wavefronts
whitespaces
workgroup
workgroups
writeback
writebacks
wrreq
wzo
xargs
xz
yaml
ysvmadyb
zypper

View File

@@ -15,6 +15,25 @@ The release notes for the ROCm platform.
-------------------
## ROCm 5.6.1
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-duplicate-header -->
### What's New in This Release
ROCm 5.6.1 is a point release with several bug fixes in the HIP runtime.
## HIP 5.6.1 (for ROCm 5.6.1)
### Fixed Defects
- *hipMemcpy* device-to-device (intra device) is now asynchronous with respect to the host
- Enabled xnack+ check in HIP catch2 tests hang when executing tests
- Memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs
- Using *hipGraphAddMemFreeNode* no longer results in a crash
-------------------
## ROCm 5.6.0
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-duplicate-header -->

View File

@@ -15,568 +15,19 @@ The release notes for the ROCm platform.
-------------------
## ROCm 5.6.0
## ROCm 5.6.1
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-duplicate-header -->
<!-- markdownlint-disable header-increment -->
#### Release Highlights
ROCm 5.6 consists of several AI software ecosystem improvements to our fast-growing user base.A few examples include:
### What's New in This Release
- New documentation portal at https://rocm.docs.amd.com
- Ongoing software enhancements for LLMs, ensuring full compliance with the HuggingFace unit test suite
- OpenAI Triton, CuPy, HIP Graph support, and many other library performance enhancements
- Improved ROCm deployment and development tools, including CPU-GPU (rocGDB) debugger, profiler, and docker containers
- New pseudorandom generators are available in rocRAND. Added support for half-precision transforms in hipFFT/rocFFT. Added LU refactorization and linear system solver for sparse matrices in rocSOLVER.
ROCm 5.6.1 is a point release with several bug fixes in the HIP runtime.
#### OS and GPU Support Changes
## HIP 5.6.1 (for ROCm 5.6.1)
- SLES15 SP5 support was added this release. SLES15 SP3 support was dropped.
- AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively referred to as gfx906 GPUs) will be entering the maintenance mode starting Q3 2023. This will be aligned with ROCm 5.7 GA release date.
- No new features and performance optimizations will be supported for the gfx906 GPUs beyond ROCm 5.7
- Bug fixes / critical security patches will continue to be supported for the gfx906 GPUs till Q2 2024 (End of Maintenance [EOM])(will be aligned with the closest ROCm release)
- Bug fixes during the maintenance will be made to the next ROCm point release
- Bug fixes will not be back ported to older ROCm releases for this SKU
- Distro / Operating system updates will continue as per the ROCm release cadence for gfx906 GPUs till EOM.
### Fixed Defects
#### AMDSMI CLI 23.0.0.4
##### Added
- AMDSMI CLI tool enabled for Linux Bare Metal & Guest
- Package: amd-smi-lib
##### Known Issues
- not all Error Correction Code (ECC) fields are currently supported
- RHEL 8 & SLES 15 have extra install steps
#### Kernel Modules (DKMS)
##### Fixes
- Stability fix for multi GPU system reproducilble via ROCm_Bandwidth_Test as reported in [Issue 2198](https://github.com/RadeonOpenCompute/ROCm/issues/2198).
#### HIP 5.6 (For ROCm 5.6)
##### Optimizations
- Consolidation of hipamd, rocclr and OpenCL projects in clr
- Optimized lock for graph global capture mode
##### Added
- Added hipRTC support for amd_hip_fp16
- Added hipStreamGetDevice implementation to get the device associated with the stream
- Added HIP_AD_FORMAT_SIGNED_INT16 in hipArray formats
- hipArrayGetInfo for getting information about the specified array
- hipArrayGetDescriptor for getting 1D or 2D array descriptor
- hipArray3DGetDescriptor to get 3D array descriptor
##### Changed
- hipMallocAsync to return success for zero size allocation to match hipMalloc
- Separation of hipcc perl binaries from HIP project to hipcc project. hip-devel package depends on newly added hipcc package
- Consolidation of hipamd, ROCclr, and OpenCL repositories into a single repository called clr. Instructions are updated to build HIP from sources in the HIP Installation guide
- Removed hipBusBandwidth and hipCommander samples from hip-tests
##### Fixed
- Fixed regression in hipMemCpyParam3D when offset is applied
##### Known Issues
- Limited testing on xnack+ configuration
- Multiple HIP tests failures (gpuvm fault or hangs)
- hipSetDevice and hipSetDeviceFlags APIs return hipErrorInvalidDevice instead of hipErrorNoDevice, on a system without GPU
- Known memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs. Issue will be fixed in a future ROCm release
##### Upcoming changes in future release
- Removal of gcnarch from hipDeviceProp_t structure
- Addition of new fields in hipDeviceProp_t structure
- maxTexture1D
- maxTexture2D
- maxTexture1DLayered
- maxTexture2DLayered
- sharedMemPerMultiprocessor
- deviceOverlap
- asyncEngineCount
- surfaceAlignment
- unifiedAddressing
- computePreemptionSupported
- uuid
- Removal of deprecated code
- hip-hcc codes from hip code tree
- Correct hipArray usage in HIP APIs such as hipMemcpyAtoH and hipMemcpyHtoA
- HIPMEMCPY_3D fields correction (unsigned int -> size_t)
- Renaming of 'memoryType' in hipPointerAttribute_t structure to 'type'
#### ROCgdb-13 (For ROCm 5.6.0)
##### Optimized
- Improved performances when handling the end of a process with a large number of threads.
Known Issues
- On certain configurations, ROCgdb can show the following warning message:
`warning: Probes-based dynamic linker interface failed. Reverting to original interface.`
This does not affect ROCgdb's functionalities.
#### ROCprofiler (For ROCm 5.6.0)
In ROCm 5.6 the `rocprofilerv1` and `rocprofilerv2` include and library files of
ROCm 5.5 are split into separate files. The `rocmtools` files that were
deprecated in ROCm 5.5 have been removed.
| ROCm 5.6 | rocprofilerv1 | rocprofilerv2 |
|-----------------|-------------------------------------|----------------------------------------|
| **Tool script** | `bin/rocprof` | `bin/rocprofv2` |
| **API include** | `include/rocprofiler/rocprofiler.h` | `include/rocprofiler/v2/rocprofiler.h` |
| **API library** | `lib/librocprofiler.so.1` | `lib/librocprofiler.so.2` |
The ROCm Profiler Tool that uses `rocprofilerV1` can be invoked using the
following command:
```sh
$ rocprof …
```
To write a custom tool based on the `rocprofilerV1` API do the following:
```C
main.c:
#include <rocprofiler/rocprofiler.h> // Use the rocprofilerV1 API
int main() {
// Use the rocprofilerV1 API
return 0;
}
```
This can be built in the following manner:
```sh
$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64
```
The resulting `a.out` will depend on
`/opt/rocm-5.6.0/lib/librocprofiler64.so.1`.
The ROCm Profiler that uses `rocprofilerV2` API can be invoked using the
following command:
```sh
$ rocprofv2 …
```
To write a custom tool based on the `rocprofilerV2` API do the following:
```C
main.c:
#include <rocprofiler/v2/rocprofiler.h> // Use the rocprofilerV2 API
int main() {
// Use the rocprofilerV2 API
return 0;
}
```
This can be built in the following manner:
```sh
$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64-v2
```
The resulting `a.out` will depend on
`/opt/rocm-5.6.0/lib/librocprofiler64.so.2`.
##### Optimized
- Improved Test Suite
##### Added
- 'end_time' need to be disabled in roctx_trace.txt
##### Fixed
- rocprof in ROcm/5.4.0 gpu selector broken.
- rocprof in ROCm/5.4.1 fails to generate kernel info.
- rocprof clobbers LD_PRELOAD.
### Library Changes in ROCM 5.6.0
| Library | Version |
|---------|---------|
| hipBLAS | ⇒ [1.0.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.6.0) |
| hipCUB | ⇒ [2.13.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.6.0) |
| hipFFT | ⇒ [1.0.12](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.6.0) |
| hipSOLVER | ⇒ [1.8.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.6.0) |
| hipSPARSE | ⇒ [2.3.6](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.6.0) |
| MIOpen | ⇒ [2.19.0](https://github.com/ROCmSoftwarePlatform/MIOpen/releases/tag/rocm-5.6.0) |
| rccl | ⇒ [2.15.5](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.6.0) |
| rocALUTION | ⇒ [2.1.9](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.6.0) |
| rocBLAS | ⇒ [3.0.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.6.0) |
| rocFFT | ⇒ [1.0.23](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.6.0) |
| rocm-cmake | ⇒ [0.9.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.6.0) |
| rocPRIM | ⇒ [2.13.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.6.0) |
| rocRAND | ⇒ [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.6.0) |
| rocSOLVER | ⇒ [3.22.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.6.0) |
| rocSPARSE | ⇒ [2.5.2](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.6.0) |
| rocThrust | ⇒ [2.18.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.6.0) |
| rocWMMA | ⇒ [1.1.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.6.0) |
| Tensile | ⇒ [4.37.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.6.0) |
#### hipBLAS 1.0.0
hipBLAS 1.0.0 for ROCm 5.6.0
##### Changed
- added const qualifier to hipBLAS functions (swap, sbmv, spmv, symv, trsm) where missing
##### Removed
- removed support for deprecated hipblasInt8Datatype_t enum
- removed support for deprecated hipblasSetInt8Datatype and hipblasGetInt8Datatype functions
##### Deprecated
- in-place trmm is deprecated. It will be replaced by trmm which includes both in-place and
out-of-place functionality
#### hipCUB 2.13.1
hipCUB 2.13.1 for ROCm 5.6.0
##### Added
- Benchmarks for `BlockShuffle`, `BlockLoad`, and `BlockStore`.
##### Changed
- CUB backend references CUB and Thrust version 1.17.2.
- Improved benchmark coverage of `BlockScan` by adding `ExclusiveScan`, benchmark coverage of `BlockRadixSort` by adding `SortBlockedToStriped`, and benchmark coverage of `WarpScan` by adding `Broadcast`.
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
##### Known Issues
- `BlockRadixRankMatch` is currently broken under the rocPRIM backend.
- `BlockRadixRankMatch` with a warp size that does not exactly divide the block size is broken under the CUB backend.
#### hipFFT 1.0.12
hipFFT 1.0.12 for ROCm 5.6.0
##### Added
- Implemented the hipfftXtMakePlanMany, hipfftXtGetSizeMany, hipfftXtExec APIs, to allow requesting half-precision transforms.
##### Changed
- Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform.
#### hipSOLVER 1.8.0
hipSOLVER 1.8.0 for ROCm 5.6.0
##### Added
- Added compatibility API with hipsolverRf prefix
#### hipSPARSE 2.3.6
hipSPARSE 2.3.6 for ROCm 5.6.0
##### Added
- Added SpGEMM algorithms
##### Changed
- For hipsparseXbsr2csr and hipsparseXcsr2bsr, blockDim == 0 now returns HIPSPARSE_STATUS_INVALID_SIZE
#### MIOpen 2.19.0
MIOpen 2.19.0 for ROCm 5.6.0
##### Added
- ROCm 5.5 support for gfx1101 (Navi32)
##### Changed
- Tuning results for MLIR on ROCm 5.5
- Bumping MLIR commit to 5.5.0 release tag
##### Fixed
- Fix 3d convolution Host API bug
- [HOTFIX][MI200][FP16] Disabled ConvHipImplicitGemmBwdXdlops when FP16_ALT is required.
#### rccl 2.15.5
RCCL 2.15.5 for ROCm 5.6.0
##### Changed
- Compatibility with NCCL 2.15.5
- Unit test executable renamed to rccl-UnitTests
##### Added
- HW-topology aware binary tree implementation
- Experimental support for MSCCL
- New unit tests for hipGraph support
- NPKit integration
##### Fixed
- rocm-smi ID conversion
- Support for HIP_VISIBLE_DEVICES for unit tests
- Support for p2p transfers to non (HIP) visible devices
##### Removed
- Removed TransferBench from tools. Exists in standalone repo: https://github.com/ROCmSoftwarePlatform/TransferBench
#### rocALUTION 2.1.9
rocALUTION 2.1.9 for ROCm 5.6.0
##### Improved
- Fixed synchronization issues in level 1 routines
#### rocBLAS 3.0.0
rocBLAS 3.0.0 for ROCm 5.6.0
##### Optimizations
- Improved performance of Level 2 rocBLAS GEMV on gfx90a GPU for non-transposed problems having small matrices and larger batch counts. Performance enhanced for problem sizes when m and n &lt;= 32 and batch_count &gt;= 256.
- Improved performance of rocBLAS syr2k for single, double, and double-complex precision, and her2k for double-complex precision. Slightly improved performance for general sizes on gfx90a.
##### Added
- Added bf16 inputs and f32 compute support to Level 1 rocBLAS Extension functions axpy_ex, scal_ex and nrm2_ex.
##### Deprecated
- trmm inplace is deprecated. It will be replaced by trmm that has both inplace and out-of-place functionality
- rocblas_query_int8_layout_flag() is deprecated and will be removed in a future release
- rocblas_gemm_flags_pack_int8x4 enum is deprecated and will be removed in a future release
- rocblas_set_device_memory_size() is deprecated and will be replaced by a future function rocblas_increase_device_memory_size()
- rocblas_is_user_managing_device_memory() is deprecated and will be removed in a future release
##### Removed
- is_complex helper was deprecated and now removed. Use rocblas_is_complex instead.
- The enum truncate_t and the value truncate was deprecated and now removed from. It was replaced by rocblas_truncate_t and rocblas_truncate, respectively.
- rocblas_set_int8_type_for_hipblas was deprecated and is now removed.
- rocblas_get_int8_type_for_hipblas was deprecated and is now removed.
##### Dependencies
- build only dependency on python joblib added as used by Tensile build
- fix for cmake install on some OS when performed by install.sh -d --cmake_install
##### Fixed
- make trsm offset calculations 64 bit safe
##### Changed
- refactor rotg test code
#### rocFFT 1.0.23
rocFFT 1.0.23 for ROCm 5.6.0
##### Added
- Implemented half-precision transforms, which can be requested by passing rocfft_precision_half to rocfft_plan_create.
- Implemented a hierarchical solution map which saves how to decompose a problem and the kernels to be used.
- Implemented a first version of offline-tuner to support tuning kernels for C2C/Z2Z problems.
##### Changed
- Replaced std::complex with hipComplex data types for data generator.
- FFT plan dimensions are now sorted to be row-major internally where possible, which produces better plans if the dimensions were accidentally specified in a different order (column-major, for example).
- Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform.
##### Fixed
- Fixed over-allocation of LDS in some real-complex kernels, which was resulting in kernel launch failure.
#### rocm-cmake 0.9.0
rocm-cmake 0.9.0 for ROCm 5.6.0
##### Added
- Added the option ROCM_HEADER_WRAPPER_WERROR
- Compile-time C macro in the wrapper headers causes errors to be emitted instead of warnings.
- Configure-time CMake option sets the default for the C macro.
#### rocPRIM 2.13.0
rocPRIM 2.13.0 for ROCm 5.6.0
##### Added
- New block level `radix_rank` primitive.
- New block level `radix_rank_match` primitive.
- Added a stable block sorting implementation. This be used with `block_sort` by using the `block_sort_algorithm::stable_merge_sort` algorithm.
##### Changed
- Improved the performance of `block_radix_sort` and `device_radix_sort`.
- Improved the performance of `device_merge_sort`.
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core). Contributed by: [v01dXYZ](https://github.com/v01dXYZ).
##### Known Issues
- Disabled GPU error messages relating to incorrect warp operation usage with Navi GPUs on Windows, due to GPU printf performance issues on Windows.
- When `ROCPRIM_DISABLE_LOOKBACK_SCAN` is set, `device_scan` fails for input sizes bigger than `scan_config::size_limit`, which defaults to `std::numeric_limits&lt;unsigned int&gt;::max()`.
#### rocRAND 2.10.17
rocRAND 2.10.17 for ROCm 5.6.0
##### Added
- MT19937 pseudo random number generator based on M. Matsumoto and T. Nishimura, 1998, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator.
- New benchmark for the device API using Google Benchmark, `benchmark_rocrand_device_api`, replacing `benchmark_rocrand_kernel`. `benchmark_rocrand_kernel` is deprecated and will be removed in a future version. Likewise, `benchmark_curand_host_api` is added to replace `benchmark_curand_generate` and `benchmark_curand_device_api` is added to replace `benchmark_curand_kernel`.
- experimental HIP-CPU feature
- ThreeFry pseudorandom number generator based on Salmon et al., 2011, &#34;Parallel random numbers: as easy as 1, 2, 3&#34;.
##### Changed
- Python 2.7 is no longer officially supported.
#### rocSOLVER 3.22.0
rocSOLVER 3.22.0 for ROCm 5.6.0
##### Added
- LU refactorization for sparse matrices
- CSRRF_ANALYSIS
- CSRRF_SUMLU
- CSRRF_SPLITLU
- CSRRF_REFACTLU
- Linear system solver for sparse matrices
- CSRRF_SOLVE
- Added type `rocsolver_rfinfo` for use with sparse matrix routines
##### Optimized
- Improved the performance of BDSQR and GESVD when singular vectors are requested
##### Fixed
- BDSQR and GESVD should no longer hang when the input contains `NaN` or `Inf`
#### rocSPARSE 2.5.2
rocSPARSE 2.5.2 for ROCm 5.6.0
##### Improved
- Fixed a memory leak in csritsv
- Fixed a bug in csrsm and bsrsm
#### rocThrust 2.18.0
rocThrust 2.18.0 for ROCm 5.6.0
##### Fixed
- `lower_bound`, `upper_bound`, and `binary_search` failed to compile for certain types.
##### Changed
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
#### rocWMMA 1.1.0
rocWMMA 1.1.0 for ROCm 5.6.0
##### Added
- Added cross-lane operation backends (Blend, Permute, Swizzle and Dpp)
- Added GPU kernels for rocWMMA unit test pre-process and post-process operations (fill, validation)
- Added performance gemm samples for half, single and double precision
- Added rocWMMA cmake versioning
- Added vectorized support in coordinate transforms
- Included ROCm smi for runtime clock rate detection
- Added fragment transforms for transpose and change data layout
##### Changed
- Default to GPU rocBLAS validation against rocWMMA
- Re-enabled int8 gemm tests on gfx9
- Upgraded to C++17
- Restructured unit test folder for consistency
- Consolidated rocWMMA samples common code
#### Tensile 4.37.0
Tensile 4.37.0 for ROCm 5.6.0
##### Added
- Added user driven tuning API
- Added decision tree fallback feature
- Added SingleBuffer + AtomicAdd option for GlobalSplitU
- DirectToVgpr support for fp16 and Int8 with TN orientation
- Added new test cases for various functions
- Added SingleBuffer algorithm for ZGEMM/CGEMM
- Added joblib for parallel map calls
- Added support for MFMA + LocalSplitU + DirectToVgprA+B
- Added asmcap check for MIArchVgpr
- Added support for MFMA + LocalSplitU
- Added frequency, power, and temperature data to the output
##### Optimizations
- Improved the performance of GlobalSplitU with SingleBuffer algorithm
- Reduced the running time of the extended and pre_checkin tests
- Optimized the Tailloop section of the assembly kernel
- Optimized complex GEMM (fixed vgpr allocation, unified CGEMM and ZGEMM code in MulMIoutAlphaToArch)
- Improved the performance of the second kernel of MultipleBuffer algorithm
##### Changed
- Updated custom kernels with 64-bit offsets
- Adapted 64-bit offset arguments for assembly kernels
- Improved temporary register re-use to reduce max sgpr usage
- Removed some restrictions on VectorWidth and DirectToVgpr
- Updated the dependency requirements for Tensile
- Changed the range of AssertSummationElementMultiple
- Modified the error messages for more clarity
- Changed DivideAndReminder to vectorStaticRemainder in case quotient is not used
- Removed dummy vgpr for vectorStaticRemainder
- Removed tmpVgpr parameter from vectorStaticRemainder/Divide/DivideAndReminder
- Removed qReg parameter from vectorStaticRemainder
##### Fixed
- Fixed tmp sgpr allocation to avoid over-writing values (alpha)
- 64-bit offset parameters for post kernels
- Fixed gfx908 CI test failures
- Fixed offset calculation to prevent overflow for large offsets
- Fixed issues when BufferLoad and BufferStore are equal to zero
- Fixed StoreCInUnroll + DirectToVgpr + no useInitAccVgprOpt mismatch
- Fixed DirectToVgpr + LocalSplitU + FractionalLoad mismatch
- Fixed the memory access error related to StaggerU + large stride
- Fixed ZGEMM 4x4 MatrixInst mismatch
- Fixed DGEMM 4x4 MatrixInst mismatch
- Fixed ASEM + GSU + NoTailLoop opt mismatch
- Fixed AssertSummationElementMultiple + GlobalSplitU issues
- Fixed ASEM + GSU + TailLoop inner unroll
- *hipMemcpy* device-to-device (intra device) is now asynchronous with respect to the host
- Enabled xnack+ check in HIP catch2 tests hang when executing tests
- Memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs
- Using *hipGraphAddMemFreeNode* no longer results in a crash

View File

@@ -12,7 +12,7 @@ fetch="https://github.com/GPUOpen-ProfessionalCompute-Libraries/" />
fetch="https://github.com/GPUOpen-Tools/" />
<remote name="KhronosGroup"
fetch="https://github.com/KhronosGroup/" />
<default revision="refs/tags/rocm-5.6.0"
<default revision="refs/tags/rocm-5.6.1"
remote="roc-github"
sync-c="true"
sync-j="4" />

View File

@@ -5,9 +5,9 @@ Documentation is built using open source toolchains. Contributions to our
documentation is encouraged and welcome. As a contributor, please familiarize
yourself with our documentation toolchain.
## ReadTheDocs
## Read The Docs
[ReadTheDocs](https://docs.readthedocs.io/en/stable/) is our front end for the
[Read the Docs](https://docs.readthedocs.io/en/stable/) is our front end for the
our documentation. By front end, this is the tool that serves our HTML based
documentation to our end users.

View File

@@ -20,8 +20,8 @@ latex_engine = "xelatex"
project = "ROCm Documentation"
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved."
version = "5.6.0"
release = "5.6.0"
version = "5.6.1"
release = "5.6.1"
setting_all_article_info = True
@@ -86,7 +86,7 @@ article_pages = [
external_toc_path = "./sphinx/_toc.yml"
docs_core = ROCmDocs("ROCm Documentation Home")
docs_core = ROCmDocs("ROCm 5.6.1 Documentation Home")
docs_core.setup()
external_projects_current_project = "rocm"

View File

@@ -18,8 +18,8 @@ following commands based on your distribution.
```shell
sudo apt update
wget https://repo.radeon.com/amdgpu-install/5.6/ubuntu/focal/amdgpu-install_5.6.50600-1_all.deb
sudo apt install ./amdgpu-install_5.6.50600-1_all.deb
wget https://repo.radeon.com/amdgpu-install/5.6.1/ubuntu/focal/amdgpu-install_5.6.50601-1_all.deb
sudo apt install ./amdgpu-install_5.6.50601-1_all.deb
```
:::
@@ -28,8 +28,8 @@ sudo apt install ./amdgpu-install_5.6.50600-1_all.deb
```shell
sudo apt update
wget https://repo.radeon.com/amdgpu-install/5.6/ubuntu/jammy/amdgpu-install_5.6.50600-1_all.deb
sudo apt install ./amdgpu-install_5.6.50600-1_all.deb
wget https://repo.radeon.com/amdgpu-install/5.6.1/ubuntu/jammy/amdgpu-install_5.6.50601-1_all.deb
sudo apt install ./amdgpu-install_5.6.50601-1_all.deb
```
:::
@@ -39,21 +39,12 @@ sudo apt install ./amdgpu-install_5.6.50600-1_all.deb
:sync: RHEL
::::{tab-set}
:::{tab-item} RHEL 8.6
:sync: RHEL-8.6
:sync: RHEL-8
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/5.6/rhel/8.6/amdgpu-install-5.6.50600-1.el8.noarch.rpm
```
:::
:::{tab-item} RHEL 8.7
:sync: RHEL-8.7
:sync: RHEL-8
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/5.6/rhel/8.7/amdgpu-install-5.6.50600-1.el8.noarch.rpm
sudo yum install https://repo.radeon.com/amdgpu-install/5.6.1/rhel/8.7/amdgpu-install-5.6.50601-1.el8.noarch.rpm
```
:::
@@ -62,7 +53,7 @@ sudo yum install https://repo.radeon.com/amdgpu-install/5.6/rhel/8.7/amdgpu-inst
:sync: RHEL-8
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/8.8/amdgpu-install-5.5.50501-1.el8.noarch.rpm
sudo yum install https://repo.radeon.com/amdgpu-install/5.6.1/rhel/8.8/amdgpu-install-5.6.50601-1.el8.noarch.rpm
```
:::
@@ -71,7 +62,7 @@ sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/8.8/amdgpu-in
:sync: RHEL-9
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/5.6/rhel/9.1/amdgpu-install-5.6.50600-1.el8.noarch.rpm
sudo yum install https://repo.radeon.com/amdgpu-install/5.6.1/rhel/9.1/amdgpu-install-5.6.50601-1.el9.noarch.rpm
```
:::
@@ -80,7 +71,7 @@ sudo yum install https://repo.radeon.com/amdgpu-install/5.6/rhel/9.1/amdgpu-inst
:sync: RHEL-9
```shell
sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/9.2/amdgpu-install-5.5.50501-1.el8.noarch.rpm
sudo yum install https://repo.radeon.com/amdgpu-install/5.6.1/rhel/9.2/amdgpu-install-5.6.50601-1.el9.noarch.rpm
```
:::
@@ -94,7 +85,7 @@ sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/9.2/amdgpu-in
:sync: SLES-15.4
```shell
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.6/sle/15.4/amdgpu-install-5.6.50600-1.noarch.rpm
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.6.1/sle/15.4/amdgpu-install-5.6.50601-1.noarch.rpm
```
:::
@@ -102,7 +93,7 @@ sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.6/s
:sync: SLES-15.5
```shell
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.6/sle/15.5/amdgpu-install-5.6.50600-1.noarch.rpm
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.6.1/sle/15.5/amdgpu-install-5.6.50601-1.noarch.rpm
```
:::
@@ -202,8 +193,8 @@ Run the following commands based on your distribution to add the repositories:
:sync: ubuntu-20.04
```shell
for ver in 5.3.3 5.4.3; do
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" | sudo tee /etc/apt/sources.list.d/rocm.list
for ver in 5.4.3 5.5.1; do
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" | sudo tee --append /etc/apt/sources.list.d/rocm.list
done
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt update
@@ -214,8 +205,8 @@ sudo apt update
:sync: ubuntu-22.04
```shell
for ver in 5.3.3 5.4.3; do
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" | sudo tee /etc/apt/sources.list.d/rocm.list
for ver in 5.4.3 5.5.1; do
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" | sudo tee --append /etc/apt/sources.list.d/rocm.list
done
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
sudo apt update
@@ -232,7 +223,7 @@ sudo apt update
:sync: RHEL-8
```shell
for ver in 5.3.3 5.4.3; do
for ver in 5.4.3 5.5.1; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -251,7 +242,7 @@ sudo yum clean all
:sync: RHEL-9
```shell
for ver in 5.3.3 5.4.3; do
for ver in 5.4.3 5.5.1; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -272,7 +263,7 @@ sudo yum clean all
:sync: SLES
```shell
for ver in 5.3.3 5.4.3; do
for ver in 5.4.3 5.5.1; do
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
name=rocm
baseurl=https://repo.radeon.com/rocm/zyp/$ver/main
@@ -302,8 +293,8 @@ driver, associated with the ROCm release v5.4.3, will be installed as its latest
release in the list.
```none
sudo amdgpu-install --usecase=rocm --rocmrelease=5.3.3
sudo amdgpu-install --usecase=rocm --rocmrelease=5.4.3
sudo amdgpu-install --usecase=rocm --rocmrelease=5.5.1
```
## Additional options

View File

@@ -52,8 +52,11 @@ To add the AMDGPU repository, follow these steps:
:sync: ubuntu-20.04
```shell
# version
ver=5.6.1
# amdgpu repository for focal
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6/ubuntu focal main' \
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/$ver/ubuntu focal main" \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
@@ -63,8 +66,11 @@ sudo apt update
:sync: ubuntu-22.04
```shell
# version
ver=5.6.1
# amdgpu repository for jammy
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6/ubuntu jammy main' \
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/$ver/ubuntu jammy main" \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
@@ -91,7 +97,7 @@ To add the ROCm repository, use the following steps:
```shell
# ROCm repositories for focal
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
for ver in 5.3.3 5.4.6 5.5.3 5.6.1; do
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" \
| sudo tee --append /etc/apt/sources.list.d/rocm.list
done
@@ -106,7 +112,7 @@ sudo apt update
```shell
# ROCm repositories for jammy
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
for ver in 5.3.3 5.4.6 5.5.3 5.6.1; do
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" \
| sudo tee --append /etc/apt/sources.list.d/rocm.list
done
@@ -136,7 +142,7 @@ For a comprehensive list of meta-packages, refer to
- Sample Multi-version installation
```shell
sudo apt install rocm-hip-sdk5.6 rocm-hip-sdk5.3.3
sudo apt install rocm-hip-sdk5.6.1 rocm-hip-sdk5.5.3
```
:::::
@@ -152,34 +158,18 @@ section.
```
::::{tab-set}
:::{tab-item} RHEL 8.6
:sync: RHEL-8.6
:sync: RHEL-8
```shell
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/8.6/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 8.7
:sync: RHEL-8.7
:sync: RHEL-8
```shell
# version
ver=5.6.1
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/8.7/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/$ver/rhel/8.7/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -195,10 +185,13 @@ sudo yum clean all
:sync: RHEL-8
```shell
# version
ver=5.6.1
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/8.8/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/$ver/rhel/8.8/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -214,10 +207,13 @@ sudo yum clean all
:sync: RHEL-9
```shell
# version
ver=5.6.1
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/9.1/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/$ver/rhel/9.1/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -233,10 +229,13 @@ sudo yum clean all
:sync: RHEL-9
```shell
# version
ver=5.6.1
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/9.2/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/$ver/rhel/9.2/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -266,7 +265,7 @@ To add the ROCm repository, use the following steps, based on your distribution:
:sync: RHEL-8
```shell
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
for ver in 5.3.3 5.4.6 5.5.3 5.6.1; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -285,7 +284,7 @@ sudo yum clean all
:sync: RHEL-9
```shell
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
for ver in 5.3.3 5.4.6 5.5.3 5.6.1; do
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -320,7 +319,7 @@ For a comprehensive list of meta-packages, refer to
- Sample Multi-version installation
```shell
sudo yum install rocm-hip-sdk5.6 rocm-hip-sdk5.3.3
sudo yum install rocm-hip-sdk5.6.1 rocm-hip-sdk5.5.3
```
:::::
@@ -340,10 +339,13 @@ section.
:sync: SLES-15.4
```shell
# version
ver=5.6.1
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/sle/15.4/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/$ver/sle/15.4/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
@@ -356,10 +358,13 @@ sudo zypper ref
:sync: SLES-15.5
```shell
# version
ver=5.6.1
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/sle/15.5/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/$ver/sle/15.5/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
@@ -384,7 +389,7 @@ sudo reboot
To add the ROCm repository, use the following steps:
```shell
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
for ver in 5.3.3 5.4.6 5.5.3 5.6.1; do
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
[ROCm-$ver]
name=ROCm$ver
@@ -416,7 +421,7 @@ For a comprehensive list of meta-packages, refer to
- Sample Multi-version installation
```shell
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk5.6 rocm-hip-sdk5.3.3
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk5.6.1 rocm-hip-sdk5.5.3
```
:::::
@@ -453,7 +458,7 @@ but are generally useful. Verification of the install is advised.
2. Add binary paths to the `PATH` environment variable.
```shell
export PATH=$PATH:/opt/rocm/bin:/opt/rocm-5.6/opencl/bin
export PATH=$PATH:/opt/rocm-5.6.1/bin:/opt/rocm-5.6.1/opencl/bin
```
```{attention}

View File

@@ -25,8 +25,11 @@ repository to the new release.
:sync: ubuntu-20.04
```shell
# version
version=5.6.1
# amdgpu repository for focal
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6/ubuntu focal main' \
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/$version/ubuntu focal main" \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
@@ -36,8 +39,11 @@ sudo apt update
:sync: ubuntu-22.04
```shell
# version
version=5.6.1
# amdgpu repository for jammy
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6/ubuntu jammy main' \
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/$version/ubuntu jammy main" \
| sudo tee /etc/apt/sources.list.d/amdgpu.list
sudo apt update
```
@@ -49,33 +55,18 @@ sudo apt update
:sync: RHEL
::::{tab-set}
:::{tab-item} RHEL 8.6
:sync: RHEL-8.6
:sync: RHEL-8
```shell
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/8.6/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
sudo yum clean all
```
:::
:::{tab-item} RHEL 8.7
:sync: RHEL-8.7
:sync: RHEL-8
```shell
# version
version=5.6.1
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/8.7/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/8.7/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -90,10 +81,13 @@ sudo yum clean all
:sync: RHEL-8
```shell
# version
version=5.6.1
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/8.8/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/8.8/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -108,10 +102,13 @@ sudo yum clean all
:sync: RHEL-9
```shell
# version
version=5.6.1
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/9.1/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/9.1/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -126,10 +123,13 @@ sudo yum clean all
:sync: RHEL-9
```shell
# version
version=5.6.1
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/9.2/main/x86_64/
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/9.2/main/x86_64/
enabled=1
priority=50
gpgcheck=1
@@ -149,10 +149,13 @@ sudo yum clean all
:sync: SLES-15.4
```shell
# version
version=5.6.1
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/5.6/sle/15.4/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/$version/sle/15.4/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
@@ -230,7 +233,10 @@ repository to the new release.
:sync: ubuntu-20.04
```shell
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.6 focal main" \
# version
version=5.6.1
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$version focal main" \
| sudo tee /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
@@ -242,7 +248,10 @@ sudo apt update
:sync: ubuntu-22.04
```shell
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.6 jammy main" \
# version
version=5.6.1
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$version jammy main" \
| sudo tee /etc/apt/sources.list.d/rocm.list
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
| sudo tee /etc/apt/preferences.d/rocm-pin-600
@@ -260,10 +269,13 @@ sudo apt update
:sync: RHEL-8
```shell
# version
version=5.6.1
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-5.6]
name=ROCm5.6
baseurl=https://repo.radeon.com/rocm/rhel8/5.6/main
[ROCm-$ver]
name=ROCm$ver
baseurl=https://repo.radeon.com/rocm/rhel8/$version/main
enabled=1
priority=50
gpgcheck=1
@@ -277,10 +289,13 @@ sudo yum clean all
:sync: RHEL-9
```shell
# version
version=5.6.1
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
[ROCm-5.6]
name=ROCm5.6
baseurl=https://repo.radeon.com/rocm/rhel9/5.6/main
[ROCm-$ver]
name=ROCm$ver
baseurl=https://repo.radeon.com/rocm/rhel9/$version/main
enabled=1
priority=50
gpgcheck=1
@@ -296,11 +311,14 @@ sudo yum clean all
:sync: SLES
```shell
# version
version=5.6.1
sudo tee /etc/zypp/repos.d/rocm.repo <<EOF
[ROCm-5.6]
name=ROCm5.6
[ROCm-$ver]
name=ROCm$ver
name=rocm
baseurl=https://repo.radeon.com/rocm/zyp/5.6/main
baseurl=https://repo.radeon.com/rocm/zyp/$version/main
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key

View File

@@ -116,12 +116,16 @@ sudo crb enable
Add the perl languages repository.
```{note}
Mar 25, 2024: We currently need to install the Perl module from SLES 15 SP5 as a workaround. The module was removed for SLES 15 SP4.
```
::::{tab-set}
:::{tab-item} SLES 15.4
:sync: SLES-15.4
```shell
zypper addrepo https://download.opensuse.org/repositories/devel:languages:perl/SLE_15_SP4/devel:languages:perl.repo
zypper addrepo https://download.opensuse.org/repositories/devel:/languages:/perl/15.5/devel:languages:perl.repo
```
:::

View File

@@ -29,11 +29,11 @@ wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
```shell
# Kernel driver repository for focal
sudo tee /etc/apt/sources.list.d/amdgpu.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/latest/ubuntu focal main
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6.1/ubuntu focal main
EOF
# ROCm repository for focal
sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/debian focal main
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.6.1 focal main
EOF
```
@@ -44,11 +44,11 @@ EOF
```shell
# Kernel driver repository for jammy
sudo tee /etc/apt/sources.list.d/amdgpu.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/latest/ubuntu jammy main
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6.1/ubuntu jammy main
EOF
# ROCm repository for jammy
sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF'
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/debian jammy main
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.6.1 jammy main
EOF
# Prefer packages from the rocm repository over system packages
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
@@ -73,33 +73,6 @@ sudo apt update
::::
::::{tab-set}
:::{tab-item} RHEL 8.6
:sync: RHEL-8.6
```shell
# Add the amdgpu module repository for RHEL 8.6
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/8.6/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
# Add the rocm repository for RHEL 8
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/rhel8/latest/main
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
```
:::
:::{tab-item} RHEL 8.7
:sync: RHEL-8.7
@@ -108,7 +81,7 @@ EOF
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/8.7/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/5.6.1/rhel/8.7/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
@@ -117,7 +90,7 @@ EOF
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/rhel8/latest/main
baseurl=https://repo.radeon.com/rocm/rhel8/5.6.1/main
enabled=1
priority=50
gpgcheck=1
@@ -135,7 +108,7 @@ EOF
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/8.8/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/5.6.1/rhel/8.8/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
@@ -144,7 +117,7 @@ EOF
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/rhel8/latest/main
baseurl=https://repo.radeon.com/rocm/rhel8/5.6.1/main
enabled=1
priority=50
gpgcheck=1
@@ -162,7 +135,7 @@ EOF
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/9.1/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/5.6.1/rhel/9.1/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
@@ -171,7 +144,7 @@ EOF
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/rhel9/latest/main
baseurl=https://repo.radeon.com/rocm/rhel9/5.6.1/main
enabled=1
priority=50
gpgcheck=1
@@ -189,7 +162,7 @@ EOF
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/latest/rhel/9.2/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/5.6.1/rhel/9.2/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
@@ -198,7 +171,7 @@ EOF
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
[rocm]
name=rocm
baseurl=https://repo.radeon.com/rocm/rhel9/latest/main
baseurl=https://repo.radeon.com/rocm/rhel9/5.6.1/main
enabled=1
priority=50
gpgcheck=1
@@ -234,7 +207,7 @@ sudo yum clean all
sudo tee /etc/zypp/repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/latest/sle/15.4/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/5.6.1/sle/15.4/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
@@ -261,7 +234,7 @@ EOF
sudo tee /etc/zypp/repos.d/amdgpu.repo <<'EOF'
[amdgpu]
name=amdgpu
baseurl=https://repo.radeon.com/amdgpu/latest/sle/15.5/main/x86_64
baseurl=https://repo.radeon.com/amdgpu/5.6.1/sle/15.5/main/x86_64
enabled=1
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key

View File

@@ -24,7 +24,7 @@ MIGraphX is a graph compiler focused on accelerating the Machine Learning infere
After doing all these transformations, MIGraphX emits code for the AMD GPU by calling to MIOpen or rocBLAS or creating HIP kernels for a particular operator. MIGraphX can also target CPUs using DNNL or ZenDNN libraries.
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using MIGraphX's C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using MIGraphX C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
- Number of arguments
@@ -187,7 +187,7 @@ Follow these steps:
}
```
2. To compile this program, you can use CMake and you only need to link the `migraphx::c` library to use MIGraphX's C++ API. The following is the `CMakeLists.txt` file that can build the earlier example:
2. To compile this program, you can use CMake and you only need to link the `migraphx::c` library to use the MIGraphX C++ API. The following is the `CMakeLists.txt` file that can build the earlier example:
```cmake
cmake_minimum_required(VERSION 3.5)
@@ -327,7 +327,7 @@ To run generated `.mxr` files through `migraphx-driver`, use the following:
./path/to/migraphx-driver run --migraphx resnet50.mxr --enable-offload-copy
```
Alternatively, you can use MIGraphX's C++ or Python API to generate `.mxr` file. Refer to {numref}`image018` for an example.
Alternatively, you can use the MIGraphX C++ or Python API to generate `.mxr` file. Refer to {numref}`image018` for an example.
```{figure} ../../data/understand/deep_learning/image.018.png
:name: image018

View File

@@ -299,7 +299,7 @@ USE_ROCM=1 MAX_JOBS=4 python3 setup.py install --user
### Test the PyTorch Installation
You can use PyTorch unit tests to validate a PyTorch installation. If using a
prebuilt PyTorch Docker image from AMD ROCm DockerHub or installing an official
prebuilt PyTorch Docker image from AMD ROCm Docker Hub or installing an official
wheels package, these tests are already run on those configurations.
Alternatively, you can manually run the unit tests to validate the PyTorch
installation fully.

View File

@@ -8,7 +8,7 @@ The software support matrices for ROCm container releases is listed.
#### `Ubuntu+ rocm5.6_internal_testing +169530b`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6.1/)
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
* [Torch 2.0.0](https://github.com/ROCmSoftwarePlatform/pytorch/tree/rocm5.6_internal_testing)
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
@@ -21,7 +21,7 @@ The software support matrices for ROCm container releases is listed.
#### `CentOS7+ rocm5.6_internal_testing +169530b`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6.1/)
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
* [Torch 2.0.0](https://github.com/ROCmSoftwarePlatform/pytorch/tree/rocm5.6_internal_testing)
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
@@ -31,7 +31,7 @@ The software support matrices for ROCm container releases is listed.
#### `1.13 +bfeb431`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6.1/)
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
* [Torch 1.13.1](https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.13)
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
@@ -44,7 +44,7 @@ The software support matrices for ROCm container releases is listed.
#### `1.12 +05d5d04`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6.1/)
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
* [Torch 1.12.1](https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.12)
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
@@ -59,7 +59,7 @@ The software support matrices for ROCm container releases is listed.
#### `tensorflow_develop-upstream-QA-rocm56 +c88a9f4`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6.1/)
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
* `tensorflow-rocm` 2.13.0
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
@@ -69,7 +69,7 @@ The software support matrices for ROCm container releases is listed.
#### `r2.11-rocm-enhanced +5be4141`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6.1/)
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
* [`tensorflow-rocm` 2.11.0](https://pypi.org/project/tensorflow-rocm/2.11.0.540/)
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
@@ -79,7 +79,7 @@ The software support matrices for ROCm container releases is listed.
#### `r2.10-rocm-enhanced +72789a3`
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6.1/)
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
* [`tensorflow-rocm` 2.10.1](https://pypi.org/project/tensorflow-rocm/2.10.1.540/)
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)

View File

@@ -60,9 +60,11 @@ ROCm supports virtualization for select GPUs only as shown below.
(supported_gpus)=
## Supported GPUs
## Linux Supported GPUs
The following table shows the list of GPUs supported on Linux distributions:
The table below shows supported GPUs for Instinct™, Radeon Pro™ and Radeon™
GPUs. Please click the tabs below to switch between GPU product lines. If a GPU
is not listed on this table, the GPU is not officially supported by AMD.
::::{tab-set}

View File

@@ -5,7 +5,6 @@ The following table is a list of ROCm components with links to their respective
terms. These components may include third party components subject to
additional licenses. Please review individual repositories for more information.
The table shows ROCm components, the name of license and link to the license terms.
The table is ordered to follow ROCm's manifest file.
<!-- spellcheck-disable -->
| Component | License |

View File

@@ -12,7 +12,11 @@ AMD ROCm™ Platform supports the following Windows SKU.
| Windows 11 | x86-64 | 22H2 (GA) |
| Windows Server 2022 | x86-64 | |
## GPU Support Table
## Windows Supported GPUs
The table below shows supported GPUs for Radeon Pro™ and Radeon™ GPUs. Please
click the tabs below to switch between GPU product lines. If a GPU is not listed
on this table, the GPU is not officially supported by AMD.
::::{tab-set}
@@ -21,12 +25,12 @@ AMD ROCm™ Platform supports the following Windows SKU.
| Name | Architecture |[LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Runtime | HIP SDK |
|:----:|:------------:|:--------------------------------------------------------------------:|:-------:|:----------------:|
| AMD Radeon Pro W7900 | RDNA3 | gfx1100 | ✅ | ✅ |
| AMD Radeon Pro W7800 | RDNA3 | gfx1100 | ✅ | ✅ |
| AMD Radeon Pro W6800 | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon Pro W6600 | RDNA2 | gfx1032 | ✅ | ❌ |
| AMD Radeon Pro W5500 | RDNA1 | gfx1012 | ❌ | ❌ |
| AMD Radeon Pro VII | GCN5.1 | gfx906 | ❌ | ❌ |
| AMD Radeon Pro W7900 | RDNA3 | gfx1100 | ✅ | ✅ |
| AMD Radeon Pro W7800 | RDNA3 | gfx1100 | ✅ | ✅ |
| AMD Radeon Pro W6800 | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon Pro W6600 | RDNA2 | gfx1032 | ✅ | ❌ |
| AMD Radeon Pro W5500 | RDNA1 | gfx1012 | ❌ | ❌ |
| AMD Radeon Pro VII | GCN5.1 | gfx906 | ❌ | ❌ |
:::
@@ -36,13 +40,18 @@ AMD ROCm™ Platform supports the following Windows SKU.
| Name | Architecture | [LLVM Target](https://www.llvm.org/docs/AMDGPUUsage.html#processors) | Runtime | HIP SDK |
|:----:|:------------:|:--------------------------------------------------------------------:|:-------:|:----------------:|
| AMD Radeon™ RX 7900 XTX | RDNA3 | gfx1100 | ✅ | ✅ |
| AMD Radeon™ RX 7900 XT | RDNA3 | gfx1100 | | |
| AMD Radeon™ RX 6950 XT | RDNA2 | gfx1030 | | |
| AMD Radeon™ RX 7900 XT | RDNA3 | gfx1100 | | |
| AMD Radeon™ RX 7600 | RDNA3 | gfx1100 | | |
| AMD Radeon™ RX 6950 XT | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon™ RX 6900 XT | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon™ RX 6800 XT | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon™ RX 6800 | RDNA2 | gfx1030 | ✅ | ✅ |
| AMD Radeon™ RX 6750 | RDNA2 | gfx1032 | ✅ | ❌ |
| AMD Radeon™ RX 6700 XT | RDNA2 | gfx1032 | ✅ | ❌ |
| AMD Radeon™ RX 6700 | RDNA2 | gfx1032 | ✅ | ❌ |
| AMD Radeon™ RX 6600 | RDNA2 | gfx1032 | | ❌ |
| AMD Radeon™ RX 6650 XT | RDNA2 | gfx1032 | | ❌ |
| AMD Radeon™ RX 6600 XT | RDNA2 | gfx1032 | ✅ | ❌ |
| AMD Radeon™ RX 6600 | RDNA2 | gfx1032 | ✅ | ❌ |
:::

View File

@@ -18,7 +18,7 @@ integrated into ML frameworks such as PyTorch and TensorFlow. ROCm can be
deployed in many ways, including through the use of containers such as Docker,
Spack, and your own build from source.
ROCms goal is to allow our users to maximize their GPU hardware investment.
The goal of ROCm is to allow our users to maximize their GPU hardware investment.
ROCm is designed to help develop, test and deploy GPU accelerated HPC, AI,
scientific computing, CAD, and other applications in a free, open-source,
integrated and secure software ecosystem.

View File

@@ -1 +1,2 @@
rocm-docs-core==0.18.3
rocm-docs-core==1.8.0
sphinx-reredirects

View File

@@ -1,118 +1,106 @@
#
# This file is autogenerated by pip-compile with Python 3.11
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
# pip-compile --resolver=backtracking requirements.in
# pip-compile requirements.in
#
accessible-pygments==0.0.3
accessible-pygments==0.0.5
# via pydata-sphinx-theme
alabaster==0.7.13
alabaster==1.0.0
# via sphinx
babel==2.11.0
babel==2.16.0
# via
# pydata-sphinx-theme
# sphinx
beautifulsoup4==4.11.2
beautifulsoup4==4.12.3
# via pydata-sphinx-theme
breathe==4.34.0
breathe==4.35.0
# via rocm-docs-core
certifi==2022.12.7
certifi==2024.8.30
# via requests
cffi==1.15.1
cffi==1.17.1
# via
# cryptography
# pynacl
charset-normalizer==2.1.1
charset-normalizer==3.3.2
# via requests
click==8.1.3
click==8.1.7
# via sphinx-external-toc
colorama==0.4.6
# via
# click
# sphinx
cryptography==40.0.2
cryptography==43.0.1
# via pyjwt
deprecated==1.2.13
deprecated==1.2.14
# via pygithub
docutils==0.19
docutils==0.21.2
# via
# breathe
# myst-parser
# pydata-sphinx-theme
# sphinx
fastjsonschema==2.16.3
fastjsonschema==2.20.0
# via rocm-docs-core
gitdb==4.0.10
gitdb==4.0.11
# via gitpython
gitpython==3.1.30
gitpython==3.1.43
# via rocm-docs-core
idna==3.4
idna==3.10
# via requests
imagesize==1.4.1
# via sphinx
importlib-metadata==6.7.0
# via sphinx
importlib-resources==5.12.0
# via rocm-docs-core
jinja2==3.1.2
jinja2==3.1.4
# via
# myst-parser
# sphinx
linkify-it-py==1.0.3
# via myst-parser
markdown-it-py==2.2.0
markdown-it-py==3.0.0
# via
# mdit-py-plugins
# myst-parser
markupsafe==2.1.2
markupsafe==2.1.5
# via jinja2
mdit-py-plugins==0.3.4
mdit-py-plugins==0.4.2
# via myst-parser
mdurl==0.1.2
# via markdown-it-py
myst-parser[linkify]==1.0.0
myst-parser==4.0.0
# via rocm-docs-core
packaging==23.0
packaging==24.1
# via
# pydata-sphinx-theme
# sphinx
pycparser==2.21
pycparser==2.22
# via cffi
pydata-sphinx-theme==0.13.3
pydata-sphinx-theme==0.15.4
# via
# rocm-docs-core
# sphinx-book-theme
pygithub==1.58.1
pygithub==2.4.0
# via rocm-docs-core
pygments==2.14.0
pygments==2.18.0
# via
# accessible-pygments
# pydata-sphinx-theme
# sphinx
pyjwt[crypto]==2.6.0
pyjwt[crypto]==2.9.0
# via pygithub
pynacl==1.5.0
# via pygithub
pytz==2022.7.1
# via babel
pyyaml==6.0
pyyaml==6.0.2
# via
# myst-parser
# rocm-docs-core
# sphinx-external-toc
requests==2.28.1
requests==2.32.3
# via
# pygithub
# sphinx
rocm-docs-core==0.18.3
rocm-docs-core==1.8.0
# via -r requirements.in
smmap==5.0.0
smmap==5.0.1
# via gitdb
snowballstemmer==2.2.0
# via sphinx
soupsieve==2.4
soupsieve==2.6
# via beautifulsoup4
sphinx==5.3.0
sphinx==8.0.2
# via
# breathe
# myst-parser
@@ -123,37 +111,40 @@ sphinx==5.3.0
# sphinx-design
# sphinx-external-toc
# sphinx-notfound-page
sphinx-book-theme==1.0.1
# sphinx-reredirects
sphinx-book-theme==1.1.3
# via rocm-docs-core
sphinx-copybutton==0.5.1
sphinx-copybutton==0.5.2
# via rocm-docs-core
sphinx-design==0.4.1
sphinx-design==0.6.1
# via rocm-docs-core
sphinx-external-toc==0.3.1
sphinx-external-toc==1.0.1
# via rocm-docs-core
sphinx-notfound-page==0.8.3
sphinx-notfound-page==1.0.4
# via rocm-docs-core
sphinxcontrib-applehelp==1.0.4
sphinx-reredirects==0.1.5
# via -r requirements.in
sphinxcontrib-applehelp==2.0.0
# via sphinx
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-devhelp==2.0.0
# via sphinx
sphinxcontrib-htmlhelp==2.0.1
sphinxcontrib-htmlhelp==2.1.0
# via sphinx
sphinxcontrib-jsmath==1.0.1
# via sphinx
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-qthelp==2.0.0
# via sphinx
sphinxcontrib-serializinghtml==1.1.5
sphinxcontrib-serializinghtml==2.0.0
# via sphinx
typing-extensions==4.5.0
# via pydata-sphinx-theme
uc-micro-py==1.0.1
# via linkify-it-py
urllib3==1.26.13
# via requests
wrapt==1.14.1
# via deprecated
zipp==3.15.0
tomli==2.0.1
# via sphinx
typing-extensions==4.12.2
# via
# importlib-metadata
# importlib-resources
# pydata-sphinx-theme
# pygithub
urllib3==2.2.3
# via
# pygithub
# requests
wrapt==1.16.0
# via deprecated

View File

@@ -13,20 +13,20 @@ The full list of HSA system architecture platform requirements are here: `HSA Sy
The ROCm Platform uses the new PCI Express 3.0 (PCIe 3.0) features for Atomic Read-Modify-Write Transactions which extends inter-processor synchronization mechanisms to IO to support the defined set of HSA capabilities needed for queuing and signaling memory operations.
The new PCIe AtomicOps operate as completers for ``CAS`` (Compare and Swap), ``FetchADD``, ``SWAP`` atomics. The AtomicsOps are initiated by the
The new PCIe atomic operations operate as completers for ``CAS`` (Compare and Swap), ``FetchADD``, ``SWAP`` atomics. The atomic operations are initiated by the
I/O device which support 32-bit, 64-bit and 128-bit operand which target address have to be naturally aligned to operation sizes.
For ROCm the Platform atomics are used in ROCm in the following ways:
* Update HSA queues read_dispatch_id: 64 bit atomic add used by the command processor on the GPU agent to update the packet ID it processed.
* Update HSA queues read_dispatch_id: 64 bit atomic add used by the command processor on the GPU agent to update the packet ID it processed.
* Update HSA queues write_dispatch_id: 64 bit atomic add used by the CPU and GPU agent to support multi-writer queue insertions.
* Update HSA Signals 64bit atomic ops are used for CPU & GPU synchronization.
The PCIe 3.0 AtomicOp feature allows atomic transactions to be requested by, routed through and completed by PCIe components. Routing and completion does not require software support. Component support for each is detectable via the DEVCAP2 register. Upstream bridges need to have AtomicOp routing enabled or the Atomic Operations will fail even though PCIe endpoint and PCIe I/O Devices has the capability to Atomics Operations.
The PCIe 3.0 atomic operations feature allows atomic transactions to be requested by, routed through and completed by PCIe components. Routing and completion does not require software support. Component support for each is detectable via the DEVCAP2 register. Upstream bridges need to have atomic operations routing enabled or the Atomic Operations will fail even though PCIe endpoint and PCIe I/O Devices has the capability to Atomics Operations.
To do AtomicOp routing capability between two or more Root Ports, each associated Root Port must indicate that capability via the AtomicOp Routing Supported bit in the Device Capabilities 2 register.
To do atomic operations routing capability between two or more Root Ports, each associated Root Port must indicate that capability via the atomic operations routing supported bit in the Device Capabilities 2 register.
If your system has a PCIe Express Switch it needs to support AtomicsOp routing. Again AtomicOp requests are permitted only if a components ``DEVCTL2.ATOMICOP_REQUESTER_ENABLE`` field is set. These requests can only be serviced if the upstream components support AtomicOp completion and/or routing to a component which does. AtomicOp Routing Support=1 Routing is supported, AtomicOp Routing Support=0 routing is not supported.
If your system has a PCIe Express Switch it needs to support atomic operations routing. Atomic operations requests are permitted only if a components ``DEVCTL2.ATOMICOP_REQUESTER_ENABLE`` field is set. These requests can only be serviced if the upstream components support atomic operations completion and/or routing to a component which does. Atomic operations routing support=1, routing is supported; Atomic operations routing support=0, routing is not supported.
Atomic Operation is a Non-Posted transaction supporting 32-bit and 64-bit address formats, there must be a response for Completion containing the result of the operation. Errors associated with the operation (uncorrectable error accessing the target location or carrying out the Atomic operation) are signaled to the requester by setting the Completion Status field in the completion descriptor, they are set to to Completer Abort (CA) or Unsupported Request (UR).
@@ -71,7 +71,7 @@ BAR Memory Overview
*******************
On a Xeon E5 based system in the BIOS we can turn on above 4GB PCIe addressing, if so he need to set MMIO Base address ( MMIOH Base) and Range ( MMIO High Size) in the BIOS.
In SuperMicro system in the system bios you need to see the following
In Supermicro system in the system bios you need to see the following
* Advanced->PCIe/PCI/PnP configuration-> Above 4G Decoding = Enabled
@@ -79,7 +79,7 @@ In SuperMicro system in the system bios you need to see the following
* Advanced->PCIe/PCI/PnP Configuration->MMIO High Size = 256G
When we support Large Bar Capability there is a Large Bar Vbios which also disable the IO bar.
When we support Large Bar Capability there is a Large Bar VBIOS which also disable the IO bar.
For GFX9 and Vega10 which have Physical Address up 44 bit and 48 bit Virtual address.
@@ -118,30 +118,5 @@ Legend:
5 : Expansion ROM This is required for the AMD Driver SW to access the GPUs video-bios. This is currently fixed at 128KB.
Excepts form Overview of Changes to PCI Express 3.0
===================================================
By Mike Jackson, Senior Staff Architect, MindShare, Inc.
********************************************************
Atomic Operations Goal:
*************************
Support SMP-type operations across a PCIe network to allow for things like offloading tasks between CPU cores and accelerators like a GPU. The spec says this enables advanced synchronization mechanisms that are particularly useful with multiple producers or consumers that need to be synchronized in a non-blocking fashion. Three new atomic non-posted requests were added, plus the corresponding completion (the address must be naturally aligned with the operand size or the TLP is malformed):
* Fetch and Add uses one operand as the “add” value. Reads the target location, adds the operand, and then writes the result back to the original location.
* Unconditional Swap uses one operand as the “swap” value. Reads the target location and then writes the swap value to it.
* Compare and Swap uses 2 operands: first data is compare value, second is swap value. Reads the target location, checks it against the compare value and, if equal, writes the swap value to the target location.
* AtomicOpCompletion new completion to give the result so far atomic request and indicate that the atomicity of the transaction has been maintained.
Since AtomicOps are not locked they don't have the performance downsides of the PCI locked protocol. Compared to locked cycles, they provide “lower latency, higher scalability, advanced synchronization algorithms, and dramatically lower impact on other PCIe traffic.” The lock mechanism can still be used across a bridge to PCI or PCI-X to achieve the desired operation.
AtomicOps can go from device to device, device to host, or host to device. Each completer indicates whether it supports this capability and guarantees atomic access if it does. The ability to route AtomicOps is also indicated in the registers for a given port.
ID-based Ordering Goal:
*************************
Improve performance by avoiding stalls caused by ordering rules. For example, posted writes are never normally allowed to pass each other in a queue, but if they are requested by different functions, we can have some confidence that the requests are not dependent on each other. The previously reserved Attribute bit [2] is now combined with the RO bit to indicate ID ordering with or without relaxed ordering.
This only has meaning for memory requests, and is reserved for Configuration or IO requests. Completers are not required to copy this bit into a completion, and only use the bit if their enable bit is set for this operation.
To read more on PCIe Gen 3 new options https://www.mindshare.com/files/resources/PCIe%203-0.pdf
For more information, you can review
`Overview of Changes to PCI Express 3.0 <https://www.mindshare.com/files/resources/PCIe%203-0.pdf>`_.

View File

@@ -4,7 +4,7 @@ Using CMake
Most components in ROCm support CMake. Projects depending on header-only or
library components typically require CMake 3.5 or higher whereas those wanting
to make use of CMake's HIP language support will require CMake 3.21 or higher.
to make use of the CMake HIP language support will require CMake 3.21 or higher.
Finding Dependencies
====================
@@ -16,7 +16,7 @@ Finding Dependencies
<https://cmake.org/cmake/help/latest/command/find_package.html>`_ and the
`Using Dependencies Guide
<https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html>`_
to get an overview of CMake's related facilities.
to get an overview of CMake related facilities.
In short, CMake supports finding dependencies in two ways:
@@ -28,7 +28,7 @@ In short, CMake supports finding dependencies in two ways:
regards needed to consume it.
ROCm predominantly relies on Config mode, one notable exception being the Module
driving the compilation of HIP programs on Nvidia runtimes. As such, when
driving the compilation of HIP programs on NVIDIA runtimes. As such, when
dependencies are not found in standard system locations, one either has to
instruct CMake to search for package config files in additional folders using
the ``CMAKE_PREFIX_PATH`` variable (a semi-colon separated list of filesystem
@@ -55,8 +55,8 @@ to the installation guides in these docs (`Linux <../deploy/linux/index.html>`_)
Using HIP in CMake
==================
ROCm componenents providing a C/C++ interface support being consumed using any
C/C++ toolchain that CMake knows how to drive. ROCm also supports CMake's HIP
ROCm components providing a C/C++ interface support being consumed using any
C/C++ toolchain that CMake knows how to drive. ROCm also supports the CMake HIP
language features, allowing users to program using the HIP single-source
programming model. When a program (or translation-unit) uses the HIP API without
compiling any GPU device code, HIP can be treated in CMake as a simple C/C++
@@ -172,7 +172,7 @@ all the flags necessary for device compilation.
.. note::
Compiling for the GPU device requires at least C++11.
This project can then be configured with for eg.
This project can then be configured with the following CMake commands:
- Windows: ``cmake -D CMAKE_CXX_COMPILER:PATH=${env:HIP_PATH}\bin\clang++.exe``
@@ -186,7 +186,7 @@ When using the CXX language support to compile HIP device code, selecting the
target GPU architectures is done via setting the ``GPU_TARGETS`` variable.
``CMAKE_HIP_ARCHITECTURES`` only exists when the HIP language is enabled. By
default, this is set to some subset of the currently supported architectures of
AMD ROCm. It can be set to eg. ``-D GPU_TARGETS="gfx1032;gfx1035"``.
AMD ROCm. It can be set to the CMake option ``-D GPU_TARGETS="gfx1032;gfx1035"``.
ROCm CMake Packages
-------------------
@@ -251,9 +251,9 @@ options.
IDEs supporting CMake (Visual Studio, Visual Studio Code, CLion, etc.) all came
up with their own way to register command-line fragments of different purpose in
a setup'n'forget fashion for quick assembly using graphical front-ends. This is
a setup-and-forget fashion for quick assembly using graphical front-ends. This is
all nice, but configurations aren't portable, nor can they be reused in
Continuous Intergration (CI) pipelines. CMake has condensed existing practice
Continuous Integration (CI) pipelines. CMake has condensed existing practice
into a portable JSON format that works in all IDEs and can be invoked from any
command-line. This is
`CMake Presets <https://cmake.org/cmake/help/latest/manual/cmake-presets.7.html>`_

View File

@@ -10,6 +10,6 @@ disambiguates compiler naming used throughout the documentation.
| `amdclang++` | Clang/LLVM-based compiler that is part of `rocm-llvm` package. The source code is available at <a href="https://github.com/RadeonOpenCompute/llvm-project" target="_blank">https://github.com/RadeonOpenCompute/llvm-project</a>. |
| AOCC | Closed-source clang-based compiler that includes additional CPU optimizations. Offered as part of ROCm via the `rocm-llvm-alt` package. See for details, <a href="https://developer.amd.com/amd-aocc/" target="_blank">https://developer.amd.com/amd-aocc/</a>. |
| HIP-Clang | Informal term for the `amdclang++` compiler |
| HIPify | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPIFY" target="_blank">https://github.com/ROCm-Developer-Tools/HIPIFY</a> |
| HIPIFY | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPIFY" target="_blank">https://github.com/ROCm-Developer-Tools/HIPIFY</a> |
| `hipcc` | HIP compiler driver. A utility that invokes `clang` or `nvcc` depending on the target and passes the appropriate include and library options for the target compiler and HIP infrastructure. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPCC" target="_blank">https://github.com/ROCm-Developer-Tools/HIPCC</a>. |
| ROCmCC | Clang/LLVM-based compiler. ROCmCC in itself is not a binary but refers to the overall compiler. |

View File

@@ -0,0 +1,15 @@
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable no-duplicate-header -->
### What's New in This Release
ROCm 5.6.1 is a point release with several bug fixes in the HIP runtime.
## HIP 5.6.1 (for ROCm 5.6.1)
### Fixed Defects
- *hipMemcpy* device-to-device (intra device) is now asynchronous with respect to the host
- Enabled xnack+ check in HIP catch2 tests hang when executing tests
- Memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs
- Using *hipGraphAddMemFreeNode* no longer results in a crash