mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-10 15:18:11 -05:00
Compare commits
22 Commits
docs/5.5.1
...
docs/5.6.1
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
0e0d89c45b | ||
|
|
1990a9340b | ||
|
|
6e2cdef227 | ||
|
|
e8faa13217 | ||
|
|
f9db7ee88e | ||
|
|
c2a1ea1248 | ||
|
|
5bcb8404e3 | ||
|
|
21f23d9089 | ||
|
|
76f43d7457 | ||
|
|
ddbe4cd38f | ||
|
|
7e097ce72a | ||
|
|
f3d3929f11 | ||
|
|
084ed7f4cb | ||
|
|
7482a8b261 | ||
|
|
bf8f0ccc65 | ||
|
|
ed8251872f | ||
|
|
c3e8e15e51 | ||
|
|
df0ee5a0ae | ||
|
|
6fb7b9f3b5 | ||
|
|
7f8eede7d1 | ||
|
|
0741268fd5 | ||
|
|
4ab3787abe |
715
.wordlist.txt
715
.wordlist.txt
@@ -1,49 +1,686 @@
|
||||
# file_reorg
|
||||
FHS
|
||||
Filesystem
|
||||
filesystem
|
||||
incrementing
|
||||
rocm
|
||||
# gpu_aware_mpi
|
||||
DMA
|
||||
GDR
|
||||
HCA
|
||||
MPI
|
||||
MVAPICH
|
||||
Mellanox's
|
||||
NIC
|
||||
OFED
|
||||
OSU
|
||||
OpenFabrics
|
||||
PeerDirect
|
||||
RDMA
|
||||
UCX
|
||||
ib_core
|
||||
# isv_deployment_win
|
||||
AAC
|
||||
ABI
|
||||
# linear algebra
|
||||
LAPACK
|
||||
MMA
|
||||
backends
|
||||
cuSOLVER
|
||||
cuSPARSE
|
||||
# openmp
|
||||
ICV
|
||||
Multithreaded
|
||||
# tuning_guides
|
||||
ACE
|
||||
ACEs
|
||||
AccVGPR
|
||||
AccVGPRs
|
||||
ALU
|
||||
AMD
|
||||
AMDGPU
|
||||
AMDGPUs
|
||||
AMDMIGraphX
|
||||
AMI
|
||||
AOCC
|
||||
AOMP
|
||||
APIC
|
||||
APIs
|
||||
APU
|
||||
ASIC
|
||||
ASICs
|
||||
ASan
|
||||
ASm
|
||||
ATI
|
||||
AddressSanitizer
|
||||
AlexNet
|
||||
Arb
|
||||
BLAS
|
||||
BMC
|
||||
BitCode
|
||||
Blit
|
||||
Bluefield
|
||||
CCD
|
||||
CDNA
|
||||
CIFAR
|
||||
CLI
|
||||
CLion
|
||||
CMake
|
||||
CMakeLists
|
||||
CMakePackage
|
||||
CP
|
||||
CPC
|
||||
CPF
|
||||
CPP
|
||||
CPU
|
||||
CPUs
|
||||
CSC
|
||||
CSE
|
||||
CSV
|
||||
CSn
|
||||
CTests
|
||||
CU
|
||||
CUDA
|
||||
CUs
|
||||
CXX
|
||||
Cavium
|
||||
CentOS
|
||||
ChatGPT
|
||||
CoRR
|
||||
Codespaces
|
||||
Commitizen
|
||||
CommonMark
|
||||
Concretized
|
||||
Conda
|
||||
ConnectX
|
||||
DGEMM
|
||||
DKMS
|
||||
DL
|
||||
DMA
|
||||
DNN
|
||||
DNNL
|
||||
DPM
|
||||
DRI
|
||||
DW
|
||||
DWORD
|
||||
Dask
|
||||
DataFrame
|
||||
DataLoader
|
||||
DataParallel
|
||||
DeepSpeed
|
||||
Dependabot
|
||||
DevCap
|
||||
Dockerfile
|
||||
Doxygen
|
||||
ELMo
|
||||
ENDPGM
|
||||
EPYC
|
||||
ESXi
|
||||
FFT
|
||||
FFTs
|
||||
FFmpeg
|
||||
FHS
|
||||
FMA
|
||||
FP
|
||||
Filesystem
|
||||
Flang
|
||||
Fortran
|
||||
Fuyu
|
||||
GALB
|
||||
GCD
|
||||
GCDs
|
||||
GCN
|
||||
GDB
|
||||
GDDR
|
||||
GDR
|
||||
GDS
|
||||
GEMM
|
||||
GEMMs
|
||||
GFortran
|
||||
GiB
|
||||
GIM
|
||||
GL
|
||||
GLXT
|
||||
GMI
|
||||
GPG
|
||||
GPR
|
||||
GPT
|
||||
GPU
|
||||
GPU's
|
||||
GPUs
|
||||
GRBM
|
||||
GenAI
|
||||
GenZ
|
||||
GitHub
|
||||
Gitpod
|
||||
HBM
|
||||
HCA
|
||||
HIPCC
|
||||
HIPExtension
|
||||
HIPIFY
|
||||
HPC
|
||||
HPCG
|
||||
HPE
|
||||
HPL
|
||||
HSA
|
||||
HWE
|
||||
Haswell
|
||||
Higgs
|
||||
Hyperparameters
|
||||
ICV
|
||||
IDE
|
||||
IDEs
|
||||
IMDb
|
||||
IOMMU
|
||||
IOP
|
||||
IOPM
|
||||
# windows
|
||||
IOV
|
||||
IRQ
|
||||
ISA
|
||||
ISV
|
||||
ISVs
|
||||
ImageNet
|
||||
InfiniBand
|
||||
Inlines
|
||||
IntelliSense
|
||||
Intersphinx
|
||||
Intra
|
||||
Ioffe
|
||||
JSON
|
||||
Jupyter
|
||||
KFD
|
||||
KiB
|
||||
KVM
|
||||
Keras
|
||||
Khronos
|
||||
LAPACK
|
||||
LCLK
|
||||
LDS
|
||||
LLM
|
||||
LLMs
|
||||
LLVM
|
||||
LM
|
||||
LSAN
|
||||
LTS
|
||||
LoRA
|
||||
MEM
|
||||
MERCHANTABILITY
|
||||
MFMA
|
||||
MiB
|
||||
MIGraphX
|
||||
MIOpen
|
||||
MIOpenGEMM
|
||||
MIVisionX
|
||||
MLM
|
||||
MMA
|
||||
MMIO
|
||||
MMIOH
|
||||
MNIST
|
||||
MPI
|
||||
MSVC
|
||||
MVAPICH
|
||||
MVFFR
|
||||
Makefile
|
||||
Makefiles
|
||||
Matplotlib
|
||||
Megatron
|
||||
Mellanox
|
||||
Mellanox's
|
||||
Meta's
|
||||
MirroredStrategy
|
||||
Multicore
|
||||
Multithreaded
|
||||
MyEnvironment
|
||||
MyST
|
||||
NBIO
|
||||
NBIOs
|
||||
NIC
|
||||
NICs
|
||||
NLI
|
||||
NLP
|
||||
NPS
|
||||
NSP
|
||||
NUMA
|
||||
NVCC
|
||||
NVIDIA
|
||||
NVPTX
|
||||
NaN
|
||||
Nano
|
||||
Navi
|
||||
Noncoherently
|
||||
NousResearch's
|
||||
NumPy
|
||||
OAM
|
||||
OAMs
|
||||
OCP
|
||||
OEM
|
||||
OFED
|
||||
OMP
|
||||
OMPI
|
||||
OMPT
|
||||
OMPX
|
||||
ONNX
|
||||
OSS
|
||||
OSU
|
||||
OpenCL
|
||||
OpenCV
|
||||
OpenFabrics
|
||||
OpenGL
|
||||
OpenMP
|
||||
OpenSSL
|
||||
OpenVX
|
||||
PCI
|
||||
PCIe
|
||||
PEFT
|
||||
PIL
|
||||
PILImage
|
||||
PRNG
|
||||
PRs
|
||||
PaLM
|
||||
Pageable
|
||||
PeerDirect
|
||||
Perfetto
|
||||
PipelineParallel
|
||||
PnP
|
||||
PowerShell
|
||||
PyPi
|
||||
PyTorch
|
||||
Qcycles
|
||||
RAII
|
||||
RCCL
|
||||
RDC
|
||||
RDMA
|
||||
RDNA
|
||||
RHEL
|
||||
ROC
|
||||
ROCProfiler
|
||||
ROCTracer
|
||||
ROCclr
|
||||
ROCdbgapi
|
||||
ROCgdb
|
||||
ROCk
|
||||
ROCm
|
||||
ROCmCC
|
||||
ROCmSoftwarePlatform
|
||||
ROCmValidationSuite
|
||||
ROCr
|
||||
RST
|
||||
RW
|
||||
Radeon
|
||||
RelWithDebInfo
|
||||
Req
|
||||
Rickle
|
||||
RoCE
|
||||
Ryzen
|
||||
SALU
|
||||
SBIOS
|
||||
SCA
|
||||
SDK
|
||||
SDMA
|
||||
SDRAM
|
||||
SENDMSG
|
||||
SGPR
|
||||
SGPRs
|
||||
SHA
|
||||
SIGQUIT
|
||||
SIMD
|
||||
SIMDs
|
||||
SKU
|
||||
SKUs
|
||||
PowerShell
|
||||
SLES
|
||||
SMEM
|
||||
SMI
|
||||
SMT
|
||||
SPI
|
||||
SQs
|
||||
SRAM
|
||||
SRAMECC
|
||||
SVD
|
||||
SWE
|
||||
SerDes
|
||||
Shlens
|
||||
Skylake
|
||||
Softmax
|
||||
Spack
|
||||
Supermicro
|
||||
Szegedy
|
||||
TCA
|
||||
TCC
|
||||
TCI
|
||||
TCIU
|
||||
TCP
|
||||
TCR
|
||||
TF
|
||||
TFLOPS
|
||||
TPU
|
||||
TPUs
|
||||
TensorBoard
|
||||
TensorFlow
|
||||
TensorParallel
|
||||
ToC
|
||||
TorchAudio
|
||||
TorchMIGraphX
|
||||
TorchScript
|
||||
TorchServe
|
||||
TorchVision
|
||||
TransferBench
|
||||
TrapStatus
|
||||
UAC
|
||||
# pytorch_install
|
||||
kdb
|
||||
precompiled
|
||||
# gpu_os_support
|
||||
HWE
|
||||
UC
|
||||
UCC
|
||||
UCX
|
||||
UIF
|
||||
USM
|
||||
UTCL
|
||||
UTIL
|
||||
Uncached
|
||||
Unhandled
|
||||
VALU
|
||||
VBIOS
|
||||
VGPR
|
||||
VGPRs
|
||||
VM
|
||||
VMEM
|
||||
VMWare
|
||||
VRAM
|
||||
VSIX
|
||||
VSkipped
|
||||
Vanhoucke
|
||||
Vulkan
|
||||
WGP
|
||||
WGPs
|
||||
WX
|
||||
WikiText
|
||||
Wojna
|
||||
Workgroups
|
||||
Writebacks
|
||||
XCD
|
||||
XCDs
|
||||
XGBoost
|
||||
XGBoost's
|
||||
XGMI
|
||||
XT
|
||||
XTX
|
||||
Xeon
|
||||
Xilinx
|
||||
Xnack
|
||||
Xteam
|
||||
YAML
|
||||
YML
|
||||
YModel
|
||||
ZeRO
|
||||
ZenDNN
|
||||
accuracies
|
||||
activations
|
||||
addr
|
||||
alloc
|
||||
allocator
|
||||
allocators
|
||||
amdgpu
|
||||
api
|
||||
atmi
|
||||
atomics
|
||||
autogenerated
|
||||
avx
|
||||
awk
|
||||
backend
|
||||
backends
|
||||
benchmarking
|
||||
bfloat
|
||||
bilinear
|
||||
bitsandbytes
|
||||
blit
|
||||
boson
|
||||
bosons
|
||||
buildable
|
||||
bursty
|
||||
bzip
|
||||
cacheable
|
||||
cd
|
||||
centos
|
||||
centric
|
||||
changelog
|
||||
chiplet
|
||||
cmake
|
||||
cmd
|
||||
coalescable
|
||||
codename
|
||||
collater
|
||||
comgr
|
||||
completers
|
||||
composable
|
||||
concretization
|
||||
config
|
||||
conformant
|
||||
convolutional
|
||||
convolves
|
||||
cpp
|
||||
csn
|
||||
cuBLAS
|
||||
cuFFT
|
||||
cuLIB
|
||||
cuRAND
|
||||
cuSOLVER
|
||||
cuSPARSE
|
||||
dataset
|
||||
datasets
|
||||
dataspace
|
||||
datatype
|
||||
datatypes
|
||||
dbgapi
|
||||
de
|
||||
deallocation
|
||||
denoise
|
||||
denoised
|
||||
denoises
|
||||
denormalize
|
||||
deserializers
|
||||
detections
|
||||
dev
|
||||
devicelibs
|
||||
devsel
|
||||
dimensionality
|
||||
disambiguates
|
||||
distro
|
||||
el
|
||||
embeddings
|
||||
enablement
|
||||
endpgm
|
||||
encodings
|
||||
env
|
||||
epilog
|
||||
etcetera
|
||||
ethernet
|
||||
exascale
|
||||
executables
|
||||
ffmpeg
|
||||
filesystem
|
||||
fortran
|
||||
galb
|
||||
gcc
|
||||
gdb
|
||||
gfortran
|
||||
gfx
|
||||
githooks
|
||||
github
|
||||
gnupg
|
||||
grayscale
|
||||
gzip
|
||||
heterogenous
|
||||
hipBLAS
|
||||
hipBLASLt
|
||||
hipCUB
|
||||
hipFFT
|
||||
hipLIB
|
||||
hipRAND
|
||||
hipSOLVER
|
||||
hipSPARSE
|
||||
hipSPARSELt
|
||||
hipTensor
|
||||
hipamd
|
||||
hipblas
|
||||
hipcub
|
||||
hipfft
|
||||
hipfort
|
||||
hipify
|
||||
hipsolver
|
||||
hipsparse
|
||||
hpp
|
||||
hsa
|
||||
hsakmt
|
||||
hyperparameter
|
||||
ib_core
|
||||
inband
|
||||
incrementing
|
||||
inferencing
|
||||
inflight
|
||||
init
|
||||
initializer
|
||||
inlining
|
||||
installable
|
||||
interprocedural
|
||||
intra
|
||||
invariants
|
||||
invocating
|
||||
ipo
|
||||
kdb
|
||||
latencies
|
||||
libfabric
|
||||
libjpeg
|
||||
libs
|
||||
linearized
|
||||
linter
|
||||
linux
|
||||
llvm
|
||||
localscratch
|
||||
logits
|
||||
lossy
|
||||
macOS
|
||||
matchers
|
||||
microarchitecture
|
||||
migraphx
|
||||
miopen
|
||||
miopengemm
|
||||
mivisionx
|
||||
mkdir
|
||||
mlirmiopen
|
||||
mtypes
|
||||
mvffr
|
||||
namespace
|
||||
namespaces
|
||||
numref
|
||||
ocl
|
||||
opencl
|
||||
opencv
|
||||
openmp
|
||||
openssl
|
||||
optimizers
|
||||
os
|
||||
pageable
|
||||
parallelization
|
||||
parameterization
|
||||
passthrough
|
||||
perfcounter
|
||||
performant
|
||||
perl
|
||||
pragma
|
||||
pre
|
||||
prebuilt
|
||||
precompiled
|
||||
prefetch
|
||||
prefetchable
|
||||
preprocess
|
||||
preprocessed
|
||||
preprocessing
|
||||
prequantized
|
||||
prerequisites
|
||||
profiler
|
||||
protobuf
|
||||
pseudorandom
|
||||
py
|
||||
quasirandom
|
||||
queueing
|
||||
rccl
|
||||
rdc
|
||||
reStructuredText
|
||||
reformats
|
||||
repos
|
||||
representativeness
|
||||
req
|
||||
resampling
|
||||
rescaling
|
||||
reusability
|
||||
roadmap
|
||||
roc
|
||||
rocAL
|
||||
rocALUTION
|
||||
rocBLAS
|
||||
rocFFT
|
||||
rocLIB
|
||||
rocMLIR
|
||||
rocPRIM
|
||||
rocRAND
|
||||
rocSOLVER
|
||||
rocSPARSE
|
||||
rocThrust
|
||||
rocWMMA
|
||||
rocalution
|
||||
rocblas
|
||||
rocclr
|
||||
rocfft
|
||||
rocm
|
||||
rocminfo
|
||||
rocprim
|
||||
rocprof
|
||||
rocprofiler
|
||||
rocr
|
||||
rocrand
|
||||
rocsolver
|
||||
rocsparse
|
||||
rocthrust
|
||||
roctracer
|
||||
runtime
|
||||
runtimes
|
||||
sL
|
||||
scalability
|
||||
scalable
|
||||
sendmsg
|
||||
serializers
|
||||
shader
|
||||
sharding
|
||||
sigmoid
|
||||
sm
|
||||
smi
|
||||
softmax
|
||||
spack
|
||||
src
|
||||
stochastically
|
||||
strided
|
||||
subdirectory
|
||||
subexpression
|
||||
subfolder
|
||||
subfolders
|
||||
supercomputing
|
||||
tensorfloat
|
||||
th
|
||||
tokenization
|
||||
tokenize
|
||||
tokenized
|
||||
tokenizer
|
||||
tokenizes
|
||||
toolchain
|
||||
toolchains
|
||||
toolset
|
||||
toolsets
|
||||
torchvision
|
||||
tqdm
|
||||
tracebacks
|
||||
txt
|
||||
uarch
|
||||
uncached
|
||||
uncorrectable
|
||||
uninstallation
|
||||
unsqueeze
|
||||
unstacking
|
||||
unswitching
|
||||
untrusted
|
||||
untuned
|
||||
upvote
|
||||
USM
|
||||
UTCL
|
||||
UTIL
|
||||
utils
|
||||
vL
|
||||
variational
|
||||
vdi
|
||||
vectorizable
|
||||
vectorization
|
||||
vectorize
|
||||
vectorized
|
||||
vectorizer
|
||||
vectorizes
|
||||
vjxb
|
||||
walkthrough
|
||||
walkthroughs
|
||||
wavefront
|
||||
wavefronts
|
||||
whitespaces
|
||||
workgroup
|
||||
workgroups
|
||||
writeback
|
||||
writebacks
|
||||
wrreq
|
||||
wzo
|
||||
xargs
|
||||
xz
|
||||
yaml
|
||||
ysvmadyb
|
||||
zypper
|
||||
19
CHANGELOG.md
19
CHANGELOG.md
@@ -15,6 +15,25 @@ The release notes for the ROCm platform.
|
||||
|
||||
-------------------
|
||||
|
||||
## ROCm 5.6.1
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
|
||||
### What's New in This Release
|
||||
|
||||
ROCm 5.6.1 is a point release with several bug fixes in the HIP runtime.
|
||||
|
||||
## HIP 5.6.1 (for ROCm 5.6.1)
|
||||
|
||||
### Fixed Defects
|
||||
|
||||
- *hipMemcpy* device-to-device (intra device) is now asynchronous with respect to the host
|
||||
- Enabled xnack+ check in HIP catch2 tests hang when executing tests
|
||||
- Memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs
|
||||
- Using *hipGraphAddMemFreeNode* no longer results in a crash
|
||||
|
||||
-------------------
|
||||
|
||||
## ROCm 5.6.0
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
|
||||
567
RELEASE.md
567
RELEASE.md
@@ -15,568 +15,19 @@ The release notes for the ROCm platform.
|
||||
|
||||
-------------------
|
||||
|
||||
## ROCm 5.6.0
|
||||
## ROCm 5.6.1
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
<!-- markdownlint-disable header-increment -->
|
||||
#### Release Highlights
|
||||
|
||||
ROCm 5.6 consists of several AI software ecosystem improvements to our fast-growing user base. A few examples include:
|
||||
### What's New in This Release
|
||||
|
||||
- New documentation portal at https://rocm.docs.amd.com
|
||||
- Ongoing software enhancements for LLMs, ensuring full compliance with the HuggingFace unit test suite
|
||||
- OpenAI Triton, CuPy, HIP Graph support, and many other library performance enhancements
|
||||
- Improved ROCm deployment and development tools, including CPU-GPU (rocGDB) debugger, profiler, and docker containers
|
||||
- New pseudorandom generators are available in rocRAND. Added support for half-precision transforms in hipFFT/rocFFT. Added LU refactorization and linear system solver for sparse matrices in rocSOLVER.
|
||||
ROCm 5.6.1 is a point release with several bug fixes in the HIP runtime.
|
||||
|
||||
#### OS and GPU Support Changes
|
||||
## HIP 5.6.1 (for ROCm 5.6.1)
|
||||
|
||||
- SLES15 SP5 support was added this release. SLES15 SP3 support was dropped.
|
||||
- AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively referred to as gfx906 GPUs) will be entering the maintenance mode starting Q3 2023. This will be aligned with ROCm 5.7 GA release date.
|
||||
- No new features and performance optimizations will be supported for the gfx906 GPUs beyond ROCm 5.7
|
||||
- Bug fixes / critical security patches will continue to be supported for the gfx906 GPUs till Q2 2024 (End of Maintenance [EOM])(will be aligned with the closest ROCm release)
|
||||
- Bug fixes during the maintenance will be made to the next ROCm point release
|
||||
- Bug fixes will not be back ported to older ROCm releases for this SKU
|
||||
- Distro / Operating system updates will continue as per the ROCm release cadence for gfx906 GPUs till EOM.
|
||||
### Fixed Defects
|
||||
|
||||
#### AMDSMI CLI 23.0.0.4
|
||||
|
||||
##### Added
|
||||
|
||||
- AMDSMI CLI tool enabled for Linux Bare Metal & Guest
|
||||
|
||||
- Package: amd-smi-lib
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- not all Error Correction Code (ECC) fields are currently supported
|
||||
|
||||
- RHEL 8 & SLES 15 have extra install steps
|
||||
|
||||
#### Kernel Modules (DKMS)
|
||||
|
||||
##### Fixes
|
||||
|
||||
- Stability fix for multi GPU system reproducilble via ROCm_Bandwidth_Test as reported in [Issue 2198](https://github.com/RadeonOpenCompute/ROCm/issues/2198).
|
||||
|
||||
#### HIP 5.6 (For ROCm 5.6)
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Consolidation of hipamd, rocclr and OpenCL projects in clr
|
||||
- Optimized lock for graph global capture mode
|
||||
|
||||
##### Added
|
||||
|
||||
- Added hipRTC support for amd_hip_fp16
|
||||
- Added hipStreamGetDevice implementation to get the device associated with the stream
|
||||
- Added HIP_AD_FORMAT_SIGNED_INT16 in hipArray formats
|
||||
- hipArrayGetInfo for getting information about the specified array
|
||||
- hipArrayGetDescriptor for getting 1D or 2D array descriptor
|
||||
- hipArray3DGetDescriptor to get 3D array descriptor
|
||||
|
||||
##### Changed
|
||||
|
||||
- hipMallocAsync to return success for zero size allocation to match hipMalloc
|
||||
- Separation of hipcc perl binaries from HIP project to hipcc project. hip-devel package depends on newly added hipcc package
|
||||
- Consolidation of hipamd, ROCclr, and OpenCL repositories into a single repository called clr. Instructions are updated to build HIP from sources in the HIP Installation guide
|
||||
- Removed hipBusBandwidth and hipCommander samples from hip-tests
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed regression in hipMemCpyParam3D when offset is applied
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- Limited testing on xnack+ configuration
|
||||
- Multiple HIP tests failures (gpuvm fault or hangs)
|
||||
- hipSetDevice and hipSetDeviceFlags APIs return hipErrorInvalidDevice instead of hipErrorNoDevice, on a system without GPU
|
||||
- Known memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs. Issue will be fixed in a future ROCm release
|
||||
|
||||
##### Upcoming changes in future release
|
||||
|
||||
- Removal of gcnarch from hipDeviceProp_t structure
|
||||
- Addition of new fields in hipDeviceProp_t structure
|
||||
- maxTexture1D
|
||||
- maxTexture2D
|
||||
- maxTexture1DLayered
|
||||
- maxTexture2DLayered
|
||||
- sharedMemPerMultiprocessor
|
||||
- deviceOverlap
|
||||
- asyncEngineCount
|
||||
- surfaceAlignment
|
||||
- unifiedAddressing
|
||||
- computePreemptionSupported
|
||||
- uuid
|
||||
- Removal of deprecated code
|
||||
- hip-hcc codes from hip code tree
|
||||
- Correct hipArray usage in HIP APIs such as hipMemcpyAtoH and hipMemcpyHtoA
|
||||
- HIPMEMCPY_3D fields correction (unsigned int -> size_t)
|
||||
- Renaming of 'memoryType' in hipPointerAttribute_t structure to 'type'
|
||||
|
||||
#### ROCgdb-13 (For ROCm 5.6.0)
|
||||
|
||||
##### Optimized
|
||||
|
||||
- Improved performances when handling the end of a process with a large number of threads.
|
||||
|
||||
Known Issues
|
||||
|
||||
- On certain configurations, ROCgdb can show the following warning message:
|
||||
|
||||
`warning: Probes-based dynamic linker interface failed. Reverting to original interface.`
|
||||
|
||||
This does not affect ROCgdb's functionalities.
|
||||
|
||||
#### ROCprofiler (For ROCm 5.6.0)
|
||||
|
||||
In ROCm 5.6 the `rocprofilerv1` and `rocprofilerv2` include and library files of
|
||||
ROCm 5.5 are split into separate files. The `rocmtools` files that were
|
||||
deprecated in ROCm 5.5 have been removed.
|
||||
|
||||
| ROCm 5.6 | rocprofilerv1 | rocprofilerv2 |
|
||||
|-----------------|-------------------------------------|----------------------------------------|
|
||||
| **Tool script** | `bin/rocprof` | `bin/rocprofv2` |
|
||||
| **API include** | `include/rocprofiler/rocprofiler.h` | `include/rocprofiler/v2/rocprofiler.h` |
|
||||
| **API library** | `lib/librocprofiler.so.1` | `lib/librocprofiler.so.2` |
|
||||
|
||||
The ROCm Profiler Tool that uses `rocprofilerV1` can be invoked using the
|
||||
following command:
|
||||
|
||||
```sh
|
||||
$ rocprof …
|
||||
```
|
||||
|
||||
To write a custom tool based on the `rocprofilerV1` API do the following:
|
||||
|
||||
```C
|
||||
main.c:
|
||||
#include <rocprofiler/rocprofiler.h> // Use the rocprofilerV1 API
|
||||
int main() {
|
||||
// Use the rocprofilerV1 API
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
This can be built in the following manner:
|
||||
|
||||
```sh
|
||||
$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64
|
||||
```
|
||||
|
||||
The resulting `a.out` will depend on
|
||||
`/opt/rocm-5.6.0/lib/librocprofiler64.so.1`.
|
||||
|
||||
The ROCm Profiler that uses `rocprofilerV2` API can be invoked using the
|
||||
following command:
|
||||
|
||||
```sh
|
||||
$ rocprofv2 …
|
||||
```
|
||||
|
||||
To write a custom tool based on the `rocprofilerV2` API do the following:
|
||||
|
||||
```C
|
||||
main.c:
|
||||
#include <rocprofiler/v2/rocprofiler.h> // Use the rocprofilerV2 API
|
||||
int main() {
|
||||
// Use the rocprofilerV2 API
|
||||
return 0;
|
||||
}
|
||||
```
|
||||
|
||||
This can be built in the following manner:
|
||||
|
||||
```sh
|
||||
$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64-v2
|
||||
```
|
||||
|
||||
The resulting `a.out` will depend on
|
||||
`/opt/rocm-5.6.0/lib/librocprofiler64.so.2`.
|
||||
|
||||
##### Optimized
|
||||
|
||||
- Improved Test Suite
|
||||
|
||||
##### Added
|
||||
|
||||
- 'end_time' need to be disabled in roctx_trace.txt
|
||||
|
||||
##### Fixed
|
||||
|
||||
- rocprof in ROcm/5.4.0 gpu selector broken.
|
||||
- rocprof in ROCm/5.4.1 fails to generate kernel info.
|
||||
- rocprof clobbers LD_PRELOAD.
|
||||
|
||||
### Library Changes in ROCM 5.6.0
|
||||
|
||||
| Library | Version |
|
||||
|---------|---------|
|
||||
| hipBLAS | ⇒ [1.0.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-5.6.0) |
|
||||
| hipCUB | ⇒ [2.13.1](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-5.6.0) |
|
||||
| hipFFT | ⇒ [1.0.12](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-5.6.0) |
|
||||
| hipSOLVER | ⇒ [1.8.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-5.6.0) |
|
||||
| hipSPARSE | ⇒ [2.3.6](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-5.6.0) |
|
||||
| MIOpen | ⇒ [2.19.0](https://github.com/ROCmSoftwarePlatform/MIOpen/releases/tag/rocm-5.6.0) |
|
||||
| rccl | ⇒ [2.15.5](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-5.6.0) |
|
||||
| rocALUTION | ⇒ [2.1.9](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-5.6.0) |
|
||||
| rocBLAS | ⇒ [3.0.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-5.6.0) |
|
||||
| rocFFT | ⇒ [1.0.23](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-5.6.0) |
|
||||
| rocm-cmake | ⇒ [0.9.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-5.6.0) |
|
||||
| rocPRIM | ⇒ [2.13.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-5.6.0) |
|
||||
| rocRAND | ⇒ [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-5.6.0) |
|
||||
| rocSOLVER | ⇒ [3.22.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-5.6.0) |
|
||||
| rocSPARSE | ⇒ [2.5.2](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-5.6.0) |
|
||||
| rocThrust | ⇒ [2.18.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-5.6.0) |
|
||||
| rocWMMA | ⇒ [1.1.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-5.6.0) |
|
||||
| Tensile | ⇒ [4.37.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-5.6.0) |
|
||||
|
||||
#### hipBLAS 1.0.0
|
||||
|
||||
hipBLAS 1.0.0 for ROCm 5.6.0
|
||||
|
||||
##### Changed
|
||||
|
||||
- added const qualifier to hipBLAS functions (swap, sbmv, spmv, symv, trsm) where missing
|
||||
|
||||
##### Removed
|
||||
|
||||
- removed support for deprecated hipblasInt8Datatype_t enum
|
||||
- removed support for deprecated hipblasSetInt8Datatype and hipblasGetInt8Datatype functions
|
||||
|
||||
##### Deprecated
|
||||
|
||||
- in-place trmm is deprecated. It will be replaced by trmm which includes both in-place and
|
||||
out-of-place functionality
|
||||
|
||||
#### hipCUB 2.13.1
|
||||
|
||||
hipCUB 2.13.1 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Benchmarks for `BlockShuffle`, `BlockLoad`, and `BlockStore`.
|
||||
|
||||
##### Changed
|
||||
|
||||
- CUB backend references CUB and Thrust version 1.17.2.
|
||||
- Improved benchmark coverage of `BlockScan` by adding `ExclusiveScan`, benchmark coverage of `BlockRadixSort` by adding `SortBlockedToStriped`, and benchmark coverage of `WarpScan` by adding `Broadcast`.
|
||||
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- `BlockRadixRankMatch` is currently broken under the rocPRIM backend.
|
||||
- `BlockRadixRankMatch` with a warp size that does not exactly divide the block size is broken under the CUB backend.
|
||||
|
||||
#### hipFFT 1.0.12
|
||||
|
||||
hipFFT 1.0.12 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Implemented the hipfftXtMakePlanMany, hipfftXtGetSizeMany, hipfftXtExec APIs, to allow requesting half-precision transforms.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform.
|
||||
|
||||
#### hipSOLVER 1.8.0
|
||||
|
||||
hipSOLVER 1.8.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added compatibility API with hipsolverRf prefix
|
||||
|
||||
#### hipSPARSE 2.3.6
|
||||
|
||||
hipSPARSE 2.3.6 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added SpGEMM algorithms
|
||||
|
||||
##### Changed
|
||||
|
||||
- For hipsparseXbsr2csr and hipsparseXcsr2bsr, blockDim == 0 now returns HIPSPARSE_STATUS_INVALID_SIZE
|
||||
|
||||
#### MIOpen 2.19.0
|
||||
|
||||
MIOpen 2.19.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- ROCm 5.5 support for gfx1101 (Navi32)
|
||||
|
||||
##### Changed
|
||||
|
||||
- Tuning results for MLIR on ROCm 5.5
|
||||
- Bumping MLIR commit to 5.5.0 release tag
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fix 3d convolution Host API bug
|
||||
- [HOTFIX][MI200][FP16] Disabled ConvHipImplicitGemmBwdXdlops when FP16_ALT is required.
|
||||
|
||||
#### rccl 2.15.5
|
||||
|
||||
RCCL 2.15.5 for ROCm 5.6.0
|
||||
|
||||
##### Changed
|
||||
|
||||
- Compatibility with NCCL 2.15.5
|
||||
- Unit test executable renamed to rccl-UnitTests
|
||||
|
||||
##### Added
|
||||
|
||||
- HW-topology aware binary tree implementation
|
||||
- Experimental support for MSCCL
|
||||
- New unit tests for hipGraph support
|
||||
- NPKit integration
|
||||
|
||||
##### Fixed
|
||||
|
||||
- rocm-smi ID conversion
|
||||
- Support for HIP_VISIBLE_DEVICES for unit tests
|
||||
- Support for p2p transfers to non (HIP) visible devices
|
||||
|
||||
##### Removed
|
||||
|
||||
- Removed TransferBench from tools. Exists in standalone repo: https://github.com/ROCmSoftwarePlatform/TransferBench
|
||||
|
||||
#### rocALUTION 2.1.9
|
||||
|
||||
rocALUTION 2.1.9 for ROCm 5.6.0
|
||||
|
||||
##### Improved
|
||||
|
||||
- Fixed synchronization issues in level 1 routines
|
||||
|
||||
#### rocBLAS 3.0.0
|
||||
|
||||
rocBLAS 3.0.0 for ROCm 5.6.0
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Improved performance of Level 2 rocBLAS GEMV on gfx90a GPU for non-transposed problems having small matrices and larger batch counts. Performance enhanced for problem sizes when m and n <= 32 and batch_count >= 256.
|
||||
- Improved performance of rocBLAS syr2k for single, double, and double-complex precision, and her2k for double-complex precision. Slightly improved performance for general sizes on gfx90a.
|
||||
|
||||
##### Added
|
||||
|
||||
- Added bf16 inputs and f32 compute support to Level 1 rocBLAS Extension functions axpy_ex, scal_ex and nrm2_ex.
|
||||
|
||||
##### Deprecated
|
||||
|
||||
- trmm inplace is deprecated. It will be replaced by trmm that has both inplace and out-of-place functionality
|
||||
- rocblas_query_int8_layout_flag() is deprecated and will be removed in a future release
|
||||
- rocblas_gemm_flags_pack_int8x4 enum is deprecated and will be removed in a future release
|
||||
- rocblas_set_device_memory_size() is deprecated and will be replaced by a future function rocblas_increase_device_memory_size()
|
||||
- rocblas_is_user_managing_device_memory() is deprecated and will be removed in a future release
|
||||
|
||||
##### Removed
|
||||
|
||||
- is_complex helper was deprecated and now removed. Use rocblas_is_complex instead.
|
||||
- The enum truncate_t and the value truncate was deprecated and now removed from. It was replaced by rocblas_truncate_t and rocblas_truncate, respectively.
|
||||
- rocblas_set_int8_type_for_hipblas was deprecated and is now removed.
|
||||
- rocblas_get_int8_type_for_hipblas was deprecated and is now removed.
|
||||
|
||||
##### Dependencies
|
||||
|
||||
- build only dependency on python joblib added as used by Tensile build
|
||||
- fix for cmake install on some OS when performed by install.sh -d --cmake_install
|
||||
|
||||
##### Fixed
|
||||
|
||||
- make trsm offset calculations 64 bit safe
|
||||
|
||||
##### Changed
|
||||
|
||||
- refactor rotg test code
|
||||
|
||||
#### rocFFT 1.0.23
|
||||
|
||||
rocFFT 1.0.23 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Implemented half-precision transforms, which can be requested by passing rocfft_precision_half to rocfft_plan_create.
|
||||
- Implemented a hierarchical solution map which saves how to decompose a problem and the kernels to be used.
|
||||
- Implemented a first version of offline-tuner to support tuning kernels for C2C/Z2Z problems.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Replaced std::complex with hipComplex data types for data generator.
|
||||
- FFT plan dimensions are now sorted to be row-major internally where possible, which produces better plans if the dimensions were accidentally specified in a different order (column-major, for example).
|
||||
- Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform.
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed over-allocation of LDS in some real-complex kernels, which was resulting in kernel launch failure.
|
||||
|
||||
#### rocm-cmake 0.9.0
|
||||
|
||||
rocm-cmake 0.9.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added the option ROCM_HEADER_WRAPPER_WERROR
|
||||
- Compile-time C macro in the wrapper headers causes errors to be emitted instead of warnings.
|
||||
- Configure-time CMake option sets the default for the C macro.
|
||||
|
||||
#### rocPRIM 2.13.0
|
||||
|
||||
rocPRIM 2.13.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- New block level `radix_rank` primitive.
|
||||
- New block level `radix_rank_match` primitive.
|
||||
- Added a stable block sorting implementation. This be used with `block_sort` by using the `block_sort_algorithm::stable_merge_sort` algorithm.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Improved the performance of `block_radix_sort` and `device_radix_sort`.
|
||||
- Improved the performance of `device_merge_sort`.
|
||||
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core). Contributed by: [v01dXYZ](https://github.com/v01dXYZ).
|
||||
|
||||
##### Known Issues
|
||||
|
||||
- Disabled GPU error messages relating to incorrect warp operation usage with Navi GPUs on Windows, due to GPU printf performance issues on Windows.
|
||||
- When `ROCPRIM_DISABLE_LOOKBACK_SCAN` is set, `device_scan` fails for input sizes bigger than `scan_config::size_limit`, which defaults to `std::numeric_limits<unsigned int>::max()`.
|
||||
|
||||
#### rocRAND 2.10.17
|
||||
|
||||
rocRAND 2.10.17 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- MT19937 pseudo random number generator based on M. Matsumoto and T. Nishimura, 1998, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator.
|
||||
- New benchmark for the device API using Google Benchmark, `benchmark_rocrand_device_api`, replacing `benchmark_rocrand_kernel`. `benchmark_rocrand_kernel` is deprecated and will be removed in a future version. Likewise, `benchmark_curand_host_api` is added to replace `benchmark_curand_generate` and `benchmark_curand_device_api` is added to replace `benchmark_curand_kernel`.
|
||||
- experimental HIP-CPU feature
|
||||
- ThreeFry pseudorandom number generator based on Salmon et al., 2011, "Parallel random numbers: as easy as 1, 2, 3".
|
||||
|
||||
##### Changed
|
||||
|
||||
- Python 2.7 is no longer officially supported.
|
||||
|
||||
#### rocSOLVER 3.22.0
|
||||
|
||||
rocSOLVER 3.22.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- LU refactorization for sparse matrices
|
||||
- CSRRF_ANALYSIS
|
||||
- CSRRF_SUMLU
|
||||
- CSRRF_SPLITLU
|
||||
- CSRRF_REFACTLU
|
||||
- Linear system solver for sparse matrices
|
||||
- CSRRF_SOLVE
|
||||
- Added type `rocsolver_rfinfo` for use with sparse matrix routines
|
||||
|
||||
##### Optimized
|
||||
|
||||
- Improved the performance of BDSQR and GESVD when singular vectors are requested
|
||||
|
||||
##### Fixed
|
||||
|
||||
- BDSQR and GESVD should no longer hang when the input contains `NaN` or `Inf`
|
||||
|
||||
#### rocSPARSE 2.5.2
|
||||
|
||||
rocSPARSE 2.5.2 for ROCm 5.6.0
|
||||
|
||||
##### Improved
|
||||
|
||||
- Fixed a memory leak in csritsv
|
||||
- Fixed a bug in csrsm and bsrsm
|
||||
|
||||
#### rocThrust 2.18.0
|
||||
|
||||
rocThrust 2.18.0 for ROCm 5.6.0
|
||||
|
||||
##### Fixed
|
||||
|
||||
- `lower_bound`, `upper_bound`, and `binary_search` failed to compile for certain types.
|
||||
|
||||
##### Changed
|
||||
|
||||
- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core).
|
||||
|
||||
#### rocWMMA 1.1.0
|
||||
|
||||
rocWMMA 1.1.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added cross-lane operation backends (Blend, Permute, Swizzle and Dpp)
|
||||
- Added GPU kernels for rocWMMA unit test pre-process and post-process operations (fill, validation)
|
||||
- Added performance gemm samples for half, single and double precision
|
||||
- Added rocWMMA cmake versioning
|
||||
- Added vectorized support in coordinate transforms
|
||||
- Included ROCm smi for runtime clock rate detection
|
||||
- Added fragment transforms for transpose and change data layout
|
||||
|
||||
##### Changed
|
||||
|
||||
- Default to GPU rocBLAS validation against rocWMMA
|
||||
- Re-enabled int8 gemm tests on gfx9
|
||||
- Upgraded to C++17
|
||||
- Restructured unit test folder for consistency
|
||||
- Consolidated rocWMMA samples common code
|
||||
|
||||
#### Tensile 4.37.0
|
||||
|
||||
Tensile 4.37.0 for ROCm 5.6.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added user driven tuning API
|
||||
- Added decision tree fallback feature
|
||||
- Added SingleBuffer + AtomicAdd option for GlobalSplitU
|
||||
- DirectToVgpr support for fp16 and Int8 with TN orientation
|
||||
- Added new test cases for various functions
|
||||
- Added SingleBuffer algorithm for ZGEMM/CGEMM
|
||||
- Added joblib for parallel map calls
|
||||
- Added support for MFMA + LocalSplitU + DirectToVgprA+B
|
||||
- Added asmcap check for MIArchVgpr
|
||||
- Added support for MFMA + LocalSplitU
|
||||
- Added frequency, power, and temperature data to the output
|
||||
|
||||
##### Optimizations
|
||||
|
||||
- Improved the performance of GlobalSplitU with SingleBuffer algorithm
|
||||
- Reduced the running time of the extended and pre_checkin tests
|
||||
- Optimized the Tailloop section of the assembly kernel
|
||||
- Optimized complex GEMM (fixed vgpr allocation, unified CGEMM and ZGEMM code in MulMIoutAlphaToArch)
|
||||
- Improved the performance of the second kernel of MultipleBuffer algorithm
|
||||
|
||||
##### Changed
|
||||
|
||||
- Updated custom kernels with 64-bit offsets
|
||||
- Adapted 64-bit offset arguments for assembly kernels
|
||||
- Improved temporary register re-use to reduce max sgpr usage
|
||||
- Removed some restrictions on VectorWidth and DirectToVgpr
|
||||
- Updated the dependency requirements for Tensile
|
||||
- Changed the range of AssertSummationElementMultiple
|
||||
- Modified the error messages for more clarity
|
||||
- Changed DivideAndReminder to vectorStaticRemainder in case quotient is not used
|
||||
- Removed dummy vgpr for vectorStaticRemainder
|
||||
- Removed tmpVgpr parameter from vectorStaticRemainder/Divide/DivideAndReminder
|
||||
- Removed qReg parameter from vectorStaticRemainder
|
||||
|
||||
##### Fixed
|
||||
|
||||
- Fixed tmp sgpr allocation to avoid over-writing values (alpha)
|
||||
- 64-bit offset parameters for post kernels
|
||||
- Fixed gfx908 CI test failures
|
||||
- Fixed offset calculation to prevent overflow for large offsets
|
||||
- Fixed issues when BufferLoad and BufferStore are equal to zero
|
||||
- Fixed StoreCInUnroll + DirectToVgpr + no useInitAccVgprOpt mismatch
|
||||
- Fixed DirectToVgpr + LocalSplitU + FractionalLoad mismatch
|
||||
- Fixed the memory access error related to StaggerU + large stride
|
||||
- Fixed ZGEMM 4x4 MatrixInst mismatch
|
||||
- Fixed DGEMM 4x4 MatrixInst mismatch
|
||||
- Fixed ASEM + GSU + NoTailLoop opt mismatch
|
||||
- Fixed AssertSummationElementMultiple + GlobalSplitU issues
|
||||
- Fixed ASEM + GSU + TailLoop inner unroll
|
||||
- *hipMemcpy* device-to-device (intra device) is now asynchronous with respect to the host
|
||||
- Enabled xnack+ check in HIP catch2 tests hang when executing tests
|
||||
- Memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs
|
||||
- Using *hipGraphAddMemFreeNode* no longer results in a crash
|
||||
|
||||
@@ -12,7 +12,7 @@ fetch="https://github.com/GPUOpen-ProfessionalCompute-Libraries/" />
|
||||
fetch="https://github.com/GPUOpen-Tools/" />
|
||||
<remote name="KhronosGroup"
|
||||
fetch="https://github.com/KhronosGroup/" />
|
||||
<default revision="refs/tags/rocm-5.6.0"
|
||||
<default revision="refs/tags/rocm-5.6.1"
|
||||
remote="roc-github"
|
||||
sync-c="true"
|
||||
sync-j="4" />
|
||||
|
||||
@@ -5,9 +5,9 @@ Documentation is built using open source toolchains. Contributions to our
|
||||
documentation is encouraged and welcome. As a contributor, please familiarize
|
||||
yourself with our documentation toolchain.
|
||||
|
||||
## ReadTheDocs
|
||||
## Read The Docs
|
||||
|
||||
[ReadTheDocs](https://docs.readthedocs.io/en/stable/) is our front end for the
|
||||
[Read the Docs](https://docs.readthedocs.io/en/stable/) is our front end for the
|
||||
our documentation. By front end, this is the tool that serves our HTML based
|
||||
documentation to our end users.
|
||||
|
||||
|
||||
@@ -20,8 +20,8 @@ latex_engine = "xelatex"
|
||||
project = "ROCm Documentation"
|
||||
author = "Advanced Micro Devices, Inc."
|
||||
copyright = "Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved."
|
||||
version = "5.6.0"
|
||||
release = "5.6.0"
|
||||
version = "5.6.1"
|
||||
release = "5.6.1"
|
||||
|
||||
|
||||
setting_all_article_info = True
|
||||
@@ -86,7 +86,7 @@ article_pages = [
|
||||
|
||||
external_toc_path = "./sphinx/_toc.yml"
|
||||
|
||||
docs_core = ROCmDocs("ROCm 5.5.1 Documentation Home")
|
||||
docs_core = ROCmDocs("ROCm 5.6.1 Documentation Home")
|
||||
docs_core.setup()
|
||||
|
||||
external_projects_current_project = "rocm"
|
||||
|
||||
@@ -18,8 +18,8 @@ following commands based on your distribution.
|
||||
|
||||
```shell
|
||||
sudo apt update
|
||||
wget https://repo.radeon.com/amdgpu-install/5.5.1/ubuntu/focal/amdgpu-install_5.5.50501-1_all.deb
|
||||
sudo apt install ./amdgpu-install_5.6.50600-1_all.deb
|
||||
wget https://repo.radeon.com/amdgpu-install/5.6.1/ubuntu/focal/amdgpu-install_5.6.50601-1_all.deb
|
||||
sudo apt install ./amdgpu-install_5.6.50601-1_all.deb
|
||||
```
|
||||
|
||||
:::
|
||||
@@ -28,8 +28,8 @@ sudo apt install ./amdgpu-install_5.6.50600-1_all.deb
|
||||
|
||||
```shell
|
||||
sudo apt update
|
||||
wget https://repo.radeon.com/amdgpu-install/5.5.1/ubuntu/jammy/amdgpu-install_5.5.50501-1_all.deb
|
||||
sudo apt install ./amdgpu-install_5.6.50600-1_all.deb
|
||||
wget https://repo.radeon.com/amdgpu-install/5.6.1/ubuntu/jammy/amdgpu-install_5.6.50601-1_all.deb
|
||||
sudo apt install ./amdgpu-install_5.6.50601-1_all.deb
|
||||
```
|
||||
|
||||
:::
|
||||
@@ -39,21 +39,12 @@ sudo apt install ./amdgpu-install_5.6.50600-1_all.deb
|
||||
:sync: RHEL
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8.6
|
||||
:sync: RHEL-8.6
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/8.6/amdgpu-install-5.5.50501-1.el8.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 8.7
|
||||
:sync: RHEL-8.7
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/8.7/amdgpu-install-5.5.50501-1.el8.noarch.rpm
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.6.1/rhel/8.7/amdgpu-install-5.6.50601-1.el8.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
@@ -62,7 +53,7 @@ sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/8.7/amdgpu-in
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/8.8/amdgpu-install-5.5.50501-1.el8.noarch.rpm
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.6.1/rhel/8.8/amdgpu-install-5.6.50601-1.el8.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
@@ -71,7 +62,7 @@ sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/8.8/amdgpu-in
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/9.1/amdgpu-install-5.5.50501-1.el9.noarch.rpm
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.6.1/rhel/9.1/amdgpu-install-5.6.50601-1.el9.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
@@ -80,7 +71,7 @@ sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/9.1/amdgpu-in
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/9.2/amdgpu-install-5.5.50501-1.el9.noarch.rpm
|
||||
sudo yum install https://repo.radeon.com/amdgpu-install/5.6.1/rhel/9.2/amdgpu-install-5.6.50601-1.el9.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
@@ -90,19 +81,19 @@ sudo yum install https://repo.radeon.com/amdgpu-install/5.5.1/rhel/9.2/amdgpu-in
|
||||
:sync: SLES
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} SLES 15.3
|
||||
:sync: SLES-15.3
|
||||
|
||||
```shell
|
||||
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.5.1/sle/15.3/amdgpu-install-5.5.50501-1.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} SLES 15.4
|
||||
:sync: SLES-15.4
|
||||
|
||||
```shell
|
||||
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.5.1/sle/15.4/amdgpu-install-5.5.50501-1.noarch.rpm
|
||||
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.6.1/sle/15.4/amdgpu-install-5.6.50601-1.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} SLES 15.5
|
||||
:sync: SLES-15.5
|
||||
|
||||
```shell
|
||||
sudo zypper --no-gpg-checks install https://repo.radeon.com/amdgpu-install/5.6.1/sle/15.5/amdgpu-install-5.6.50601-1.noarch.rpm
|
||||
```
|
||||
|
||||
:::
|
||||
@@ -202,8 +193,8 @@ Run the following commands based on your distribution to add the repositories:
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4.3; do
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" | sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
for ver in 5.4.3 5.5.1; do
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" | sudo tee --append /etc/apt/sources.list.d/rocm.list
|
||||
done
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt update
|
||||
@@ -214,8 +205,8 @@ sudo apt update
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4.3; do
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" | sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
for ver in 5.4.3 5.5.1; do
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" | sudo tee --append /etc/apt/sources.list.d/rocm.list
|
||||
done
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
sudo apt update
|
||||
@@ -232,7 +223,7 @@ sudo apt update
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4.3; do
|
||||
for ver in 5.4.3 5.5.1; do
|
||||
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-$ver]
|
||||
name=ROCm$ver
|
||||
@@ -251,7 +242,7 @@ sudo yum clean all
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4.3; do
|
||||
for ver in 5.4.3 5.5.1; do
|
||||
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-$ver]
|
||||
name=ROCm$ver
|
||||
@@ -272,7 +263,7 @@ sudo yum clean all
|
||||
:sync: SLES
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4.3; do
|
||||
for ver in 5.4.3 5.5.1; do
|
||||
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/zyp/$ver/main
|
||||
@@ -302,8 +293,8 @@ driver, associated with the ROCm release v5.4.3, will be installed as its latest
|
||||
release in the list.
|
||||
|
||||
```none
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=5.3.3
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=5.4.3
|
||||
sudo amdgpu-install --usecase=rocm --rocmrelease=5.5.1
|
||||
```
|
||||
|
||||
## Additional options
|
||||
|
||||
@@ -52,8 +52,11 @@ To add the AMDGPU repository, follow these steps:
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
# version
|
||||
ver=5.6.1
|
||||
|
||||
# amdgpu repository for focal
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6/ubuntu focal main' \
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/$ver/ubuntu focal main" \
|
||||
| sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
```
|
||||
@@ -63,8 +66,11 @@ sudo apt update
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
# version
|
||||
ver=5.6.1
|
||||
|
||||
# amdgpu repository for jammy
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6/ubuntu jammy main' \
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/$ver/ubuntu jammy main" \
|
||||
| sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
```
|
||||
@@ -91,7 +97,7 @@ To add the ROCm repository, use the following steps:
|
||||
|
||||
```shell
|
||||
# ROCm repositories for focal
|
||||
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
|
||||
for ver in 5.3.3 5.4.6 5.5.3 5.6.1; do
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver focal main" \
|
||||
| sudo tee --append /etc/apt/sources.list.d/rocm.list
|
||||
done
|
||||
@@ -106,7 +112,7 @@ sudo apt update
|
||||
|
||||
```shell
|
||||
# ROCm repositories for jammy
|
||||
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
|
||||
for ver in 5.3.3 5.4.6 5.5.3 5.6.1; do
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$ver jammy main" \
|
||||
| sudo tee --append /etc/apt/sources.list.d/rocm.list
|
||||
done
|
||||
@@ -136,7 +142,7 @@ For a comprehensive list of meta-packages, refer to
|
||||
- Sample Multi-version installation
|
||||
|
||||
```shell
|
||||
sudo apt install rocm-hip-sdk5.6 rocm-hip-sdk5.3.3
|
||||
sudo apt install rocm-hip-sdk5.6.1 rocm-hip-sdk5.5.3
|
||||
```
|
||||
|
||||
:::::
|
||||
@@ -152,34 +158,18 @@ section.
|
||||
```
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8.6
|
||||
:sync: RHEL-8.6
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/8.6/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL 8.7
|
||||
:sync: RHEL-8.7
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
# version
|
||||
ver=5.6.1
|
||||
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/8.7/main/x86_64/
|
||||
baseurl=https://repo.radeon.com/amdgpu/$ver/rhel/8.7/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -195,10 +185,13 @@ sudo yum clean all
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
# version
|
||||
ver=5.6.1
|
||||
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/8.8/main/x86_64/
|
||||
baseurl=https://repo.radeon.com/amdgpu/$ver/rhel/8.8/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -214,10 +207,13 @@ sudo yum clean all
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
# version
|
||||
ver=5.6.1
|
||||
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/9.1/main/x86_64/
|
||||
baseurl=https://repo.radeon.com/amdgpu/$ver/rhel/9.1/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -233,10 +229,13 @@ sudo yum clean all
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
# version
|
||||
ver=5.6.1
|
||||
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/9.2/main/x86_64/
|
||||
baseurl=https://repo.radeon.com/amdgpu/$ver/rhel/9.2/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -266,7 +265,7 @@ To add the ROCm repository, use the following steps, based on your distribution:
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
|
||||
for ver in 5.3.3 5.4.6 5.5.3 5.6.1; do
|
||||
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-$ver]
|
||||
name=ROCm$ver
|
||||
@@ -285,7 +284,7 @@ sudo yum clean all
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
|
||||
for ver in 5.3.3 5.4.6 5.5.3 5.6.1; do
|
||||
sudo tee --append /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-$ver]
|
||||
name=ROCm$ver
|
||||
@@ -320,7 +319,7 @@ For a comprehensive list of meta-packages, refer to
|
||||
- Sample Multi-version installation
|
||||
|
||||
```shell
|
||||
sudo yum install rocm-hip-sdk5.6 rocm-hip-sdk5.3.3
|
||||
sudo yum install rocm-hip-sdk5.6.1 rocm-hip-sdk5.5.3
|
||||
```
|
||||
|
||||
:::::
|
||||
@@ -340,10 +339,13 @@ section.
|
||||
:sync: SLES-15.4
|
||||
|
||||
```shell
|
||||
# version
|
||||
ver=5.6.1
|
||||
|
||||
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/sle/15.4/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/$ver/sle/15.4/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -356,10 +358,13 @@ sudo zypper ref
|
||||
:sync: SLES-15.5
|
||||
|
||||
```shell
|
||||
# version
|
||||
ver=5.6.1
|
||||
|
||||
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/sle/15.5/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/$ver/sle/15.5/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -384,7 +389,7 @@ sudo reboot
|
||||
To add the ROCm repository, use the following steps:
|
||||
|
||||
```shell
|
||||
for ver in 5.3.3 5.4.3 5.5.1 5.6; do
|
||||
for ver in 5.3.3 5.4.6 5.5.3 5.6.1; do
|
||||
sudo tee --append /etc/zypp/repos.d/rocm.repo <<EOF
|
||||
[ROCm-$ver]
|
||||
name=ROCm$ver
|
||||
@@ -416,7 +421,7 @@ For a comprehensive list of meta-packages, refer to
|
||||
- Sample Multi-version installation
|
||||
|
||||
```shell
|
||||
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk5.6 rocm-hip-sdk5.3.3
|
||||
sudo zypper --gpg-auto-import-keys install rocm-hip-sdk5.6.1 rocm-hip-sdk5.5.3
|
||||
```
|
||||
|
||||
:::::
|
||||
@@ -453,7 +458,7 @@ but are generally useful. Verification of the install is advised.
|
||||
2. Add binary paths to the `PATH` environment variable.
|
||||
|
||||
```shell
|
||||
export PATH=$PATH:/opt/rocm/bin:/opt/rocm-5.6/opencl/bin
|
||||
export PATH=$PATH:/opt/rocm-5.6.1/bin:/opt/rocm-5.6.1/opencl/bin
|
||||
```
|
||||
|
||||
```{attention}
|
||||
|
||||
@@ -25,8 +25,11 @@ repository to the new release.
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
# version
|
||||
version=5.6.1
|
||||
|
||||
# amdgpu repository for focal
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6/ubuntu focal main' \
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/$version/ubuntu focal main" \
|
||||
| sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
```
|
||||
@@ -36,8 +39,11 @@ sudo apt update
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
# version
|
||||
version=5.6.1
|
||||
|
||||
# amdgpu repository for jammy
|
||||
echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6/ubuntu jammy main' \
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/$version/ubuntu jammy main" \
|
||||
| sudo tee /etc/apt/sources.list.d/amdgpu.list
|
||||
sudo apt update
|
||||
```
|
||||
@@ -49,33 +55,18 @@ sudo apt update
|
||||
:sync: RHEL
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8.6
|
||||
:sync: RHEL-8.6
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/8.6/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
sudo yum clean all
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} RHEL 8.7
|
||||
:sync: RHEL-8.7
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
# version
|
||||
version=5.6.1
|
||||
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/8.7/main/x86_64/
|
||||
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/8.7/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -90,10 +81,13 @@ sudo yum clean all
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
# version
|
||||
version=5.6.1
|
||||
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/8.8/main/x86_64/
|
||||
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/8.8/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -108,10 +102,13 @@ sudo yum clean all
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
# version
|
||||
version=5.6.1
|
||||
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/rhel/9.1/main/x86_64/
|
||||
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/9.1/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -126,10 +123,13 @@ sudo yum clean all
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
# version
|
||||
version=5.6.1
|
||||
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/9.2/main/x86_64/
|
||||
baseurl=https://repo.radeon.com/amdgpu/$version/rhel/9.2/main/x86_64/
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -149,10 +149,13 @@ sudo yum clean all
|
||||
:sync: SLES-15.4
|
||||
|
||||
```shell
|
||||
# version
|
||||
version=5.6.1
|
||||
|
||||
sudo tee /etc/zypp/repos.d/amdgpu.repo <<EOF
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6/sle/15.4/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/$version/sle/15.4/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -230,7 +233,10 @@ repository to the new release.
|
||||
:sync: ubuntu-20.04
|
||||
|
||||
```shell
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.6 focal main" \
|
||||
# version
|
||||
version=5.6.1
|
||||
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$version focal main" \
|
||||
| sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
|
||||
| sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
@@ -242,7 +248,10 @@ sudo apt update
|
||||
:sync: ubuntu-22.04
|
||||
|
||||
```shell
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.6 jammy main" \
|
||||
# version
|
||||
version=5.6.1
|
||||
|
||||
echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/$version jammy main" \
|
||||
| sudo tee /etc/apt/sources.list.d/rocm.list
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' \
|
||||
| sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
@@ -260,10 +269,13 @@ sudo apt update
|
||||
:sync: RHEL-8
|
||||
|
||||
```shell
|
||||
# version
|
||||
version=5.6.1
|
||||
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-5.6]
|
||||
name=ROCm5.6
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/5.6/main
|
||||
[ROCm-$ver]
|
||||
name=ROCm$ver
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/$version/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -277,10 +289,13 @@ sudo yum clean all
|
||||
:sync: RHEL-9
|
||||
|
||||
```shell
|
||||
# version
|
||||
version=5.6.1
|
||||
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<EOF
|
||||
[ROCm-5.6]
|
||||
name=ROCm5.6
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/5.6/main
|
||||
[ROCm-$ver]
|
||||
name=ROCm$ver
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/$version/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -296,11 +311,14 @@ sudo yum clean all
|
||||
:sync: SLES
|
||||
|
||||
```shell
|
||||
# version
|
||||
version=5.6.1
|
||||
|
||||
sudo tee /etc/zypp/repos.d/rocm.repo <<EOF
|
||||
[ROCm-5.6]
|
||||
name=ROCm5.6
|
||||
[ROCm-$ver]
|
||||
name=ROCm$ver
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/zyp/5.6/main
|
||||
baseurl=https://repo.radeon.com/rocm/zyp/$version/main
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
|
||||
@@ -116,12 +116,16 @@ sudo crb enable
|
||||
|
||||
Add the perl languages repository.
|
||||
|
||||
```{note}
|
||||
Mar 25, 2024: We currently need to install the Perl module from SLES 15 SP5 as a workaround. The module was removed for SLES 15 SP4.
|
||||
```
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} SLES 15.4
|
||||
:sync: SLES-15.4
|
||||
|
||||
```shell
|
||||
zypper addrepo https://download.opensuse.org/repositories/devel:languages:perl/SLE_15_SP4/devel:languages:perl.repo
|
||||
zypper addrepo https://download.opensuse.org/repositories/devel:/languages:/perl/15.5/devel:languages:perl.repo
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
@@ -29,11 +29,11 @@ wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | \
|
||||
```shell
|
||||
# Kernel driver repository for focal
|
||||
sudo tee /etc/apt/sources.list.d/amdgpu.list <<'EOF'
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.5.1/ubuntu focal main
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6.1/ubuntu focal main
|
||||
EOF
|
||||
# ROCm repository for focal
|
||||
sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF'
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.5.1 focal main
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.6.1 focal main
|
||||
EOF
|
||||
```
|
||||
|
||||
@@ -44,11 +44,11 @@ EOF
|
||||
```shell
|
||||
# Kernel driver repository for jammy
|
||||
sudo tee /etc/apt/sources.list.d/amdgpu.list <<'EOF'
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.5.1/ubuntu jammy main
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/amdgpu/5.6.1/ubuntu jammy main
|
||||
EOF
|
||||
# ROCm repository for jammy
|
||||
sudo tee /etc/apt/sources.list.d/rocm.list <<'EOF'
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.5.1 jammy main
|
||||
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/5.6.1 jammy main
|
||||
EOF
|
||||
# Prefer packages from the rocm repository over system packages
|
||||
echo -e 'Package: *\nPin: release o=repo.radeon.com\nPin-Priority: 600' | sudo tee /etc/apt/preferences.d/rocm-pin-600
|
||||
@@ -73,33 +73,6 @@ sudo apt update
|
||||
::::
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} RHEL 8.6
|
||||
:sync: RHEL-8.6
|
||||
|
||||
```shell
|
||||
# Add the amdgpu module repository for RHEL 8.6
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/8.6/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
# Add the rocm repository for RHEL 8
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
|
||||
[rocm]
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/5.5.1/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
EOF
|
||||
```
|
||||
|
||||
:::
|
||||
|
||||
:::{tab-item} RHEL 8.7
|
||||
:sync: RHEL-8.7
|
||||
|
||||
@@ -108,7 +81,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/8.7/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6.1/rhel/8.7/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -117,7 +90,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
|
||||
[rocm]
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/5.5.1/main
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/5.6.1/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -135,7 +108,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/8.8/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6.1/rhel/8.8/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -144,7 +117,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
|
||||
[rocm]
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/5.5.1/main
|
||||
baseurl=https://repo.radeon.com/rocm/rhel8/5.6.1/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -162,7 +135,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/9.1/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6.1/rhel/9.1/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -171,7 +144,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
|
||||
[rocm]
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/5.5.1/main
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/5.6.1/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -189,7 +162,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/rhel/9.2/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6.1/rhel/9.2/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -198,7 +171,7 @@ EOF
|
||||
sudo tee /etc/yum.repos.d/rocm.repo <<'EOF'
|
||||
[rocm]
|
||||
name=rocm
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/5.5.1/main
|
||||
baseurl=https://repo.radeon.com/rocm/rhel9/5.6.1/main
|
||||
enabled=1
|
||||
priority=50
|
||||
gpgcheck=1
|
||||
@@ -225,16 +198,16 @@ sudo yum clean all
|
||||
::::
|
||||
|
||||
::::{tab-set}
|
||||
:::{tab-item} SLES 15.3
|
||||
:sync: SLES-15.3
|
||||
:::{tab-item} SLES 15.4
|
||||
:sync: SLES-15.4
|
||||
|
||||
```shell
|
||||
|
||||
# Add the amdgpu module repository for SLES 15.3
|
||||
# Add the amdgpu module repository for SLES 15.4
|
||||
sudo tee /etc/zypp/repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/sle/15.3/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6.1/sle/15.4/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
@@ -252,16 +225,16 @@ EOF
|
||||
```
|
||||
|
||||
:::
|
||||
:::{tab-item} SLES 15.4
|
||||
:sync: SLES-15.4
|
||||
:::{tab-item} SLES 15.5
|
||||
:sync: SLES-15.5
|
||||
|
||||
```shell
|
||||
|
||||
# Add the amdgpu module repository for SLES 15.4
|
||||
# Add the amdgpu module repository for SLES 15.5
|
||||
sudo tee /etc/zypp/repos.d/amdgpu.repo <<'EOF'
|
||||
[amdgpu]
|
||||
name=amdgpu
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.5.1/sle/15.4/main/x86_64
|
||||
baseurl=https://repo.radeon.com/amdgpu/5.6.1/sle/15.5/main/x86_64
|
||||
enabled=1
|
||||
gpgcheck=1
|
||||
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
|
||||
|
||||
@@ -24,7 +24,7 @@ MIGraphX is a graph compiler focused on accelerating the Machine Learning infere
|
||||
|
||||
After doing all these transformations, MIGraphX emits code for the AMD GPU by calling to MIOpen or rocBLAS or creating HIP kernels for a particular operator. MIGraphX can also target CPUs using DNNL or ZenDNN libraries.
|
||||
|
||||
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using MIGraphX's C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
|
||||
MIGraphX provides easy-to-use APIs in C++ and Python to import machine models in ONNX or TensorFlow. Users can compile, save, load, and run these models using MIGraphX C++ and Python APIs. Internally, MIGraphX parses ONNX or TensorFlow models into internal graph representation where each operator in the model gets mapped to an operator within MIGraphX. Each of these operators defines various attributes such as:
|
||||
|
||||
- Number of arguments
|
||||
|
||||
@@ -187,7 +187,7 @@ Follow these steps:
|
||||
}
|
||||
```
|
||||
|
||||
2. To compile this program, you can use CMake and you only need to link the `migraphx::c` library to use MIGraphX's C++ API. The following is the `CMakeLists.txt` file that can build the earlier example:
|
||||
2. To compile this program, you can use CMake and you only need to link the `migraphx::c` library to use the MIGraphX C++ API. The following is the `CMakeLists.txt` file that can build the earlier example:
|
||||
|
||||
```cmake
|
||||
cmake_minimum_required(VERSION 3.5)
|
||||
@@ -327,7 +327,7 @@ To run generated `.mxr` files through `migraphx-driver`, use the following:
|
||||
./path/to/migraphx-driver run --migraphx resnet50.mxr --enable-offload-copy
|
||||
```
|
||||
|
||||
Alternatively, you can use MIGraphX's C++ or Python API to generate `.mxr` file. Refer to {numref}`image018` for an example.
|
||||
Alternatively, you can use the MIGraphX C++ or Python API to generate `.mxr` file. Refer to {numref}`image018` for an example.
|
||||
|
||||
```{figure} ../../data/understand/deep_learning/image.018.png
|
||||
:name: image018
|
||||
|
||||
@@ -299,7 +299,7 @@ USE_ROCM=1 MAX_JOBS=4 python3 setup.py install --user
|
||||
### Test the PyTorch Installation
|
||||
|
||||
You can use PyTorch unit tests to validate a PyTorch installation. If using a
|
||||
prebuilt PyTorch Docker image from AMD ROCm DockerHub or installing an official
|
||||
prebuilt PyTorch Docker image from AMD ROCm Docker Hub or installing an official
|
||||
wheels package, these tests are already run on those configurations.
|
||||
Alternatively, you can manually run the unit tests to validate the PyTorch
|
||||
installation fully.
|
||||
|
||||
@@ -8,7 +8,7 @@ The software support matrices for ROCm container releases is listed.
|
||||
|
||||
#### `Ubuntu+ rocm5.6_internal_testing +169530b`
|
||||
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6.1/)
|
||||
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
|
||||
* [Torch 2.0.0](https://github.com/ROCmSoftwarePlatform/pytorch/tree/rocm5.6_internal_testing)
|
||||
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
|
||||
@@ -21,7 +21,7 @@ The software support matrices for ROCm container releases is listed.
|
||||
|
||||
#### `CentOS7+ rocm5.6_internal_testing +169530b`
|
||||
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6.1/)
|
||||
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
|
||||
* [Torch 2.0.0](https://github.com/ROCmSoftwarePlatform/pytorch/tree/rocm5.6_internal_testing)
|
||||
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
|
||||
@@ -31,7 +31,7 @@ The software support matrices for ROCm container releases is listed.
|
||||
|
||||
#### `1.13 +bfeb431`
|
||||
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6.1/)
|
||||
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
|
||||
* [Torch 1.13.1](https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.13)
|
||||
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
|
||||
@@ -44,7 +44,7 @@ The software support matrices for ROCm container releases is listed.
|
||||
|
||||
#### `1.12 +05d5d04`
|
||||
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6.1/)
|
||||
* [Python 3.8](https://www.python.org/downloads/release/python-380/)
|
||||
* [Torch 1.12.1](https://github.com/ROCmSoftwarePlatform/pytorch/tree/release/1.12)
|
||||
* [Apex 0.1](https://github.com/ROCmSoftwarePlatform/apex/tree/v0.1)
|
||||
@@ -59,7 +59,7 @@ The software support matrices for ROCm container releases is listed.
|
||||
|
||||
#### `tensorflow_develop-upstream-QA-rocm56 +c88a9f4`
|
||||
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6.1/)
|
||||
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
|
||||
* `tensorflow-rocm` 2.13.0
|
||||
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
|
||||
@@ -69,7 +69,7 @@ The software support matrices for ROCm container releases is listed.
|
||||
|
||||
#### `r2.11-rocm-enhanced +5be4141`
|
||||
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6.1/)
|
||||
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
|
||||
* [`tensorflow-rocm` 2.11.0](https://pypi.org/project/tensorflow-rocm/2.11.0.540/)
|
||||
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
|
||||
@@ -79,7 +79,7 @@ The software support matrices for ROCm container releases is listed.
|
||||
|
||||
#### `r2.10-rocm-enhanced +72789a3`
|
||||
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/latest/)
|
||||
* [ROCm5.6](https://repo.radeon.com/rocm/apt/5.6.1/)
|
||||
* [Python 3.9](https://www.python.org/downloads/release/python-390/)
|
||||
* [`tensorflow-rocm` 2.10.1](https://pypi.org/project/tensorflow-rocm/2.10.1.540/)
|
||||
* [OFED 5.3](https://content.mellanox.com/ofed/MLNX_OFED-5.3-1.0.5.0/MLNX_OFED_LINUX-5.3-1.0.5.0-ubuntu20.04-x86_64.tgz)
|
||||
|
||||
@@ -5,7 +5,6 @@ The following table is a list of ROCm components with links to their respective
|
||||
terms. These components may include third party components subject to
|
||||
additional licenses. Please review individual repositories for more information.
|
||||
The table shows ROCm components, the name of license and link to the license terms.
|
||||
The table is ordered to follow ROCm's manifest file.
|
||||
|
||||
<!-- spellcheck-disable -->
|
||||
| Component | License |
|
||||
|
||||
@@ -18,7 +18,7 @@ integrated into ML frameworks such as PyTorch and TensorFlow. ROCm can be
|
||||
deployed in many ways, including through the use of containers such as Docker,
|
||||
Spack, and your own build from source.
|
||||
|
||||
ROCm’s goal is to allow our users to maximize their GPU hardware investment.
|
||||
The goal of ROCm is to allow our users to maximize their GPU hardware investment.
|
||||
ROCm is designed to help develop, test and deploy GPU accelerated HPC, AI,
|
||||
scientific computing, CAD, and other applications in a free, open-source,
|
||||
integrated and secure software ecosystem.
|
||||
|
||||
@@ -13,20 +13,20 @@ The full list of HSA system architecture platform requirements are here: `HSA Sy
|
||||
|
||||
The ROCm Platform uses the new PCI Express 3.0 (PCIe 3.0) features for Atomic Read-Modify-Write Transactions which extends inter-processor synchronization mechanisms to IO to support the defined set of HSA capabilities needed for queuing and signaling memory operations.
|
||||
|
||||
The new PCIe AtomicOps operate as completers for ``CAS`` (Compare and Swap), ``FetchADD``, ``SWAP`` atomics. The AtomicsOps are initiated by the
|
||||
The new PCIe atomic operations operate as completers for ``CAS`` (Compare and Swap), ``FetchADD``, ``SWAP`` atomics. The atomic operations are initiated by the
|
||||
I/O device which support 32-bit, 64-bit and 128-bit operand which target address have to be naturally aligned to operation sizes.
|
||||
|
||||
For ROCm the Platform atomics are used in ROCm in the following ways:
|
||||
|
||||
* Update HSA queue’s read_dispatch_id: 64 bit atomic add used by the command processor on the GPU agent to update the packet ID it processed.
|
||||
* Update HSA queue’s read_dispatch_id: 64 bit atomic add used by the command processor on the GPU agent to update the packet ID it processed.
|
||||
* Update HSA queue’s write_dispatch_id: 64 bit atomic add used by the CPU and GPU agent to support multi-writer queue insertions.
|
||||
* Update HSA Signals – 64bit atomic ops are used for CPU & GPU synchronization.
|
||||
|
||||
The PCIe 3.0 AtomicOp feature allows atomic transactions to be requested by, routed through and completed by PCIe components. Routing and completion does not require software support. Component support for each is detectable via the DEVCAP2 register. Upstream bridges need to have AtomicOp routing enabled or the Atomic Operations will fail even though PCIe endpoint and PCIe I/O Devices has the capability to Atomics Operations.
|
||||
The PCIe 3.0 atomic operations feature allows atomic transactions to be requested by, routed through and completed by PCIe components. Routing and completion does not require software support. Component support for each is detectable via the DEVCAP2 register. Upstream bridges need to have atomic operations routing enabled or the Atomic Operations will fail even though PCIe endpoint and PCIe I/O Devices has the capability to Atomics Operations.
|
||||
|
||||
To do AtomicOp routing capability between two or more Root Ports, each associated Root Port must indicate that capability via the AtomicOp Routing Supported bit in the Device Capabilities 2 register.
|
||||
To do atomic operations routing capability between two or more Root Ports, each associated Root Port must indicate that capability via the atomic operations routing supported bit in the Device Capabilities 2 register.
|
||||
|
||||
If your system has a PCIe Express Switch it needs to support AtomicsOp routing. Again AtomicOp requests are permitted only if a component’s ``DEVCTL2.ATOMICOP_REQUESTER_ENABLE`` field is set. These requests can only be serviced if the upstream components support AtomicOp completion and/or routing to a component which does. AtomicOp Routing Support=1 Routing is supported, AtomicOp Routing Support=0 routing is not supported.
|
||||
If your system has a PCIe Express Switch it needs to support atomic operations routing. Atomic operations requests are permitted only if a component’s ``DEVCTL2.ATOMICOP_REQUESTER_ENABLE`` field is set. These requests can only be serviced if the upstream components support atomic operations completion and/or routing to a component which does. Atomic operations routing support=1, routing is supported; Atomic operations routing support=0, routing is not supported.
|
||||
|
||||
Atomic Operation is a Non-Posted transaction supporting 32-bit and 64-bit address formats, there must be a response for Completion containing the result of the operation. Errors associated with the operation (uncorrectable error accessing the target location or carrying out the Atomic operation) are signaled to the requester by setting the Completion Status field in the completion descriptor, they are set to to Completer Abort (CA) or Unsupported Request (UR).
|
||||
|
||||
@@ -71,7 +71,7 @@ BAR Memory Overview
|
||||
*******************
|
||||
On a Xeon E5 based system in the BIOS we can turn on above 4GB PCIe addressing, if so he need to set MMIO Base address ( MMIOH Base) and Range ( MMIO High Size) in the BIOS.
|
||||
|
||||
In SuperMicro system in the system bios you need to see the following
|
||||
In Supermicro system in the system bios you need to see the following
|
||||
|
||||
* Advanced->PCIe/PCI/PnP configuration-> Above 4G Decoding = Enabled
|
||||
|
||||
@@ -79,7 +79,7 @@ In SuperMicro system in the system bios you need to see the following
|
||||
|
||||
* Advanced->PCIe/PCI/PnP Configuration->MMIO High Size = 256G
|
||||
|
||||
When we support Large Bar Capability there is a Large Bar Vbios which also disable the IO bar.
|
||||
When we support Large Bar Capability there is a Large Bar VBIOS which also disable the IO bar.
|
||||
|
||||
For GFX9 and Vega10 which have Physical Address up 44 bit and 48 bit Virtual address.
|
||||
|
||||
@@ -118,30 +118,5 @@ Legend:
|
||||
|
||||
5 : Expansion ROM – This is required for the AMD Driver SW to access the GPU’s video-bios. This is currently fixed at 128KB.
|
||||
|
||||
Excepts form Overview of Changes to PCI Express 3.0
|
||||
===================================================
|
||||
By Mike Jackson, Senior Staff Architect, MindShare, Inc.
|
||||
********************************************************
|
||||
Atomic Operations – Goal:
|
||||
*************************
|
||||
Support SMP-type operations across a PCIe network to allow for things like offloading tasks between CPU cores and accelerators like a GPU. The spec says this enables advanced synchronization mechanisms that are particularly useful with multiple producers or consumers that need to be synchronized in a non-blocking fashion. Three new atomic non-posted requests were added, plus the corresponding completion (the address must be naturally aligned with the operand size or the TLP is malformed):
|
||||
|
||||
* Fetch and Add – uses one operand as the “add” value. Reads the target location, adds the operand, and then writes the result back to the original location.
|
||||
|
||||
* Unconditional Swap – uses one operand as the “swap” value. Reads the target location and then writes the swap value to it.
|
||||
|
||||
* Compare and Swap – uses 2 operands: first data is compare value, second is swap value. Reads the target location, checks it against the compare value and, if equal, writes the swap value to the target location.
|
||||
|
||||
* AtomicOpCompletion – new completion to give the result so far atomic request and indicate that the atomicity of the transaction has been maintained.
|
||||
|
||||
Since AtomicOps are not locked they don't have the performance downsides of the PCI locked protocol. Compared to locked cycles, they provide “lower latency, higher scalability, advanced synchronization algorithms, and dramatically lower impact on other PCIe traffic.” The lock mechanism can still be used across a bridge to PCI or PCI-X to achieve the desired operation.
|
||||
|
||||
AtomicOps can go from device to device, device to host, or host to device. Each completer indicates whether it supports this capability and guarantees atomic access if it does. The ability to route AtomicOps is also indicated in the registers for a given port.
|
||||
|
||||
ID-based Ordering – Goal:
|
||||
*************************
|
||||
Improve performance by avoiding stalls caused by ordering rules. For example, posted writes are never normally allowed to pass each other in a queue, but if they are requested by different functions, we can have some confidence that the requests are not dependent on each other. The previously reserved Attribute bit [2] is now combined with the RO bit to indicate ID ordering with or without relaxed ordering.
|
||||
|
||||
This only has meaning for memory requests, and is reserved for Configuration or IO requests. Completers are not required to copy this bit into a completion, and only use the bit if their enable bit is set for this operation.
|
||||
|
||||
To read more on PCIe Gen 3 new options https://www.mindshare.com/files/resources/PCIe%203-0.pdf
|
||||
For more information, you can review
|
||||
`Overview of Changes to PCI Express 3.0 <https://www.mindshare.com/files/resources/PCIe%203-0.pdf>`_.
|
||||
@@ -4,7 +4,7 @@ Using CMake
|
||||
|
||||
Most components in ROCm support CMake. Projects depending on header-only or
|
||||
library components typically require CMake 3.5 or higher whereas those wanting
|
||||
to make use of CMake's HIP language support will require CMake 3.21 or higher.
|
||||
to make use of the CMake HIP language support will require CMake 3.21 or higher.
|
||||
|
||||
Finding Dependencies
|
||||
====================
|
||||
@@ -16,7 +16,7 @@ Finding Dependencies
|
||||
<https://cmake.org/cmake/help/latest/command/find_package.html>`_ and the
|
||||
`Using Dependencies Guide
|
||||
<https://cmake.org/cmake/help/latest/guide/using-dependencies/index.html>`_
|
||||
to get an overview of CMake's related facilities.
|
||||
to get an overview of CMake related facilities.
|
||||
|
||||
In short, CMake supports finding dependencies in two ways:
|
||||
|
||||
@@ -28,7 +28,7 @@ In short, CMake supports finding dependencies in two ways:
|
||||
regards needed to consume it.
|
||||
|
||||
ROCm predominantly relies on Config mode, one notable exception being the Module
|
||||
driving the compilation of HIP programs on Nvidia runtimes. As such, when
|
||||
driving the compilation of HIP programs on NVIDIA runtimes. As such, when
|
||||
dependencies are not found in standard system locations, one either has to
|
||||
instruct CMake to search for package config files in additional folders using
|
||||
the ``CMAKE_PREFIX_PATH`` variable (a semi-colon separated list of filesystem
|
||||
@@ -55,8 +55,8 @@ to the installation guides in these docs (`Linux <../deploy/linux/index.html>`_)
|
||||
Using HIP in CMake
|
||||
==================
|
||||
|
||||
ROCm componenents providing a C/C++ interface support being consumed using any
|
||||
C/C++ toolchain that CMake knows how to drive. ROCm also supports CMake's HIP
|
||||
ROCm components providing a C/C++ interface support being consumed using any
|
||||
C/C++ toolchain that CMake knows how to drive. ROCm also supports the CMake HIP
|
||||
language features, allowing users to program using the HIP single-source
|
||||
programming model. When a program (or translation-unit) uses the HIP API without
|
||||
compiling any GPU device code, HIP can be treated in CMake as a simple C/C++
|
||||
@@ -172,7 +172,7 @@ all the flags necessary for device compilation.
|
||||
.. note::
|
||||
Compiling for the GPU device requires at least C++11.
|
||||
|
||||
This project can then be configured with for eg.
|
||||
This project can then be configured with the following CMake commands:
|
||||
|
||||
- Windows: ``cmake -D CMAKE_CXX_COMPILER:PATH=${env:HIP_PATH}\bin\clang++.exe``
|
||||
|
||||
@@ -186,7 +186,7 @@ When using the CXX language support to compile HIP device code, selecting the
|
||||
target GPU architectures is done via setting the ``GPU_TARGETS`` variable.
|
||||
``CMAKE_HIP_ARCHITECTURES`` only exists when the HIP language is enabled. By
|
||||
default, this is set to some subset of the currently supported architectures of
|
||||
AMD ROCm. It can be set to eg. ``-D GPU_TARGETS="gfx1032;gfx1035"``.
|
||||
AMD ROCm. It can be set to the CMake option ``-D GPU_TARGETS="gfx1032;gfx1035"``.
|
||||
|
||||
ROCm CMake Packages
|
||||
-------------------
|
||||
@@ -251,9 +251,9 @@ options.
|
||||
|
||||
IDEs supporting CMake (Visual Studio, Visual Studio Code, CLion, etc.) all came
|
||||
up with their own way to register command-line fragments of different purpose in
|
||||
a setup'n'forget fashion for quick assembly using graphical front-ends. This is
|
||||
a setup-and-forget fashion for quick assembly using graphical front-ends. This is
|
||||
all nice, but configurations aren't portable, nor can they be reused in
|
||||
Continuous Intergration (CI) pipelines. CMake has condensed existing practice
|
||||
Continuous Integration (CI) pipelines. CMake has condensed existing practice
|
||||
into a portable JSON format that works in all IDEs and can be invoked from any
|
||||
command-line. This is
|
||||
`CMake Presets <https://cmake.org/cmake/help/latest/manual/cmake-presets.7.html>`_
|
||||
|
||||
@@ -10,6 +10,6 @@ disambiguates compiler naming used throughout the documentation.
|
||||
| `amdclang++` | Clang/LLVM-based compiler that is part of `rocm-llvm` package. The source code is available at <a href="https://github.com/RadeonOpenCompute/llvm-project" target="_blank">https://github.com/RadeonOpenCompute/llvm-project</a>. |
|
||||
| AOCC | Closed-source clang-based compiler that includes additional CPU optimizations. Offered as part of ROCm via the `rocm-llvm-alt` package. See for details, <a href="https://developer.amd.com/amd-aocc/" target="_blank">https://developer.amd.com/amd-aocc/</a>. |
|
||||
| HIP-Clang | Informal term for the `amdclang++` compiler |
|
||||
| HIPify | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPIFY" target="_blank">https://github.com/ROCm-Developer-Tools/HIPIFY</a> |
|
||||
| HIPIFY | Tools including `hipify-clang` and `hipify-perl`, used to automatically translate CUDA source code into portable HIP C++. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPIFY" target="_blank">https://github.com/ROCm-Developer-Tools/HIPIFY</a> |
|
||||
| `hipcc` | HIP compiler driver. A utility that invokes `clang` or `nvcc` depending on the target and passes the appropriate include and library options for the target compiler and HIP infrastructure. The source code is available at <a href="https://github.com/ROCm-Developer-Tools/HIPCC" target="_blank">https://github.com/ROCm-Developer-Tools/HIPCC</a>. |
|
||||
| ROCmCC | Clang/LLVM-based compiler. ROCmCC in itself is not a binary but refers to the overall compiler. |
|
||||
|
||||
15
tools/autotag/templates/rocm_changes/5.6.1.md
Normal file
15
tools/autotag/templates/rocm_changes/5.6.1.md
Normal file
@@ -0,0 +1,15 @@
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
|
||||
### What's New in This Release
|
||||
|
||||
ROCm 5.6.1 is a point release with several bug fixes in the HIP runtime.
|
||||
|
||||
## HIP 5.6.1 (for ROCm 5.6.1)
|
||||
|
||||
### Fixed Defects
|
||||
|
||||
- *hipMemcpy* device-to-device (intra device) is now asynchronous with respect to the host
|
||||
- Enabled xnack+ check in HIP catch2 tests hang when executing tests
|
||||
- Memory leak when code object files are loaded/unloaded via hipModuleLoad/hipModuleUnload APIs
|
||||
- Using *hipGraphAddMemFreeNode* no longer results in a crash
|
||||
Reference in New Issue
Block a user