mirror of https://github.com/ROCm/ROCm.git synced 2026-01-10 07:08:08 -05:00

Files

Sam Wu 8311130829 New template for changelog (#100 )

* Use components.xml instead of default.xml

* Rm unused var

* Use category instead of group

* Add group and category

* Change changelog template

* Conditional display

* Remove sort

* Add mappings

* Jinja does not track state

* Handle dupe logic in python

* Construct doc page and repo url

* Add repo url

* Add doc page

* Avoid using bare URL

* Add None key

* Test release notes

2024-07-03 09:17:21 -06:00

25 KiB

Raw Blame History

Release notes

This page contains the release notes for AMD ROCm™ Software.

ROCm 6.1.2

ROCm 6.1.2 includes enhancements to SMI tools and improvements to some libraries.

OS support

ROCm 6.1.2 has been tested against a pre-release version of Ubuntu 22.04.5 (kernel: 5.15 [GA], 6.8 [HWE]).

AMD SMI

AMD SMI for ROCm 6.1.2

Additions

Added process isolation and clean shader APIs and CLI commands.
- amdsmi_get_gpu_process_isolation()
- amdsmi_set_gpu_process_isolation()
- amdsmi_set_gpu_clear_sram_data()
Added the MIN_POWER metric to output provided by amd-smi static --limit.

Optimizations

Updated the amd-smi monitor --pcie output to prevent delays with the monitor command.

Changes

Updated amismi_get_power_cap_info to return values in uW instead of W.
Updated Python library return types for amdsmi_get_gpu_memory_reserved_pages and amdsmi_get_gpu_bad_page_info.
Updated the output of amd-smi metric --ecc-blocks to show counters available from blocks.

Fixes

amdsmi_get_gpu_board_info() no longer returns junk character strings.
amd-smi metric --power now correctly details power output for RDNA3, RDNA2, and MI1x devices.
Fixed the amdsmitstReadWrite.TestPowerCapReadWrite test for RDNA3, RDNA2, and MI100 devices.
Fixed an issue with the amdsmi_get_gpu_memory_reserved_pages and amdsmi_get_gpu_bad_page_info Python interface calls.

Removals

Removed the amdsmi_get_gpu_process_info API from the Python library. It was removed from the C library in an earlier release.

See the AMD SMI [detailed changelog](https://github.com/ROCm/amdsmi/blob/rocm-6.1.x/CHANGELOG.md) with code samples for more information.

HIPCC

HIPCC for ROCm 6.1.2

Changes

Upcoming: a future release will enable use of compiled binaries hipcc.bin and hipconfig.bin by default. No action is needed by users; you may continue calling high-level Perl scripts hipcc and hipconfig. hipcc.bin and hipconfig.bin will be invoked by the high-level Perl scripts. To revert to the previous behavior and invoke hipcc.pl and hipconfig.pl, set the HIP_USE_PERL_SCRIPTS environment variable to 1.
Upcoming: a subsequent release will remove high-level Perl scripts hipcc and hipconfig. This release will remove the HIP_USE_PERL_SCRIPTS environment variable. It will rename hipcc.bin and hipconfig.bin to hipcc and hipconfig respectively. No action is needed by the users. To revert to the previous behavior, invoke hipcc.pl and hipconfig.pl explicitly.
Upcoming: a subsequent release will remove hipcc.pl and hipconfig.pl.

ROCm SMI

ROCm SMI for ROCm 6.1.2

Additions

Added the ring hang event to the amdsmi_evt_notification_type_t enum.

Fixes

Fixed an issue causing ROCm SMI to incorrectly report GPU utilization for RDNA3 GPUs. See the issue on GitHub.
Fixed the parsing of pp_od_clk_voltage in get_od_clk_volt_info to work better with MI-series hardware.

Library changes in ROCm 6.1.2

Category	Group	Name	Version	Repository
Libraries	Machine Learning and Computer Vision	composable_kernel	0.2.0	ROCm/composable_kernel
		AMDMIGraphX	2.9	ROCm/AMDMIGraphX
		MIOpen	3.1.0	ROCm/MIOpen
		MIVisionX	2.5.0	ROCm/MIVisionX
		rpp	1.5.0	ROCm/rpp
	Communication	rccl	2.18.6	ROCm/rccl
		hipBLAS	2.1.0	ROCm/hipBLAS
		hipBLASLt	0.7.0	ROCm/hipBLASLt
		hipFFT	1.0.14	ROCm/hipFFT
		hipRAND	2.10.17	ROCm/hipRAND
		hipSOLVER	2.1.1	ROCm/hipSOLVER
		hipSPARSE	3.0.1	ROCm/hipSPARSE
		hipSPARSELt	0.2.0	ROCm/hipSPARSELt
		rocALUTION	3.1.1	ROCm/rocALUTION
		rocBLAS	4.1.2	ROCm/rocBLAS
		rocFFT	1.0.27	ROCm/rocFFT
		rocRAND	3.0.1	ROCm/rocRAND
		rocSOLVER	3.25.0	ROCm/rocSOLVER
		rocSPARSE	3.1.2	ROCm/rocSPARSE
		rocWMMA	1.4.0	ROCm/rocWMMA
		Tensile	4.40.0	ROCm/Tensile
	Primitives	hipCUB	3.1.0	ROCm/hipCUB
		hipTensor	1.2.0	ROCm/hipTensor
		rocPRIM	3.1.0	ROCm/rocPRIM
		rocThrust	3.0.1	ROCm/rocThrust
		ROCdbgapi	0.71.0	ROCm/ROCdbgapi
		rocm-cmake	0.12.0	ROCm/rocm-cmake

AMDMIGraphX

MIGraphX 2.9 for ROCm 6.1.2

Additions

Added FP8 support
Created a dockerfile with MIGraphX+ONNX Runtime EP+Torch
Added support for the Hardmax, DynamicQuantizeLinear, Qlinearconcat, Unique, QLinearAveragePool, QLinearSigmoid, QLinearLeakyRelu, QLinearMul, IsInf operators
Created web site examples for Whisper, Llama-2, and Stable Diffusion 2.1
Created examples of using the ONNX Runtime MIGraphX Execution Provider with the InceptionV3 and Resnet50 models
Updated operators to support ONNX Opset 19
Enable fuse_pointwise and fuse_reduce in the driver
Add support for dot-(mul)-softmax-dot offloads to MLIR
Added Blas auto-tuning for GEMMs
Added dynamic shape support for the multinomial operator
Added fp16 to accuracy checker
Added initial code for running on Windows OS

Optimizations

Improved the output of migraphx-driver command
Documentation now shows all environment variables
Updates needed for general stride support
Enabled Asymmetric Quantization
Added ScatterND unsupported reduction modes
Rewrote softmax for better performance
General improvement to how quantization is performed to support INT8
Used problem_cache for gemm tuning
Improved performance by always using rocMLIR for quantized convolution
Improved group convolutions by using rocMLIR
Improved accuracy of fp16 models
ScatterElements unsupported reduction
Added concat fusions
Improved INT8 support to include UINT8
Allow reshape ops between dq and quant_op
Improve dpp reductions on navi
Have the accuracy checker print the whole final buffer
Added support for handling dynamic Slice and ConstantOfShape ONNX operators
Add support for the dilations attribute to Pooling ops
Add layout attribute support for LSTM operator
Improved performance by removing contiguous for reshapes
Handle all slice input variations
Add scales attribute parse in upsample for older opset versions
Added support for uneven Split operations
Improved unit testing to run in python virtual environments

Fixes

Fixed outstanding issues in autogenerated documentation
Update model zoo paths for examples
Fixed promote_literals_test by using additional if condition
Fixed export API symbols from dynamic library
Fixed bug in pad operator from dimension reduction
Fixed using the LD to embed files and enable by default when building shared libraries on linux
fixed get_version()
Fixed Round operator inaccuracy
Fixed wrong size check when axes not present for slice
Set the .SO version correctly

Changes

Cleanup LSTM and RNN activation functions
Placed gemm_pointwise at a higher priority than layernorm_pointwise
Updated README to mention the need to include GPU_TARGETS when building MIGraphX

Removals

Removed unused device kernels from Gather and Pad operators
Removed int8x4 format

composable_kernel

CK 0.2.0 for ROCm 6.1.2

Fixes

Fixed a bug in 6-dimensional kernels (#555)
Fixed a test case failure with grouped convolution backward weight (#524)

Optimizations

Improved the performance of the normalization kernel

Additions

New CMake flags:
- "DL_KERNELS"-* Must be set to "ON" in order to build the gemm_dl and batched_gemm_multi_d_dl instances
- "DTYPES" -- Can be set to any subset of "fp64;fp32;fp16;fp8;bf16;int8" to build an instance of the specified data types
- "INSTANCES_ONLY" -- Only builds CK library and instances without tests, examples, or profiler
New feature: if GPU_TARGETS is not set in the CMake command line, CK will be built for all targets supported by the compiler
Support for MI300A/MI300X
Support for AMD RDNA 3
New user tutorial (#563)
Additional instances for irregular GEMM sizes (#560)
New inter-wave consumer-producer programming model for GEMM kernels (#310)
GEMM with support multiple elementwise fusions (multi-D) (#534)
Multi-embeddings support (#542)
AMD RDNA 3 blockwise GEMM and real GEMM support (#541)
AMD RDNA grouped convolution backward weight support (#505)
MaxPool and AvgPool forward (#815); MaxPool backward (#750)

Changes

None

hipBLAS

hipBLAS 2.1.0 for ROCm 6.1.2

Additions

New build option to automatically use hipconfig --platform to determine HIP platform
Level 1 functions have additional ILP64 API for both C and Fortran (_64 name suffix) with int64_t function arguments
New functions hipblasGetMathMode and hipblasSetMathMode

Deprecations

USE_CUDA build option; use HIP_PLATFORM=amd or HIP_PLATFORM=nvidia to override hipconfig

Changes

Some Level 2 function argument names have changed from m to n to match legacy BLAS; there was no change in implementation.
Updated client code to use YAML-based testing
Renamed .doxygen and .sphinx folders to doxygen and sphinx, respectively
Added CMake support for documentation

hipBLASLt

hipBLASLt 0.7.0 for ROCm 6.1.2

Additions

Added hipblasltExtSoftmax extension API
Added hipblasltExtLayerNorm extension API
Added hipblasltExtAMax extension API
Added GemmTuning extension parameter to set split-k by user
Support for mix precision datatype: fp16/fp8 in with fp16 out

Deprecations

algoGetHeuristic() ext API for GroupGemm will be deprecated in a future release of hipBLASLt

hipCUB

hipCUB 3.1.0 for ROCm 6.1.2

Changed

CUB backend references CUB and Thrust version 2.1.0.
Updated HIPCUB_HOST_WARP_THREADS macro definition to match host_warp_size changes from rocPRIM 3.0.
Implemented __int128_t and __uint128_t support for radix_sort.

Fixed

Fixed build issues with rmake.py on Windows when using VS 2017 15.8 or later due to a breaking fix with extended aligned storage.

Added

Added interface DeviceMemcpy::Batched for batched memcpy from rocPRIM and CUB.

hipFFT

hipFFT 1.0.14 for ROCm 6.1.2

Changes

When building hipFFT from source, rocFFT code no longer needs to be initialized as a git submodule.

Fixes

Fixed error when creating length-1 plans.

hipRAND

hipRAND 2.10.17 for ROCm 6.1.2

Fixes

Fixed benchmark and unit test builds on Windows

hipSOLVER

hipSOLVER 2.1.1 for ROCm 6.1.2

Changed

BUILD_WITH_SPARSE now defaults to OFF on Windows.

Fixed

Fixed benchmark client build when BUILD_WITH_SPARSE is OFF.

hipSPARSE

hipSPARSE 3.0.1 for ROCm 6.1.2

Fixes

Fixes to the build chain

hipSPARSELt

hipSPARSELt 0.2.0 for ROCm 6.1.2

Added

Support Matrix B is a Structured Sparsity Matrix.

hipTensor

hipTensor 1.2.0 for ROCm 6.1.2

Additions

API support for permutation of rank 4 tensors: f16 and f32
New datatype support in contractions of rank 4: f16, bf16, complex f32, complex f64
Added scale and bilinear contraction samples and tests for new supported data types
Added permutation samples and tests for f16, f32 types

Fixes

Fixed bug in contraction calculation with data type f32

MIOpen

MIOpen 3.1.0 for ROCm 6.1.2

Added

CK-based 2d/3d convolution solvers to support nchw/ncdhw layout
Fused solver for Fwd Convolution with Residual, Bias and activation
AI Based Parameter Prediction Model for conv_hip_igemm_group_fwd_xdlops Solver
Forward, backward data and backward weight convolution solver with fp8/bfp8
check for packed tensors for convolution solvers
Integrate CK's layer norm
Combine gtests into single binary

Fixed

fix for backward passes bwd/wrw for CK group conv 3d
Fixed out-of-bounds memory access : ConvOclDirectFwdGen
fixed build failure due to hipRTC

Changed

Standardize workspace abstraction
Use split CK libraries

Removed

clamping to MAX from CastTensor used in Bwd and WrW convolution

MIVisionX

MIVisionX for ROCm 6.1.2

Additions

CTest: Tests for install verification
Hardware support updates
Doxygen support for API documentation

Optimizations

CMakeList Cleanup
Readme

Changes

rocAL: PyBind Link to prebuilt library
- PyBind11
- RapidJSON
Setup Updates
RPP - Use package install
Dockerfiles: Updates & bugfix
CuPy - No longer installed with setup.py

Fixes

rocAL bug fix and updates

Tested Configurations

Windows 10 / 11
Linux distribution
- Ubuntu - 20.04 / 22.04
- CentOS - 7 / 8
- RHEL - 8 / 9
- SLES - 15-SP4
ROCm: rocm-core - 5.7.0.50700-6
miopen-hip - 2.20.0.50700-63
MIGraphX - 2.7.0.50700-63
Protobuf - V3.12.4
OpenCV - 4.6.0
RPP - [1.5.0]
FFMPEG - n4.4.2
Dependencies for all preceding packages
MIVisionX setup script - V2.6.1

Known Issues

OpenCV 4.X support for some applications is missing
MIVisionX package install requires manual prerequisites installation

rccl

RCCL 2.18.6 for ROCm 6.1.2

Changed

Reduced NCCL_TOPO_MAX_NODES to limit stack usage and avoid overflow

rocALUTION

rocALUTION 3.1.1 for ROCm 6.1.2

Additions

TripleMatrixProduct functionality for GlobalMatrix
Multi-Node/GPU support for UA-AMG, SA-AMG and RS-AMG
Iterative ILU0 preconditioner ItILU0
Iterative triangular solve, selectable via SolverDecr class

Deprecations

LocalMatrix::AMGConnect
LocalMatrix::AMGAggregate
LocalMatrix::AMGPMISAggregate
LocalMatrix::AMGSmoothedAggregation
LocalMatrix::AMGAggregation
PairwiseAMG

Known Issues

PairwiseAMG does currently not support matrix sizes that exceed int32 range
PairwiseAMG might fail building the hierarchy on certain input matrices

rocBLAS

rocBLAS 4.1.2 for ROCm 6.1.2

Fixes

Fixes BF16 TT get_solutions

Optimizations

Tune gfx942 BBS TN, TT

ROCdbgapi

rocm-dbgapi 0.71.0 for ROCm 6.1.2

Added

Add support for gfx940, gfx941 and gfx942 architectures.

rocFFT

rocFFT 1.0.27 for ROCm 6.1.2

Fixes

Fixed kernel launch failure on execute of very large odd-length real-complex transforms.

Additions

Enable multi-gpu testing on systems without direct GPU-interconnects

rocm-cmake

rocm-cmake 0.12.0 for ROCm 6.1.2

Changed

ROCMSphinxDoc: Allow separate source and config directories.
ROCMCreatePackage: Allow additional PROVIDES on header-only packages.
ROCMInstallTargets: Don't install executable targets by default for ASAN builds.
ROCMTest: Add RPATH for installed tests.
Finalize rename to ROCmCMakeBuildTools

Fixed

ROCMClangTidy: Fixed invalid list index.
Test failures when ROCM_CMAKE_GENERATOR is empty.

rocPRIM

rocPRIM 3.1.0 for ROCm 6.1.2

Additions

New primitive: block_run_length_decode
New primitive: batch_memcpy

Changes

Renamed:
- scan_config_v2 to scan_config
- scan_by_key_config_v2 to scan_by_key_config
- radix_sort_config_v2 to radix_sort_config
- reduce_by_key_config_v2 to reduce_by_key_config
- radix_sort_config_v2 to radix_sort_config
Removed support for custom config types for device algorithms
host_warp_size() was moved into rocprim/device/config_types.hpp; it now uses either device_id or a stream parameter to query the proper device and a device_id out parameter
- The return type is hipError_t
Added support for __int128_t in device_radix_sort and block_radix_sort
Improved the performance of match_any, and block_histogram which uses it

Deprecations

Removed reduce_by_key_config, MatchAny, scan_config, scan_by_key_config, and radix_sort_config

Fixes

Build issues with rmake.py on Windows when using VS 2017 15.8 or later (due to a breaking fix with extended aligned storage)

rocRAND

rocRAND 3.0.1 for ROCm 6.1.2

Fixes

Implemented workaround for regressions in XORWOW and LFSR on MI200

rocSOLVER

rocSOLVER 3.25.0 for ROCm 6.1.2

Added

Eigensolver routines for symmetric/hermitian matrices using Divide & Conquer and Jacobi algorithm:
- SYEVDJ (with batched and strided_batched versions)
- HEEVDJ (with batched and strided_batched versions)
Generalized symmetric/hermitian-definite eigensolvers using Divide & Conquer and Jacobi algorithm:
- SYGVDJ (with batched and strided_batched versions)
- HEGVDJ (with batched and strided_batched versions)

Changed

Relaxed array length requirements for GESVDX with rocblas_srange_index.

Removed

Removed gfx803 and gfx900 from default build targets.

Fixed

Corrected singular vector normalization in BDSVDX and GESVDX
Fixed potential memory access fault in STEIN, SYEVX/HEEVX, SYGVX/HEGVX, BDSVDX and GESVDX

rocSPARSE

rocSPARSE 3.1.2 for ROCm 6.1.2

Additions

New LRB algorithm to SpMV, supporting CSR format
rocBLAS as now an optional dependency for SDDMM algorithms
Additional verbose output for csrgemm and bsrgemm

Optimizations

Triangular solve with multiple rhs (SpSM, csrsm, ...) now calls SpSV, csrsv, etcetera when nrhs equals 1
Improved user manual section Installation and Building for Linux and Windows
Improved SpMV in CSR format on MI300

rocThrust

rocThrust 3.0.1 for ROCm 6.1.2

Fixes

Ported a fix from thrust 2.2 that ensures thrust::optional is trivially copyable.

rocWMMA

rocWMMA 1.4.0 for ROCm 6.1.2

Additions

Added bf16 support for hipRTC sample

Changes

Changed Clang C++ version to C++17
Updated rocwmma_coop API
Linked rocWMMA to hiprtc

Fixes

Fixed compile/runtime arch checks
Built all test in large code model
Removed inefficient branching in layout loop unrolling

rpp

rpp for ROCm 6.1.2

Changes

Prerequisites

Tested Configurations

Linux distribution
- Ubuntu - 20.04 / 22.04
- CentOS - 7
- RHEL - 8/9
ROCm: rocm-core - 5.5.0.50500-63
Clang - Version 5.0.1 and above
CMake - Version 3.22.3
IEEE 754-based half-precision floating-point library - Version 1.12.0

Tensile

Tensile 4.40.0 for ROCm 6.1.2

Additions

new DisableKernelPieces values to invalidate local read, local write, and global read
stream-K kernel generation, including two-tile stream-k algorithm by setting StreamK=3
feature to allow testing stream-k grid multipliers
debug output to check occupancy for Stream-K
reject condition for FractionalLoad + DepthU!=power of 2
new TENSILE_DB debugging value to dump the common kernel parameters
predicate for APU libs
new parameter (ClusterLocalRead) to turn on/off wider local read opt for TileMajorLDS
new parameter (ExtraLatencyForLR) to add extra interval between local read and wait
new logic to check LDS size with auto LdsPad(=1) and change LdsPad to 0 if LDS overflows
initialization type and general batched options to the rocblas-bench input creator script

Optimizations

enabled MFMA + LocalSplitU=4 for MT16x16
enabled (DirectToVgpr + MI4x4) and supported skinny MacroTile
optimized postGSU kernel: separate postGSU kernels for different GSU values, loop unroll for GSU loop, wider global load depending on array size, and parallel reduction depending on array size
auto LdsPad calculation for TileMajorLds + MI16x16
auto LdsPad calculation for UnrollMajorLds + MI16x16 + VectorWidth

Changes

cleared hipErrorNotFound error since it is an expected part of the search
modified hipcc search path for Linux
changed PCI ID from 32bit to 64bit for ROCm SMI HW monitor
changed LdsBlockSizePerPad to LdsBlockSizePerPadA, B to specify LBSPP separately
changed the default value of LdsPadA, B, LdsBlockSizePerPadA, B from 0 to -1
updated test cases according to parameter changes for LdsPad, LBSPP and ClusterLocalRead
Replaced std::regex with fnmatch()/PathMatchSpec as a workaround to std::regex stack overflow known bug

Fixes

hipcc compile append flag parallel-jobs=4
race condition in Stream-K that appeared with large grids and small sizes
mismatch issue with LdsPad + LdsBlockSizePerPad!=0 and TailLoop
mismatch issue with LdsPad + LdsBlockSizePerPad!=0 and SplitLds
incorrect reject condition check for DirectToLds + LdsBlockSizePerPad=-1 case
small fix for LdsPad optimization (LdsElement calculation)

25 KiB Raw Blame History

Release notes

ROCm 6.1.2

OS support

AMD SMI

Additions

Optimizations

Changes

Fixes

Removals

HIPCC

Changes

ROCm SMI

Additions

Fixes

Library changes in ROCm 6.1.2

AMDMIGraphX

Additions

Optimizations

Fixes

Changes

Removals

composable_kernel

Fixes

Optimizations

Additions

Changes

hipBLAS

Additions

Deprecations

Changes

hipBLASLt

Additions

Deprecations

hipCUB

Changed

Fixed

Added

hipFFT

Changes

Fixes

hipRAND

Fixes

hipSOLVER

Changed

Fixed

hipSPARSE

Fixes

hipSPARSELt

Added

hipTensor

Additions

Fixes

MIOpen

Added

Fixed

Changed

Removed

MIVisionX

Additions

Optimizations

Changes

Fixes

Tested Configurations

Known Issues

rccl

Changed

rocALUTION

Additions

Deprecations

Known Issues

rocBLAS

Fixes

Optimizations

ROCdbgapi

Added

rocFFT

Fixes

Additions

rocm-cmake

25 KiB

Raw Blame History