mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-09 14:48:06 -05:00
rearranging (#2718)
This commit is contained in:
287
CHANGELOG.md
287
CHANGELOG.md
@@ -1,4 +1,4 @@
|
||||
# Release notes
|
||||
# Changelog
|
||||
<!-- Do not edit this file! This file is autogenerated with -->
|
||||
<!-- tools/autotag/tag_script.py -->
|
||||
|
||||
@@ -16,19 +16,82 @@ This page contains the release notes for AMD ROCm Software.
|
||||
-------------------
|
||||
|
||||
## ROCm 6.0.0
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
|
||||
### New in this release:
|
||||
ROCm 6.0 is a major release with new performance optimizations, expanded frameworks and library
|
||||
support, and improved developer experience. This includes initial enablement of the AMD Instinct™
|
||||
MI300 series. Future releases will further enable and optimize this new platform. Key features include:
|
||||
|
||||
* Added AMD Instinct™ MI300A and MI300X GPU support
|
||||
* Added support for the following operating systems:
|
||||
* RHEL 9.3
|
||||
* RHEL 8.9
|
||||
* Improved performance in areas like lower precision math and attention layers.
|
||||
* New hipSPARSELt library to accelerate AI workloads via AMD's sparse matrix core technique.
|
||||
* Latest upstream support for popular AI frameworks like PyTorch, TensorFlow, and JAX.
|
||||
* New support for libraries, such as DeepSpeed, ONNX-RT, and CuPy.
|
||||
* Prepackaged HPC and AI containers on AMD Infinity Hub, with improved documentation and
|
||||
tutorials on the [AMD ROCm Docs](https://rocm.docs.amd.com) site.
|
||||
* Consolidated developer resources and training on the new AMD ROCm Developer Hub.
|
||||
|
||||
#### Documentation
|
||||
The following section provide a release overview for ROCm 6.0. For additional details, you can refer to
|
||||
the [Changelog](https://rocm.docs.amd.com/en/develop/about/CHANGELOG.html).
|
||||
|
||||
CMake support is added for the documentation in the [ROCm repository](https://github.com/RadeonOpenCompute/ROCm).
|
||||
### OS and GPU support changes
|
||||
|
||||
AMD Instinct™ MI300A and MI300X Accelerator support has been enabled for limited operating
|
||||
systems.
|
||||
|
||||
* Ubuntu 22.04.5 (MI300A and MI300X)
|
||||
* RHEL 8.9 (MI300A)
|
||||
* SLES 15 SP5
|
||||
|
||||
We've added support for the following operating systems:
|
||||
|
||||
* RHEL 9.3
|
||||
* RHEL 8.9
|
||||
|
||||
Note that, of ROCm 6.2, we've planned for end-of-support (EoS) for the following operating systems:
|
||||
|
||||
* Ubuntu 20.04.5
|
||||
* SLES 15 SP4
|
||||
* RHEL/CentOS 7.9
|
||||
|
||||
### New ROCm meta package
|
||||
|
||||
We've added a new ROCm meta package for easy installation of all ROCm core packages, tools, and
|
||||
libraries. For example, the following command will install the full ROCm package: `apt-get install rocm`
|
||||
(Ubuntu), or `yum install rocm` (RHEL).
|
||||
|
||||
### Filesystem Hierarchy Standard
|
||||
|
||||
ROCm 6.0 fully adopts the Filesystem Hierarchy Standard (FHS) reorganization goals. We've removed
|
||||
the backward compatibility support for old file locations.
|
||||
|
||||
### Compiler location change
|
||||
|
||||
* The installation path of LLVM has been changed from `/opt/rocm-<rel>/llvm` to
|
||||
`/opt/rocm-<rel>/lib/llvm`. For backward compatibility, a symbolic link is provided to the old
|
||||
location and will be removed in a future release.
|
||||
* The installation path of the device library bitcode has changed from `/opt/rocm-<rel>/amdgcn` to
|
||||
`/opt/rocm-<rel>/lib/llvm/lib/clang/<ver>/lib/amdgcn`. For backward compatibility, a symbolic link
|
||||
is provided and will be removed in a future release.
|
||||
|
||||
### Documentation
|
||||
|
||||
CMake support has been added for documentation in the
|
||||
[ROCm repository](https://github.com/RadeonOpenCompute/ROCm).
|
||||
|
||||
### AMD Instinct™ MI50 end-of-support notice
|
||||
|
||||
AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively gfx906 GPUs) enters
|
||||
maintenance mode in ROCm 6.0.
|
||||
|
||||
As outlined in [5.6.0](https://rocm.docs.amd.com/en/docs-5.6.0/release.html), ROCm 5.7 was the
|
||||
final release for gfx906 GPUs in a fully supported state.
|
||||
|
||||
* Henceforth, no new features and performance optimizations will be supported for the gfx906 GPUs.
|
||||
* Bug fixes and critical security patches will continue to be supported for the gfx906 GPUs until Q2
|
||||
2024 (end of maintenance \[EOM] will be aligned with the closest ROCm release).
|
||||
* Bug fixes will be made up to the next ROCm point release.
|
||||
* Bug fixes will not be backported to older ROCm releases for gfx906.
|
||||
* Distribution and operating system updates will continue per the ROCm release cadence for gfx906
|
||||
GPUs until EOM.
|
||||
|
||||
### Library changes
|
||||
|
||||
@@ -64,7 +127,6 @@ MIGraphX 2.8 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Support for AMD Instinct™ MI300 series GPUs
|
||||
* Support for TorchMIGraphX via PyTorch
|
||||
* Boosted overall performance by integrating rocMLIR
|
||||
* INT8 support for ONNX Runtime
|
||||
@@ -110,6 +172,16 @@ MIGraphX 2.8 for ROCm 6.0.0
|
||||
|
||||
* Removed building Python 2.7 bindings
|
||||
|
||||
#### AMD SMI
|
||||
|
||||
* Integrated the E-SMI library: You can now query CPU-related information directly through AMD SMI.
|
||||
Metrics include power, energy, performance, and other system details.
|
||||
|
||||
* Added support for gfx942 metrics: You can now query MI300 device metrics to get real-time
|
||||
information. Metrics include power, temperature, energy, and performance.
|
||||
|
||||
* Added support for compute and memory partitions
|
||||
|
||||
#### HIP 6.0.0
|
||||
|
||||
HIP 6.0.0 for ROCm 6.0.0
|
||||
@@ -127,35 +199,38 @@ HIP 6.0.0 for ROCm 6.0.0
|
||||
* `hipExternalMemoryHandleType_enum`
|
||||
|
||||
* New environment variable `HIP_LAUNCH_BLOCKING`
|
||||
* For serialization on kernel execution. The default value is 0 (disable); kernel will execute normally as defined in the queue. When this environment variable is set as 1 (enable), HIP runtime will serialize kernel enqueue; behaves the same as AMD_SERIALIZE_KERNEL.
|
||||
* For serialization on kernel execution. The default value is 0 (disable); kernel will execute normally as
|
||||
defined in the queue. When this environment variable is set as 1 (enable), HIP runtime will
|
||||
serialize kernel enqueue; behaves the same as AMD_SERIALIZE_KERNEL.
|
||||
|
||||
* More members are added in HIP struct `hipDeviceProp_t`, for new feature capabilities including:
|
||||
* Texture
|
||||
* `int maxTexture1DMipmap;`
|
||||
* `int maxTexture2DMipmap[2];`
|
||||
* `int maxTexture2DLinear[3];`
|
||||
* `int maxTexture2DGather[2];`
|
||||
* `int maxTexture3DAlt[3];`
|
||||
* `int maxTextureCubemap;`
|
||||
* `int maxTexture1DLayered[2];`
|
||||
* `int maxTexture2DLayered[3];`
|
||||
* `int maxTextureCubemapLayered[2];`
|
||||
* `int maxTexture1DMipmap;`
|
||||
* `int maxTexture2DMipmap[2];`
|
||||
* `int maxTexture2DLinear[3];`
|
||||
* `int maxTexture2DGather[2];`
|
||||
* `int maxTexture3DAlt[3];`
|
||||
* `int maxTextureCubemap;`
|
||||
* `int maxTexture1DLayered[2];`
|
||||
* `int maxTexture2DLayered[3];`
|
||||
* `int maxTextureCubemapLayered[2];`
|
||||
* Surface
|
||||
* `int maxSurface1D;`
|
||||
* `int maxSurface2D[2];`
|
||||
* `int maxSurface3D[3];`
|
||||
* `int maxSurface1DLayered[2];`
|
||||
* `int maxSurface2DLayered[3];`
|
||||
* `int maxSurfaceCubemap;`
|
||||
* `int maxSurfaceCubemapLayered[2];`
|
||||
* `int maxSurface1D;`
|
||||
* `int maxSurface2D[2];`
|
||||
* `int maxSurface3D[3];`
|
||||
* `int maxSurface1DLayered[2];`
|
||||
* `int maxSurface2DLayered[3];`
|
||||
* `int maxSurfaceCubemap;`
|
||||
* `int maxSurfaceCubemapLayered[2];`
|
||||
* Device
|
||||
* `hipUUID uuid;`
|
||||
* `char luid[8];` this is an 8-byte unique identifier. Only valid on Windows
|
||||
* `unsigned int luidDeviceNodeMask;`
|
||||
* `hipUUID uuid;`
|
||||
* `char luid[8];` this is an 8-byte unique identifier. Only valid on Windows
|
||||
* `unsigned int luidDeviceNodeMask;`
|
||||
|
||||
* LUID (Locally Unique Identifier) is supported for interoperability between devices. In HIP, more members are added in the struct `hipDeviceProp_t`, as properties to identify each device:
|
||||
* `char luid[8];`
|
||||
* `unsigned int luidDeviceNodeMask;`
|
||||
* LUID (Locally Unique Identifier) is supported for interoperability between devices. In HIP, more
|
||||
members are added in the struct `hipDeviceProp_t`, as properties to identify each device:
|
||||
* `char luid[8];`
|
||||
* `unsigned int luidDeviceNodeMask;`
|
||||
|
||||
Note: HIP supports LUID only on Windows OS.
|
||||
|
||||
@@ -166,11 +241,13 @@ Note: HIP supports LUID only on Windows OS.
|
||||
* `hipGraphicsGLRegisterBuffer`
|
||||
* `hipGraphicsGLRegisterImage`
|
||||
|
||||
###### Changes Impacting Backward Incompatibility
|
||||
###### Changes impacting backward incompatibility
|
||||
|
||||
* Data types for members in `HIP_MEMCPY3D` structure are changed from `unsigned int` to `size_t`.
|
||||
* The value of the flag `hipIpcMemLazyEnablePeerAccess` is changed to `0x01`, which was previously defined as `0`.
|
||||
* Some device property attributes are not currently supported in HIP runtime. In order to maintain consistency, the following related enumeration names are changed in `hipDeviceAttribute_t`
|
||||
* The value of the flag `hipIpcMemLazyEnablePeerAccess` is changed to `0x01`, which was previously
|
||||
defined as `0`
|
||||
* Some device property attributes are not currently supported in HIP runtime. In order to maintain
|
||||
consistency, the following related enumeration names are changed in `hipDeviceAttribute_t`
|
||||
* `hipDeviceAttributeName` is changed to `hipDeviceAttributeUnused1`
|
||||
* `hipDeviceAttributeUuid` is changed to `hipDeviceAttributeUnused2`
|
||||
* `hipDeviceAttributeArch` is changed to `hipDeviceAttributeUnused3`
|
||||
@@ -178,13 +255,14 @@ Note: HIP supports LUID only on Windows OS.
|
||||
* `hipDeviceAttributeGcnArchName` is changed to `hipDeviceAttributeUnused5`
|
||||
* HIP struct `hipArray` is removed from driver type header to comply with CUDA
|
||||
* `hipArray_t` replaces `hipArray*`, as the pointer to array.
|
||||
* This allows `hipMemcpyAtoH` and `hipMemcpyHtoA` to have the correct array type which is equivalent to corresponding CUDA driver APIs.
|
||||
* This allows `hipMemcpyAtoH` and `hipMemcpyHtoA` to have the correct array type which is
|
||||
equivalent to corresponding CUDA driver APIs.
|
||||
|
||||
##### Fixes
|
||||
|
||||
* Kernel launch maximum dimension validation is added specifically on gridY and gridZ in the HIP API `hipModule-LaunchKernel`. As a result,when `hipGetDeviceAttribute` is called for the value of `hipDeviceAttributeMaxGrid-Dim`, the behavior on the AMD platform is equivalent to NVIDIA.
|
||||
|
||||
* The HIP stream synchronisation behaviour is changed in internal stream functions, in which a flag "wait" is added and set when the current stream is null pointer while executing stream synchronisation on other explicitly created streams. This change avoids blocking of execution on null/default stream. The change won't affect usage of applications, and makes them behave the same on the AMD platform as NVIDIA.
|
||||
* The HIP stream synchronization behavior is changed in internal stream functions, in which a flag "wait" is added and set when the current stream is null pointer while executing stream synchronization on other explicitly created streams. This change avoids blocking of execution on null/default stream. The change won't affect usage of applications, and makes them behave the same on the AMD platform as NVIDIA.
|
||||
|
||||
* Error handling behavior on unsupported GPU is fixed, HIP runtime will log out error message, instead of creating signal abortion error which is invisible to developers but continued kernel execution process. This is for the case when developers compile any application via hipcc, setting the option `--offload-arch` with GPU ID which is different from the one on the system.
|
||||
|
||||
@@ -203,7 +281,7 @@ Note: These complex operations are equivalent to corresponding types/functions o
|
||||
* Compilation flags for the platform definitions
|
||||
* AMD platform
|
||||
* `HIP_PLATFORM_HCC`
|
||||
* `HCC`
|
||||
* `HCC`
|
||||
* `HIP_ROCclr`
|
||||
* NVIDIA platform
|
||||
* `HIP_PLATFORM_NVCC`
|
||||
@@ -300,7 +378,7 @@ hipTensor 1.1.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Architecture support for gfx940, gfx941, and gfx942
|
||||
* Architecture support for gfx942
|
||||
* Client tests configuration parameters now support YAML file input format
|
||||
|
||||
##### Changes
|
||||
@@ -334,6 +412,55 @@ MIOpen 2.19.0 for ROCm 6.0.0
|
||||
* `[HOTFIX][MI200][FP16]` has been disabled for `ConvHipImplicitGemmBwdXdlops` when FP16_ALT is
|
||||
required
|
||||
|
||||
#### MIVisionX
|
||||
|
||||
* Added Comprehensive CTests to aid developers
|
||||
* Introduced Doxygen support for complete API documentation
|
||||
* Simplified dependencies for rocAL
|
||||
|
||||
#### OpenMP
|
||||
|
||||
* MI300:
|
||||
* Added support for gfx942 targets
|
||||
* Fixed declare target variable access in unified_shared_memory mode
|
||||
* Enabled OMPX_APU_MAPS environment variable for MI200 and gfx942
|
||||
* Handled global pointers in forced USM (`OMPX_APU_MAPS`)
|
||||
|
||||
* Nextgen AMDGPU plugin:
|
||||
* Respect `GPU_MAX_HW_QUEUES` in the AMDGPU Nextgen plugin, which takes precedence over the
|
||||
standard `LIBOMPTARGET_AMDGPU_NUM_HSA_QUEUES` environment variable
|
||||
* Changed the default for `LIBOMPTARGET_AMDGPU_TEAMS_PER_CU` from 4 to 6
|
||||
* Fixed the behavior of the `OMPX_FORCE_SYNC_REGIONS` environment variable, which is used to
|
||||
force synchronous target regions (the default is to use an asynchronous implementation)
|
||||
* Added support for and enabled default of code object version 5
|
||||
* Implemented target OMPT callbacks and trace records support in the nextgen plugin
|
||||
|
||||
* Specialized kernels:
|
||||
* Removes redundant copying of arrays when xteam reductions are active but not offloaded
|
||||
* Tuned the number of teams for BigJumpLoop
|
||||
* Enables specialized kernel generation with nested OpenMP pragma, as long as there is no nested
|
||||
omp-parallel directive
|
||||
|
||||
##### Additions
|
||||
|
||||
* `-fopenmp-runtimelib={lib,lib-perf,lib-debug}` to select libs
|
||||
* Warning if mixed HIP / OpenMP offloading (i.e., if HIP language mode is active, but OpenMP target
|
||||
directives are encountered)
|
||||
* Introduced compile-time limit for the number of GPUs supported in a system: 16 GPUs in a single
|
||||
node is currently the maximum supported
|
||||
|
||||
##### Changes
|
||||
|
||||
* Correctly compute number of waves when workgroup size is less than the wave size
|
||||
* Implemented `LIBOMPTARGET_KERNEL_TRACE=3`, which prints DEVID traces and API timings
|
||||
* ASAN support for openmp release, debug, and perf libraries
|
||||
* Changed LDS lowering default to hybrid
|
||||
|
||||
##### Fixes
|
||||
|
||||
* Fixed RUNPATH for gdb plugin
|
||||
* Fixed hang in OMPT support if flush trace is called when there are no helper threads
|
||||
|
||||
#### rccl 2.15.5
|
||||
|
||||
RCCL 2.15.5 for ROCm 6.0.0
|
||||
@@ -358,7 +485,7 @@ RCCL 2.15.5 for ROCm 6.0.0
|
||||
|
||||
##### Removals
|
||||
|
||||
- Removed TransferBench from tools as it exists in standalone repo:
|
||||
* Removed TransferBench from tools as it exists in standalone repo:
|
||||
[https://github.com/ROCmSoftwarePlatform/TransferBench](https://github.com/ROCmSoftwarePlatform/TransferBench)
|
||||
|
||||
#### rocALUTION 3.0.3
|
||||
@@ -406,7 +533,7 @@ rocBLAS 4.0.0 for ROCm 6.0.0
|
||||
##### Additions
|
||||
|
||||
* Beta API `rocblas_gemm_batched_ex3` and `rocblas_gemm_strided_batched_ex3`
|
||||
*Input/output type f16_r/bf16_r and execution type f32_r support for Level 2 gemv_batched and
|
||||
* Input/output type f16_r/bf16_r and execution type f32_r support for Level 2 gemv_batched and
|
||||
gemv_strided_batched
|
||||
* Use of `rocblas_status_excluded_from_build` when calling functions that require Tensile (when using
|
||||
rocBLAS built without Tensile)
|
||||
@@ -461,7 +588,7 @@ rocFFT 1.0.25 for ROCm 6.0.0
|
||||
* `rocfft_field_add_brick` can be called to describe the brick decomposition of an FFT field, where each
|
||||
brick can be assigned a different device
|
||||
|
||||
These interfaces are still experimental and subject to change. We are interested in getting feedback.
|
||||
These interfaces are still experimental and subject to change. Your feedback is appreciated.
|
||||
You can raise questions and concerns by opening issues in the
|
||||
[rocFFT issue tracker](https://github.com/ROCmSoftwarePlatform/rocFFT/issues).
|
||||
|
||||
@@ -504,19 +631,22 @@ ROCgdb 13.2 for ROCm 6.0.0
|
||||
|
||||
* Support for watchpoints on scratch memory addresses.
|
||||
* Added support for gfx1100, gfx1101, and gfx1102.
|
||||
* Added support for gfx940, gfx941 and gfx942.
|
||||
* Added support for gfx942.
|
||||
|
||||
##### Optimizations
|
||||
|
||||
* Improved performances when handling the end of a process with a large number of threads.
|
||||
|
||||
##### Known Issues
|
||||
##### Known issues
|
||||
|
||||
* On certain configurations, ROCgdb can show the following warning message:
|
||||
`warning: Probes-based dynamic linker interface failed. Reverting to original interface.`
|
||||
This does not affect ROCgdb's functionalities.
|
||||
|
||||
* ROCgdb cannot debug a program on an AMDGPU device past a `s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)` instruction. If an exception is reported after this instruction has been executed (including asynchronous exceptions), the wave is killed and the exceptions are only reported by the ROCm runtime.
|
||||
* ROCgdb cannot debug a program on an AMDGPU device past a
|
||||
`s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)` instruction. If an exception is reported after this
|
||||
instruction has been executed (including asynchronous exceptions), the wave is killed and the
|
||||
exceptions are only reported by the ROCm runtime.
|
||||
|
||||
#### rocm-cmake 0.11.0
|
||||
|
||||
@@ -531,6 +661,18 @@ rocm-cmake 0.11.0 for ROCm 6.0.0
|
||||
* Fixed extra `make` flags passed for Clang-Tidy (ROCMClangTidy).
|
||||
* Fixed issues with ROCMTest when using a module in a subdirectory
|
||||
|
||||
#### ROCm Compiler
|
||||
|
||||
* On MI300, kernel arguments can be preloaded into SGPRs rather than passed in memory. This
|
||||
feature is enabled with a compiler option, which also controls the number of arguments to pass in
|
||||
SGPRs.
|
||||
|
||||
* Improved register allocation at -O0: Avoid compiler crashes ( 'ran out of registers during register allocation' )
|
||||
|
||||
* Improved generation of debug information:
|
||||
* Improve compile time
|
||||
* Avoid compiler crashes
|
||||
|
||||
#### rocPRIM 3.0.0
|
||||
|
||||
rocPRIM 3.0.0 for ROCm 6.0.0
|
||||
@@ -551,9 +693,9 @@ rocPRIM 3.0.0 for ROCm 6.0.0
|
||||
* Fixed `rocprim::MatchAny` for devices with 64-bit warp size
|
||||
* Note that `rocprim::MatchAny` is deprecated; use `rocprim::match_any` instead
|
||||
|
||||
#### rocprofiler 2.0.0
|
||||
#### Roc Profiler 2.0.0
|
||||
|
||||
rocprofiler 2.0.0 for ROCm 6.0.0
|
||||
Roc Profiler 2.0.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
@@ -675,7 +817,7 @@ rocWMMA 1.3.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Support for gfx940, gfx941, and gfx942 targets
|
||||
* Support for gfx942
|
||||
* Support for f8, bf8, and xfloat32 data types
|
||||
* support for `HIP_NO_HALF`, `__ HIP_NO_HALF_CONVERSIONS__`, and
|
||||
`__ HIP_NO_HALF_OPERATORS__` (e.g., PyTorch environment)
|
||||
@@ -697,7 +839,7 @@ Tensile 4.39.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Added `aquavanjaram` support: gfx940/gfx941/gfx942, fp8/bf8 datatype, xf32 datatype, and
|
||||
* Added `aquavanjaram` support: gfx942, fp8/bf8 datatype, xf32 datatype, and
|
||||
stochastic rounding for various datatypes
|
||||
* Added and updated tuning scripts
|
||||
* Added `DirectToLds` support for larger data types with 32-bit global load (old parameter `DirectToLds`
|
||||
@@ -712,12 +854,11 @@ Tensile 4.39.0 for ROCm 6.0.0
|
||||
|
||||
* Enabled `InitAccVgprOpt` for `MatrixInstruction` cases
|
||||
* Implemented local read related parameter calculations with `DirectToVgpr`
|
||||
* Adjusted `miIssueLatency` for gfx940
|
||||
* Enabled dedicated vgpr allocation for local read + pack
|
||||
* Optimized code initialization
|
||||
* Optimized sgpr allocation
|
||||
* Supported DGEMM TLUB + RLVW=2 for odd N (edge shift change)
|
||||
* Enabled `miLatency` optimization for (gfx940/gfx941 + MFMA) for specific data types, and fixed
|
||||
* Enabled `miLatency` optimization for specific data types, and fixed
|
||||
instruction scheduling
|
||||
|
||||
##### Changes
|
||||
@@ -737,7 +878,6 @@ Tensile 4.39.0 for ROCm 6.0.0
|
||||
* Supported multi-gpu for different architectures in lazy library loading
|
||||
* Enabled dtree library for batch > 1
|
||||
* Added problem scale feature for dtree selection
|
||||
* Enabled ROCm SMI for gfx940/941
|
||||
* Modified non-lazy load build to skip experimental logic
|
||||
|
||||
##### Fixes
|
||||
@@ -751,34 +891,9 @@ Tensile 4.39.0 for ROCm 6.0.0
|
||||
* Adding missing headers
|
||||
* Boost link for a clean build on Ubuntu 22
|
||||
* Bug in `forcestoresc1` arch selection
|
||||
* Compiler directive for gfx941 and gfx942
|
||||
* Compiler directive for gfx942
|
||||
* Formatting for `DecisionTree_test.cpp`
|
||||
|
||||
#### AMD Instinct™ MI50 end-of-support notice
|
||||
|
||||
AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively gfx906 GPUs) will enter
|
||||
maintenance mode starting Q3 2023.
|
||||
|
||||
As outlined in [5.6.0](https://rocm.docs.amd.com/en/docs-5.6.0/release.html), ROCm 5.7 will be the
|
||||
final release for gfx906 GPUs to be in a fully supported state.
|
||||
|
||||
* ROCm 6.0 shows MI50s as "under maintenance" for
|
||||
[Linux](../about/compatibility/linux-support.md) and
|
||||
[Windows](../about/compatibility/windows-support.md)
|
||||
|
||||
* No new features and performance optimizations will be supported for the gfx906 GPUs beyond
|
||||
ROCm 5.7.
|
||||
|
||||
* Bug fixes and critical security patches will continue to be supported for the gfx906 GPUs until Q2
|
||||
2024 (end of maintenance \[EOM] will be aligned with the closest ROCm release).
|
||||
|
||||
* Bug fixes during the maintenance will be made to the next ROCm point release.
|
||||
|
||||
* Bug fixes will not be backported to older ROCm releases for gfx906.
|
||||
|
||||
* Distribution and operating system updates will continue per the ROCm release cadence for gfx906
|
||||
GPUs until EOM.
|
||||
|
||||
-------------------
|
||||
|
||||
## ROCm 5.7.1
|
||||
@@ -2378,7 +2493,7 @@ MIOpen 2.19.0 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- ROCm 5.5 support for gfx1101 (Navi32)
|
||||
- ROCm 5.5 support for gfx1101 (Navi 32)
|
||||
|
||||
##### Changed
|
||||
|
||||
@@ -2422,7 +2537,7 @@ rocALUTION 2.1.8 for ROCm 5.5.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added build support for Navi32
|
||||
- Added build support for Navi 32
|
||||
|
||||
##### Improved
|
||||
|
||||
@@ -2592,7 +2707,7 @@ rocSPARSE 2.5.1 for ROCm 5.5.0
|
||||
|
||||
- Added bsrgemm and spgemm for BSR format
|
||||
- Added bsrgeam
|
||||
- Added build support for Navi32
|
||||
- Added build support for Navi 32
|
||||
- Added experimental hipGraph support for some rocSPARSE routines
|
||||
- Added csritsv, spitsv csr iterative triangular solve
|
||||
- Added mixed precisions for SpMV
|
||||
@@ -3402,7 +3517,7 @@ rocALUTION 2.1.3 for ROCm 5.4.0
|
||||
|
||||
##### Added
|
||||
|
||||
- Added build support for Navi31 and Navi33
|
||||
- Added build support for Navi 31 and Navi 33
|
||||
- Added support for non-squared global matrices
|
||||
|
||||
##### Improved
|
||||
@@ -3529,7 +3644,7 @@ rocSPARSE 2.4.0 for ROCm 5.4.0
|
||||
- Added rocsparse_spmv_ex routine
|
||||
- Added rocsparse_bsrmv_ex_analysis and rocsparse_bsrmv_ex routines
|
||||
- Added csritilu0 routine
|
||||
- Added build support for Navi31 and Navi 33
|
||||
- Added build support for Navi 31 and Navi 33
|
||||
|
||||
##### Improved
|
||||
|
||||
@@ -5063,7 +5178,7 @@ rocBLAS 2.44.0 for ROCm 5.2.0
|
||||
|
||||
##### Removed
|
||||
|
||||
- Remove Navi12 (gfx1011) from fat binary.
|
||||
- Remove Navi 12 (gfx1011) from fat binary.
|
||||
|
||||
#### rocFFT 1.0.17
|
||||
|
||||
|
||||
927
RELEASE.md
927
RELEASE.md
File diff suppressed because it is too large
Load Diff
@@ -1,99 +0,0 @@
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="description" content="What's new in ROCm">
|
||||
<meta name="keywords" content="what's new">
|
||||
</head>
|
||||
|
||||
# What's new in ROCm?
|
||||
|
||||
ROCm is now supported on Windows.
|
||||
|
||||
## Windows support
|
||||
|
||||
Starting with ROCm 5.5, the HIP SDK brings a subset of ROCm to developers on Windows.
|
||||
The collection of features enabled on Windows is referred to as the HIP SDK.
|
||||
These features allow developers to use the HIP runtime, HIP math libraries
|
||||
and HIP Primitive libraries. The following table shows the differences
|
||||
between Windows and Linux releases.
|
||||
|
||||
|Component|Linux|Windows|
|
||||
|---------|-----|-------|
|
||||
|Driver|Radeon Software for Linux |AMD Software Pro Edition|
|
||||
|Compiler|`hipcc`/`amdclang++`|`hipcc`/`clang++`|
|
||||
|Debugger|`rocgdb`|no debugger available|
|
||||
|Profiler|`rocprof`|[Radeon GPU Profiler](https://gpuopen.com/rgp/)|
|
||||
|Porting Tools|HIPIFY|Coming Soon|
|
||||
|Runtime|HIP (Open Sourced)|HIP (closed source)|
|
||||
|Math Libraries|Supported|Supported|
|
||||
|Primitives Libraries|Supported|Supported|
|
||||
|Communication Libraries|Supported|Not Available|
|
||||
|AI Libraries|MIOpen, MIGraphX|Not Available|
|
||||
|System Management|`rocm-smi-lib`, RDC, `rocminfo`|`amdsmi`, `hipInfo`|
|
||||
|AI Frameworks|PyTorch, TensorFlow, etc.|Not Available|
|
||||
|CMake HIP Language|Enabled|Unsupported|
|
||||
|Visual Studio| Not applicable| Plugin Available|
|
||||
|HIP Ray Tracing| Supported|Supported|
|
||||
|
||||
AMD is continuing to invest in Windows support and AMD plans to release enhanced
|
||||
features in subsequent revisions.
|
||||
|
||||
```{note}
|
||||
The 5.5 Windows Installer collectively groups the Math and Primitives
|
||||
libraries.
|
||||
```
|
||||
|
||||
```{note}
|
||||
GPU support on Windows and Linux may differ. You must refer to
|
||||
Windows and Linux GPU support tables separately.
|
||||
```
|
||||
|
||||
```{note}
|
||||
HIP Ray Tracing is not distributed via ROCm in Linux.
|
||||
```
|
||||
|
||||
## ROCm release versioning
|
||||
|
||||
Linux OS releases set the canonical version numbers for ROCm. Windows will
|
||||
follow Linux version numbers as Windows releases are based on Linux ROCm
|
||||
releases. However, not all Linux ROCm releases will have a corresponding Windows
|
||||
release. The following table shows the ROCm releases on Windows and Linux. Releases
|
||||
with both Windows and Linux are referred to as a joint release. Releases with
|
||||
only Linux support are referred to as a skipped release from the Windows
|
||||
perspective.
|
||||
|
||||
|Release version|Linux|Windows|
|
||||
|---------------|-----|-------|
|
||||
|5.5|✅|✅|
|
||||
|5.6|✅|❌|
|
||||
|
||||
ROCm Linux releases are versioned with following the Major.Minor.Patch
|
||||
version number system. Windows releases will only be versioned with Major.Minor.
|
||||
|
||||
In general, Windows releases will trail Linux releases. Software developers that
|
||||
wish to support both Linux and Windows using a single ROCm version should
|
||||
refrain from upgrading ROCm unless there is a joint release.
|
||||
|
||||
## Windows documentation implications
|
||||
|
||||
The ROCm documentation website contains both Windows and Linux documentation.
|
||||
Just below each article title, a convenient article information section states
|
||||
whether the page applies to Linux only, Windows only or both OSes. To find the
|
||||
exact Windows documentation for a release of the HIP SDK, please view the ROCm documentation with the same
|
||||
Major.Minor version number while ignoring the Patch version. The Patch version
|
||||
only matters for Linux releases. For convenience,
|
||||
Windows documentation will continue to be included in the overall ROCm
|
||||
documentation for the skipped Windows releases.
|
||||
|
||||
Windows release notes will contain only information pertinent to Windows.
|
||||
The software developer must read all the previous ROCm release notes (including)
|
||||
skipped ROCm versions on Windows for information on all the changes present in
|
||||
the Windows release.
|
||||
|
||||
## Windows builds from source
|
||||
|
||||
Not all source code required to build Windows from source is available under a
|
||||
permissive open source license. Build instructions on Windows is only provided
|
||||
for projects that can be built from source on Windows using a toolchain that
|
||||
has closed source build prerequisites. The ROCm manifest file is not valid for
|
||||
Windows. AMD does not release a manifest or tag our components in Windows.
|
||||
Users may use corresponding Linux tags to build on Windows.
|
||||
@@ -85,6 +85,7 @@ article_pages = [
|
||||
|
||||
{"file":"rocm-a-z", "os":["linux", "windows"]},
|
||||
|
||||
{"file":"about/release-notes", "os":["linux"]},
|
||||
]
|
||||
|
||||
exclude_patterns = ['temp']
|
||||
|
||||
@@ -11,7 +11,6 @@ Welcome to the ROCm docs home page! If you're new to ROCm, you can review the fo
|
||||
resources to learn more about our products and what we support:
|
||||
|
||||
* [What is ROCm?](./what-is-rocm.md)
|
||||
* [What's new?](about/whats-new/whats-new)
|
||||
* [Release notes](./about/release-notes.md)
|
||||
|
||||
Our documentation is organized into the following categories:
|
||||
|
||||
@@ -10,18 +10,17 @@ subtrees:
|
||||
- file: about/release-notes.md
|
||||
title: Release notes
|
||||
subtrees:
|
||||
- entries:
|
||||
- entries:
|
||||
- file: about/CHANGELOG.md
|
||||
title: Changelog
|
||||
- file: about/whats-new/whats-new.md
|
||||
- url: https://github.com/RadeonOpenCompute/ROCm/labels/Verified%20Issue
|
||||
title: Known issues
|
||||
|
||||
- caption: Install
|
||||
entries:
|
||||
- url: https://rocm.docs.amd.com/projects/linux-install/en/${branch}/
|
||||
- url: https://rocm.docs.amd.com/projects/install-on-linux/en/${branch}/
|
||||
title: ROCm on Linux
|
||||
- url: https://rocm.docs.amd.com/projects/linux-install/en/${branch}/
|
||||
- url: https://rocm.docs.amd.com/projects/install-on-windows/en/${branch}/
|
||||
title: HIP SDK on Windows
|
||||
|
||||
- caption: Supported configurations
|
||||
|
||||
@@ -1,8 +1,880 @@
|
||||
<!-- markdownlint-disable first-line-h1 -->
|
||||
<!-- markdownlint-disable no-duplicate-header -->
|
||||
|
||||
### What's new in this release
|
||||
ROCm 6.0 is a major release with new performance optimizations, expanded frameworks and library
|
||||
support, and improved developer experience. This includes initial enablement of the AMD Instinct™
|
||||
MI300 series. Future releases will further enable and optimize this new platform. Key features include:
|
||||
|
||||
#### Documentation
|
||||
* Improved performance in areas like lower precision math and attention layers.
|
||||
* New hipSPARSELt library to accelerate AI workloads via AMD's sparse matrix core technique.
|
||||
* Latest upstream support for popular AI frameworks like PyTorch, TensorFlow, and JAX.
|
||||
* New support for libraries, such as DeepSpeed, ONNX-RT, and CuPy.
|
||||
* Prepackaged HPC and AI containers on AMD Infinity Hub, with improved documentation and
|
||||
tutorials on the [AMD ROCm Docs](https://rocm.docs.amd.com) site.
|
||||
* Consolidated developer resources and training on the new AMD ROCm Developer Hub.
|
||||
|
||||
The following section provide a release overview for ROCm 6.0. For additional details, you can refer to
|
||||
the [Changelog](https://rocm.docs.amd.com/en/develop/about/CHANGELOG.html).
|
||||
|
||||
### OS and GPU support changes
|
||||
|
||||
AMD Instinct™ MI300A and MI300X Accelerator support has been enabled for limited operating
|
||||
systems.
|
||||
|
||||
* Ubuntu 22.04.5 (MI300A and MI300X)
|
||||
* RHEL 8.9 (MI300A)
|
||||
* SLES 15 SP5
|
||||
|
||||
We've added support for the following operating systems:
|
||||
|
||||
* RHEL 9.3
|
||||
* RHEL 8.9
|
||||
|
||||
Note that, of ROCm 6.2, we've planned for end-of-support (EoS) for the following operating systems:
|
||||
|
||||
* Ubuntu 20.04.5
|
||||
* SLES 15 SP4
|
||||
* RHEL/CentOS 7.9
|
||||
|
||||
### New ROCm meta package
|
||||
|
||||
We've added a new ROCm meta package for easy installation of all ROCm core packages, tools, and
|
||||
libraries. For example, the following command will install the full ROCm package: `apt-get install rocm`
|
||||
(Ubuntu), or `yum install rocm` (RHEL).
|
||||
|
||||
### Filesystem Hierarchy Standard
|
||||
|
||||
ROCm 6.0 fully adopts the Filesystem Hierarchy Standard (FHS) reorganization goals. We've removed
|
||||
the backward compatibility support for old file locations.
|
||||
|
||||
### Compiler location change
|
||||
|
||||
* The installation path of LLVM has been changed from `/opt/rocm-<rel>/llvm` to
|
||||
`/opt/rocm-<rel>/lib/llvm`. For backward compatibility, a symbolic link is provided to the old
|
||||
location and will be removed in a future release.
|
||||
* The installation path of the device library bitcode has changed from `/opt/rocm-<rel>/amdgcn` to
|
||||
`/opt/rocm-<rel>/lib/llvm/lib/clang/<ver>/lib/amdgcn`. For backward compatibility, a symbolic link
|
||||
is provided and will be removed in a future release.
|
||||
|
||||
### Documentation
|
||||
|
||||
CMake support has been added for documentation in the
|
||||
[ROCm repository](https://github.com/RadeonOpenCompute/ROCm).
|
||||
|
||||
### AMD Instinct™ MI50 end-of-support notice
|
||||
|
||||
AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively gfx906 GPUs) enters
|
||||
maintenance mode in ROCm 6.0.
|
||||
|
||||
As outlined in [5.6.0](https://rocm.docs.amd.com/en/docs-5.6.0/release.html), ROCm 5.7 was the
|
||||
final release for gfx906 GPUs in a fully supported state.
|
||||
|
||||
* Henceforth, no new features and performance optimizations will be supported for the gfx906 GPUs.
|
||||
* Bug fixes and critical security patches will continue to be supported for the gfx906 GPUs until Q2
|
||||
2024 (end of maintenance \[EOM] will be aligned with the closest ROCm release).
|
||||
* Bug fixes will be made up to the next ROCm point release.
|
||||
* Bug fixes will not be backported to older ROCm releases for gfx906.
|
||||
* Distribution and operating system updates will continue per the ROCm release cadence for gfx906
|
||||
GPUs until EOM.
|
||||
|
||||
### Library changes
|
||||
|
||||
| Library | Version |
|
||||
|---------|---------|
|
||||
| AMDMIGraphX | ⇒ [2.8](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/releases/tag/rocm-6.0.0) |
|
||||
| HIP | [6.0.0](https://github.com/ROCm/HIP/releases/tag/rocm-6.0.0) |
|
||||
| hipBLAS | ⇒ [2.0.0](https://github.com/ROCmSoftwarePlatform/hipBLAS/releases/tag/rocm-6.0.0) |
|
||||
| hipCUB | ⇒ [3.0.0](https://github.com/ROCmSoftwarePlatform/hipCUB/releases/tag/rocm-6.0.0) |
|
||||
| hipFFT | ⇒ [1.0.13](https://github.com/ROCmSoftwarePlatform/hipFFT/releases/tag/rocm-6.0.0) |
|
||||
| hipSOLVER | ⇒ [2.0.0](https://github.com/ROCmSoftwarePlatform/hipSOLVER/releases/tag/rocm-6.0.0) |
|
||||
| hipSPARSE | ⇒ [3.0.0](https://github.com/ROCmSoftwarePlatform/hipSPARSE/releases/tag/rocm-6.0.0) |
|
||||
| hipTensor | ⇒ [1.1.0](https://github.com/ROCmSoftwarePlatform/hipTensor/releases/tag/rocm-6.0.0) |
|
||||
| MIOpen | ⇒ [2.19.0](https://github.com/ROCmSoftwarePlatform/MIOpen/releases/tag/rocm-6.0.0) |
|
||||
| rccl | ⇒ [2.15.5](https://github.com/ROCmSoftwarePlatform/rccl/releases/tag/rocm-6.0.0) |
|
||||
| rocALUTION | ⇒ [3.0.3](https://github.com/ROCmSoftwarePlatform/rocALUTION/releases/tag/rocm-6.0.0) |
|
||||
| rocBLAS | ⇒ [4.0.0](https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/rocm-6.0.0) |
|
||||
| rocFFT | ⇒ [1.0.25](https://github.com/ROCmSoftwarePlatform/rocFFT/releases/tag/rocm-6.0.0) |
|
||||
| ROCgdb | [13.2](https://github.com/ROCm/ROCgdb/releases/tag/rocm-6.0.0) |
|
||||
| rocm-cmake | ⇒ [0.11.0](https://github.com/RadeonOpenCompute/rocm-cmake/releases/tag/rocm-6.0.0) |
|
||||
| rocPRIM | ⇒ [3.0.0](https://github.com/ROCmSoftwarePlatform/rocPRIM/releases/tag/rocm-6.0.0) |
|
||||
| rocprofiler | [2.0.0](https://github.com/ROCm/rocprofiler/releases/tag/rocm-6.0.0) |
|
||||
| rocRAND | ⇒ [2.10.17](https://github.com/ROCmSoftwarePlatform/rocRAND/releases/tag/rocm-6.0.0) |
|
||||
| rocSOLVER | ⇒ [3.24.0](https://github.com/ROCmSoftwarePlatform/rocSOLVER/releases/tag/rocm-6.0.0) |
|
||||
| rocSPARSE | ⇒ [3.0.2](https://github.com/ROCmSoftwarePlatform/rocSPARSE/releases/tag/rocm-6.0.0) |
|
||||
| rocThrust | ⇒ [3.0.0](https://github.com/ROCmSoftwarePlatform/rocThrust/releases/tag/rocm-6.0.0) |
|
||||
| rocWMMA | ⇒ [1.3.0](https://github.com/ROCmSoftwarePlatform/rocWMMA/releases/tag/rocm-6.0.0) |
|
||||
| Tensile | ⇒ [4.39.0](https://github.com/ROCmSoftwarePlatform/Tensile/releases/tag/rocm-6.0.0) |
|
||||
|
||||
#### AMDMIGraphX 2.8
|
||||
|
||||
MIGraphX 2.8 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Support for TorchMIGraphX via PyTorch
|
||||
* Boosted overall performance by integrating rocMLIR
|
||||
* INT8 support for ONNX Runtime
|
||||
* Support for ONNX version 1.14.1
|
||||
* Added new operators: `Qlinearadd`, `QlinearGlobalAveragePool`, `Qlinearconv`, `Shrink`, `CastLike`,
|
||||
and `RandomUniform`
|
||||
* Added an error message for when `gpu_targets` is not set during MIGraphX compilation
|
||||
* Added parameter to set tolerances with `migraphx-driver` verify
|
||||
* Added support for MXR files > 4 GB
|
||||
* Added `MIGRAPHX_TRACE_MLIR` flag
|
||||
* BETA added capability for using ROCm Composable Kernels via the `MIGRAPHX_ENABLE_CK=1`
|
||||
environment variable
|
||||
|
||||
##### Optimizations
|
||||
|
||||
* Improved performance support for INT8
|
||||
* Improved time precision while benchmarking candidate kernels from CK or MLIR
|
||||
* Removed contiguous from reshape parsing
|
||||
* Updated the `ConstantOfShape` operator to support Dynamic Batch
|
||||
* Simplified dynamic shapes-related operators to their static versions, where possible
|
||||
* Improved debugging tools for accuracy issues
|
||||
* Included a print warning about `miopen_fusion` while generating `mxr`
|
||||
* General reduction in system memory usage during model compilation
|
||||
* Created additional fusion opportunities during model compilation
|
||||
* Improved debugging for matchers
|
||||
* Improved general debug messages
|
||||
|
||||
##### Fixes
|
||||
|
||||
* Fixed scatter operator for nonstandard shapes with some models from ONNX Model Zoo
|
||||
* Provided a compile option to improve the accuracy of some models by disabling Fast-Math
|
||||
* Improved layernorm + pointwise fusion matching to ignore argument order
|
||||
* Fixed accuracy issue with `ROIAlign` operator
|
||||
* Fixed computation logic for the `Trilu` operator
|
||||
* Fixed support for the DETR model
|
||||
|
||||
##### Changes
|
||||
|
||||
* Changed MIGraphX version to 2.8
|
||||
* Extracted the test packages into a separate deb file when building MIGraphX from source
|
||||
|
||||
##### Removals
|
||||
|
||||
* Removed building Python 2.7 bindings
|
||||
|
||||
#### AMD SMI
|
||||
|
||||
* Integrated the E-SMI library: You can now query CPU-related information directly through AMD SMI.
|
||||
Metrics include power, energy, performance, and other system details.
|
||||
|
||||
* Added support for gfx942 metrics: You can now query MI300 device metrics to get real-time
|
||||
information. Metrics include power, temperature, energy, and performance.
|
||||
|
||||
* Added support for compute and memory partitions
|
||||
|
||||
#### HIP 6.0.0
|
||||
|
||||
HIP 6.0.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* New fields and structs for external resource interoperability
|
||||
* `hipExternalMemoryHandleDesc_st`
|
||||
* `hipExternalMemoryBufferDesc_st`
|
||||
* `hipExternalSemaphoreHandleDesc_st`
|
||||
* `hipExternalSemaphoreSignalParams_st`
|
||||
* `hipExternalSemaphoreWaitParams_st Enumerations`
|
||||
* `hipExternalMemoryHandleType_enum`
|
||||
* `hipExternalSemaphoreHandleType_enum`
|
||||
* `hipExternalMemoryHandleType_enum`
|
||||
|
||||
* New environment variable `HIP_LAUNCH_BLOCKING`
|
||||
* For serialization on kernel execution. The default value is 0 (disable); kernel will execute normally as
|
||||
defined in the queue. When this environment variable is set as 1 (enable), HIP runtime will
|
||||
serialize kernel enqueue; behaves the same as AMD_SERIALIZE_KERNEL.
|
||||
|
||||
* More members are added in HIP struct `hipDeviceProp_t`, for new feature capabilities including:
|
||||
* Texture
|
||||
* `int maxTexture1DMipmap;`
|
||||
* `int maxTexture2DMipmap[2];`
|
||||
* `int maxTexture2DLinear[3];`
|
||||
* `int maxTexture2DGather[2];`
|
||||
* `int maxTexture3DAlt[3];`
|
||||
* `int maxTextureCubemap;`
|
||||
* `int maxTexture1DLayered[2];`
|
||||
* `int maxTexture2DLayered[3];`
|
||||
* `int maxTextureCubemapLayered[2];`
|
||||
* Surface
|
||||
* `int maxSurface1D;`
|
||||
* `int maxSurface2D[2];`
|
||||
* `int maxSurface3D[3];`
|
||||
* `int maxSurface1DLayered[2];`
|
||||
* `int maxSurface2DLayered[3];`
|
||||
* `int maxSurfaceCubemap;`
|
||||
* `int maxSurfaceCubemapLayered[2];`
|
||||
* Device
|
||||
* `hipUUID uuid;`
|
||||
* `char luid[8];` this is an 8-byte unique identifier. Only valid on Windows
|
||||
* `unsigned int luidDeviceNodeMask;`
|
||||
|
||||
* LUID (Locally Unique Identifier) is supported for interoperability between devices. In HIP, more
|
||||
members are added in the struct `hipDeviceProp_t`, as properties to identify each device:
|
||||
* `char luid[8];`
|
||||
* `unsigned int luidDeviceNodeMask;`
|
||||
|
||||
Note: HIP supports LUID only on Windows OS.
|
||||
|
||||
##### Changes
|
||||
|
||||
* Some OpenGL Interop HIP APIs are moved from the hip_runtime_api header to a new header file hip_gl_interop.h for the AMD platform, as follows:
|
||||
* `hipGLGetDevices`
|
||||
* `hipGraphicsGLRegisterBuffer`
|
||||
* `hipGraphicsGLRegisterImage`
|
||||
|
||||
###### Changes impacting backward incompatibility
|
||||
|
||||
* Data types for members in `HIP_MEMCPY3D` structure are changed from `unsigned int` to `size_t`.
|
||||
* The value of the flag `hipIpcMemLazyEnablePeerAccess` is changed to `0x01`, which was previously
|
||||
defined as `0`
|
||||
* Some device property attributes are not currently supported in HIP runtime. In order to maintain
|
||||
consistency, the following related enumeration names are changed in `hipDeviceAttribute_t`
|
||||
* `hipDeviceAttributeName` is changed to `hipDeviceAttributeUnused1`
|
||||
* `hipDeviceAttributeUuid` is changed to `hipDeviceAttributeUnused2`
|
||||
* `hipDeviceAttributeArch` is changed to `hipDeviceAttributeUnused3`
|
||||
* `hipDeviceAttributeGcnArch` is changed to `hipDeviceAttributeUnused4`
|
||||
* `hipDeviceAttributeGcnArchName` is changed to `hipDeviceAttributeUnused5`
|
||||
* HIP struct `hipArray` is removed from driver type header to comply with CUDA
|
||||
* `hipArray_t` replaces `hipArray*`, as the pointer to array.
|
||||
* This allows `hipMemcpyAtoH` and `hipMemcpyHtoA` to have the correct array type which is
|
||||
equivalent to corresponding CUDA driver APIs.
|
||||
|
||||
##### Fixes
|
||||
|
||||
* Kernel launch maximum dimension validation is added specifically on gridY and gridZ in the HIP API `hipModule-LaunchKernel`. As a result,when `hipGetDeviceAttribute` is called for the value of `hipDeviceAttributeMaxGrid-Dim`, the behavior on the AMD platform is equivalent to NVIDIA.
|
||||
|
||||
* The HIP stream synchronization behavior is changed in internal stream functions, in which a flag "wait" is added and set when the current stream is null pointer while executing stream synchronization on other explicitly created streams. This change avoids blocking of execution on null/default stream. The change won't affect usage of applications, and makes them behave the same on the AMD platform as NVIDIA.
|
||||
|
||||
* Error handling behavior on unsupported GPU is fixed, HIP runtime will log out error message, instead of creating signal abortion error which is invisible to developers but continued kernel execution process. This is for the case when developers compile any application via hipcc, setting the option `--offload-arch` with GPU ID which is different from the one on the system.
|
||||
|
||||
* HIP complex vector type multiplication and division operations. On AMD platform, some duplicated complex operators are removed to avoid compilation failures. In HIP, `hipFloatComplex` and `hipDoubleComplex` are defined as complex data types: `typedef float2 hipFloatComplex; typedef double2 hipDoubleComplex;` Any application that uses complex multiplication and division operations needs to replace '*' and '/' operators with the following:
|
||||
* `hipCmulf()` and `hipCdivf()` for `hipFloatComplex`
|
||||
* `hipCmul()` and `hipCdiv()` for `hipDoubleComplex`
|
||||
Note: These complex operations are equivalent to corresponding types/functions on NVIDIA platform.
|
||||
|
||||
##### Removals
|
||||
|
||||
* Deprecated Heterogeneous Compute (HCC) symbols and flags are removed from the HIP source code, including:
|
||||
* Build options on obsolete `HCC_OPTIONS` were removed from cmake.
|
||||
* Micro definitions are removed:
|
||||
* `HIP_INCLUDE_HIP_HCC_DETAIL_DRIVER_TYPES_H`
|
||||
* `HIP_INCLUDE_HIP_HCC_DETAIL_HOST_DEFINES_H`
|
||||
* Compilation flags for the platform definitions
|
||||
* AMD platform
|
||||
* `HIP_PLATFORM_HCC`
|
||||
* `HCC`
|
||||
* `HIP_ROCclr`
|
||||
* NVIDIA platform
|
||||
* `HIP_PLATFORM_NVCC`
|
||||
* File directories in the clr repository are removed, for more details see https://github.com/ROCm-Developer-Tools/clr/blob/develop/hipamd/include/hip/hcc_detail and https://github.com/ROCm-Developer-Tools/clr/blob/develop/hipamd/include/hip/nvcc_detail
|
||||
* Deprecated gcnArch is removed from hip device struct `hipDeviceProp_t`.
|
||||
* Deprecated `enum hipMemoryType memoryType;` is removed from HIP struct `hipPointerAttribute_t` union.
|
||||
|
||||
#### hipBLAS 2.0.0
|
||||
|
||||
hipBLAS 2.0.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* New option to define `HIPBLAS_USE_HIP_BFLOAT16` to switch API to use the `hip_bfloat16` type
|
||||
* New `hipblasGemmExWithFlags` API
|
||||
|
||||
##### Deprecations
|
||||
|
||||
* `hipblasDatatype_t`; use `hipDataType` instead
|
||||
* `hipblasComplex`; use `hipComplex` instead
|
||||
* `hipblasDoubleComplex`; use `hipDoubleComplex` instead
|
||||
* Use of `hipblasDatatype_t` for `hipblasGemmEx` for compute-type; use `hipblasComputeType_t` instead
|
||||
|
||||
##### Removals
|
||||
|
||||
* `hipblasXtrmm` (calculates B <- alpha * op(A) * B) has been replaced with `hipblasXtrmm` (calculates
|
||||
C <- alpha * op(A) * B)
|
||||
|
||||
|
||||
#### hipCUB 3.0.0
|
||||
|
||||
hipCUB 3.0.0 for ROCm 6.0.0
|
||||
|
||||
##### Changes
|
||||
|
||||
* Removed `DOWNLOAD_ROCPRIM`: you can force rocPRIM to download using
|
||||
`DEPENDENCIES_FORCE_DOWNLOAD`
|
||||
|
||||
#### hipFFT 1.0.13
|
||||
|
||||
hipFFT 1.0.13 for ROCm 6.0.0
|
||||
|
||||
##### Changes
|
||||
|
||||
* `hipfft-rider` has been renamed to `hipfft-bench`; it is controlled by the `BUILD_CLIENTS_BENCH`
|
||||
CMake option (note that a link for the old file name is installed, and the old `BUILD_CLIENTS_RIDER`
|
||||
CMake option is accepted for backwards compatibility, but both will be removed in a future release)
|
||||
* Binaries in debug builds no longer have a `-d` suffix
|
||||
* The minimum rocFFT required version has been updated to 1.0.21
|
||||
|
||||
##### Additions
|
||||
|
||||
* `hipfftXtSetGPUs`, `hipfftXtMalloc, hipfftXtMemcpy`, `hipfftXtFree`, and `hipfftXtExecDescriptor` APIs
|
||||
have been implemented to allow FFT computing on multiple devices in a single process
|
||||
|
||||
#### hipSOLVER 2.0.0
|
||||
|
||||
hipSOLVER 2.0.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Added hipBLAS as an optional dependency to `hipsolver-test`
|
||||
* You can use the `BUILD_HIPBLAS_TESTS` CMake option to test the compatibility between hipSOLVER
|
||||
and hipBLAS
|
||||
|
||||
##### Changes
|
||||
|
||||
* The `hipsolverOperation_t` type is now an alias of `hipblasOperation_t`
|
||||
* The `hipsolverFillMode_t` type is now an alias of `hipblasFillMode_t`
|
||||
* The `hipsolverSideMode_t` type is now an alias of `hipblasSideMode_t`
|
||||
|
||||
##### Fixes
|
||||
|
||||
* Tests for hipSOLVER info updates in `ORGBR/UNGBR`, `ORGQR/UNGQR`, `ORGTR/UNGTR`,
|
||||
`ORMQR/UNMQR`, and `ORMTR/UNMTR`
|
||||
|
||||
#### hipSPARSE 3.0.0
|
||||
|
||||
hipSPARSE 3.0.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Added `hipsparseGetErrorName` and `hipsparseGetErrorString`
|
||||
|
||||
##### Changes
|
||||
|
||||
* Changed the `hipsparseSpSV_solve()` API function to match the cuSPARSE API
|
||||
* Changed generic API functions to use const descriptors
|
||||
* Improved documentation
|
||||
|
||||
#### hipTensor 1.1.0
|
||||
|
||||
hipTensor 1.1.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Architecture support for gfx942
|
||||
* Client tests configuration parameters now support YAML file input format
|
||||
|
||||
##### Changes
|
||||
|
||||
* Doxygen now treats warnings as errors
|
||||
|
||||
##### Fixes
|
||||
|
||||
* Client tests output redirections now behave accordingly
|
||||
* Removed dependency static library deployment
|
||||
* Security issues for documentation
|
||||
* Compile issues in debug mode
|
||||
* Corrected soft link for ROCm deployment
|
||||
|
||||
#### MIOpen 2.19.0
|
||||
|
||||
MIOpen 2.19.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* ROCm 5.5 support for gfx1101 (Navi32)
|
||||
|
||||
##### Changes
|
||||
|
||||
* Tuning results for MLIR on ROCm 5.5
|
||||
* Bumped MLIR commit to 5.5.0 release tag
|
||||
|
||||
##### Fixes
|
||||
|
||||
* 3-D convolution host API bug
|
||||
* `[HOTFIX][MI200][FP16]` has been disabled for `ConvHipImplicitGemmBwdXdlops` when FP16_ALT is
|
||||
required
|
||||
|
||||
#### MIVisionX
|
||||
|
||||
* Added Comprehensive CTests to aid developers
|
||||
* Introduced Doxygen support for complete API documentation
|
||||
* Simplified dependencies for rocAL
|
||||
|
||||
#### OpenMP
|
||||
|
||||
* MI300:
|
||||
* Added support for gfx942 targets
|
||||
* Fixed declare target variable access in unified_shared_memory mode
|
||||
* Enabled OMPX_APU_MAPS environment variable for MI200 and gfx942
|
||||
* Handled global pointers in forced USM (`OMPX_APU_MAPS`)
|
||||
|
||||
* Nextgen AMDGPU plugin:
|
||||
* Respect `GPU_MAX_HW_QUEUES` in the AMDGPU Nextgen plugin, which takes precedence over the
|
||||
standard `LIBOMPTARGET_AMDGPU_NUM_HSA_QUEUES` environment variable
|
||||
* Changed the default for `LIBOMPTARGET_AMDGPU_TEAMS_PER_CU` from 4 to 6
|
||||
* Fixed the behavior of the `OMPX_FORCE_SYNC_REGIONS` environment variable, which is used to
|
||||
force synchronous target regions (the default is to use an asynchronous implementation)
|
||||
* Added support for and enabled default of code object version 5
|
||||
* Implemented target OMPT callbacks and trace records support in the nextgen plugin
|
||||
|
||||
* Specialized kernels:
|
||||
* Removes redundant copying of arrays when xteam reductions are active but not offloaded
|
||||
* Tuned the number of teams for BigJumpLoop
|
||||
* Enables specialized kernel generation with nested OpenMP pragma, as long as there is no nested
|
||||
omp-parallel directive
|
||||
|
||||
##### Additions
|
||||
|
||||
* `-fopenmp-runtimelib={lib,lib-perf,lib-debug}` to select libs
|
||||
* Warning if mixed HIP / OpenMP offloading (i.e., if HIP language mode is active, but OpenMP target
|
||||
directives are encountered)
|
||||
* Introduced compile-time limit for the number of GPUs supported in a system: 16 GPUs in a single
|
||||
node is currently the maximum supported
|
||||
|
||||
##### Changes
|
||||
|
||||
* Correctly compute number of waves when workgroup size is less than the wave size
|
||||
* Implemented `LIBOMPTARGET_KERNEL_TRACE=3`, which prints DEVID traces and API timings
|
||||
* ASAN support for openmp release, debug, and perf libraries
|
||||
* Changed LDS lowering default to hybrid
|
||||
|
||||
##### Fixes
|
||||
|
||||
* Fixed RUNPATH for gdb plugin
|
||||
* Fixed hang in OMPT support if flush trace is called when there are no helper threads
|
||||
|
||||
#### rccl 2.15.5
|
||||
|
||||
RCCL 2.15.5 for ROCm 6.0.0
|
||||
|
||||
##### Changes
|
||||
|
||||
* Compatibility with NCCL 2.15.5
|
||||
* Renamed the unit test executable to `rccl-UnitTests`
|
||||
|
||||
##### Additions
|
||||
|
||||
* HW-topology-aware binary tree implementation
|
||||
* Experimental support for MSCCL
|
||||
* New unit tests for hipGraph support
|
||||
* NPKit integration
|
||||
|
||||
##### Fixes
|
||||
|
||||
* rocm-smi ID conversion
|
||||
* Support for `HIP_VISIBLE_DEVICES` for unit tests
|
||||
* Support for p2p transfers to non (HIP) visible devices
|
||||
|
||||
##### Removals
|
||||
|
||||
* Removed TransferBench from tools as it exists in standalone repo:
|
||||
[https://github.com/ROCmSoftwarePlatform/TransferBench](https://github.com/ROCmSoftwarePlatform/TransferBench)
|
||||
|
||||
#### rocALUTION 3.0.3
|
||||
|
||||
rocALUTION 3.0.3 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Support for 64bit integer vectors
|
||||
* Inclusive and exclusive sum functionality for vector classes
|
||||
* Transpose functionality for `GlobalMatrix` and `LocalMatrix`
|
||||
* `TripleMatrixProduct` functionality for `LocalMatrix`
|
||||
* `Sort()` function for `LocalVector` class
|
||||
* Multiple stream support to the HIP backend
|
||||
|
||||
##### Optimizations
|
||||
|
||||
* `GlobalMatrix::Apply()` now uses multiple streams to better hide communication
|
||||
|
||||
##### Changes
|
||||
|
||||
* Matrix dimensions and number of non-zeros are now stored using 64-bit integers
|
||||
* Improved the ILUT preconditioner
|
||||
|
||||
##### Removals
|
||||
|
||||
* `LocalVector::GetIndexValues(ValueType*)`
|
||||
* `LocalVector::SetIndexValues(const ValueType*)`
|
||||
* `LocalMatrix::RSDirectInterpolation(const LocalVector&, const LocalVector&, LocalMatrix*, LocalMatrix*)`
|
||||
* `LocalMatrix::RSExtPIInterpolation(const LocalVector&, const LocalVector&, bool, float, LocalMatrix*, LocalMatrix*)`
|
||||
* `LocalMatrix::RugeStueben()`
|
||||
* `LocalMatrix::AMGSmoothedAggregation(ValueType, const LocalVector&, const LocalVector&, LocalMatrix*, LocalMatrix*, int)`
|
||||
* `LocalMatrix::AMGAggregation(const LocalVector&, LocalMatrix*, LocalMatrix*)`
|
||||
|
||||
##### Fixes
|
||||
|
||||
* Unit tests no longer ignore BCSR block dimension
|
||||
* Fixed documentation typos
|
||||
* Bug in multi-coloring for non-symmetric matrix patterns
|
||||
|
||||
#### rocBLAS 4.0.0
|
||||
|
||||
rocBLAS 4.0.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Beta API `rocblas_gemm_batched_ex3` and `rocblas_gemm_strided_batched_ex3`
|
||||
* Input/output type f16_r/bf16_r and execution type f32_r support for Level 2 gemv_batched and
|
||||
gemv_strided_batched
|
||||
* Use of `rocblas_status_excluded_from_build` when calling functions that require Tensile (when using
|
||||
rocBLAS built without Tensile)
|
||||
* System for asynchronous kernel launches that set a `rocblas_status` failure based on a
|
||||
`hipPeekAtLastError` discrepancy
|
||||
|
||||
##### Optimizations
|
||||
|
||||
* TRSM performance for small sizes (m < 32 && n < 32)
|
||||
|
||||
##### Deprecations
|
||||
|
||||
* Atomic operations will be disabled by default in a future release of rocBLAS (you can enable atomic
|
||||
operations using the `rocblas_set_atomics_mode` function)
|
||||
|
||||
##### Removals
|
||||
|
||||
* `rocblas_gemm_ext2` API function
|
||||
* In-place trmm API from Legacy BLAS is replaced by an API that supports both in-place and
|
||||
out-of-place trmm
|
||||
* int8x4 support is removed (int8 support is unchanged)
|
||||
* `#define __STDC_WANT_IEC_60559_TYPES_EXT__` is removed from `rocblas-types.h` (if you want
|
||||
ISO/IEC TS 18661-3:2015 functionality, you must define `__STDC_WANT_IEC_60559_TYPES_EXT__`
|
||||
before including `float.h`, `math.h`, and `rocblas.h`)
|
||||
* The default build removes device code for gfx803 architecture from the fat binary
|
||||
|
||||
##### Fixes
|
||||
|
||||
* Made offset calculations for 64-bit rocBLAS functions safe
|
||||
* Fixes for very large leading dimension or increment potentially causing overflow:
|
||||
* Level2: `gbmv`, `gemv`, `hbmv`, `sbmv`, `spmv`, `tbmv`, `tpmv`, `tbsv`, and `tpsv`
|
||||
* Lazy loading supports heterogeneous architecture setup and load-appropriate tensile library files,
|
||||
based on device architecture
|
||||
* Guards against no-op kernel launches that result in a potential `hipGetLastError`
|
||||
|
||||
##### Changes
|
||||
|
||||
* Reduced the default verbosity of `rocblas-test` (you can see all tests by setting the
|
||||
`GTEST_LISTENER=PASS_LINE_IN_LOG` environment variable)
|
||||
|
||||
#### rocFFT 1.0.25
|
||||
|
||||
rocFFT 1.0.25 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Implemented experimental APIs to allow computing FFTs on data distributed across multiple devices
|
||||
in a single process
|
||||
|
||||
* `rocfft_field` is a new type that can be added to a plan description to describe the layout of FFT
|
||||
input or output
|
||||
* `rocfft_field_add_brick` can be called to describe the brick decomposition of an FFT field, where each
|
||||
brick can be assigned a different device
|
||||
|
||||
These interfaces are still experimental and subject to change. Your feedback is appreciated.
|
||||
You can raise questions and concerns by opening issues in the
|
||||
[rocFFT issue tracker](https://github.com/ROCmSoftwarePlatform/rocFFT/issues).
|
||||
|
||||
Note that multi-device FFTs currently have several limitations (we plan to address these in future
|
||||
releases):
|
||||
|
||||
* Real-complex (forward or inverse) FFTs are not supported
|
||||
* Planar format fields are not supported
|
||||
* Batch (the `number_of_transforms` provided to `rocfft_plan_create`) must be 1
|
||||
* FFT input is gathered to the current device at run time, so all FFT data must fit on that device
|
||||
|
||||
##### Optimizations
|
||||
|
||||
* Improved the performance of several 2D/3D real FFTs supported by `2D_SINGLE` kernel. Offline
|
||||
tuning provides more optimization for fx90a
|
||||
* Removed an extra kernel launch from even-length, real-complex FFTs that use callbacks
|
||||
|
||||
##### Changes
|
||||
|
||||
* Built kernels in a solution map to the library kernel cache
|
||||
* Real forward transforms (real-to-complex) no longer overwrite input; rocFFT may still overwrite real
|
||||
inverse (complex-to-real) input, as this allows for faster performance
|
||||
|
||||
* `rocfft-rider` and `dyna-rocfft-rider` have been renamed to `rocfft-bench` and `dyna-rocfft-bench`;
|
||||
these are controlled by the `BUILD_CLIENTS_BENCH` CMake option
|
||||
* Links for the former file names are installed, and the former `BUILD_CLIENTS_RIDER` CMake option
|
||||
is accepted for compatibility, but both will be removed in a future release
|
||||
* Binaries in debug builds no longer have a `-d` suffix
|
||||
|
||||
##### Fixes
|
||||
|
||||
* rocFFT now correctly handles load callbacks that convert data from a smaller data type (e.g., 16-bit
|
||||
integers -> 32-bit float)
|
||||
|
||||
#### ROCgdb 13.2
|
||||
|
||||
ROCgdb 13.2 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Support for watchpoints on scratch memory addresses.
|
||||
* Added support for gfx1100, gfx1101, and gfx1102.
|
||||
* Added support for gfx942.
|
||||
|
||||
##### Optimizations
|
||||
|
||||
* Improved performances when handling the end of a process with a large number of threads.
|
||||
|
||||
##### Known issues
|
||||
|
||||
* On certain configurations, ROCgdb can show the following warning message:
|
||||
`warning: Probes-based dynamic linker interface failed. Reverting to original interface.`
|
||||
This does not affect ROCgdb's functionalities.
|
||||
|
||||
* ROCgdb cannot debug a program on an AMDGPU device past a
|
||||
`s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)` instruction. If an exception is reported after this
|
||||
instruction has been executed (including asynchronous exceptions), the wave is killed and the
|
||||
exceptions are only reported by the ROCm runtime.
|
||||
|
||||
#### rocm-cmake 0.11.0
|
||||
|
||||
rocm-cmake 0.11.0 for ROCm 6.0.0
|
||||
|
||||
##### Changes
|
||||
|
||||
* Improved validation, documentation, and rocm-docs-core integration for ROCMSphinxDoc
|
||||
|
||||
##### Fixes
|
||||
|
||||
* Fixed extra `make` flags passed for Clang-Tidy (ROCMClangTidy).
|
||||
* Fixed issues with ROCMTest when using a module in a subdirectory
|
||||
|
||||
#### ROCm Compiler
|
||||
|
||||
* On MI300, kernel arguments can be preloaded into SGPRs rather than passed in memory. This
|
||||
feature is enabled with a compiler option, which also controls the number of arguments to pass in
|
||||
SGPRs.
|
||||
|
||||
* Improved register allocation at -O0: Avoid compiler crashes ( 'ran out of registers during register allocation' )
|
||||
|
||||
* Improved generation of debug information:
|
||||
* Improve compile time
|
||||
* Avoid compiler crashes
|
||||
|
||||
#### rocPRIM 3.0.0
|
||||
|
||||
rocPRIM 3.0.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* `block_sort::sort()` overload for keys and values with a dynamic size, for all block sort algorithms
|
||||
* All `block_sort::sort()` overloads with a dynamic size are now supported for
|
||||
`block_sort_algorithm::merge_sort` and `block_sort_algorithm::bitonic_sort`
|
||||
* New two-way partition primitive `partition_two_way`, which can write to two separate iterators
|
||||
|
||||
##### Optimizations
|
||||
|
||||
* Improved `partition` performance
|
||||
|
||||
##### Fixes
|
||||
|
||||
* Fixed `rocprim::MatchAny` for devices with 64-bit warp size
|
||||
* Note that `rocprim::MatchAny` is deprecated; use `rocprim::match_any` instead
|
||||
|
||||
#### Roc Profiler 2.0.0
|
||||
|
||||
Roc Profiler 2.0.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Updated supported GPU architectures in README with profiler versions
|
||||
* Automatic ISA dumping for ATT. See README.
|
||||
* CSV mode for ATT. See README.
|
||||
* Added option to control kernel name truncation.
|
||||
* Limit rocprof(v1) script usage to only supported architectures.
|
||||
* Added Tool versioning to be able to run rocprofv2 using rocprof. See README for more information.
|
||||
* Added Plugin Versioning way in rocprofv2. See README for more details.
|
||||
* Added `--version` in rocprof and rocprofv2 to be able to see the current rocprof/v2 version along with ROCm version information.
|
||||
|
||||
#### rocRAND 2.10.17
|
||||
|
||||
rocRAND 2.10.17 for ROCm 6.0.0
|
||||
|
||||
### Changes
|
||||
|
||||
* Generator classes from `rocrand.hpp` are no longer copyable (in previous versions these copies
|
||||
would copy internal references to the generators and would lead to double free or memory leak
|
||||
errors)
|
||||
* These types should be moved instead of copied; move constructors and operators are now
|
||||
defined
|
||||
|
||||
### Optimizations
|
||||
|
||||
* Improved MT19937 initialization and generation performance
|
||||
|
||||
### Removals
|
||||
|
||||
* Removed the hipRAND submodule from rocRAND; hipRAND is now only available as a separate
|
||||
package
|
||||
* Removed references to, and workarounds for, the deprecated hcc
|
||||
|
||||
### Fixes
|
||||
|
||||
* `mt19937_engine` from `rocrand.hpp` is now move-constructible and move-assignable (the move
|
||||
constructor and move assignment operator was deleted for this class)
|
||||
* Various fixes for the C++ wrapper header `rocrand.hpp`
|
||||
* The name of `mrg31k3p` it is now correctly spelled (was incorrectly named `mrg31k3a` in previous
|
||||
versions)
|
||||
* Added the missing `order` setter method for `threefry4x64`
|
||||
* Fixed the default ordering parameter for `lfsr113`
|
||||
* Build error when using Clang++ directly resulting from unsupported `amdgpu-target` references
|
||||
|
||||
#### rocSOLVER 3.24.0
|
||||
|
||||
rocSOLVER 3.24.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Cholesky refactorization for sparse matrices: `CSRRF_REFACTCHOL`
|
||||
* Added `rocsolver_rfinfo_mode` and the ability to specify the desired refactorization routine (see `rocsolver_set_rfinfo_mode`)
|
||||
|
||||
##### Changes
|
||||
|
||||
* `CSRRF_ANALYSIS` and `CSRRF_SOLVE` now support sparse Cholesky factorization
|
||||
|
||||
#### rocSPARSE 3.0.2
|
||||
|
||||
rocSPARSE 3.0.2 for ROCm 6.0.0
|
||||
|
||||
##### Changes
|
||||
|
||||
* Function arguments for `rocsparse_spmv`
|
||||
* Function arguments for `rocsparse_xbsrmv` routines
|
||||
* When using host pointer mode, you must now call `hipStreamSynchronize` following `doti`, `dotci`,
|
||||
`spvv`, and `csr2ell`
|
||||
* Improved documentation
|
||||
* Improved verbose output during argument checking on API function calls
|
||||
|
||||
##### Removals
|
||||
|
||||
* Auto stages from `spmv`, `spmm`, `spgemm`, `spsv`, `spsm`, and `spitsv`
|
||||
* Formerly deprecated `rocsparse_spmm_ex` routine
|
||||
|
||||
### Fixes
|
||||
|
||||
* Bug in `rocsparse-bench` where the SpMV algorithm was not taken into account in CSR format
|
||||
* BSR and GEBSR routines (`bsrmv`, `bsrsv`, `bsrmm`, `bsrgeam`, `gebsrmv`, `gebsrmm`) didn't always
|
||||
show `block_dim==0` as an invalid size
|
||||
* Passing `nnz = 0` to `doti` or `dotci` wasn't always returning a dot product of 0
|
||||
|
||||
### Additions
|
||||
|
||||
* `rocsparse_inverse_permutation`
|
||||
* Mixed-precisions for SpVV
|
||||
* Uniform int8 precision for gather and scatter
|
||||
|
||||
#### rocThrust 3.0.0
|
||||
|
||||
rocThrust 3.0.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Updated to match upstream Thrust 2.0.1
|
||||
* `NV_IF_TARGET` macro from libcu++ for NVIDIA backend and HIP implementation for HIP backend
|
||||
|
||||
##### Changes
|
||||
|
||||
* The CMake build system now accepts `GPU_TARGETS` in addition to `AMDGPU_TARGETS` for
|
||||
setting targeted GPU architectures
|
||||
* `GPU_TARGETS=all` compiles for all supported architectures
|
||||
* `AMDGPU_TARGETS` is only provided for backwards compatibility (`GPU_TARGETS` is preferred)
|
||||
* Removed CUB symlink from the root of the repository
|
||||
* Removed support for deprecated macros (`THRUST_DEVICE_BACKEND` and
|
||||
`THRUST_HOST_BACKEND`)
|
||||
|
||||
##### Known issues
|
||||
|
||||
* The `THRUST_HAS_CUDART` macro, which is no longer used in Thrust (it's provided only for legacy
|
||||
support) is replaced with `NV_IF_TARGET` and `THRUST_RDC_ENABLED` in the NVIDIA backend. The
|
||||
HIP backend doesn't have a `THRUST_RDC_ENABLED` macro, so some branches in Thrust code may
|
||||
be unreachable in the HIP backend.
|
||||
|
||||
#### rocWMMA 1.3.0
|
||||
|
||||
rocWMMA 1.3.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Support for gfx942
|
||||
* Support for f8, bf8, and xfloat32 data types
|
||||
* support for `HIP_NO_HALF`, `__ HIP_NO_HALF_CONVERSIONS__`, and
|
||||
`__ HIP_NO_HALF_OPERATORS__` (e.g., PyTorch environment)
|
||||
|
||||
##### Changes
|
||||
|
||||
* rocWMMA with hipRTC now supports `bfloat16_t` data type
|
||||
* gfx11 WMMA now uses lane swap instead of broadcast for layout adjustment
|
||||
* Updated samples GEMM parameter validation on host arch
|
||||
|
||||
##### Fixes
|
||||
|
||||
* Disabled GoogleTest static library deployment
|
||||
* Extended tests now build in large code model
|
||||
|
||||
#### Tensile 4.39.0
|
||||
|
||||
Tensile 4.39.0 for ROCm 6.0.0
|
||||
|
||||
##### Additions
|
||||
|
||||
* Added `aquavanjaram` support: gfx942, fp8/bf8 datatype, xf32 datatype, and
|
||||
stochastic rounding for various datatypes
|
||||
* Added and updated tuning scripts
|
||||
* Added `DirectToLds` support for larger data types with 32-bit global load (old parameter `DirectToLds`
|
||||
is replaced with `DirectToLdsA` and `DirectToLdsB`), and the corresponding test cases
|
||||
* Added the average of frequency, power consumption, and temperature information for the winner
|
||||
kernels to the CSV file
|
||||
* Added asmcap check for MFMA + const src
|
||||
* Added support for wider local read + pack with v_perm (with `VgprForLocalReadPacking=True`)
|
||||
* Added a new parameter to increase `miLatencyLeft`
|
||||
|
||||
##### Optimizations
|
||||
|
||||
* Enabled `InitAccVgprOpt` for `MatrixInstruction` cases
|
||||
* Implemented local read related parameter calculations with `DirectToVgpr`
|
||||
* Enabled dedicated vgpr allocation for local read + pack
|
||||
* Optimized code initialization
|
||||
* Optimized sgpr allocation
|
||||
* Supported DGEMM TLUB + RLVW=2 for odd N (edge shift change)
|
||||
* Enabled `miLatency` optimization for specific data types, and fixed
|
||||
instruction scheduling
|
||||
|
||||
##### Changes
|
||||
|
||||
* Removed old code for DTL + (bpe * GlobalReadVectorWidth > 4)
|
||||
* Changed/updated failed CI tests for gfx11xx, InitAccVgprOpt, and DTLds
|
||||
* Removed unused `CustomKernels` and `ReplacementKernels`
|
||||
* Added a reject condition for DTVB + TransposeLDS=False (not supported so far)
|
||||
* Removed unused code for DirectToLds
|
||||
* Updated test cases for DTV + TransposeLDS=False
|
||||
* Moved the `MinKForGSU` parameter from `globalparameter` to `BenchmarkCommonParameter` to
|
||||
support smaller K
|
||||
* Changed how to calculate `latencyForLR` for miLatency
|
||||
* Set minimum value of `latencyForLRCount` for 1LDSBuffer to avoid getting rejected by
|
||||
overflowedResources=5 (related to miLatency)
|
||||
* Refactored allowLRVWBforTLUandMI and renamed it as VectorWidthB
|
||||
* Supported multi-gpu for different architectures in lazy library loading
|
||||
* Enabled dtree library for batch > 1
|
||||
* Added problem scale feature for dtree selection
|
||||
* Modified non-lazy load build to skip experimental logic
|
||||
|
||||
##### Fixes
|
||||
|
||||
* Predicate ordering for fp16alt impl round near zero mode to unbreak distance modes
|
||||
* Boundary check for mirror dims and re-enable disabled mirror dims test cases
|
||||
* Merge error affecting i8 with WMMA
|
||||
* Mismatch issue with DTLds + TSGR + TailLoop
|
||||
* Bug with `InitAccVgprOpt` + GSU>1 and a mismatch issue with PGR=0
|
||||
* Override for unloaded solutions when lazy loading
|
||||
* Adding missing headers
|
||||
* Boost link for a clean build on Ubuntu 22
|
||||
* Bug in `forcestoresc1` arch selection
|
||||
* Compiler directive for gfx942
|
||||
* Formatting for `DecisionTree_test.cpp`
|
||||
|
||||
CMake support is added for the documentation in the [ROCm repository](https://github.com/RadeonOpenCompute/ROCm).
|
||||
|
||||
Reference in New Issue
Block a user