Update RELEASE.md (#515)

* Update RELEASE.md

Add logical reduction changes to ROCm 7.0 Release Notes

* Update RELEASE.md

Added description of DebugFission option for llvm-project

* Update RELEASE.md

update definition of __builtin_amdgcn_is_invocable

* Update RELEASE.md

Removed Perl Scripts from HIPCC
This commit is contained in:
randyh62
2025-08-21 06:18:35 -07:00
committed by GitHub
parent 6b93d7a75a
commit 0d5f17a58b

View File

@@ -854,7 +854,8 @@ See the full [AMD SMI changelog](https://github.com/ROCm/amdsmi/blob/release/roc
- `hipDrvLaunchKernelEx` dispatches the device kernel represented by a HIP function object.
- `hipMemGetHandleForAddressRange` gets a handle for the address range requested.
- `num_threads` Total number of threads in the group. The legacy API size is alias.
- `__reduce_add_sync`, `__reduce_min_sync`, and `__reduce_max_sync` functions added for reduction across lanes of a warp. For details, see [Warp cross-lane functions](https://rocm.docs.amd.com/projects/HIP/en/latest/how-to/hip_cpp_language_extensions.html#warp-cross-lane-functions).
- `__reduce_add_sync`, `__reduce_min_sync`, and `__reduce_max_sync` functions added for aritimetic reduction across lanes of a warp, and `__reduce_and_sync`, `__reduce_or_sync`, and `__reduce_xor_sync`
functions added for logical reduction. For details, see [Warp cross-lane functions](https://rocm.docs.amd.com/projects/HIP/en/latest/how-to/hip_cpp_language_extensions.html#warp-cross-lane-functions).
* New support for Open Compute Project (OCP) floating-point `FP4`/`FP6`/`FP8` as the following. For details, see [Low precision floating point document](https://rocm.docs.amd.com/projects/HIP/en/latest/reference/low_fp_types.html).
- Data types for `FP4`/`FP6`/`FP8`.
- HIP APIs for `FP4`/`FP6`/`FP8`, which are compatible with corresponding CUDA APIs.
@@ -1278,15 +1279,16 @@ HIP runtime has the following functional improvements which improves runtime per
#### Added
* Added compiler support for separate debug file generation for device code.
* Added the compiler `-gsplit-dwarf` option to enable the generation of separate debug information file at compile time. When used, separate debug information files are generated for host and for each offload architecture. For additional information, see [DebugFission](https://gcc.gnu.org/wiki/DebugFission).
* Added `llvm-flang`, AMD's next generation Fortran compiler is a re-implementation of the Fortran frontend that can be found at `llvm/llvm-project/flang` on GitHub.
* Added Comgr support for an in-memory virtual file system (VFS) for storing temporary files generated during intermediate compilation steps to improve performance in the device library link step.
* Added compiler support of a new target-specific builtin `__builtin_amdgcn_processor_is` for late or deferred queries of the current target processor, and `__builtin_amdgcn_is_invocable` enabling fine-grained target-specific feature availability.
* Added compiler support of a new target-specific builtin `__builtin_amdgcn_processor_is` for late or deferred queries of the current target processor, and `__builtin_amdgcn_is_invocable` to determine the current target processor ability to invoke a particular builtin.
* Added HIPIFY support for NVIDIA CUDA 12.9.1 APIs. Added support for all new device and host APIs, including FP4, FP6, and FP128, and support for the corresponding ROCm HIP equivalents.
#### Changed
* Updated clang/llvm to AMD clang version 20.0.0 (equivalent to LLVM 20.0.0 with additional out-of-tree patches).
* HIPCC Perl scripts (`hipcc.pl` and `hipconfig.pl`) have been removed from this release.
#### Optimized