ROCm/RELEASE.md

# Release notes for AMD ROCm™ 6.0

ROCm 6.0 is a major release with new performance optimizations, expanded frameworks and library
support, and improved developer experience. This includes initial enablement of the AMD Instinct™
MI300 series. Future releases will further enable and optimize this new platform. Key features include:

* Improved performance in areas like lower precision math and attention layers.
* New hipSPARSELt library accelerates AI workloads via AMD's sparse matrix core technique.
* Upstream support is now available for popular AI frameworks like TensorFlow, JAX, and PyTorch.
* New support for libraries, such as DeepSpeed, ONNX-RT, and CuPy.
* Prepackaged HPC and AI containers on AMD Infinity Hub, with improved documentation and
  tutorials on the [AMD ROCm Docs](https://rocm.docs.amd.com) site.
* Consolidated developer resources and training on the new
  [AMD ROCm Developer Hub](https://www.amd.com/en/developer/resources/rocm-hub.html).

The following section provide a release overview for ROCm 6.0. For additional details, you can refer to
the [Changelog](https://rocm.docs.amd.com/en/develop/about/CHANGELOG.html). We list known
issues on [GitHub](https://github.com/ROCm/ROCm/issues).

## OS and GPU support changes

ROCm 6.0 enables the use of MI300A and MI300X Accelerators with a limited operating systems
support. Future releases will add additional OS's to match our general offering.

| Operating Systems | MI300A | MI300X |
|:---:|:---:|:---:|
| Ubuntu 22.04.3 | Supported | Supported |
| RHEL 8.9 | Supported |  |
| SLES15 SP5 | Supported |  |

For older generations of supported Instinct products we've added the following operating systems:

* RHEL 9.3
* RHEL 8.9

Note: For ROCm 6.2 and beyond, we've planned for end-of-support (EoS) for the following operating
systems:

* Ubuntu 20.04.5
* SLES 15 SP4
* RHEL/CentOS 7.9

## New ROCm meta package

We've added a new ROCm meta package for easy installation of all ROCm core packages, tools, and
libraries. For example, the following command will install the full ROCm package: `apt-get install rocm`
(Ubuntu), or `yum install rocm` (RHEL).

## Filesystem Hierarchy Standard

ROCm 6.0 fully adopts the Filesystem Hierarchy Standard (FHS) reorganization goals. We've removed
the backward compatibility support for old file locations.

## Compiler location change

* The installation path of LLVM has been changed from `/opt/rocm-<rel>/llvm` to
  `/opt/rocm-<rel>/lib/llvm`. For backward compatibility, a symbolic link is provided to the old
  location and will be removed in a future release.
* The installation path of the device library bitcode has changed from `/opt/rocm-<rel>/amdgcn` to
  `/opt/rocm-<rel>/lib/llvm/lib/clang/<ver>/lib/amdgcn`. For backward compatibility, a symbolic link
  is provided and will be removed in a future release.

## Documentation

CMake support has been added for documentation in the
[ROCm repository](https://github.com/RadeonOpenCompute/ROCm).

## AMD Instinct™ MI50 end-of-support notice

AMD Instinct MI50, Radeon Pro VII, and Radeon VII products (collectively gfx906 GPUs) enters
maintenance mode in ROCm 6.0.

As outlined in [5.6.0](https://rocm.docs.amd.com/en/docs-5.6.0/release.html), ROCm 5.7 was the
final release for gfx906 GPUs in a fully supported state.

  * Henceforth, no new features and performance optimizations will be supported for the gfx906 GPUs.
  * Bug fixes and critical security patches will continue to be supported for the gfx906 GPUs until Q2
    2024 (end of maintenance \[EOM] will be aligned with the closest ROCm release).
  * Bug fixes will be made up to the next ROCm point release.
  * Bug fixes will not be backported to older ROCm releases for gfx906.
  * Distribution and operating system updates will continue per the ROCm release cadence for gfx906
    GPUs until EOM.

## ROCm projects

The following sections contains project-specific release notes for ROCm 6.0. For additional details, you
can refer to the [Changelog](https://rocm.docs.amd.com/en/develop/about/CHANGELOG.html).

### AMD SMI

* **Integrated the E-SMI (EPYC-SMI) library**.
    You can now query CPU-related information directly through AMD SMI. Metrics include power,
    energy, performance, and other system details.

* **Added support for gfx942 metrics**.
    You can now query MI300 device metrics to get real-time information. Metrics include power,
    temperature, energy, and performance.

### HIP

* **New features to improve resource interoperability**.
  * For external resource interoperability, we've added new structs and enums.
  * We've added new members to HIP struct `hipDeviceProp_t` for surfaces, textures, and device
    identifiers.

* **Changes impacting backward compatibility**.
    There are several changes impacting backward compatibility: we changed some struct members and
    some enum values, and removed some deprecated flags. For additional information, please refer to
    the Changelog.

### hipCUB

* **Additional CUB API support**.
    The hipCUB backend is updated to CUB and Thrust 2.1.

### HIPIFY

* **Enhanced CUDA2HIP document generation**.
    API versions are now listed in the CUDA2HIP documentation. To see if the application binary
    interface (ABI) has changed, refer to the
    [*C* column](https://rocm.docs.amd.com/projects/HIPIFY/en/latest/tables/CUDA_Runtime_API_functions_supported_by_HIP.html)
    in our API documentation.

* **Hipified rocSPARSE**.
    We've implemented support for the direct hipification of additional cuSPARSE APIs into rocSPARSE
    APIs under the `--roc` option. This covers a major milestone in the roadmap towards complete
    cuSPARSE-to-rocSPARSE hipification.

### hipRAND

* **Official release**.
    hipRAND is now a *standalone project*--it's no longer available as a submodule for rocRAND.

### hipTensor

* **Added architecture support**.
    We've added contraction support for gfx942 architectures, and f32 and f64 data
    types.

* **Upgraded testing infrastructure**.
    hipTensor will now support dynamic parameter configuration with input YAML config.

### MIGraphX

* **Added TorchMIGraphX**.
    We introduced a Dynamo backend for Torch, which allows PyTorch to use MIGraphX directly
    without first requiring a model to be converted to the ONNX model format. With a single line of
    code, PyTorch users can utilize the performance and quantization benefits provided by MIGraphX.

* **Boosted overall performance with rocMLIR**.
    We've integrated the rocMLIR library for ROCm-supported RDNA and CDNA GPUs. This
    technology provides MLIR-based convolution and GEMM kernel generation.

* **Added INT8 support across the MIGraphX portfolio**.
    We now support the INT8 data type. MIGraphX can perform the quantization or ingest
    prequantized models. INT8 support extends to the MIGraphX execution provider for ONNX Runtime.

### ROCgdb

* **Added support for additional GPU architectures**.
  * Navi 3 series: gfx1100, gfx1101, and gfx1102.
  * MI300 series: gfx942.

### rocm-smi-lib

* **Improved accessibility to GPU partition nodes**.
    You can now view, set, and reset the compute and memory partitions. You'll also get notifications of
    a GPU busy state, which helps you avoid partition set or reset failure.

* **Upgraded GPU metrics version 1.4**.
    The upgraded GPU metrics binary has an improved metric version format with a content version
    appended to it. You can read each metric within the binary without the full `rsmi_gpu_metric_t` data
    structure.

* **Updated GPU index sorting**.
    We made GPU index sorting consistent with other ROCm software tools by optimizing it to use
    `Bus:Device.Function` (BDF) instead of the card number.

### ROCm Compiler

* **Added kernel argument optimization on gfx942**.
    With the new feature, you can preload kernel arguments into Scalar General-Purpose Registers
    (SGPRs) rather than pass them in memory. This feature is enabled with a compiler option, which also
    controls the number of arguments to pass in SGPRs. For more information, see:
    [https://llvm.org/docs/AMDGPUUsage.html#preloaded-kernel-arguments](https://llvm.org/docs/AMDGPUUsage.html#preloaded-kernel-arguments)

* **Improved register allocation at -O0**.
    We've improved the register allocator used at -O0 to avoid compiler crashes (when the signature is
    'ran out of registers during register allocation').

* **Improved generation of debug information**.
    We've improved compile time when generating debug information for certain corner cases. We've
    also improved the compiler to eliminate compiler crashes when generating debug information.

### ROCmValidationSuite

* **Added GPU and operating system support**.
    We added support for MI300X GPU in GPU Stress Test (GST).

### Roc Profiler

* **Added option to specify desired Roc Profiler version**.
    You can now use rocProfV1 or rocProfV2 by specifying your desired version, as the legacy rocProf
    (`rocprofv1`) provides the option to use the latest version (`rocprofv2`).

* **Automated the ISA dumping process by Advance Thread Tracer**.
    Advance Thread Tracer (ATT) no longer depends on user-supplied Instruction Set Architecture (ISA)
    and compilation process (using ``hipcc --save-temps``) to dump ISA from the running kernels.

* **Added ATT support for parallel kernels**.
    The automatic ISA dumping process also helps ATT successfully parse multiple kernels running in
    parallel, and provide cycle-accurate occupancy information for multiple kernels at the same time.

### ROCr

* **Support for SDMA link aggregation**.
    If multiple XGMI links are available when making SDMA copies between GPUs, the copy is
    distributed over multiple links to increase peak bandwidth.

### rocThrust

* **Added Thrust 2.1 API support**.
    rocThrust backend is updated to Thrust and CUB 2.1.

### rocWMMA

* **Added new architecture support**.
    We added support for gfx942 architectures.

* **Added data type support**.
    We added support for f8, bf8, xf32 data types on supporting architectures, and for bf16 in the HIP RTC
    environment.

* **Added support for the PyTorch kernel plugin**.
    We added awareness of `__HIP_NO_HALF_CONVERSIONS__` to support PyTorch users.

### TransferBench (beta)

* **Improved ordering control**.
    You can now set the thread block size (`BLOCK_SIZE`) and the thread block order (`BLOCK_ORDER`)
    in which thread blocks from different transfers are run when using a single stream.

* **Added comprehensive reports**.
    We modified individual transfers to report X Compute Clusters (XCC) ID when `SHOW_ITERATIONS`
    is set to 1.

* **Improved accuracy in result validation**.
    You can now validate results for each iteration instead of just once for all iterations.