diff --git a/.wordlist.txt b/.wordlist.txt index ba79181cb..849ce2ccc 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -13,6 +13,7 @@ AMDMIGraphX AMI AOCC AOMP +AOTriton APBDIS APIC APIs @@ -158,6 +159,7 @@ HWS Haswell Higgs Hyperparameters +ICD ICV IDE IDEs @@ -208,6 +210,7 @@ MiB MIGraphX MIOpen MIOpenGEMM +MIOpen's MIVisionX MLM MMA @@ -295,7 +298,9 @@ PipelineParallel PnP PowerEdge PowerShell +Profiler's PyPi +Pytest PyTorch Qcycles Qwen @@ -303,6 +308,7 @@ RAII RAS RCCL RDC +RDC's RDMA RDNA README @@ -342,6 +348,7 @@ SENDMSG SGPR SGPRs SHA +SHARK's SIGQUIT SIMD SIMDs @@ -521,6 +528,7 @@ devsel dimensionality disambiguates distro +dkms el embeddings enablement @@ -686,6 +694,7 @@ rocALUTION rocBLAS rocDecode rocFFT +rocJPEG rocLIB rocMLIR rocPRIM @@ -778,6 +787,7 @@ vectorize vectorized vectorizer vectorizes +virtualized vjxb voxel walkthrough diff --git a/RELEASE.md b/RELEASE.md index ea9f62b5a..fe3d40dbc 100644 --- a/RELEASE.md +++ b/RELEASE.md @@ -1,4 +1,16 @@ -# ROCm 6.2.4 release notes + + + + + + + + + + + + +# ROCm 6.3.0 release notes The release notes provide a summary of notable changes since the previous ROCm release. @@ -12,6 +24,8 @@ The release notes provide a summary of notable changes since the previous ROCm r - [ROCm known issues](#rocm-known-issues) +- [ROCm resolved issues](#rocm-resolved-issues) + - [ROCm upcoming changes](#rocm-upcoming-changes) ```{note} @@ -23,49 +37,175 @@ documentation to verify compatibility and system requirements. ## Release highlights -The following are notable new features and improvements in ROCm 6.2.4. For changes to individual components, see +The following are notable new features and improvements in ROCm 6.3.0. For changes to individual components, see [Detailed component changes](#detailed-component-changes). -#### ROCm documentation updates +### rocJPEG added -ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for -a wider variety of user needs and use cases. +ROCm 6.3.0 introduces the rocJPEG library to the ROCm software stack. rocJPEG is a high performance +JPEG decode SDK for AMD GPUs. For more information, see the [rocJPEG +documentation](https://rocm.docs.amd.com/projects/rocJPEG/en/docs-6.3.0/index.html). -* Added a new GPU cluster networking guide. See - [Cluster network performance validation for AMD Instinct accelerators](https://rocm.docs.amd.com/projects/gpu-cluster-networking/en/docs-6.2.4/index.html). - This documentation provides guidelines on validating network configurations - in single-node and multi-node environments to attain optimal speed and bandwidth - in AMD Instinct-powered clusters. +### ROCm Compute Profiler and ROCm Systems Profiler -* Updated the HIP runtime documentation. +These ROCm components have been renamed to reflect their new direction as part of the ROCm software +stack. - * Added a new section on how to use [HIP graphs](https://rocm.docs.amd.com/projects/HIP/en/docs-6.2.4/how-to/hipgraph.html). +- **ROCm Compute Profiler**, formerly Omniperf. For more information, see the [ROCm Compute Profiler + documentation](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/docs-6.3.0/index.html) and + [https://github.com/ROCm/rocprofiler-compute](https://github.com/ROCm/rocprofiler-compute) on GitHub. - * Added a new section about the [Stream ordered memory allocator (SOMA)](https://rocm.docs.amd.com/projects/HIP/en/docs-6.2.4/how-to/stream_ordered_allocator.html). +- **ROCm Systems Profiler**, formerly Omnitrace. For more information, see the [ROCm Systems Profiler + documentation](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/docs-6.3.0/index.html) and + [https://github.com/ROCm/rocprofiler-systems](https://github.com/ROCm/rocprofiler-systems) on GitHub. + For future compatibility, the Omnitrace project is available at [https://github.com/ROCm/omnitrace](https://github.com/ROCm/omnitrace). + See the [Omnitrace documentation](https://rocm.docs.amd.com/projects/omnitrace/en/latest/index.html). - * Updated the [Porting CUDA driver API](https://rocm.docs.amd.com/projects/HIP/en/docs-6.2.4/how-to/hip_porting_driver_api.html) section. + ```{note} + Update any references to the old binary names `omniperf` and `omnitrace` to + ensure compatibility with the new `rocprof-compute` and `rocprof-sys-*` binaries. + This might include updating environment variables, commands, and paths as + needed to avoid disruptions to your profiling or tracing workflows. -* Updated the [Post-installation instructions](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.4/install/post-install.html) - with guidance on using the `update-alternatives` utility and environment modules to help you manage multiple ROCm - versions and streamline PATH configuration. + See [ROCm Compute Profiler](#rocm-compute-profiler-3-0-0) and [ROCm Systems + Profiler](#rocm-systems-profiler-0-1-0). + ``` -* Updated the [LLM inference performance validation on AMD Instinct - MI300X](https://rocm.docs.amd.com/en/docs-6.2.4/how-to/performance-validation/mi300x/vllm-benchmark.html) - documentation with more detailed guidance, new models, and the `float8` data type. +### SHARK AI toolkit for high-speed inferencing and serving introduced + +SHARK is an open-source toolkit for high-performance serving of popular generative AI and large +language models. In its initial release, SHARK contains the [Shortfin high-performance serving +engine](https://github.com/nod-ai/shark-ai/tree/main/shortfin), which is the SHARK inferencing +library that includes example server applications for popular models. + +This initial release includes support for serving the Stable Diffusion XL model on AMD Instinct™ +MI300 devices using ROCm. See SHARK's [release +page](https://github.com/nod-ai/shark-ai/releases/tag/v3.0.0) on GitHub to get started. + +### PyTorch 2.4 support added + +ROCm 6.3.0 adds support for PyTorch 2.4. See the [Compatibility +matrix](https://rocm.docs.amd.com/en/docs-6.3.0/compatibility/compatibility-matrix.html#framework-support-compatibility-matrix) +for the complete list of PyTorch versions tested for compatibility with ROCm. + +### Flash Attention kernels in Triton and Composable Kernel (CK) added to Transformer Engine + +Composable Kernel-based and Triton-based Flash Attention kernels have been integrated into +Transformer Engine via the ROCm Composable Kernel and AOTriton libraries. The +Transformer Engine can now optionally select a flexible and optimized Attention +solution for AMD GPUs. For more information, see [Fused Attention Backends on +ROCm](https://github.com/ROCm/TransformerEngine/tree/dev?tab=readme-ov-file#fused-attention-backends-on-rocm) +on GitHub. + +### HIP compatibility + +HIP now includes the `hipStreamLegacy` API. It's equivalent to NVIDIA `cudaStreamLegacy`. For more +information, see [Global enum and +defines](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/reference/hip_runtime_api/global_defines_enums_structs_files/global_enum_and_defines.html#c.hipStreamLegacy) +in the HIP runtime API documentation. + +### Unload active amdgpu-dkms module without a system reboot + +On Instinct MI200 and MI300 systems, you can now unload the active `amdgpu-dkms` modules, and reinstall +and reload newer modules without a system reboot. If the new `dkms` package includes newer firmware +components, the driver will first reset the device and then load newer firmware components. + +### ROCm Offline Installer Creator updates + +The ROCm Offline Installer Creator 6.3 introduces a new feature to uninstall the previous version of +ROCm on the non-connected target system before installing a new version. This feature is only supported +on the Ubuntu distribution. See the [ROCm Offline Installer +Creator](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.0/install/rocm-offline-installer.html) +documentation for more information. + +### OpenCL ICD loader separated from ROCm + +The OpenCL ICD loader is no longer delivered as part of ROCm, and must be installed separately +as part of the [ROCm installation +process](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.0). For Ubuntu and RHEL +installations, the required package is installed as part of the setup described in +[Prerequisites](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.0/install/prerequisites.html). +In other supported Linux distributions like SUSE, the required package must be installed in separate steps, which are included in the installation instructions. + +Because the OpenCL path is now separate from the ROCm installation for versioned and multi-version +installations, you must manually define the `LD_LIBRARY_PATH` to point to the ROCm +installation library as described in the [Post-installation +instructions](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.0/install/post-install.html). +If the `LD_LIBRARY_PATH` is not set as needed for versioned or multi-version installations, OpenCL +applications like `clinfo` will fail to run and return an error. + +### ROCT Thunk Interface integrated into ROCr runtime + +The ROCT Thunk Interface package is now integrated into the ROCr runtime. As a result, the ROCT package +is no longer included as a separate package in the ROCm software stack. + +### ROCm documentation updates + +ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a +wider variety of user needs and use cases. + +- Documentation for Tensile is now available. Tensile is a library that creates + benchmark-driven backend implementations for GEMMs, serving primarily as a + backend component of rocBLAS. See the [Tensile + documentation](https://rocm.docs.amd.com/projects/Tensile/en/docs-6.3.0/src/index.html). + +- New documentation has been added to explain the advantages of enabling the IOMMU in passthrough + mode for Instinct accelerators and Radeon GPUs. See [Input-Output Memory Management + Unit](https://rocm.docs.amd.com/en/docs-6.3.0/conceptual/iommu.html). + +- The HIP documentation has been updated and includes the following new topics: + + - [What is HIP?](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/what_is_hip.html) + - [HIP environment variables](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/reference/env_variables.html) + - [Initialization](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/how-to/hip_runtime_api/initialization.html) + and [error handling](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/how-to/hip_runtime_api/error_handling.html) + - [Hardware features](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/reference/hardware_features.html) + - [Call stack](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/how-to/hip_runtime_api/call_stack.html) + - [External resource interoperability](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/how-to/hip_runtime_api/external_interop.html) + +- The following HIP documentation topics have been updated: + + - [HIP FAQ](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/faq.html) + - [Deprecated APIs](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/reference/deprecated_api_list.html) + - [Performance guidelines](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/how-to/performance_guidelines.html) + +- The following HIP documentation topics have been reorganized to improve usability: + + - [HIP documentation landing page](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/index.html) + - [HIP runtime API reference topics](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/reference/hip_runtime_api_reference.html) + - [Programming guide](https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.0/how-to/hip_runtime_api.html) ## Operating system and hardware support changes -ROCm 6.2.4 adds support for the [AMD Radeon PRO V710](https://www.amd.com/en/products/accelerators/radeon-pro/amd-radeon-pro-v710.html) GPU for compute workloads. See -[Supported GPUs](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.4/reference/system-requirements.html#supported-gpus) -for more information. +ROCm 6.3.0 adds support for the following operating system and kernel versions: -This release maintains the same operating system support as 6.2.2. +- Ubuntu 24.04.2 (kernel: 6.8 [GA], 6.11 [HWE]) +- Ubuntu 22.04.5 (kernel: 5.15 [GA], 6.8 [HWE]) +- RHEL 9.5 (kernel: 5.14.0) +- Oracle Linux 8.10 (kernel: 5.15.0) + +See installation instructions at [ROCm installation for +Linux](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.3.0/). + +ROCm 6.3.0 marks the end of support (EoS) for: + +- Ubuntu 24.04.1 +- Ubuntu 22.04.4 +- RHEL 9.3 +- RHEL 8.9 +- Oracle Linux 8.9 + +Hardware support remains unchanged in this release. + +See the [Compatibility +matrix](https://rocm.docs.amd.com/en/docs-6.3.0/compatibility/compatibility-matrix.html) +for more information about operating system and hardware compatibility. ## ROCm components -The following table lists the versions of ROCm components for ROCm 6.2.4, including any version changes from 6.2.2 to 6.2.4. - -Click the component's updated version to go to a detailed list of its changes. Click to go to the component's source code on GitHub. +The following table lists the versions of ROCm components for ROCm 6.3.0, including any version +changes from 6.2.4 to 6.3.0. Click the component's updated version to go to a list of its changes. +Click {fab}`github` to go to the component's source code on GitHub.
| Libraries | -Machine learning and computer vision | -Composable Kernel - | +Libraries | +Machine learning and computer vision | +Composable Kernel | 1.1.0 | -+ | ||
|---|---|---|---|---|---|---|---|---|---|
| MIGraphX | -2.10 | -+ | MIGraphX | +2.11.0 | +|||||
| MIOpen | -3.2.0 | -+ | MIOpen | +3.2.0 ⇒ 3.3.0 | +|||||
| MIVisionX | -3.0.0 | -+ | MIVisionX | +3.0.0 ⇒ 3.1.0 | +|||||
| rocAL | -2.0.0 | -+ | rocAL | +2.0.0 ⇒ 2.1.0 | +|||||
| rocDecode | +rocDecode | +0.6.0 ⇒ 0.8.0 | ++ | ||||||
| rocJPEG | 0.6.0 | -+ | |||||||
| rocPyDecode | -0.1.0 | -+ | rocPyDecode | +0.1.0 ⇒ 0.2.0 | +|||||
| RPP | -1.8.0 | -+ | RPP | +1.8.0 ⇒ 1.9.1 | +|||||
| Communication | -RCCL | -2.20.5 | -+ | RCCL | +2.20.5 ⇒ 2.21.5 | +||||
| Math | -hipBLAS | -2.2.0 | -+ | hipBLAS | +2.2.0 ⇒ 2.3.0 | +||||
| hipBLASLt | -0.8.0 | -+ | hipBLASLt | +0.8.0 ⇒ 0.10.0 | +|||||
| hipFFT | -1.0.15 ⇒ 1.0.16 | -+ | hipFFT | +1.0.16 ⇒ 1.0.17 | +|||||
| hipfort | -0.4.0 | -+ | hipfort | +0.4.0 ⇒ 0.5.0 | +|||||
| hipRAND | -2.11.0 ⇒ 2.11.1 | -+ | hipRAND | +2.11.1 ⇒ 2.11.0 * + | |||||
| hipSOLVER | -2.2.0 | -+ | hipSOLVER | +2.2.0 ⇒ 2.3.0 | +|||||
| hipSPARSE | -3.1.1 | -+ | hipSPARSE | +3.1.1 ⇒ 3.1.2 | +|||||
| hipSPARSELt | -0.2.1 | -+ | hipSPARSELt | +0.2.1 ⇒ 0.2.2 | +|||||
| rocALUTION | -3.2.0 ⇒ 3.2.1 | -+ | rocALUTION | +3.2.0 ⇒ 3.2.1 | +|||||
| rocBLAS | -4.2.1 ⇒ 4.2.4 | -+ | rocBLAS | +4.2.4 ⇒ 4.3.0 | +|||||
| rocFFT | -1.0.29 ⇒ 1.0.30 | -+ | rocFFT | +1.0.30 ⇒ 1.0.31 | +|||||
| rocRAND | -3.1.0 ⇒ 3.1.1 | -+ | rocRAND | +3.1.1 ⇒ 3.2.0 | +|||||
| rocSOLVER | -3.26.0 ⇒ 3.26.2 | -+ | rocSOLVER | +3.26.2 ⇒ 3.27.0 | +|||||
| rocSPARSE | -3.2.0 ⇒ 3.2.1 | -+ | rocSPARSE | +3.2.1 ⇒ 3.3.0 | +|||||
| rocWMMA | -1.5.0 | -+ | rocWMMA | +1.5.0 ⇒ 1.6.0 | +|||||
| Tensile | -4.41.0 | -+ | Tensile | +4.41.0 ⇒ 4.42.0 | +|||||
| Primitives | -hipCUB | -3.2.0 ⇒ 3.2.1 | -+ | hipCUB | +3.2.1 ⇒ 3.3.0 | +||||
| hipTensor | -1.3.0 | -+ | hipTensor | +1.3.0 ⇒ 1.4.0 | +|||||
| rocPRIM | -3.2.1 ⇒ 3.2.2 | -+ | rocPRIM | +3.2.2 ⇒ 3.3.0 | +|||||
| rocThrust | -3.1.0 ⇒ 3.1.1 | -+ | rocThrust | +3.1.1 ⇒ 3.3.0 | +|||||
| Tools | -System management | -AMD SMI | -24.6.3 ⇒ 24.6.3 | -+ | Tools | +System management | +AMD SMI | +24.6.3 ⇒ 24.7.1 | +|
| rocminfo | +ROCm Data Center Tool | +0.3.0 ⇒ 0.3.0 | ++ | ||||||
| rocminfo | 1.0.0 | -+ | |||||||
| ROCm Data Center Tool | -0.3.0 | -+ | ROCm SMI | +7.3.0 ⇒ 7.4.0 | +|||||
| ROCm SMI | -7.3.0 | -- | |||||||
| ROCm Validation Suite | -1.0.0 | -+ | ROCmValidationSuite | +1.0.0 ⇒ 1.1.0 | +|||||
| Performance | -Omniperf | -2.0.1 | -- | ||||||
| Omnitrace | -1.11.2 | -- | |||||||
| ROCm Bandwidth + | ROCm Bandwidth Test | 1.4.0 | -|||||||
| ROCProfiler | -2.0.0 | -ROCm Compute Profiler | +2.0.1 ⇒ 3.0.0 | +||||||
| ROCprofiler-SDK | -0.4.0 | -ROCm Systems Profiler | +1.11.2 ⇒ 0.1.0 | ++ | |||||
| ROCProfiler | +2.0.0 ⇒ 2.0.0 | ++ | |||||||
| ROCprofiler-SDK | +0.4.0 ⇒ 0.5.0 | +||||||||
| ROCTracer | +ROCTracer | 4.1.0 | -|||||||
| Development | -HIPIFY | -18.0.0 | -HIPIFY | +18.0.0 ⇒ 18.0.0 | +|||||
| ROCdbgapi | -0.76.0 | -ROCdbgapi | +0.76.0 ⇒ 0.77.0 | +||||||
| ROCm CMake | -0.13.0 | -ROCm CMake | +0.14.0 | +||||||
| ROCm Debugger (ROCgdb) + | ROCm Debugger (ROCgdb) | -14.2 | -14.2 ⇒ 15.2 | +||||||
| ROCr Debug Agent + | ROCr Debug Agent | 2.0.3 | -|||||||
| Compilers | -HIPCC | +HIPCC | 1.1.1 | -||||||
| llvm-project | -18.0.0 | -llvm-project | +18.0.0 ⇒ 18.0.0 | +||||||
| Runtimes | -HIP | -6.2.4 | -+ | HIP | +6.2.4 ⇒ 6.3.0 | +||||
| ROCr Runtime | +ROCr Runtime | 1.14.0 | -+ | ||||||