## ROCm Version History
This file contains archived version history information for the [ROCm project](https://github.com/RadeonOpenCompute/ROCm)

### Current ROCm Version: 2.6
- [New features and enhancements in ROCm 2.5](#new-features-and-enhancements-in-rocm-25)
- [New features and enhancements in ROCm 2.4](#new-features-and-enhancements-in-rocm-24)
- [New features and enhancements in ROCm 2.3](#new-features-and-enhancements-in-rocm-23)
- [New features and enhancements in ROCm 2.2](#new-features-and-enhancements-in-rocm-22)
- [New features and enhancements in ROCm 2.1](#new-features-and-enhancements-in-rocm-21)
- [New features and enhancements in ROCm 2.0](#new-features-and-enhancements-in-rocm-20)
- [New features and enhancements in ROCm 1.9.2](#new-features-and-enhancements-in-rocm-192)
- [New features and enhancements in ROCm 1.9.2](#new-features-and-enhancements-in-rocm-192-1)
- [New features and enhancements in ROCm 1.9.1](#new-features-and-enhancements-in-rocm-191)
- [New features and enhancements in ROCm 1.9.0](#new-features-and-enhancements-in-rocm-190)
- [New features as of ROCm 1.8.3](#new-features-as-of-rocm-183)
- [New features as of ROCm 1.8](#new-features-as-of-rocm-18)
- [New Features as of ROCm 1.7](#new-features-as-of-rocm-17)
- [New Features as of ROCm 1.5](#new-features-as-of-rocm-15)

### New features and enhancements in ROCm 2.5

#### UCX 1.6 support
Support for UCX version 1.6 has been added.

#### BFloat16 GEMM in rocBLAS/Tensile
Software support for BFloat16 on Radeon Instinct MI50, MI60 has been added.  This includes:
- Mixed precision GEMM with BFloat16 input and output matrices, and all arithmetic in IEEE32 bit
- Input matrix values are converted from BFloat16 to IEEE32 bit, all arithmetic and accumulation is IEEE32 bit. Output values are rounded from IEEE32 bit to BFloat16
- Accuracy should be correct to 0.5 ULP

#### ROCm-SMI enhancements
CLI support for querying the memory size, driver version, and firmware version has been added to ROCm-smi.

#### [PyTorch] multi-GPU functional support (CPU aggregation/Data Parallel)
Multi-GPU support is enabled in PyTorch using Dataparallel path for versions of PyTorch built using the 06c8aa7a3bbd91cda2fd6255ec82aad21fa1c0d5 commit or later.

#### rocSparse optimization on Radeon Instinct MI50 and MI60
This release includes performance optimizations for csrsv routines in the rocSparse library.

#### [Thrust] Preview
Preview release for early adopters. rocThrust is a port of thrust, a parallel algorithm library. Thrust has been ported to the HIP/ROCm platform to use the rocPRIM library. The HIP ported library works on HIP/ROCm platforms.

Note: This library will replace https://github.com/ROCmSoftwarePlatform/thrust in a future release. The package for rocThrust (this library) currently conflicts with version 2.5 package of thrust. They should not be installed together.

#### Support overlapping kernel execution in same HIP stream
HIP API has been enhanced to allow independent kernels to run in parallel on the same stream.

#### AMD Infinity Fabric&#x2122; Link enablement
The ability to connect four Radeon Instinct MI60 or Radeon Instinct MI50 boards in one hive via AMD Infinity Fabric™ Link GPU interconnect technology has been added.
### New features and enhancements in ROCm 2.4

#### TensorFlow 2.0 support
ROCm 2.4 includes the enhanced compilation toolchain and a set of bug fixes to support TensorFlow 2.0 features natively

#### AMD Infinity Fabric&#x2122; Link enablement
ROCm 2.4 adds support to connect two Radeon Instinct MI60 or Radeon Instinct MI50 boards via AMD Infinity Fabric&#x2122; Link GPU interconnect technology.

### New features and enhancements in ROCm 2.3

#### Mem usage per GPU
Per GPU memory usage is added to rocm-smi.
Display information regarding used/total bytes for VRAM, visible VRAM and GTT, via the --showmeminfo flag

#### MIVisionX, v1.1 - ONNX
ONNX parser changes to adjust to new file formats

#### MIGraphX, v0.2
MIGraphX 0.2 supports the following new features:
* New Python API
* Support for additional ONNX operators and fixes that now enable a large set of Imagenet models
* Support for RNN Operators
* Support for multi-stream Execution
* [Experimental] Support for Tensorflow frozen protobuf files

See: [Getting-started:-using-the-new-features-of-MIGraphX-0.2](https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/wiki/Getting-started:-using-the-new-features-of-MIGraphX-0.2) for more details

#### MIOpen, v1.8 - 3d convolutions and int8
* This release contains full 3-D convolution support and int8 support for inference.
* Additionally, there are major updates in the performance database for major models including those found in Torchvision.

See: [MIOpen releases](https://github.com/ROCmSoftwarePlatform/MIOpen/releases)

#### Caffe2 -  mGPU support
Multi-gpu support is enabled for Caffe2.

#### rocTracer library, ROCm tracing API for collecting runtimes API and asynchronous GPU activity traces
HIP/HCC domains support is introduced in rocTracer library.

#### BLAS -  Int8 GEMM performance, Int8 functional and performance
Introduces support and performance optimizations for Int8 GEMM, implements TRSV support, and includes improvements and optimizations with Tensile.

#### Prioritized L1/L2/L3 BLAS (functional)
Functional implementation of BLAS L1/L2/L3 functions

#### BLAS - tensile optimization
Improvements and optimizations with tensile

#### MIOpen Int8 support
Support for int8

### New features and enhancements in ROCm 2.2

#### rocSparse Optimization on Vega20
Cache usage optimizations for csrsv (sparse triangular solve), coomv
(SpMV in COO format) and ellmv (SpMV in ELL format) are available.

#### DGEMM and DTRSM Optimization
Improved DGEMM performance for reduced matrix sizes (k=384, k=256)

#### Caffe2
Added support for multi-GPU training

### New features and enhancements in ROCm 2.1

#### RocTracer v1.0 preview release – 'rocprof' HSA runtime tracing and statistics support -
Supports HSA API tracing and HSA asynchronous GPU activity including kernels execution and memory copy

#### Improvements to ROCM-SMI tool -
Added support to show real-time PCIe bandwidth usage via the -b/--showbw flag

#### DGEMM Optimizations -
Improved DGEMM performance for large square and reduced matrix sizes (k=384, k=256)

### New features and enhancements in ROCm 2.0

#### Adds support for RHEL 7.6 / CentOS 7.6 and Ubuntu 18.04.1

#### Adds support for Vega 7nm, Polaris 12 GPUs

#### Introduces MIVisionX
* A comprehensive computer vision and machine intelligence libraries, utilities and applications bundled into a single toolkit.

#### Improvements to ROCm Libraries
* rocSPARSE & hipSPARSE
* rocBLAS with improved DGEMM efficiency on Vega 7nm

#### MIOpen
* This release contains general bug fixes and an updated performance database
* Group convolutions backwards weights performance has been improved
* RNNs now support fp16

#### Tensorflow multi-gpu and Tensorflow FP16 support for Vega 7nm
* TensorFlow v1.12 is enabled with fp16 support

#### PyTorch/Caffe2 with Vega 7nm Support
* fp16 support is enabled
* Several bug fixes and performance enhancements
* Known Issue: breaking changes are introduced in ROCm 2.0 which are not addressed upstream yet. Meanwhile, please continue to use ROCm fork at https://github.com/ROCmSoftwarePlatform/pytorch

#### Improvements to ROCProfiler tool
* Support for Vega 7nm

#### Support for hipStreamCreateWithPriority
* Creates a stream with the specified priority. It creates a stream on which enqueued kernels have a different priority for execution compared to kernels enqueued on normal priority streams. The priority could be higher or lower than normal priority streams.

#### OpenCL 2.0 support
* ROCm 2.0 introduces full support for kernels written in the OpenCL 2.0 C language on certain devices and systems.  Applications can detect this support by calling the “clGetDeviceInfo” query function with “parame_name” argument set to “CL_DEVICE_OPENCL_C_VERSION”.  In order to make use of OpenCL 2.0 C language features, the application must include the option “-cl-std=CL2.0” in options passed to the runtime API calls responsible for compiling or building device programs.  The complete specification for the OpenCL 2.0 C language can be obtained using the following link: https://www.khronos.org/registry/OpenCL/specs/opencl-2.0-openclc.pdf

#### Improved Virtual Addressing (48 bit VA) management for Vega 10 and later GPUs
* Fixes Clang AddressSanitizer and potentially other 3rd-party memory debugging tools with ROCm
* Small performance improvement on workloads that do a lot of memory management
* Removes virtual address space limitations on systems with more VRAM than system memory

#### Kubernetes support

### New features and enhancements in ROCm 1.9.2
#### RDMA(MPI) support on Vega 7nm
* Support ROCnRDMA based on Mellanox InfiniBand

#### Improvements to HCC
* Improved link time optimization

#### Improvements to ROCProfiler tool
* General bug fixes and implemented versioning APIs

### New features and enhancements in ROCm 1.9.2
#### RDMA(MPI) support on Vega 7nm
* Support ROCnRDMA based on Mellanox InfiniBand

#### Improvements to HCC
* Improved link time optimization

#### Improvements to ROCProfiler tool
* General bug fixes and implemented versioning APIs

#### Critical bug fixes

### New features and enhancements in ROCm 1.9.1
#### Added DPM support to Vega 7nm
* Dynamic Power Management feature is enabled on Vega 7nm.

#### Fix for 'ROCm profiling' that used to fail with a “Version mismatch between HSA runtime and libhsa-runtime-tools64.so.1” error

### New features and enhancements in ROCm 1.9.0

#### Preview for Vega 7nm
* Enables developer preview support for Vega 7nm

#### System Management Interface
* Adds support for the ROCm SMI (System Management Interface) library, which provides monitoring and management capabilities for AMD GPUs.

#### Improvements to HIP/HCC
* Support for gfx906
* Added deprecation warning for C++AMP.  This will be the last version of HCC supporting C++AMP.
* Improved optimization for global address space pointers passing into a GPU kernel
* Fixed several race conditions in the HCC runtime
* Performance tuning to the unpinned copy engine
* Several codegen enhancement fixes in the compiler backend

#### Preview for rocprof Profiling Tool
Developer preview (alpha) of profiling tool rocProfiler. It includes a command-line front-end, `rpl_run.sh`, which enables:
* Cmd-line tool for dumping public per kernel perf-counters/metrics and kernel timestamps
* Input file with counters list and kernels selecting parameters
* Multiple counters groups and app runs supported
* Output results in CSV format

The tool can be installed from the `rocprofiler-dev` package. It will be installed into: `/opt/rocm/bin/rpl_run.sh`

#### Preview for rocr Debug Agent rocr_debug_agent
The ROCr Debug Agent is a library that can be loaded by ROCm Platform Runtime to provide the following functionality:
* Print the state for wavefronts that report memory violation or upon executing a "s_trap 2" instruction.
* Allows SIGINT (`ctrl c`) or SIGTERM (`kill -15`) to print wavefront state of aborted GPU dispatches.
* It is enabled on Vega10 GPUs on ROCm1.9.

The ROCm1.9 release will install the ROCr Debug Agent library at `/opt/rocm/lib/librocr_debug_agent64.so`


#### New distribution support

* Binary package support for Ubuntu 18.04

#### ROCm 1.9 is ABI compatible with KFD in upstream Linux kernels.
Upstream Linux kernels support the following GPUs in these releases:
4.17: Fiji, Polaris 10, Polaris 11
4.18: Fiji, Polaris 10, Polaris 11, Vega10

Some ROCm features are not available in the upstream KFD:
* More system memory available to ROCm applications
* Interoperability between graphics and compute
* RDMA
* IPC

To try ROCm with an upstream kernel, install ROCm as normal, but do not install the rock-dkms package. Also add a udev rule to control `/dev/kfd` permissions:

```
    echo 'SUBSYSTEM=="kfd", KERNEL=="kfd", TAG+="uaccess", GROUP="video"' | sudo tee /etc/udev/rules.d/70-kfd.rules
```

### New features as of ROCm 1.8.3

* ROCm 1.8.3 is a minor update meant to fix compatibility issues on Ubuntu releases running kernel 4.15.0-33

### New features as of ROCm 1.8

#### DKMS driver installation

 * Debian packages are provided for DKMS on Ubuntu
 * RPM packages are provided for CentOS/RHEL 7.4 and 7.5 support
 * See the [ROCT-Thunk-Interface](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/tree/roc-1.8.x) and [ROCK-Kernel-Driver](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/tree/roc-1.8.x) for additional documentation on driver setup

#### New distribution support

 * Binary package support for Ubuntu 16.04 and 18.04
 * Binary package support for CentOS 7.4 and 7.5
 * Binary package support for RHEL 7.4 and 7.5

#### Improved OpenMPI via UCX support

 * UCX support for OpenMPI
 * ROCm RDMA

### New Features as of ROCm 1.7

#### DKMS driver installation

 * New driver installation uses Dynamic Kernel Module Support (DKMS)
 * Only amdkfd and amdgpu kernel modules are installed to support AMD hardware
 * Currently only Debian packages are provided for DKMS (no Fedora suport available)
 * See the [ROCT-Thunk-Interface](https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/tree/roc-1.7.x) and [ROCK-Kernel-Driver](https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/tree/roc-1.7.x) for additional documentation on driver setup

### New Features as of ROCm 1.5

#### Developer preview of the new OpenCL 1.2 compatible language runtime and compiler

 * OpenCL 2.0 compatible kernel language support with OpenCL 1.2 compatible
   runtime
 * Supports offline ahead of time compilation today;
   during the Beta phase we will add in-process/in-memory compilation.

#### Binary Package support for Ubuntu 16.04

#### Binary Package support for Fedora 24 is not currently available

#### Dropping binary package support for Ubuntu 14.04, Fedora 23

#### IPC support