mirror of
https://github.com/ROCm/ROCm.git
synced 2026-01-10 07:08:08 -05:00
Spellcheck fixes in release notes templates (#3526)
* fix spelling in 5.4.x templates * add to wordlist * update templates update wordlist * remove extra_components rm extra_components * fix spelling
This commit is contained in:
File diff suppressed because it is too large
Load Diff
@@ -16,7 +16,7 @@ long long int wall_clock64();
|
||||
```
|
||||
|
||||
It returns wall clock count at a constant frequency on the device, which can be queried via HIP API with
|
||||
the hipDeviceAttributeWallClockRate attribute of the device in the HIP application code.
|
||||
the `hipDeviceAttributeWallClockRate` attribute of the device in the HIP application code.
|
||||
|
||||
Example:
|
||||
|
||||
@@ -25,7 +25,7 @@ int wallClkRate = 0; //in kilohertz
|
||||
+HIPCHECK(hipDeviceGetAttribute(&wallClkRate, hipDeviceAttributeWallClockRate, deviceId));
|
||||
```
|
||||
|
||||
Where hipDeviceAttributeWallClockRate is a device attribute.
|
||||
Where `hipDeviceAttributeWallClockRate` is a device attribute.
|
||||
|
||||
:::{note}
|
||||
The wall clock frequency is a per-device attribute.
|
||||
|
||||
@@ -16,7 +16,7 @@ binary counterpart. No user action is required. Once these are available, users
|
||||
|
||||
#### `hipcc` options deprecation
|
||||
|
||||
The following hipcc options are being deprecated and will be removed in a future release:
|
||||
The following `hipcc` options are being deprecated and will be removed in a future release:
|
||||
|
||||
* The `--amdgpu-target` option is being deprecated, and user must use the `–offload-arch` option to
|
||||
specify the GPU architecture.
|
||||
|
||||
@@ -1,29 +1,3 @@
|
||||
|
||||
The release notes provide a comprehensive summary of changes since the previous ROCm release.
|
||||
|
||||
- [Release highlights](release-highlights)
|
||||
|
||||
- [Operating system and hardware support changes](operating-system-and-hardware-support-changes)
|
||||
|
||||
- [ROCm components versioning](rocm-components)
|
||||
|
||||
- [Detailed component changes](detailed-component-changes)
|
||||
|
||||
- [ROCm known issues](rocm-known-issues)
|
||||
|
||||
- [ROCm upcoming changes](rocm-upcoming-changes)
|
||||
|
||||
The [Compatibility matrix](https://rocm.docs.amd.com/en/latest/release/docs/6.2.0/compatibility/compatibility-matrix)
|
||||
provides an overview of operating system, hardware, ecosystem, and ROCm component support across ROCm releases.
|
||||
|
||||
Release notes for previous ROCm releases are available in earlier versions of the documentation.
|
||||
See the [ROCm documentation release history](https://rocm.docs.amd.com/en/latest/release/versions).
|
||||
|
||||
## Release highlights
|
||||
|
||||
This section introduces notable new features and improvements in ROCm 6.2. See the
|
||||
[Detailed component changes](#detailed-component-changes) for individual component changes.
|
||||
|
||||
### New components
|
||||
|
||||
ROCm 6.2.0 introduces the following new components to the ROCm software stack.
|
||||
@@ -41,7 +15,7 @@ ROCm 6.2.0 introduces the following new components to the ROCm software stack.
|
||||
- **rocPyDecode** -- A tool to access rocDecode APIs in Python. It connects Python and C/C++ libraries,
|
||||
enabling function calling and data passing between the two languages. The `rocpydecode.so` library, a wrapper, uses
|
||||
rocDecode APIs written primarily in C/C++ within Python. For more information, see
|
||||
[rocPyDecode](https://rocm.docs.amd.com/projects/rocpydecode/en/latest).
|
||||
[rocPyDecode](https://rocm.docs.amd.com/projects/rocPyDecode/en/latest).
|
||||
|
||||
- **ROCprofiler-SDK** -- ROCprofiler-SDK is a profiling and tracing library for HIP and ROCm applications on AMD ROCm software
|
||||
used to identify application performance bottlenecks and optimize their performance. The new APIs add restrictions for more
|
||||
@@ -75,14 +49,14 @@ multiple unique configurations for use when installing ROCm on a target. Other n
|
||||
* Resolution and inclusion of dependency packages for offline installation
|
||||
|
||||
For more information, see
|
||||
[ROCm Offline Installer Creator](https://rocm.docs.amd.com/projects/rocm-install-on-linux/en/latest/install/rocm-offline-installer.html).
|
||||
[ROCm Offline Installer Creator](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/install/rocm-offline-installer.html).
|
||||
|
||||
### Math libraries default to Clang instead of HIPCC
|
||||
|
||||
The default compiler used to build the math libraries on Linux changes from `hipcc` to `amdclang++`.
|
||||
Appropriate compiler flags are added to ensure these compilations build correctly. This change only applies when
|
||||
building the libraries. Applications using the libraries can continue to be compiled using `hipcc` or `amdclang++` as
|
||||
described in [ROCm compiler reference](https://rocm.docs.amd.com/projects/llvm-project/en/latest/reference/rocmcc.html).
|
||||
described in [ROCm compiler reference](https://rocm.docs.amd.com/projects/llvm-project/en/docs-6.2.0/reference/rocmcc.html).
|
||||
The math libraries can also be built with `hipcc` using any of the previously available methods (for example, the `CXX`
|
||||
environment variable, the `CMAKE_CXX_COMPILER` CMake variable, and so on). This change shouldn't affect performance or
|
||||
functionality.
|
||||
@@ -95,27 +69,27 @@ This section highlights updates to supported deep learning frameworks and notabl
|
||||
|
||||
ROCm 6.2.0 supports PyTorch versions 2.2 and 2.3 and TensorFlow version 2.16.
|
||||
|
||||
See [Installing PyTorch for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/pytorch-install.html)
|
||||
and [Installing TensorFlow for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/tensorflow-install.html)
|
||||
See [Installing PyTorch for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/how-to/3rd-party/pytorch-install.html)
|
||||
and [Installing TensorFlow for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/how-to/3rd-party/tensorflow-install.html)
|
||||
for installation instructions.
|
||||
|
||||
Refer to the
|
||||
[Third-party support matrix](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/3rd-party-support-matrix.html#deep-learning)
|
||||
[Third-party support matrix](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/reference/3rd-party-support-matrix.html#deep-learning)
|
||||
for a comprehensive list of third-party frameworks and libraries supported by ROCm.
|
||||
|
||||
#### Optimized framework support for OpenXLA
|
||||
|
||||
PyTorch for ROCm and TensorFlow for ROCm now provide native support for OpenXLA. OpenXLA is an open-source ML compiler
|
||||
ecosystem that enables developers to compile and optimize models from all leading ML frameworks. For more information, see
|
||||
[Installing PyTorch for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/pytorch-install.html)
|
||||
and [Installing TensorFlow for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/tensorflow-install.html).
|
||||
[Installing PyTorch for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/how-to/3rd-party/pytorch-install.html)
|
||||
and [Installing TensorFlow for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/how-to/3rd-party/tensorflow-install.html).
|
||||
|
||||
#### PyTorch support for Autocast (automatic mixed precision)
|
||||
|
||||
PyTorch now supports Autocast for recurrent neural networks (RNNs) on ROCm. This can help to reduce computational
|
||||
workloads and improve performance. Based on the information about the magnitude of values, Autocast can substitute the
|
||||
original `float32` linear layers and convolutions with their `float16` or `bfloat16` variants. For more information, see
|
||||
[Automatic mixed precision](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/train-a-model#automatic-mixed-precision-amp).
|
||||
[Automatic mixed precision](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/rocm-for-ai/train-a-model.html#automatic-mixed-precision-amp).
|
||||
|
||||
#### Memory savings for bitsandbytes model quantization
|
||||
|
||||
@@ -125,9 +99,9 @@ ROCm 6.2.0 introduces the following bitsandbytes changes:
|
||||
|
||||
- `Int8` matrix multiplication is enabled, and it includes the following functions:
|
||||
- `extract-outliers` – extracts rows and columns that have outliers in the inputs. They’re later used for matrix multiplication without quantization.
|
||||
- `transform` – row-to-column and column-to-row transformations are enabled, along with transpose operations. These are used before and after matmul computation.
|
||||
- `transform` – row-to-column and column-to-row transformations are enabled, along with transpose operations. These are used before and after `matmul` computation.
|
||||
- `igemmlt` – new function for GEMM computation A*B^T. It uses
|
||||
[hipblasLtMatMul](https://rocm.docs.amd.com/projects/hipBLASLt/en/latest/api-reference.html#hipblasltmatmul) and performs 8-bit GEMM operations.
|
||||
[hipblasLtMatMul](https://rocm.docs.amd.com/projects/hipBLASLt/en/docs-6.2.0/api-reference.html#hipblasltmatmul) and performs 8-bit GEMM operations.
|
||||
- `dequant_mm` – dequantizes output matrix to original data type using scaling factors from vector-wise quantization.
|
||||
- Blockwise quantization – input tensors are quantized for a fixed block size.
|
||||
- 4-bit quantization and dequantization functions – normalized `Float4` quantization, quantile estimation, and quantile quantization functions are enabled.
|
||||
@@ -138,7 +112,7 @@ These functions are included in bitsandbytes. They are not part of ROCm. However
|
||||
features to run them.
|
||||
```
|
||||
|
||||
For more information, see [Model quantization techniques](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/model-quantization.html).
|
||||
For more information, see [Model quantization techniques](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/llm-fine-tuning-optimization/model-quantization.html).
|
||||
|
||||
#### Improved vLLM support
|
||||
|
||||
@@ -146,14 +120,10 @@ ROCm 6.2.0 enhances vLLM support for inference on AMD Instinct accelerators, add
|
||||
capabilities for `FP16`/`BF16` precision for LLMs, and `FP8` support for Llama.
|
||||
ROCm 6.2.0 adds support for the following vLLM features:
|
||||
|
||||
- MP:
|
||||
|
||||
Multi-GPU execution. Choose between MP and Ray using a flag. To set it to MP,
|
||||
- MP: Multi-GPU execution. Choose between MP and Ray using a flag. To set it to MP,
|
||||
use `--distributed-executor-backed=mp`. The default depends on the commit in flux.
|
||||
|
||||
- FP8 KV cache:
|
||||
|
||||
Enhances computational efficiency and performance by significantly reducing memory usage and bandwidth requirements.
|
||||
- FP8 KV cache: Enhances computational efficiency and performance by significantly reducing memory usage and bandwidth requirements.
|
||||
The QUARK quantizer currently only supports Llama.
|
||||
|
||||
- Triton Flash Attention:
|
||||
@@ -166,7 +136,7 @@ ROCm 6.2.0 adds support for the following vLLM features:
|
||||
Improved optimization and tuning of GEMMs. It requires Docker with PyTorch 2.3 or later.
|
||||
|
||||
For more information about enabling these features, see
|
||||
[vLLM inference](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.html#vllm-inference).
|
||||
[vLLM inference](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.html#vllm-inference).
|
||||
|
||||
ROCm has a vLLM branch for experimental features. This includes performance improvements, accuracy, and correctness testing.
|
||||
These features include:
|
||||
@@ -178,44 +148,44 @@ These features include:
|
||||
computation in large-scale models. This benefits all workloads in `FP16` configurations.
|
||||
|
||||
To enable these experimental new features, see
|
||||
[vLLM inference](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.html#vllm-inference).
|
||||
[vLLM inference](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.html#vllm-inference).
|
||||
Use the `rocm/vllm` branch when cloning the GitHub repo. The `vllm/ROCm_performance.md` document outlines
|
||||
all the accessible features, and the `vllm/Dockerfile.rocm` file can be used.
|
||||
|
||||
### Enhanced performance tuning on AMD Instinct accelerators
|
||||
|
||||
ROCm is pretuned for high-performance computing workloads including large language models, generative AI, and scientific computing.
|
||||
ROCm is pre-tuned for high-performance computing workloads including large language models, generative AI, and scientific computing.
|
||||
The ROCm documentation provides comprehensive guidance on configuring your system for AMD Instinct accelerators. It includes
|
||||
detailed instructions on system settings and application tuning suggestions to help you fully leverage the capabilities of these
|
||||
accelerators for optimal performance. For more information, see
|
||||
[AMD MI300X tuning guides](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/index.html) and
|
||||
[AMD MI300A system optimization](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/mi300x.html).
|
||||
[AMD MI300X tuning guides](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/tuning-guides/mi300x/index.html) and
|
||||
[AMD MI300A system optimization](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/system-optimization/mi300x.html).
|
||||
|
||||
### Removed clang-ocl
|
||||
|
||||
As of version 6.2, ROCm no longer provides the `clang-ocl` package. The project will be archived in the future.
|
||||
As of version 6.2, ROCm no longer provides the `clang-ocl` package.
|
||||
See the [clang-ocl README](https://github.com/ROCm/clang-ocl).
|
||||
|
||||
### ROCm documentation changes
|
||||
|
||||
The documentation for the ROCm components has been reorganized and reformatted in a standard look and feel. This
|
||||
improves the usability and readability of the documentation. For more information about the ROCm components, see
|
||||
[What is ROCm?](https://rocm.docs.amd.com/en/latest/what-is-rocm.html).
|
||||
[What is ROCm?](https://rocm.docs.amd.com/en/docs-6.2.0/what-is-rocm.html).
|
||||
|
||||
Since the release of ROCm 6.1, the documentation has added some key topics including:
|
||||
|
||||
- [AMD Instinct MI300X workload tuning guide](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html)
|
||||
- [AMD Instinct MI300X system tuning guide](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/mi300x.html)
|
||||
- [AMD Instinct MI300A system tuning guide](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/mi300a.html)
|
||||
- [Using ROCm for AI](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/index.html)
|
||||
- [Using ROCm for HPC](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-hpc/index.html)
|
||||
- [Fine-tuning LLMs and inference optimization](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/index.html)
|
||||
- [LLVM reference documentation](https://rocm.docs.amd.com/projects/llvm-project/en/latest/)
|
||||
- [AMD Instinct MI300X workload tuning guide](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/tuning-guides/mi300x/workload.html)
|
||||
- [AMD Instinct MI300X system tuning guide](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/system-optimization/mi300x.html)
|
||||
- [AMD Instinct MI300A system tuning guide](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/system-optimization/mi300a.html)
|
||||
- [Using ROCm for AI](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/rocm-for-ai/index.html)
|
||||
- [Using ROCm for HPC](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/rocm-for-hpc/index.html)
|
||||
- [Fine-tuning LLMs and inference optimization](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/llm-fine-tuning-optimization/index.html)
|
||||
- [LLVM reference documentation](https://rocm.docs.amd.com/projects/llvm-project/en/docs-6.2.0/)
|
||||
|
||||
The following topics have been significantly improved, expanded, or both:
|
||||
|
||||
- [HIP programming manual](https://rocm.docs.amd.com/projects/HIP/en/latest/)
|
||||
- [Compatibility matrix](https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html)
|
||||
- [HIP documentation](https://rocm.docs.amd.com/projects/HIP/en/docs-6.2.0/)
|
||||
- [Compatibility matrix](https://rocm.docs.amd.com/en/docs-6.2.0/compatibility/compatibility-matrix.html)
|
||||
|
||||
```{note}
|
||||
All ROCm projects are open source and available on GitHub. To contribute to ROCm documentation, see the
|
||||
|
||||
@@ -1,4 +1,3 @@
|
||||
|
||||
## Operating system and hardware support changes
|
||||
|
||||
ROCm 6.2.0 adds support for the following operating system and kernel versions.
|
||||
@@ -23,5 +22,5 @@ ROCm 6.2.0 marks the end of support (EoS) for:
|
||||
|
||||
ROCm 6.2.0 has been tested against pre-release Ubuntu 22.04.5 (kernel: 6.5 [HWE]).
|
||||
|
||||
See the [Compatibility matrix](https://rocm-stg.amd.com/en/docs/6.2.0/compatibility/compatibility-matrix.html) for an
|
||||
See the [Compatibility matrix](https://rocm.docs.amd.com/en/docs-6.2.0/compatibility/compatibility-matrix.html) for an
|
||||
overview of supported operating systems and hardware architectures.
|
||||
|
||||
@@ -1,9 +1,3 @@
|
||||
|
||||
## ROCm known issues
|
||||
|
||||
ROCm known issues are noted on {fab}`github` [GitHub](https://github.com/ROCm/ROCm/labels/Verified%20Issue). For known
|
||||
issues related to individual components, review the [Detailed component changes](detailed-component-changes).
|
||||
|
||||
### Default processor affinity behavior for helper threads
|
||||
|
||||
Processor affinity is a critical setting to ensure that ROCm helper threads run on the correct cores. By default, ROCm
|
||||
@@ -98,4 +92,3 @@ functionality provided by the closed-source compiler should transition to the op
|
||||
Once the `rocm-llvm-alt` package is removed, any compilation requesting functionality provided by
|
||||
the closed-source compiler will result in a Clang warning: "*[AMD] proprietary optimization compiler
|
||||
has been removed*".
|
||||
|
||||
|
||||
Reference in New Issue
Block a user