Spellcheck fixes in release notes templates (#3526)

* fix spelling in 5.4.x templates

* add to wordlist

* update templates

update wordlist

* remove extra_components

rm extra_components

* fix spelling
This commit is contained in:
Peter Park
2024-08-08 00:13:08 -04:00
committed by GitHub
parent f47afb7c66
commit 9c874ce984
8 changed files with 81 additions and 1389 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -16,7 +16,7 @@ long long int wall_clock64();
```
It returns wall clock count at a constant frequency on the device, which can be queried via HIP API with
the hipDeviceAttributeWallClockRate attribute of the device in the HIP application code.
the `hipDeviceAttributeWallClockRate` attribute of the device in the HIP application code.
Example:
@@ -25,7 +25,7 @@ int wallClkRate = 0; //in kilohertz
+HIPCHECK(hipDeviceGetAttribute(&wallClkRate, hipDeviceAttributeWallClockRate, deviceId));
```
Where hipDeviceAttributeWallClockRate is a device attribute.
Where `hipDeviceAttributeWallClockRate` is a device attribute.
:::{note}
The wall clock frequency is a per-device attribute.

View File

@@ -16,7 +16,7 @@ binary counterpart. No user action is required. Once these are available, users
#### `hipcc` options deprecation
The following hipcc options are being deprecated and will be removed in a future release:
The following `hipcc` options are being deprecated and will be removed in a future release:
* The `--amdgpu-target` option is being deprecated, and user must use the `offload-arch` option to
specify the GPU architecture.

View File

@@ -1,29 +1,3 @@
The release notes provide a comprehensive summary of changes since the previous ROCm release.
- [Release highlights](release-highlights)
- [Operating system and hardware support changes](operating-system-and-hardware-support-changes)
- [ROCm components versioning](rocm-components)
- [Detailed component changes](detailed-component-changes)
- [ROCm known issues](rocm-known-issues)
- [ROCm upcoming changes](rocm-upcoming-changes)
The [Compatibility matrix](https://rocm.docs.amd.com/en/latest/release/docs/6.2.0/compatibility/compatibility-matrix)
provides an overview of operating system, hardware, ecosystem, and ROCm component support across ROCm releases.
Release notes for previous ROCm releases are available in earlier versions of the documentation.
See the [ROCm documentation release history](https://rocm.docs.amd.com/en/latest/release/versions).
## Release highlights
This section introduces notable new features and improvements in ROCm 6.2. See the
[Detailed component changes](#detailed-component-changes) for individual component changes.
### New components
ROCm 6.2.0 introduces the following new components to the ROCm software stack.
@@ -41,7 +15,7 @@ ROCm 6.2.0 introduces the following new components to the ROCm software stack.
- **rocPyDecode** -- A tool to access rocDecode APIs in Python. It connects Python and C/C++ libraries,
enabling function calling and data passing between the two languages. The `rocpydecode.so` library, a wrapper, uses
rocDecode APIs written primarily in C/C++ within Python. For more information, see
[rocPyDecode](https://rocm.docs.amd.com/projects/rocpydecode/en/latest).
[rocPyDecode](https://rocm.docs.amd.com/projects/rocPyDecode/en/latest).
- **ROCprofiler-SDK** -- ROCprofiler-SDK is a profiling and tracing library for HIP and ROCm applications on AMD ROCm software
used to identify application performance bottlenecks and optimize their performance. The new APIs add restrictions for more
@@ -75,14 +49,14 @@ multiple unique configurations for use when installing ROCm on a target. Other n
* Resolution and inclusion of dependency packages for offline installation
For more information, see
[ROCm Offline Installer Creator](https://rocm.docs.amd.com/projects/rocm-install-on-linux/en/latest/install/rocm-offline-installer.html).
[ROCm Offline Installer Creator](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/install/rocm-offline-installer.html).
### Math libraries default to Clang instead of HIPCC
The default compiler used to build the math libraries on Linux changes from `hipcc` to `amdclang++`.
Appropriate compiler flags are added to ensure these compilations build correctly. This change only applies when
building the libraries. Applications using the libraries can continue to be compiled using `hipcc` or `amdclang++` as
described in [ROCm compiler reference](https://rocm.docs.amd.com/projects/llvm-project/en/latest/reference/rocmcc.html).
described in [ROCm compiler reference](https://rocm.docs.amd.com/projects/llvm-project/en/docs-6.2.0/reference/rocmcc.html).
The math libraries can also be built with `hipcc` using any of the previously available methods (for example, the `CXX`
environment variable, the `CMAKE_CXX_COMPILER` CMake variable, and so on). This change shouldn't affect performance or
functionality.
@@ -95,27 +69,27 @@ This section highlights updates to supported deep learning frameworks and notabl
ROCm 6.2.0 supports PyTorch versions 2.2 and 2.3 and TensorFlow version 2.16.
See [Installing PyTorch for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/pytorch-install.html)
and [Installing TensorFlow for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/tensorflow-install.html)
See [Installing PyTorch for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/how-to/3rd-party/pytorch-install.html)
and [Installing TensorFlow for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/how-to/3rd-party/tensorflow-install.html)
for installation instructions.
Refer to the
[Third-party support matrix](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/3rd-party-support-matrix.html#deep-learning)
[Third-party support matrix](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/reference/3rd-party-support-matrix.html#deep-learning)
for a comprehensive list of third-party frameworks and libraries supported by ROCm.
#### Optimized framework support for OpenXLA
PyTorch for ROCm and TensorFlow for ROCm now provide native support for OpenXLA. OpenXLA is an open-source ML compiler
ecosystem that enables developers to compile and optimize models from all leading ML frameworks. For more information, see
[Installing PyTorch for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/pytorch-install.html)
and [Installing TensorFlow for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/tensorflow-install.html).
[Installing PyTorch for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/how-to/3rd-party/pytorch-install.html)
and [Installing TensorFlow for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/how-to/3rd-party/tensorflow-install.html).
#### PyTorch support for Autocast (automatic mixed precision)
PyTorch now supports Autocast for recurrent neural networks (RNNs) on ROCm. This can help to reduce computational
workloads and improve performance. Based on the information about the magnitude of values, Autocast can substitute the
original `float32` linear layers and convolutions with their `float16` or `bfloat16` variants. For more information, see
[Automatic mixed precision](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/train-a-model#automatic-mixed-precision-amp).
[Automatic mixed precision](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/rocm-for-ai/train-a-model.html#automatic-mixed-precision-amp).
#### Memory savings for bitsandbytes model quantization
@@ -125,9 +99,9 @@ ROCm 6.2.0 introduces the following bitsandbytes changes:
- `Int8` matrix multiplication is enabled, and it includes the following functions:
- `extract-outliers` extracts rows and columns that have outliers in the inputs. Theyre later used for matrix multiplication without quantization.
- `transform` row-to-column and column-to-row transformations are enabled, along with transpose operations. These are used before and after matmul computation.
- `transform` row-to-column and column-to-row transformations are enabled, along with transpose operations. These are used before and after `matmul` computation.
- `igemmlt` new function for GEMM computation A*B^T. It uses
[hipblasLtMatMul](https://rocm.docs.amd.com/projects/hipBLASLt/en/latest/api-reference.html#hipblasltmatmul) and performs 8-bit GEMM operations.
[hipblasLtMatMul](https://rocm.docs.amd.com/projects/hipBLASLt/en/docs-6.2.0/api-reference.html#hipblasltmatmul) and performs 8-bit GEMM operations.
- `dequant_mm` dequantizes output matrix to original data type using scaling factors from vector-wise quantization.
- Blockwise quantization input tensors are quantized for a fixed block size.
- 4-bit quantization and dequantization functions normalized `Float4` quantization, quantile estimation, and quantile quantization functions are enabled.
@@ -138,7 +112,7 @@ These functions are included in bitsandbytes. They are not part of ROCm. However
features to run them.
```
For more information, see [Model quantization techniques](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/model-quantization.html).
For more information, see [Model quantization techniques](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/llm-fine-tuning-optimization/model-quantization.html).
#### Improved vLLM support
@@ -146,14 +120,10 @@ ROCm 6.2.0 enhances vLLM support for inference on AMD Instinct accelerators, add
capabilities for `FP16`/`BF16` precision for LLMs, and `FP8` support for Llama.
ROCm 6.2.0 adds support for the following vLLM features:
- MP:
Multi-GPU execution. Choose between MP and Ray using a flag. To set it to MP,
- MP: Multi-GPU execution. Choose between MP and Ray using a flag. To set it to MP,
use `--distributed-executor-backed=mp`. The default depends on the commit in flux.
- FP8 KV cache:
Enhances computational efficiency and performance by significantly reducing memory usage and bandwidth requirements.
- FP8 KV cache: Enhances computational efficiency and performance by significantly reducing memory usage and bandwidth requirements.
The QUARK quantizer currently only supports Llama.
- Triton Flash Attention:
@@ -166,7 +136,7 @@ ROCm 6.2.0 adds support for the following vLLM features:
Improved optimization and tuning of GEMMs. It requires Docker with PyTorch 2.3 or later.
For more information about enabling these features, see
[vLLM inference](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.html#vllm-inference).
[vLLM inference](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.html#vllm-inference).
ROCm has a vLLM branch for experimental features. This includes performance improvements, accuracy, and correctness testing.
These features include:
@@ -178,44 +148,44 @@ These features include:
computation in large-scale models. This benefits all workloads in `FP16` configurations.
To enable these experimental new features, see
[vLLM inference](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.html#vllm-inference).
[vLLM inference](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.html#vllm-inference).
Use the `rocm/vllm` branch when cloning the GitHub repo. The `vllm/ROCm_performance.md` document outlines
all the accessible features, and the `vllm/Dockerfile.rocm` file can be used.
### Enhanced performance tuning on AMD Instinct accelerators
ROCm is pretuned for high-performance computing workloads including large language models, generative AI, and scientific computing.
ROCm is pre-tuned for high-performance computing workloads including large language models, generative AI, and scientific computing.
The ROCm documentation provides comprehensive guidance on configuring your system for AMD Instinct accelerators. It includes
detailed instructions on system settings and application tuning suggestions to help you fully leverage the capabilities of these
accelerators for optimal performance. For more information, see
[AMD MI300X tuning guides](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/index.html) and
[AMD MI300A system optimization](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/mi300x.html).
[AMD MI300X tuning guides](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/tuning-guides/mi300x/index.html) and
[AMD MI300A system optimization](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/system-optimization/mi300x.html).
### Removed clang-ocl
As of version 6.2, ROCm no longer provides the `clang-ocl` package. The project will be archived in the future.
As of version 6.2, ROCm no longer provides the `clang-ocl` package.
See the [clang-ocl README](https://github.com/ROCm/clang-ocl).
### ROCm documentation changes
The documentation for the ROCm components has been reorganized and reformatted in a standard look and feel. This
improves the usability and readability of the documentation. For more information about the ROCm components, see
[What is ROCm?](https://rocm.docs.amd.com/en/latest/what-is-rocm.html).
[What is ROCm?](https://rocm.docs.amd.com/en/docs-6.2.0/what-is-rocm.html).
Since the release of ROCm 6.1, the documentation has added some key topics including:
- [AMD Instinct MI300X workload tuning guide](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html)
- [AMD Instinct MI300X system tuning guide](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/mi300x.html)
- [AMD Instinct MI300A system tuning guide](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/mi300a.html)
- [Using ROCm for AI](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/index.html)
- [Using ROCm for HPC](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-hpc/index.html)
- [Fine-tuning LLMs and inference optimization](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/index.html)
- [LLVM reference documentation](https://rocm.docs.amd.com/projects/llvm-project/en/latest/)
- [AMD Instinct MI300X workload tuning guide](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/tuning-guides/mi300x/workload.html)
- [AMD Instinct MI300X system tuning guide](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/system-optimization/mi300x.html)
- [AMD Instinct MI300A system tuning guide](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/system-optimization/mi300a.html)
- [Using ROCm for AI](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/rocm-for-ai/index.html)
- [Using ROCm for HPC](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/rocm-for-hpc/index.html)
- [Fine-tuning LLMs and inference optimization](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/llm-fine-tuning-optimization/index.html)
- [LLVM reference documentation](https://rocm.docs.amd.com/projects/llvm-project/en/docs-6.2.0/)
The following topics have been significantly improved, expanded, or both:
- [HIP programming manual](https://rocm.docs.amd.com/projects/HIP/en/latest/)
- [Compatibility matrix](https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html)
- [HIP documentation](https://rocm.docs.amd.com/projects/HIP/en/docs-6.2.0/)
- [Compatibility matrix](https://rocm.docs.amd.com/en/docs-6.2.0/compatibility/compatibility-matrix.html)
```{note}
All ROCm projects are open source and available on GitHub. To contribute to ROCm documentation, see the

View File

@@ -1,4 +1,3 @@
## Operating system and hardware support changes
ROCm 6.2.0 adds support for the following operating system and kernel versions.
@@ -23,5 +22,5 @@ ROCm 6.2.0 marks the end of support (EoS) for:
ROCm 6.2.0 has been tested against pre-release Ubuntu 22.04.5 (kernel: 6.5 [HWE]).
See the [Compatibility matrix](https://rocm-stg.amd.com/en/docs/6.2.0/compatibility/compatibility-matrix.html) for an
See the [Compatibility matrix](https://rocm.docs.amd.com/en/docs-6.2.0/compatibility/compatibility-matrix.html) for an
overview of supported operating systems and hardware architectures.

View File

@@ -1,9 +1,3 @@
## ROCm known issues
ROCm known issues are noted on {fab}`github` [GitHub](https://github.com/ROCm/ROCm/labels/Verified%20Issue). For known
issues related to individual components, review the [Detailed component changes](detailed-component-changes).
### Default processor affinity behavior for helper threads
Processor affinity is a critical setting to ensure that ROCm helper threads run on the correct cores. By default, ROCm
@@ -98,4 +92,3 @@ functionality provided by the closed-source compiler should transition to the op
Once the `rocm-llvm-alt` package is removed, any compilation requesting functionality provided by
the closed-source compiler will result in a Clang warning: "*[AMD] proprietary optimization compiler
has been removed*".