Spellcheck fixes in release notes templates (#3526)

* fix spelling in 5.4.x templates

* add to wordlist

* update templates

update wordlist

* remove extra_components

rm extra_components

* fix spelling
This commit is contained in:
Peter Park
2024-08-08 00:13:08 -04:00
committed by GitHub
parent f47afb7c66
commit 9c874ce984
8 changed files with 81 additions and 1389 deletions

View File

@@ -26,11 +26,12 @@ ATI
AddressSanitizer
AlexNet
Arb
Autocast
BARs
BLAS
BMC
BitCode
Blit
Blockwise
Bluefield
Bootloader
CCD
@@ -67,6 +68,7 @@ CommonMark
Concretized
Conda
ConnectX
CuPy
DDR
DF
DGEMM
@@ -86,6 +88,7 @@ DataLoader
DataParallel
DeepSpeed
Dependabot
Deprecations
DevCap
Dockerfile
Doxygen
@@ -93,6 +96,7 @@ ELMo
ENDPGM
EPYC
ESXi
EoS
FFT
FFTs
FFmpeg
@@ -143,6 +147,7 @@ HPCG
HPE
HPL
HSA
HW
HWE
HWS
Haswell
@@ -166,12 +171,14 @@ ImageNet
InfiniBand
Inlines
IntelliSense
Interop
Intersphinx
Intra
Ioffe
JSON
Jupyter
KFD
KFDTest
KiB
KV
KVM
@@ -208,6 +215,7 @@ MVFFR
Makefile
Makefiles
Matplotlib
Matrox
Megatrends
Megatron
Mellanox
@@ -258,6 +266,7 @@ OpenMP
OpenMPI
OpenSSL
OpenVX
OpenXLA
PCC
PCI
PCIe
@@ -274,6 +283,7 @@ PerfDb
Perfetto
PipelineParallel
PnP
PowerEdge
PowerShell
PyPi
PyTorch
@@ -284,7 +294,10 @@ RCCL
RDC
RDMA
RDNA
README
RHEL
RNN
RNNs
ROC
ROCProfiler
ROCTracer
@@ -296,6 +309,7 @@ ROCm
ROCmCC
ROCmSoftwarePlatform
ROCmValidationSuite
ROCprofiler
ROCr
RST
RW
@@ -430,6 +444,7 @@ backends
benchmarking
bfloat
bilinear
bitcode
bitsandbytes
blit
bootloader
@@ -455,8 +470,10 @@ composable
concretization
config
conformant
constructible
convolutional
convolves
copyable
cpp
csn
cuBLAS
@@ -478,6 +495,8 @@ denoise
denoised
denoises
denormalize
dequantization
dequantizes
deserializers
detections
dev
@@ -489,8 +508,9 @@ distro
el
embeddings
enablement
endpgm
encodings
endpgm
enqueue
env
epilog
etcetera
@@ -501,6 +521,7 @@ ffmpeg
filesystem
fortran
fp
gRPC
galb
gcc
gdb
@@ -508,6 +529,7 @@ gfortran
gfx
githooks
github
globals
gnupg
grayscale
gzip
@@ -537,6 +559,7 @@ hpp
hsa
hsakmt
hyperparameter
iDRAC
ib_core
inband
incrementing
@@ -547,6 +570,7 @@ init
initializer
inlining
installable
interop
interprocedural
intra
invariants
@@ -574,6 +598,7 @@ mivisionx
mkdir
mlirmiopen
mtypes
mutex
mvffr
namespace
namespaces
@@ -596,26 +621,35 @@ pragma
pre
prebuilt
precompiled
preconditioner
preconfigured
prefetch
prefetchable
prefill
prefills
preloaded
preprocess
preprocessed
preprocessing
preprocessor
prequantized
prerequisites
profiler
profilers
protobuf
pseudorandom
py
quantile
quantizer
quasirandom
queueing
rccl
rdc
reStructuredText
redirections
refactorization
reformats
repo
repos
representativeness
req
@@ -627,10 +661,12 @@ roc
rocAL
rocALUTION
rocBLAS
rocDecode
rocFFT
rocLIB
rocMLIR
rocPRIM
rocPyDecode
rocRAND
rocSOLVER
rocSPARSE
@@ -668,11 +704,14 @@ spack
src
stochastically
strided
subcommand
subdirectory
subexpression
subfolder
subfolders
submodule
supercomputing
symlink
td
tensorfloat
th
@@ -692,6 +731,7 @@ txt
uarch
uncached
uncorrectable
unhandled
uninstallation
unsqueeze
unstacking
@@ -713,10 +753,13 @@ vectorized
vectorizer
vectorizes
vjxb
voxel
walkthrough
walkthroughs
watchpoints
wavefront
wavefronts
whitespace
whitespaces
workgroup
workgroups

View File

@@ -137,7 +137,7 @@ ROCm 6.2.0 introduces the following bitsandbytes changes:
- `Int8` matrix multiplication is enabled, and it includes the following functions:
- `extract-outliers` extracts rows and columns that have outliers in the inputs. Theyre later used for matrix multiplication without quantization.
- `transform` row-to-column and column-to-row transformations are enabled, along with transpose operations. These are used before and after matmul computation.
- `transform` row-to-column and column-to-row transformations are enabled, along with transpose operations. These are used before and after `matmul` computation.
- `igemmlt` new function for GEMM computation A*B^T. It uses
[hipblasLtMatMul](https://rocm.docs.amd.com/projects/hipBLASLt/en/docs-6.2.0/api-reference.html#hipblasltmatmul) and performs 8-bit GEMM operations.
- `dequant_mm` dequantizes output matrix to original data type using scaling factors from vector-wise quantization.
@@ -192,7 +192,7 @@ all the accessible features, and the `vllm/Dockerfile.rocm` file can be used.
### Enhanced performance tuning on AMD Instinct accelerators
ROCm is pretuned for high-performance computing workloads including large language models, generative AI, and scientific computing.
ROCm is pre-tuned for high-performance computing workloads including large language models, generative AI, and scientific computing.
The ROCm documentation provides comprehensive guidance on configuring your system for AMD Instinct accelerators. It includes
detailed instructions on system settings and application tuning suggestions to help you fully leverage the capabilities of these
accelerators for optimal performance. For more information, see

File diff suppressed because it is too large Load Diff

View File

@@ -16,7 +16,7 @@ long long int wall_clock64();
```
It returns wall clock count at a constant frequency on the device, which can be queried via HIP API with
the hipDeviceAttributeWallClockRate attribute of the device in the HIP application code.
the `hipDeviceAttributeWallClockRate` attribute of the device in the HIP application code.
Example:
@@ -25,7 +25,7 @@ int wallClkRate = 0; //in kilohertz
+HIPCHECK(hipDeviceGetAttribute(&wallClkRate, hipDeviceAttributeWallClockRate, deviceId));
```
Where hipDeviceAttributeWallClockRate is a device attribute.
Where `hipDeviceAttributeWallClockRate` is a device attribute.
:::{note}
The wall clock frequency is a per-device attribute.

View File

@@ -16,7 +16,7 @@ binary counterpart. No user action is required. Once these are available, users
#### `hipcc` options deprecation
The following hipcc options are being deprecated and will be removed in a future release:
The following `hipcc` options are being deprecated and will be removed in a future release:
* The `--amdgpu-target` option is being deprecated, and user must use the `offload-arch` option to
specify the GPU architecture.

View File

@@ -1,29 +1,3 @@
The release notes provide a comprehensive summary of changes since the previous ROCm release.
- [Release highlights](release-highlights)
- [Operating system and hardware support changes](operating-system-and-hardware-support-changes)
- [ROCm components versioning](rocm-components)
- [Detailed component changes](detailed-component-changes)
- [ROCm known issues](rocm-known-issues)
- [ROCm upcoming changes](rocm-upcoming-changes)
The [Compatibility matrix](https://rocm.docs.amd.com/en/latest/release/docs/6.2.0/compatibility/compatibility-matrix)
provides an overview of operating system, hardware, ecosystem, and ROCm component support across ROCm releases.
Release notes for previous ROCm releases are available in earlier versions of the documentation.
See the [ROCm documentation release history](https://rocm.docs.amd.com/en/latest/release/versions).
## Release highlights
This section introduces notable new features and improvements in ROCm 6.2. See the
[Detailed component changes](#detailed-component-changes) for individual component changes.
### New components
ROCm 6.2.0 introduces the following new components to the ROCm software stack.
@@ -41,7 +15,7 @@ ROCm 6.2.0 introduces the following new components to the ROCm software stack.
- **rocPyDecode** -- A tool to access rocDecode APIs in Python. It connects Python and C/C++ libraries,
enabling function calling and data passing between the two languages. The `rocpydecode.so` library, a wrapper, uses
rocDecode APIs written primarily in C/C++ within Python. For more information, see
[rocPyDecode](https://rocm.docs.amd.com/projects/rocpydecode/en/latest).
[rocPyDecode](https://rocm.docs.amd.com/projects/rocPyDecode/en/latest).
- **ROCprofiler-SDK** -- ROCprofiler-SDK is a profiling and tracing library for HIP and ROCm applications on AMD ROCm software
used to identify application performance bottlenecks and optimize their performance. The new APIs add restrictions for more
@@ -75,14 +49,14 @@ multiple unique configurations for use when installing ROCm on a target. Other n
* Resolution and inclusion of dependency packages for offline installation
For more information, see
[ROCm Offline Installer Creator](https://rocm.docs.amd.com/projects/rocm-install-on-linux/en/latest/install/rocm-offline-installer.html).
[ROCm Offline Installer Creator](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/install/rocm-offline-installer.html).
### Math libraries default to Clang instead of HIPCC
The default compiler used to build the math libraries on Linux changes from `hipcc` to `amdclang++`.
Appropriate compiler flags are added to ensure these compilations build correctly. This change only applies when
building the libraries. Applications using the libraries can continue to be compiled using `hipcc` or `amdclang++` as
described in [ROCm compiler reference](https://rocm.docs.amd.com/projects/llvm-project/en/latest/reference/rocmcc.html).
described in [ROCm compiler reference](https://rocm.docs.amd.com/projects/llvm-project/en/docs-6.2.0/reference/rocmcc.html).
The math libraries can also be built with `hipcc` using any of the previously available methods (for example, the `CXX`
environment variable, the `CMAKE_CXX_COMPILER` CMake variable, and so on). This change shouldn't affect performance or
functionality.
@@ -95,27 +69,27 @@ This section highlights updates to supported deep learning frameworks and notabl
ROCm 6.2.0 supports PyTorch versions 2.2 and 2.3 and TensorFlow version 2.16.
See [Installing PyTorch for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/pytorch-install.html)
and [Installing TensorFlow for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/tensorflow-install.html)
See [Installing PyTorch for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/how-to/3rd-party/pytorch-install.html)
and [Installing TensorFlow for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/how-to/3rd-party/tensorflow-install.html)
for installation instructions.
Refer to the
[Third-party support matrix](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/3rd-party-support-matrix.html#deep-learning)
[Third-party support matrix](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/reference/3rd-party-support-matrix.html#deep-learning)
for a comprehensive list of third-party frameworks and libraries supported by ROCm.
#### Optimized framework support for OpenXLA
PyTorch for ROCm and TensorFlow for ROCm now provide native support for OpenXLA. OpenXLA is an open-source ML compiler
ecosystem that enables developers to compile and optimize models from all leading ML frameworks. For more information, see
[Installing PyTorch for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/pytorch-install.html)
and [Installing TensorFlow for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/tensorflow-install.html).
[Installing PyTorch for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/how-to/3rd-party/pytorch-install.html)
and [Installing TensorFlow for ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.2.0/how-to/3rd-party/tensorflow-install.html).
#### PyTorch support for Autocast (automatic mixed precision)
PyTorch now supports Autocast for recurrent neural networks (RNNs) on ROCm. This can help to reduce computational
workloads and improve performance. Based on the information about the magnitude of values, Autocast can substitute the
original `float32` linear layers and convolutions with their `float16` or `bfloat16` variants. For more information, see
[Automatic mixed precision](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/train-a-model#automatic-mixed-precision-amp).
[Automatic mixed precision](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/rocm-for-ai/train-a-model.html#automatic-mixed-precision-amp).
#### Memory savings for bitsandbytes model quantization
@@ -125,9 +99,9 @@ ROCm 6.2.0 introduces the following bitsandbytes changes:
- `Int8` matrix multiplication is enabled, and it includes the following functions:
- `extract-outliers` extracts rows and columns that have outliers in the inputs. Theyre later used for matrix multiplication without quantization.
- `transform` row-to-column and column-to-row transformations are enabled, along with transpose operations. These are used before and after matmul computation.
- `transform` row-to-column and column-to-row transformations are enabled, along with transpose operations. These are used before and after `matmul` computation.
- `igemmlt` new function for GEMM computation A*B^T. It uses
[hipblasLtMatMul](https://rocm.docs.amd.com/projects/hipBLASLt/en/latest/api-reference.html#hipblasltmatmul) and performs 8-bit GEMM operations.
[hipblasLtMatMul](https://rocm.docs.amd.com/projects/hipBLASLt/en/docs-6.2.0/api-reference.html#hipblasltmatmul) and performs 8-bit GEMM operations.
- `dequant_mm` dequantizes output matrix to original data type using scaling factors from vector-wise quantization.
- Blockwise quantization input tensors are quantized for a fixed block size.
- 4-bit quantization and dequantization functions normalized `Float4` quantization, quantile estimation, and quantile quantization functions are enabled.
@@ -138,7 +112,7 @@ These functions are included in bitsandbytes. They are not part of ROCm. However
features to run them.
```
For more information, see [Model quantization techniques](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/model-quantization.html).
For more information, see [Model quantization techniques](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/llm-fine-tuning-optimization/model-quantization.html).
#### Improved vLLM support
@@ -146,14 +120,10 @@ ROCm 6.2.0 enhances vLLM support for inference on AMD Instinct accelerators, add
capabilities for `FP16`/`BF16` precision for LLMs, and `FP8` support for Llama.
ROCm 6.2.0 adds support for the following vLLM features:
- MP:
Multi-GPU execution. Choose between MP and Ray using a flag. To set it to MP,
- MP: Multi-GPU execution. Choose between MP and Ray using a flag. To set it to MP,
use `--distributed-executor-backed=mp`. The default depends on the commit in flux.
- FP8 KV cache:
Enhances computational efficiency and performance by significantly reducing memory usage and bandwidth requirements.
- FP8 KV cache: Enhances computational efficiency and performance by significantly reducing memory usage and bandwidth requirements.
The QUARK quantizer currently only supports Llama.
- Triton Flash Attention:
@@ -166,7 +136,7 @@ ROCm 6.2.0 adds support for the following vLLM features:
Improved optimization and tuning of GEMMs. It requires Docker with PyTorch 2.3 or later.
For more information about enabling these features, see
[vLLM inference](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.html#vllm-inference).
[vLLM inference](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.html#vllm-inference).
ROCm has a vLLM branch for experimental features. This includes performance improvements, accuracy, and correctness testing.
These features include:
@@ -178,44 +148,44 @@ These features include:
computation in large-scale models. This benefits all workloads in `FP16` configurations.
To enable these experimental new features, see
[vLLM inference](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.html#vllm-inference).
[vLLM inference](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/llm-fine-tuning-optimization/llm-inference-frameworks.html#vllm-inference).
Use the `rocm/vllm` branch when cloning the GitHub repo. The `vllm/ROCm_performance.md` document outlines
all the accessible features, and the `vllm/Dockerfile.rocm` file can be used.
### Enhanced performance tuning on AMD Instinct accelerators
ROCm is pretuned for high-performance computing workloads including large language models, generative AI, and scientific computing.
ROCm is pre-tuned for high-performance computing workloads including large language models, generative AI, and scientific computing.
The ROCm documentation provides comprehensive guidance on configuring your system for AMD Instinct accelerators. It includes
detailed instructions on system settings and application tuning suggestions to help you fully leverage the capabilities of these
accelerators for optimal performance. For more information, see
[AMD MI300X tuning guides](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/index.html) and
[AMD MI300A system optimization](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/mi300x.html).
[AMD MI300X tuning guides](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/tuning-guides/mi300x/index.html) and
[AMD MI300A system optimization](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/system-optimization/mi300x.html).
### Removed clang-ocl
As of version 6.2, ROCm no longer provides the `clang-ocl` package. The project will be archived in the future.
As of version 6.2, ROCm no longer provides the `clang-ocl` package.
See the [clang-ocl README](https://github.com/ROCm/clang-ocl).
### ROCm documentation changes
The documentation for the ROCm components has been reorganized and reformatted in a standard look and feel. This
improves the usability and readability of the documentation. For more information about the ROCm components, see
[What is ROCm?](https://rocm.docs.amd.com/en/latest/what-is-rocm.html).
[What is ROCm?](https://rocm.docs.amd.com/en/docs-6.2.0/what-is-rocm.html).
Since the release of ROCm 6.1, the documentation has added some key topics including:
- [AMD Instinct MI300X workload tuning guide](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html)
- [AMD Instinct MI300X system tuning guide](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/mi300x.html)
- [AMD Instinct MI300A system tuning guide](https://rocm.docs.amd.com/en/latest/how-to/system-optimization/mi300a.html)
- [Using ROCm for AI](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/index.html)
- [Using ROCm for HPC](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-hpc/index.html)
- [Fine-tuning LLMs and inference optimization](https://rocm.docs.amd.com/en/latest/how-to/llm-fine-tuning-optimization/index.html)
- [LLVM reference documentation](https://rocm.docs.amd.com/projects/llvm-project/en/latest/)
- [AMD Instinct MI300X workload tuning guide](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/tuning-guides/mi300x/workload.html)
- [AMD Instinct MI300X system tuning guide](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/system-optimization/mi300x.html)
- [AMD Instinct MI300A system tuning guide](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/system-optimization/mi300a.html)
- [Using ROCm for AI](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/rocm-for-ai/index.html)
- [Using ROCm for HPC](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/rocm-for-hpc/index.html)
- [Fine-tuning LLMs and inference optimization](https://rocm.docs.amd.com/en/docs-6.2.0/how-to/llm-fine-tuning-optimization/index.html)
- [LLVM reference documentation](https://rocm.docs.amd.com/projects/llvm-project/en/docs-6.2.0/)
The following topics have been significantly improved, expanded, or both:
- [HIP programming manual](https://rocm.docs.amd.com/projects/HIP/en/latest/)
- [Compatibility matrix](https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html)
- [HIP documentation](https://rocm.docs.amd.com/projects/HIP/en/docs-6.2.0/)
- [Compatibility matrix](https://rocm.docs.amd.com/en/docs-6.2.0/compatibility/compatibility-matrix.html)
```{note}
All ROCm projects are open source and available on GitHub. To contribute to ROCm documentation, see the

View File

@@ -1,4 +1,3 @@
## Operating system and hardware support changes
ROCm 6.2.0 adds support for the following operating system and kernel versions.
@@ -23,5 +22,5 @@ ROCm 6.2.0 marks the end of support (EoS) for:
ROCm 6.2.0 has been tested against pre-release Ubuntu 22.04.5 (kernel: 6.5 [HWE]).
See the [Compatibility matrix](https://rocm-stg.amd.com/en/docs/6.2.0/compatibility/compatibility-matrix.html) for an
See the [Compatibility matrix](https://rocm.docs.amd.com/en/docs-6.2.0/compatibility/compatibility-matrix.html) for an
overview of supported operating systems and hardware architectures.

View File

@@ -1,9 +1,3 @@
## ROCm known issues
ROCm known issues are noted on {fab}`github` [GitHub](https://github.com/ROCm/ROCm/labels/Verified%20Issue). For known
issues related to individual components, review the [Detailed component changes](detailed-component-changes).
### Default processor affinity behavior for helper threads
Processor affinity is a critical setting to ensure that ROCm helper threads run on the correct cores. By default, ROCm
@@ -98,4 +92,3 @@ functionality provided by the closed-source compiler should transition to the op
Once the `rocm-llvm-alt` package is removed, any compilation requesting functionality provided by
the closed-source compiler will result in a Clang warning: "*[AMD] proprietary optimization compiler
has been removed*".